A Test Collection for Entity Search in DBpedia

With this SIGIR ’13 short paper, we try to address some of the action points that were identified as as important priorities for entity-oriented and semantic search at the JIWES workshop held at SIGIR ’12 (see the detailed workshop report). Namely: (A1) Getting more representative information needs and favoring long queries over short ones. (A2) Limiting search to a smaller, fixed set of entity types (as opposed to arbitrary types of entities). (A3) Using test collections that integrate both structured and unstructured information about entities.

An IR test collection has three main ingredients: a data collection, a set of queries, and corresponding relevance judgments. We propose to use DBpedia as the data collection; DBpedia is a community effort to extract structured information from Wikipedia. It is one of the most comprehensive knowledge bases on the web, describing 3.64M entities (in version 3.7). We took entity-oriented queries from a number of benchmarking evaluation campaigns, synthesized them into a single query set, and mapped known relevant answers to DBpedia. This mapping involved a series of not-too-exciting yet necessary data cleansing steps, such as normalizing URIs, replacing redirects, removing duplicates, and filtering out non-entity results. In the end, we have 485 queries with an average of 27 relevant entities per query.

Now, let’s see how this relates to the action points outlined above. (A1) We consider a broad range of information needs, ranging from short keyword queries to natural language questions. The average query length, computed over the whole query set, is 5.3 terms—more than double the length of typical web search queries (which is around 2.4 terms). (A2) DBpedia has a consistent ontology comprising of 320 classes, organized into a 6 levels deep hierarchy; this allows for the incorporation of type information at different granularities. (A3) As DBpedia is extracted from Wikipedia, there is more textual content available for those who wish to combine structured and unstructured information about entities.

The paper also includes a set of baseline results using variants of two popular retrieval models: language models and BM25. We found that the various query sub-sets (originating from different benchmarking campaigns) exhibit different levels of difficulty—this was expected. What was rather surprising, however, is that none of the more advanced multi-field variants could really improve over the simplest possible single-field approach. We observed that a large number of topics were affected, but the number of topics helped/hurt was about the same. The breakdowns by various query-subsets also suggest that there is no one-size-fits-all way to effectively address all types of information needs represented in this collection. This phenomenon could give rise to novel approaches in the future; for example, one could first identify the type of the query and then choose the retrieval model accordingly.

The resources developed as part of this study are made available here. You are also welcome to check out the poster I presented at SIGIR ’13.
If you have (or planning to have) a paper that uses this collection, I would be happy to hear about it!

Survey on Expertise Retrieval

Together with Yi Fang (Purdue University, USA), Maarten de Rijke (University of Amsterdam, The Netherlands), Pavel Serdyukov (Yandex, Russia), and Luo Si (Purdue University, USA), I wrote a survey paper on Expertise Retrieval for the Foundations and Trends in Information Retrieval (FnTIR) journal, which is now available online. (If your organization doesn’t have a subscription, you can get a free copy from my homepage.)

The study offers a comprehensive overview of expertise retrieval, primarily from an IR perspective, but many other aspects of this multi-faceted research area are also covered. Our main attention is on models and algorithms, which are organized in five groups of basic approaches. We discuss extensions of these models as well as practical considerations. At the end of the survey, we identify a number of possible future directions; these could be of particular interest to those currently working in this area.

A Living Lab for Product Search

Earlier today I presented the work by Leif Azzopardi and myself at the CLEF 2011 conference, entitled Towards a Living Lab for Information Retrieval Research and Development. A proposal for a living lab for product search tasks. The abstract follows:

The notion of having a “living lab” to undertaken evaluations has been proposed by a number of proponents within the field of Information Retrieval (IR). However, what such a living lab might look like and how it might be setup has not been discussed in detail. Living labs have a number of appealing points such as realistic evaluation contexts where tasks are directly linked to user experience and the closer integration of research/academia and development/industry facilitating more efficient knowledge transfer. However, operationalizing a living lab opens up a number of concerns regarding security, privacy, etc. as well as challenges regarding the design, development and maintenance of the infrastructure required to support such evaluations. Here, we aim to further the discussion on living labs for IR evaluation and propose one possible architecture to create such an evaluation environment. To focus discussion, we put forward a proposal for a living lab on product search tasks within the context of an online shop.

Full paper | Presentation slides

We are keen to get feedback from the community to see if we should continue to develop this initiative further. If you’re at CLEF this week, come talk to me.

TREC Entity 2010 overview

The TREC Entity 2010 overview paper is now available online. We will soon start the discussion about the 2011 edition on the track’s mailing list.

Best @INEX2009 Entity ranking

The UvA ISLA team (consisting of me, Marc Bron, Maarten de Rijke, and Wouter Weerkamp) achieved top performance at the Entity Ranking (XER) track at INEX 2009, on both tasks (entity ranking and list completion). Our submission employed a slightly tweaked variation of the best performing models we describe in our paper entitled Category-based Query Modeling for Entity Search; this work will be presented at ECIR at the end of this month (the paper is available online).

Although we did really great at INEX, our achievement is somewhat weakened by the fact that only 5 teams participated at the XER track (including us). The number of participating teams was 8 in 2007, 6 in 2008, and 5 in 2009. So, what’s the future of INEX-XER (or INEX for that matter)?

Update Apr 19, 2010: the paper describing our approach @INEX is available online.