Survey on Expertise Retrieval

Together with Yi Fang (Purdue University, USA), Maarten de Rijke (University of Amsterdam, The Netherlands), Pavel Serdyukov (Yandex, Russia), and Luo Si (Purdue University, USA), I wrote a survey paper on Expertise Retrieval for the Foundations and Trends in Information Retrieval (FnTIR) journal, which is now available online. (If your organization doesn’t have a subscription, you can get a free copy from my homepage.)

The study offers a comprehensive overview of expertise retrieval, primarily from an IR perspective, but many other aspects of this multi-faceted research area are also covered. Our main attention is on models and algorithms, which are organized in five groups of basic approaches. We discuss extensions of these models as well as practical considerations. At the end of the survey, we identify a number of possible future directions; these could be of particular interest to those currently working in this area.

A Living Lab for Product Search

Earlier today I presented the work by Leif Azzopardi and myself at the CLEF 2011 conference, entitled Towards a Living Lab for Information Retrieval Research and Development. A proposal for a living lab for product search tasks. The abstract follows:

The notion of having a “living lab” to undertaken evaluations has been proposed by a number of proponents within the field of Information Retrieval (IR). However, what such a living lab might look like and how it might be setup has not been discussed in detail. Living labs have a number of appealing points such as realistic evaluation contexts where tasks are directly linked to user experience and the closer integration of research/academia and development/industry facilitating more efficient knowledge transfer. However, operationalizing a living lab opens up a number of concerns regarding security, privacy, etc. as well as challenges regarding the design, development and maintenance of the infrastructure required to support such evaluations. Here, we aim to further the discussion on living labs for IR evaluation and propose one possible architecture to create such an evaluation environment. To focus discussion, we put forward a proposal for a living lab on product search tasks within the context of an online shop.

Full paper | Presentation slides

We are keen to get feedback from the community to see if we should continue to develop this initiative further. If you’re at CLEF this week, come talk to me.

TREC Entity 2010 overview

The TREC Entity 2010 overview paper is now available online. We will soon start the discussion about the 2011 edition on the track’s mailing list.

Best @INEX2009 Entity ranking

The UvA ISLA team (consisting of me, Marc Bron, Maarten de Rijke, and Wouter Weerkamp) achieved top performance at the Entity Ranking (XER) track at INEX 2009, on both tasks (entity ranking and list completion). Our submission employed a slightly tweaked variation of the best performing models we describe in our paper entitled Category-based Query Modeling for Entity Search; this work will be presented at ECIR at the end of this month (the paper is available online).

Although we did really great at INEX, our achievement is somewhat weakened by the fact that only 5 teams participated at the XER track (including us). The number of participating teams was 8 in 2007, 6 in 2008, and 5 in 2009. So, what’s the future of INEX-XER (or INEX for that matter)?

Update Apr 19, 2010: the paper describing our approach @INEX is available online.

Last bundle of updates for 2009

I haven’t had time to post entries on my blog over the past few weeks (or even months — has it really been that long ago?). Anyway, here is a couple of things worth mentioning before 2009 is officially over.

A newer version of the EARS toolkit has been released. Major changes concern document-entity associations and faster computation of candidate models, as well as support for MS Visual Studio. See the changelog for details.

Our paper entitled Category-based Query Modeling for Entity Search, with Krisztian Balog, Marc Bron, and Maarten de Rijke as authors, has been accepted to ECIR 2010 and is available online now.

Abstract. Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

See also: ECIR 2010 accepted papers, posters, and demos.

The TREC Enterprise 2008 overview paper has finally been posted to the proceedings.

Happy 2010!