Future challenges in expertise retrieval

This was the title of the workshop I organized at SIGIR 2008 in July. The main objective of the workshop was to bring people from di?erent research communities together, to discuss recent advances in expertise retrieval, and to de?ne a research roadmap for the next years.
I think (and I hope I’m not alone with this) that the workshop was a success, with many interesting papers and lively discussions. If you’re interested in expert finding but missed it, now is your chance to find out what themes were discussed; check out the workshop summary that was recently published in the December 2008 issue of SIGIR Forum.

A Language Modeling Framework for Expert Finding

Our first paper on formal models for expertise retrieval, Formal Models for Expert Finding in Enterprise Corpora by Krisztian Balog, Leif Azzopardi, and Maarten de Rijke from SIGIR’06, has been very influential. It has received 70 citations according to Google Scholar so far, and the models we laid down there (especially “Model 2″) have become the de facto baselines against which other approaches compare themselves.

A Language Modeling Framework for Expert Finding, from the same authors, will be published in the January 2009 issue of Information Processing & Management. Actually, it is available online since September 2008, but I have not posted about it yet – so it’s time to make up for it!
The IPM paper can be seen as an extension of the SIGIR’06 work. Additions include the proximity-based versions of candidate and document models (Models 1B and 2B), a solution for setting the smoothing parameter for each model by automatic means, advanced document-candidate associations, and an extensive empirical comparison of the different methods, followed by a detailed analysis of the results.

ECAI 2008 paper online

Finding Key Bloggers, One Post At A Time by Wouter Weerkamp, Krisztian Balog and Maarten de Rijke is available online now. Our idea of applying expertise retrieval models to the task of blog distillation was first described in a SIGIR 2008 poster titled Bloggers as Experts. The conclusions of that work was that the expert finding Model 1 can compete with state-of-the-art on the blog distillation task. In the ECAI paper we explore additional blog-specific features (including representation, number of comments, post length, and temporal ordering) and, in addition, a combination of these. We find that these result in significant improvements over the baseline.

SIGIR 2008 papers

I’ve got one full paper and two posters accepted at this year’s SIGIR conference.
The paper titled A Few Examples Go A Long Way: Constructing Query Models from Elaborate Query Formulations (co-authored by Wouter Weerkamp and Maarten de Rijke) addresses the document search task set out at TREC 2007. Our scenario is one where the topic description consists of a short query (of a few keywords) together with examples of key reference pages. Our main research goal is to investigate ways of utilizing these example documents provided by the users. In particular, we use these “sample documents” for query expansion, by sampling terms from them both independent of and dependent on the original query. We find that the query-independent expansion method helps to address the “aspect recall” problem, by identifying relevant documents that are not identified by the other query models we consider.

In the poster paper titled Parsimonious Relevance Models (co-authored by Edgar Meij, Wouter Weerkamp, and Maarten de Rijke) we describe a method for applying parsimonious language models to re-estimate the term probabilities assigned by relevance models. The results of our experimental evaluation (performed on six TREC collections) indicate that parsimonious relevance models significantly outperform their non-parsimonized counterparts on most measures.

Finally, the poster titled Bloggers as Experts (co-authored by Wouter Weerkamp and Maarten de Rijke) views the blog distillation task (finding blogs that are principally devoted to a given topic) as an association finding task between topics and bloggers. Under this view, it resembles the expert finding task (for which a range of models have been proposed). We adopt two expert finding models (Model 1 and Model 2 from our SIGIR 2006 paper) to determine their effectiveness as feed distillation strategies. We find that out-of-the-box expert finding methods can achieve competitive scores on the feed distillation task. However, as opposed to expert finding, where Model 2 performed consistently better, for the blog distillation task Model 1 is the preferred strategy.