Expertise Retrieval Workshop at SIGIR 2008

I’m co-organizing a workshop at SIGIR 2008 (together with Yong Yu) titled Future Challenges in Expertise Retrieval (fCHER). Since the introduction of the Expert Finding task at TREC 2005, a rapid progress has been made in terms of modeling, algorithms, and evaluation over the past 3 years. In fact, expertise retrieval has reached the point where it is appropriate to assess progress, bring people from different research communities together, and define a research agenda for the next years. This workshop aims to determine what we have accomplished and where we need to go from here in expertise retrieval.

fCHER website and CfP

SIGIR 2008 papers

I’ve got one full paper and two posters accepted at this year’s SIGIR conference.
The paper titled A Few Examples Go A Long Way: Constructing Query Models from Elaborate Query Formulations (co-authored by Wouter Weerkamp and Maarten de Rijke) addresses the document search task set out at TREC 2007. Our scenario is one where the topic description consists of a short query (of a few keywords) together with examples of key reference pages. Our main research goal is to investigate ways of utilizing these example documents provided by the users. In particular, we use these “sample documents” for query expansion, by sampling terms from them both independent of and dependent on the original query. We find that the query-independent expansion method helps to address the “aspect recall” problem, by identifying relevant documents that are not identified by the other query models we consider.

In the poster paper titled Parsimonious Relevance Models (co-authored by Edgar Meij, Wouter Weerkamp, and Maarten de Rijke) we describe a method for applying parsimonious language models to re-estimate the term probabilities assigned by relevance models. The results of our experimental evaluation (performed on six TREC collections) indicate that parsimonious relevance models significantly outperform their non-parsimonized counterparts on most measures.

Finally, the poster titled Bloggers as Experts (co-authored by Wouter Weerkamp and Maarten de Rijke) views the blog distillation task (finding blogs that are principally devoted to a given topic) as an association finding task between topics and bloggers. Under this view, it resembles the expert finding task (for which a range of models have been proposed). We adopt two expert finding models (Model 1 and Model 2 from our SIGIR 2006 paper) to determine their effectiveness as feed distillation strategies. We find that out-of-the-box expert finding methods can achieve competitive scores on the feed distillation task. However, as opposed to expert finding, where Model 2 performed consistently better, for the blog distillation task Model 1 is the preferred strategy.

SAW 2008 accepted papers

The review process is over, the list of accepted papers is available at

Out of 18 submissions 10 papers were accepted which gives the workshop a 55% acceptance ratio. All accepted papers will be presented on May 6th, 2008. They will be also published altogether with presentation slides in open content proceedings of all BIS 2008 Workshops (at as well as in a book or on CD.

Thesis completed

I am happy to announce that my thesis titled People Search in the Enterprise has been completed and submitted to the committee.

The main focus in the thesis is on two main expertise retrieval tasks: (1) expert finding — identifying a list of people who are knowledgeable about a given topic (“Who are the experts on topic X?”) and (2) expert profiling — returning a list of topics that a person is knowledgeable about (“What topics does person Y know about?”). In the thesis, expertise retrieval is approached as an association finding task between people and topics.

The main contribution of the thesis is a generative probabilistic modeling framework for capturing the expert finding and profiling tasks in a uniform way. On top of this general framework two main families of models are introduced, by adapting generative language modeling techniques for document retrieval in a transparent and theoretically sound way.

Throughout the thesis we extensively evaluate and compare these baseline models across different organizational settings, and perform an extensive and systematic exploration and analysis of the experimental results obtained. We show that our baseline models are robust yet deliver very competitive performance.

Through a series of examples we demonstrate that our generic models are able to incorporate and exploit special characteristics and features of test collections and/or the organizational settings that they represent. Additionally, we address a number of related tasks, including finding similar experts, mining contact details of people, and enterprise document search.

Finally, we provide further examples that illustrate the generic nature of our baseline models and apply them to find associations between topics and entities other than people.

Assuming that the committee’s answer is affirmative, the thesis is going to be printed in early June 2008.