Last bundle of updates for 2009

I haven’t had time to post entries on my blog over the past few weeks (or even months — has it really been that long ago?). Anyway, here is a couple of things worth mentioning before 2009 is officially over.

A newer version of the EARS toolkit has been released. Major changes concern document-entity associations and faster computation of candidate models, as well as support for MS Visual Studio. See the changelog for details.

Our paper entitled Category-based Query Modeling for Entity Search, with Krisztian Balog, Marc Bron, and Maarten de Rijke as authors, has been accepted to ECIR 2010 and is available online now.

Abstract. Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

See also: ECIR 2010 accepted papers, posters, and demos.

The TREC Enterprise 2008 overview paper has finally been posted to the proceedings.

Happy 2010!

TREC Enterprise 2008 overview

The overview paper of the TREC 2008 Enterprise track is -finally- available. While I was not an organizer of the track, I helped out with finishing the paper; the track organizers generously awarded my contribution with a first authorship. The document still needs to undergo the NIST approval process, but I am allowed to distribute it as “draft”.
[Dowload PDF|BibTex].

Despite having my name on the overview paper, I am still wearing a participant’s hat. So the first questions that comes to mind is: How did we do? (We is team ISLA, consisting of Maarten de Rijke and me.) To cut the story short — we won! Of course, TREC (according to some people) is not a competition. I am not going to take a side on that matter (at least not in this post), so let me translate the simple “we won” statement from ordinary to scientific language: our run showed the best performance among all submissions for the expert finding task of the TREC 2008 Enterprise track. Actually, we achieved both first and second place for all metrics and for all three different versions of the official qrels (they differ in how assessor agreement was handled). Our best run employed a combination of three models: a proximity-based candidate model, a document-based model, and a Web-based variation of the candidate model; our second best run is the same, but without the Web-based component. See the details in our paper [Download PDF|BibTex].
Needless to say, I am very content with these results. Seeing that my investments into research on expert finding has resulted in the state-of-the-art feels just great.

Seminar on Searching and Ranking in Enterprises

Today, on the occasion of the PhD defense of Pavel Serdyukov, a seminar on enterprise seach was held at the University of Twente. Three of Pavel’s committee members gave talks: David Hawking, Iadh Ounis, and Maarten de Rijke.
The summaries of the talks will soon be uploaded.
Of course, the main attraction of the day was Pavel’s defense. His PhD thesis is entitled The search for expertise: Beyond direct evidence. He was confronted with interesting, and, sometimes quite challenging questions, but handled them to the satisfaction of the committee. Congratulations Pavel, I mean, Dr. Serdyukov!

A Language Modeling Framework for Expert Finding

Our first paper on formal models for expertise retrieval, Formal Models for Expert Finding in Enterprise Corpora by Krisztian Balog, Leif Azzopardi, and Maarten de Rijke from SIGIR’06, has been very influential. It has received 70 citations according to Google Scholar so far, and the models we laid down there (especially “Model 2″) have become the de facto baselines against which other approaches compare themselves.

A Language Modeling Framework for Expert Finding, from the same authors, will be published in the January 2009 issue of Information Processing & Management. Actually, it is available online since September 2008, but I have not posted about it yet – so it’s time to make up for it!
The IPM paper can be seen as an extension of the SIGIR’06 work. Additions include the proximity-based versions of candidate and document models (Models 1B and 2B), a solution for setting the smoothing parameter for each model by automatic means, advanced document-candidate associations, and an extensive empirical comparison of the different methods, followed by a detailed analysis of the results.

TREC 2008, 2009

The ILPS group of the University of Amsterdam participated in three tracks at TREC 2008: blog, enterprise, and relevance feedback. The working notes paper describing our approaches is available online.
Results for the Enterprise track were not available at the time of writing, therefore the paper only reports the runs we submitted.
I will present some interesting findings we came across, concerning the expert finding task, at the conference (November 19-21, Gaithersburg, USA). I hope that the title of my talk sounds promising: Now that you’ve bought into Model 2, we’ll tell you why to get Model 1.

After four successful years, the Enterprise track is coming to an end. Personally, I am extremely grateful for the TRECENT Organizers (Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, and Arjen P. de Vries, strictly in alphabetical order) for coordinating the track, and making this platform available to the research community!

A new track, Entity Ranking will run from 2009 that I’m co-organizing with Arjen P. de Vries, Paul Thomas, and Thijs Westerveld. I’m not supposed to share details about it at this point, but it’ll has something to do with searching entities (such as people) in web data. We hope to attract participants that would have performed expert finding at the Enterprise track… specifics should follow after TREC.