TREC Enterprise 2008 overview

The overview paper of the TREC 2008 Enterprise track is -finally- available. While I was not an organizer of the track, I helped out with finishing the paper; the track organizers generously awarded my contribution with a first authorship. The document still needs to undergo the NIST approval process, but I am allowed to distribute it as “draft”.
[Dowload PDF|BibTex].

Despite having my name on the overview paper, I am still wearing a participant’s hat. So the first questions that comes to mind is: How did we do? (We is team ISLA, consisting of Maarten de Rijke and me.) To cut the story short — we won! Of course, TREC (according to some people) is not a competition. I am not going to take a side on that matter (at least not in this post), so let me translate the simple “we won” statement from ordinary to scientific language: our run showed the best performance among all submissions for the expert finding task of the TREC 2008 Enterprise track. Actually, we achieved both first and second place for all metrics and for all three different versions of the official qrels (they differ in how assessor agreement was handled). Our best run employed a combination of three models: a proximity-based candidate model, a document-based model, and a Web-based variation of the candidate model; our second best run is the same, but without the Web-based component. See the details in our paper [Download PDF|BibTex].
Needless to say, I am very content with these results. Seeing that my investments into research on expert finding has resulted in the state-of-the-art feels just great.

500+ thesis downloads

My thesis hit a significant milestone last week as it crossed the 500 download mark. It took less than 8 months since it was made available online in 2008 July to reach this.

The first release of the implementation of the models introduced in the thesis, alias EARS (Entity and Association Retrieval System), is expected to arrive before the end of this month.

Future challenges in expertise retrieval

This was the title of the workshop I organized at SIGIR 2008 in July. The main objective of the workshop was to bring people from di?erent research communities together, to discuss recent advances in expertise retrieval, and to de?ne a research roadmap for the next years.
I think (and I hope I’m not alone with this) that the workshop was a success, with many interesting papers and lively discussions. If you’re interested in expert finding but missed it, now is your chance to find out what themes were discussed; check out the workshop summary that was recently published in the December 2008 issue of SIGIR Forum.

A Language Modeling Framework for Expert Finding

Our first paper on formal models for expertise retrieval, Formal Models for Expert Finding in Enterprise Corpora by Krisztian Balog, Leif Azzopardi, and Maarten de Rijke from SIGIR’06, has been very influential. It has received 70 citations according to Google Scholar so far, and the models we laid down there (especially “Model 2”) have become the de facto baselines against which other approaches compare themselves.

A Language Modeling Framework for Expert Finding, from the same authors, will be published in the January 2009 issue of Information Processing & Management. Actually, it is available online since September 2008, but I have not posted about it yet – so it’s time to make up for it!
The IPM paper can be seen as an extension of the SIGIR’06 work. Additions include the proximity-based versions of candidate and document models (Models 1B and 2B), a solution for setting the smoothing parameter for each model by automatic means, advanced document-candidate associations, and an extensive empirical comparison of the different methods, followed by a detailed analysis of the results.

PhD thesis online

My PhD thesis titled People Search in the Enterprise is made available online. Contact me if you want a paperback version!

Part of the contributions of the thesis is a collection of resources, including software code, as well as data. These will come in several releases, starting very soon…