Future research directions in IR

Wondering what your next IR conference paper should be about? This is the billion dollar question (well, at least for IR researchers) that I surely won’t answer for you. But, here is some hint.
(I’ve just come across this on Facebook (thnx to Arjen P. De Vries and Claudia Hauff); this is evidence, that if you cut through all the clutter, FB can indeed be a great tool sometimes for finding serendipitous information. Maybe this is also something to think about…)
The list contains nominated papers from prominent IR researchers “that, in their opinion, represent important new directions, research areas, or results in the IR field.”
I must say I thoroughly enjoyed reading it. And yes, it does make me feel good that I see our last year’s ECIR paper with Elena Smirnova on the list :)

Language Modeling Overview

The boom of language modeling (LM) approaches to information retrieval started in 1998, with Ponte and Croft’s SIGIR’98 paper (which, btw, is near to reaching a milestone of 1000 citations according to Google scholar). At about the same time, and apparently independent of Ponte and Croft’s work, Hiemstra and Kraaij and Miller et. al. proposed the same idea of scoring documents by query-likelihood.

The last decade has witnessed tremendous progress in the use and development of LM techniques. Language models are attractive because of their strong foundations in statistical theory and their superior empirical performance. Further, they provide a principled way of modeling various special retrieval tasks—expert finding is a prominent example of that.

The latest issue of Foundations and Trends in Information Retrieval is featuring an excellent article Statistical Language Models for Information Retrieval: A Critical Review, by ChengXiang Zhai. It is a great survey that covers a wide spectrum of the work on LMs, with many useful references for further reading. In summary, this paper is highly recommended both for experts in language modeling and for newcomers to the field.

A Language Modeling Framework for Expert Finding

Our first paper on formal models for expertise retrieval, Formal Models for Expert Finding in Enterprise Corpora by Krisztian Balog, Leif Azzopardi, and Maarten de Rijke from SIGIR’06, has been very influential. It has received 70 citations according to Google Scholar so far, and the models we laid down there (especially “Model 2”) have become the de facto baselines against which other approaches compare themselves.

A Language Modeling Framework for Expert Finding, from the same authors, will be published in the January 2009 issue of Information Processing & Management. Actually, it is available online since September 2008, but I have not posted about it yet – so it’s time to make up for it!
The IPM paper can be seen as an extension of the SIGIR’06 work. Additions include the proximity-based versions of candidate and document models (Models 1B and 2B), a solution for setting the smoothing parameter for each model by automatic means, advanced document-candidate associations, and an extensive empirical comparison of the different methods, followed by a detailed analysis of the results.

Happy new year & welcome back

I took a little break from work so I could celebrate Christmas, spend time with the family, etc. I am back online now, and ready to commit myself to full-time thesis writing for the upcoming several weeks.

As to expert search material, here is a quick update.

  • Our (me and Maarten de Rijke) recent paper titled Associating People and Documents has been accepted to ECIR 2008. Common to most expertise search approaches is a component that estimates the strength of the association between a document and a people. In this paper we perform a careful analysis and investigation of how different association methods contribute to performance. The camera-ready version of the paper will be available from the Publications page, after jan 11).
  • We (me, Maarten, and Leif Azzopardi) submitted a paper titled A Language Modeling Framework for Expertise Search to the Information Processing and Management (IPM) journal. In this paper we introduce our language modeling approaches to expertise search in detail, and integrate these into a generative probabilistic framework. Since it is not a conference paper, it may take some time until it can be published.

There is some reading material from CIKM 2007:

Looks like the topic of expertise retrieval is gaining more and more popularity in IR conferences. While browsing the list of accepted papers for ECIR 2008, I found 3 full papers (out of 33) and 1 short paper (out of 19) about expert search, which gives the topic a solid presence.

  • (Serdyukov and Hiemstra)
    Modeling documents as mixtures of persons for expert finding [full]
  • (Balog and de Rijke)
    Associating People and Documents [full]
  • (Macdonald et al.)
    High Quality Expertise Evidence for Expert Search [full]
  • (Macdonald and Ounis)
    Expert Search Evaluation by Supporting Documents [short]