A Late “Happy New Year!”

Never too late for a happy new year…
I was pretending to be on vacation (while, in fact, working on some interesting proposal), but now I’m officially back in business.

I wanted my first 2009 post to be on “looking back on 2008”, but I had to face reality and realize that writing that summary might be too hard and definitely too time-consuming.

Nevertheless, I still wanted to summarize my scientific outcome somehow, and then I came across a great website, called QuadSearch. It ranks your publications based on citation counts, calculates statistics and research impact indexes, such as the H-index and G-index. The coverage is not perfect, but is pretty decent, as far as I can tell.

And the numbers are…

H-INDEX (Hirsch Number): 8
Egghe’s G-INDEX: 13
Maximum Cites: 74
Total Cites: 214, Total Articles: 34
Cites/Paper: 6.2941

 

The top 5 papers from this chart are:

  1. Formal models for expert finding in enterprise corpora; SIGIR 2006 (Cited by 74)
  2. Finding experts and their details in e-mail corpora; WWW 2006 (Cited by 27)
  3. Language Modeling Approaches for Enterprise Tasks; TREC 2005 (Cited by 16)
  4. Why are they excited? identifying and explaining spikes in blog mood level; EACL 2006 (Cited by 13)
  5. Broad expertise retrieval in sparse data environments; SIGIR 2007 (Cited by 13)

Let’s see how much these numbers improve in 2009 :)

Future challenges in expertise retrieval

This was the title of the workshop I organized at SIGIR 2008 in July. The main objective of the workshop was to bring people from different research communities together, to discuss recent advances in expertise retrieval, and to define a research roadmap for the next years.
I think (and I hope I’m not alone with this) that the workshop was a success, with many interesting papers and lively discussions. If you’re interested in expert finding but missed it, now is your chance to find out what themes were discussed; check out the workshop summary that was recently published in the December 2008 issue of SIGIR Forum.

A Language Modeling Framework for Expert Finding

Our first paper on formal models for expertise retrieval, Formal Models for Expert Finding in Enterprise Corpora by Krisztian Balog, Leif Azzopardi, and Maarten de Rijke from SIGIR’06, has been very influential. It has received 70 citations according to Google Scholar so far, and the models we laid down there (especially “Model 2”) have become the de facto baselines against which other approaches compare themselves.

A Language Modeling Framework for Expert Finding, from the same authors, will be published in the January 2009 issue of Information Processing & Management. Actually, it is available online since September 2008, but I have not posted about it yet – so it’s time to make up for it!
The IPM paper can be seen as an extension of the SIGIR’06 work. Additions include the proximity-based versions of candidate and document models (Models 1B and 2B), a solution for setting the smoothing parameter for each model by automatic means, advanced document-candidate associations, and an extensive empirical comparison of the different methods, followed by a detailed analysis of the results.

TREC 2008, 2009

The ILPS group of the University of Amsterdam participated in three tracks at TREC 2008: blog, enterprise, and relevance feedback. The working notes paper describing our approaches is available online.
Results for the Enterprise track were not available at the time of writing, therefore the paper only reports the runs we submitted.
I will present some interesting findings we came across, concerning the expert finding task, at the conference (November 19-21, Gaithersburg, USA). I hope that the title of my talk sounds promising: Now that you’ve bought into Model 2, we’ll tell you why to get Model 1.

After four successful years, the Enterprise track is coming to an end. Personally, I am extremely grateful for the TRECENT Organizers (Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, and Arjen P. de Vries, strictly in alphabetical order) for coordinating the track, and making this platform available to the research community!

A new track, Entity Ranking will run from 2009 that I’m co-organizing with Arjen P. de Vries, Paul Thomas, and Thijs Westerveld. I’m not supposed to share details about it at this point, but it’ll has something to do with searching entities (such as people) in web data. We hope to attract participants that would have performed expert finding at the Enterprise track… specifics should follow after TREC.

Thesis in the news

NWO has also released an English translation of the article about my thesis work that had escaped my attention when I posted the link to the Dutch version. Obviously, a number of news sites did not miss it (I believe this has to be credited to NWO), and picked up this press release. It’s basically the same story, published under various titles —and since one of these is in French, I can say— and in multiple languages. Here is a list with the appearances I found so far. (If you find one that is not on the list, please post it in a comment!)