Krisztian Balog

Thesis resources #1: CSIRO candidates and associations

July 29, 2008 by krisztianbalog

As promised before, it’s now time to start sharing some resources that I obtained during my thesis work. This first release contains two CSIRO related items: the list of CSIRO candidates (e-mail addresses) and a list of document-candidate associations.
I was actually keen to make these available before the submission deadline for the Expert Search runs at the TREC 2008 Enterprise track. These lists, of course, are far from perfect, but worked for me quite well. If you have comments, suggestions, improved versions, etc. feel free to contact me!
The files are available at the same place as the CERC collection (so you’ll need the same username and password): http://es.csiro.au/cerc/data/balog. Thanks to Paul Thomas for arranging the hosting!

PhD thesis online

July 15, 2008 by krisztianbalog

My PhD thesis titled People Search in the Enterprise is made available online. Contact me if you want a paperback version!

Part of the contributions of the thesis is a collection of resources, including software code, as well as data. These will come in several releases, starting very soon…

fCHER program

July 14, 2008 by krisztianbalog

The program of the Future Challenges in Expertise Retrieval (fCHER) SIGIR 2008 workshop is available here.

fCHER papers

June 13, 2008 by krisztianbalog

The list of papers accepted for the Future Challenges in Expertise Retrieval (fCHER) workshop at SIGIR 2008 can be found here.

ECAI 2008 paper online

May 22, 2008 by krisztianbalog

Finding Key Bloggers, One Post At A Time by Wouter Weerkamp, Krisztian Balog and Maarten de Rijke is available online now. Our idea of applying expertise retrieval models to the task of blog distillation was first described in a SIGIR 2008 poster titled Bloggers as Experts. The conclusions of that work was that the expert finding Model 1 can compete with state-of-the-art on the blog distillation task. In the ECAI paper we explore additional blog-specific features (including representation, number of comments, post length, and temporal ordering) and, in addition, a combination of these. We find that these result in significant improvements over the baseline.