Thesis resources #1: CSIRO candidates and associations

As promised before, it’s now time to start sharing some resources that I obtained during my thesis work. This first release contains two CSIRO related items: the list of CSIRO candidates (e-mail addresses) and a list of document-candidate associations.
I was actually keen to make these available before the submission deadline for the Expert Search runs at the TREC 2008 Enterprise track. These lists, of course, are far from perfect, but worked for me quite well. If you have comments, suggestions, improved versions, etc. feel free to contact me!
The files are available at the same place as the CERC collection (so you’ll need the same username and password): http://es.csiro.au/cerc/data/balog. Thanks to Paul Thomas for arranging the hosting!

TREC Enterprise preparations

In the light of TREC conference preparations, I spent quite some time on performing additional experiments using the CSIRO collection this week. Although difficulties did occur, the road at the end has led to compelling results, with lessons learned. I will present our group’s results next Friday at TREC, and of course will put slides online after the talk.

While the problem of expertise management is not new in the Knowledge Management field, it was the introduction of the Expert search task at TREC 2005 that certainly attracted the attention of the IR community. This interest increased in 2006, and I anticipate it has just grown further this year. When I started my PhD career in Sept 2005, I immediately started working on expert finding, and looking back over the past two years, I feel this was a bet on the right horse.

Turning back to the TREC Enterprise track, on the other hand, the Discussion search task — featured in 2005 and 2006 — received much less interest. In fact, it was replaced with a Document search task this year. While it may be considered less innovative, I do find this interesting for the following reasons:

  • the (new) CSIRO collection is quite different from both the W3C and the WWW settings;
  • topics are broader than in case of W3C and topic definitions include example documents; and last but not least,
  • the same set of topics is used in document and expert search.

I am looking forward hearing about how others tackled this year’s Enterprise search tasks, what the organizers’ plans are for next year, and of course, curious to see how our approaches performed compared to that of other participants. While W3C became the de facto standard evaluation set for expert search, it only represents one type of intranet. Therefore, I hope that forthcoming publications on expert search will include results using CSIRO and the TREC 2007 platform too.