TREC 2010 summary
The 19th Text REtrieval Conference (TREC) took place at the “usual” time and place: Gaithersburg, MD, in the second half of November. Seven tracks ran in 2010: Blog, Chemical IR, Entity, Legal, Relevance Feedback, Session, and Web.
The Entity track was very popular both in terms of the number of participants and the number of posters presented. The proposed approaches displayed a great degree of diversity and made the presentations very interesting. I don’t want to repeat myself, so I refer to the posts on the Entity website for the conference summary and plans for 2011.
As to TREC 2011, the Chemical IR, Entity, Session, Legal, and Web tracks will continue. The Blog track will migrate to a new Microblog track and will investigate social search, especially search over Twitter data. Two more new tracks will be added: Crowdsourcing (as a means of evaluation) and Medical records (content-based access to the free text fields of medical records, e.g., find patients with disease X treated with Y). Finally, CMU is planning another Web crawl, successor to ClueWeb09; one idea is to have a smaller set of pages, but crawled regularly over a period of time.
TREC Entity: overview of 2009 and plans for 2010
The Entity track overview paper has been added to the TREC 2009 online Proceedings [direct link to the pdf].
The track continues in 2010. An overview of what happened at the 2009 TREC conference (entity wise), along with plans for the 2010 edition has been published on the track’s website. There is some discussion on the mailing list too.
TREC Enterprise 2008 overview
The overview paper of the TREC 2008 Enterprise track is -finally- available. While I was not an organizer of the track, I helped out with finishing the paper; the track organizers generously awarded my contribution with a first authorship. The document still needs to undergo the NIST approval process, but I am allowed to distribute it as “draft”.
[Dowload PDF|BibTex].
Despite having my name on the overview paper, I am still wearing a participant’s hat. So the first questions that comes to mind is: How did we do? (We is team ISLA, consisting of Maarten de Rijke and me.) To cut the story short — we won! Of course, TREC (according to some people) is not a competition. I am not going to take a side on that matter (at least not in this post), so let me translate the simple “we won” statement from ordinary to scientific language: our run showed the best performance among all submissions for the expert finding task of the TREC 2008 Enterprise track. Actually, we achieved both first and second place for all metrics and for all three different versions of the official qrels (they differ in how assessor agreement was handled). Our best run employed a combination of three models: a proximity-based candidate model, a document-based model, and a Web-based variation of the candidate model; our second best run is the same, but without the Web-based component. See the details in our paper [Download PDF|BibTex].
Needless to say, I am very content with these results. Seeing that my investments into research on expert finding has resulted in the state-of-the-art feels just great.
TREC Entity – draft guidelines available
The first year of the track will investigate the problem of related entity finding. Given the name and homepage of an entity, as well as the type of the target entity, find related entities that are of target type. Entity types are limited to people, organizations, and products. Participating systems need to return homepages of related entities, and, optionally, a string answer that represents the entity concisely (i.e., the name of the entity).
The draft guidelines are available at this page.