TREC Entity related developments

There has been a lot of silence on this blog since May. This is not because I have too little to say, but I have too much to do :)

A lot of effort has gone into organizing the TREC Entity track; those who are interested could follow developments on the track’s mailing list and blog. Topics are available for both the main (Related Entity Finding) and for the pilot (Entity List Completion) tasks. Developing topics for the latter involved some engineering work that I think might be worth sharing; I’m planning to do so, but don’t take it as a promise.

Another Entity track related development is that Marc Bron, Maarten de Rijke and myself have a paper accepted at CIKM 2010. In this paper, we propose a generative modeling framework for addressing the related entity finding (REF) task and perform a detailed analysis of four core components; co-occurrence models, type filtering, context modeling and homepage finding. Check out the abstract or the full paper. We made a number of resources used in the paper available to help others to repeat and improve upon our experiments.

TREC Entity 2010 draft guidelines

The draft guidelines for the 2010 edition of the track have been posted on the track’s website.

In 2010, Related Entity Finding (REF) runs as the main task of the track. A number of changes has been made to the previous edition. We also attempted to clarify issues, such as what is and what is not an entity homepage.
In addition, the track introduces a second challenge, entity list completion (ELC), which will run as a pilot task.

Your feedback is not only welcomed, but encouraged! Post them as comments on the guidelines page or send them to the mailing list.

SemSearch2010 workshop at WWW

The 3rd Semantic Search workshop (SemSearch2010) was held on Monday in conjunction with the WWW2010 conference at Raleigh, NC, USA.
This post is about the highlights of the workshop, with some personal comments at the end. For more information, check the post by Christian Grant, Jeff Dalton, and the #semsearch2010 hashtag on twitter.

Read more…

TREC Entity: overview of 2009 and plans for 2010

The Entity track overview paper has been added to the TREC 2009 online Proceedings [direct link to the pdf].
The track continues in 2010. An overview of what happened at the 2009 TREC conference (entity wise), along with plans for the 2010 edition has been published on the track’s website. There is some discussion on the mailing list too.

Best @INEX2009 Entity ranking

The UvA ISLA team (consisting of me, Marc Bron, Maarten de Rijke, and Wouter Weerkamp) achieved top performance at the Entity Ranking (XER) track at INEX 2009, on both tasks (entity ranking and list completion). Our submission employed a slightly tweaked variation of the best performing models we describe in our paper entitled Category-based Query Modeling for Entity Search; this work will be presented at ECIR at the end of this month (the paper is available online).

Although we did really great at INEX, our achievement is somewhat weakened by the fact that only 5 teams participated at the XER track (including us). The number of participating teams was 8 in 2007, 6 in 2008, and 5 in 2009. So, what’s the future of INEX-XER (or INEX for that matter)?

Update Apr 19, 2010: the paper describing our approach @INEX is available online.