TREC Entity related developments

There has been a lot of silence on this blog since May. This is not because I have too little to say, but I have too much to do :)

A lot of effort has gone into organizing the TREC Entity track; those who are interested could follow developments on the track’s mailing list and blog. Topics are available for both the main (Related Entity Finding) and for the pilot (Entity List Completion) tasks. Developing topics for the latter involved some engineering work that I think might be worth sharing; I’m planning to do so, but don’t take it as a promise.

Another Entity track related development is that Marc Bron, Maarten de Rijke and myself have a paper accepted at CIKM 2010. In this paper, we propose a generative modeling framework for addressing the related entity finding (REF) task and perform a detailed analysis of four core components; co-occurrence models, type filtering, context modeling and homepage finding. Check out the abstract or the full paper. We made a number of resources used in the paper available to help others to repeat and improve upon our experiments.

TREC Entity 2010 draft guidelines

The draft guidelines for the 2010 edition of the track have been posted on the track’s website.

In 2010, Related Entity Finding (REF) runs as the main task of the track. A number of changes has been made to the previous edition. We also attempted to clarify issues, such as what is and what is not an entity homepage.
In addition, the track introduces a second challenge, entity list completion (ELC), which will run as a pilot task.

Your feedback is not only welcomed, but encouraged! Post them as comments on the guidelines page or send them to the mailing list.

Two evaluation campaigns related to entity/expert search

The CLEF 2010 labs will feature two evaluation campaigns that are potentially of interest to people working in the area of entity/people/expert search.

The third WePS Evaluation Workshop (WePS3) focuses on two tasks related to web entity search:

  • Task 1: Clustering andĀ Attribute Extraction for Web People Search.
    Given a set of web search results for a person name, the task is to cluster the pages according to the different people sharing the name and extract certain biographical attributes for each person. [details]
  • Task 2: Name ambiguity resolution forĀ Online Reputation Management.
    Given a set of Twitter entries containing an (ambiguous) company name, and given the home page of the company, the task is to discriminate entries that do not refer to the company. Entries will be given in two languages: English and Spanish. [details]

The Cross-lingual Expert Search (CriES) workshop addresses the problem of multi-lingual expert search in social media environments. The workshop also includes a pilot challenge, which is very much like the expert finding task at the TREC Enterprise track: given a document collection and a query topic, return a ranked list of experts, who are likely to be experts on the topic. However, the document collection is a multilingual social environment (Yahoo! Answers) and topics come in 4 different languages (English, German, French, Spanish).

Update on the TREC Entity track

The main development that I am pleased to report is the release of the final test topics. The test set comprises 20 topics, which is less than we originally aimed for, but this is what could be achieved within the time limits. We certainly wanted to avoid extending the deadlines even further.

Since the number of queries is probably too low to support generalizable conclusions, evaluation will primarily focus on per-topic analysis of the results, rather than on average measures.
It is also worth noting that many of the “primary” entity homepages may not be included in the Category B subset of the collection. In such cases the “descriptive” pages (including the entity’s Wikipedia page) are the best available.

The test topics can be downloaded from the TREC site (you need to be a registered participant for TREC 2009 to be able to access them).

The track’s guidelines have been updated and can be considered final, although minor changes or additions are possible, should anything need clarification.

The submission deadline is Sept 21, so there is still plenty of time. In fact, this might attract some more teams to participate, given that submissions for all other TREC tracks are due by the end of August, and many of these tracks use the same collection.