EARS released

After a period of development I am ready to release EARS to the world. EARS is an open source toolkit for entity-oriented search and discovery in large text collections. The association finding framework and models implemented in EARS were originally developed for expertise retrieval in an organizational setting, during my PhD studies. These models are robust and generic, and can be applied to finding associations between topics and entities other than people.

At present, EARS supports two main tasks: finding entities (“Which entities are associated with topic X?”) and profiling entities (“What topics is an entity associated with?”), and implements two baseline search strategies for accomplishing these tasks; these became popularly known as “Model 1” and “Model 2”.

A software system will never be finished; EARS is no exception to that rule. It, however, is an active research project with ongoing development and enhancements. A number of new models and features will be included in upcoming releases. Feedback, comments, and suggestions are always welcome.

The toolkit is available at http://code.google.com/p/ears/.

Update on the TREC Entity track

The main development that I am pleased to report is the release of the final test topics. The test set comprises 20 topics, which is less than we originally aimed for, but this is what could be achieved within the time limits. We certainly wanted to avoid extending the deadlines even further.

Since the number of queries is probably too low to support generalizable conclusions, evaluation will primarily focus on per-topic analysis of the results, rather than on average measures.
It is also worth noting that many of the “primary” entity homepages may not be included in the Category B subset of the collection. In such cases the “descriptive” pages (including the entity’s Wikipedia page) are the best available.

The test topics can be downloaded from the TREC site (you need to be a registered participant for TREC 2009 to be able to access them).

The track’s guidelines have been updated and can be considered final, although minor changes or additions are possible, should anything need clarification.

The submission deadline is Sept 21, so there is still plenty of time. In fact, this might attract some more teams to participate, given that submissions for all other TREC tracks are due by the end of August, and many of these tracks use the same collection.

The good and the bad news

A quick update on the TREC Entity track, which reminds me of the classical good news-bad news situation. The good news is that we have just reached 100 members on the TREC entity mailing list. The bad news is that almost all of them are mute.
On a more serious account, the track guidelines need to get finalized very soon. One way of interpreting the silence is that people are happy with the proposed task and all details are clear. There may be other (less positive) interpretations. Whichever the case might be, in the absence of discussion, organizers will simply dictate what is to be done.

Seminar on Searching and Ranking in Enterprises

Today, on the occasion of the PhD defense of Pavel Serdyukov, a seminar on enterprise seach was held at the University of Twente. Three of Pavel’s committee members gave talks: David Hawking, Iadh Ounis, and Maarten de Rijke.
The summaries of the talks will soon be uploaded.
Of course, the main attraction of the day was Pavel’s defense. His PhD thesis is entitled The search for expertise: Beyond direct evidence. He was confronted with interesting, and, sometimes quite challenging questions, but handled them to the satisfaction of the committee. Congratulations Pavel, I mean, Dr. Serdyukov!

Back on TREC

Yes, things have been quiet lately on the TREC Entity homepage. Now that training topics have been made available, I sincerely hope that this is about to change. We are in the process of developing test topics and finalizing the guidelines, so make sure your voice is heard if you want something different…