Last bundle of updates for 2009

I haven’t had time to post entries on my blog over the past few weeks (or even months — has it really been that long ago?). Anyway, here is a couple of things worth mentioning before 2009 is officially over.

A newer version of the EARS toolkit has been released. Major changes concern document-entity associations and faster computation of candidate models, as well as support for MS Visual Studio. See the changelog for details.

Our paper entitled Category-based Query Modeling for Entity Search, with Krisztian Balog, Marc Bron, and Maarten de Rijke as authors, has been accepted to ECIR 2010 and is available online now.

Abstract. Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

See also: ECIR 2010 accepted papers, posters, and demos.

The TREC Enterprise 2008 overview paper has finally been posted to the proceedings.

Happy 2010!

EARS released

After a period of development I am ready to release EARS to the world. EARS is an open source toolkit for entity-oriented search and discovery in large text collections. The association finding framework and models implemented in EARS were originally developed for expertise retrieval in an organizational setting, during my PhD studies. These models are robust and generic, and can be applied to finding associations between topics and entities other than people.

At present, EARS supports two main tasks: finding entities (“Which entities are associated with topic X?”) and profiling entities (“What topics is an entity associated with?”), and implements two baseline search strategies for accomplishing these tasks; these became popularly known as “Model 1” and “Model 2”.

A software system will never be finished; EARS is no exception to that rule. It, however, is an active research project with ongoing development and enhancements. A number of new models and features will be included in upcoming releases. Feedback, comments, and suggestions are always welcome.

The toolkit is available at http://code.google.com/p/ears/.