SARA organizes a kick-off meeting for its Proof-of-Concept Hadoop service on Dec 7, 2010 at the Science Park, Amsterdam. A major part of the event will be a “hackathon”, a hands-on introduction to Hadoop, with the support of two Hadoop-experts: Edgar Meij and Djoerd Hiemstra. It’s a good opportunity to learn about Hadoop and play with it on existing datasets (for example the Wikipedia, ENRON, or White House access records), or on a case of choice.
CLEF 2010 Keynote 2
The second CLEF 2010 keynote, entitled Retrieval Evaluation in Practice, was given by Ricardo Baeza-Yates. As yesterday, here are my raw notes from the lecture.
CLEF 2010 Keynote 1
I am at the CLEF conference this week. Here are my raw, unedited notes from the first keynote, IR Between Science and Engineering, and the Role of Experimentation, by Norbert Fuhr.
TREC Entity related developments
There has been a lot of silence on this blog since May. This is not because I have too little to say, but I have too much to do :)
A lot of effort has gone into organizing the TREC Entity track; those who are interested could follow developments on the track’s mailing list and blog. Topics are available for both the main (Related Entity Finding) and for the pilot (Entity List Completion) tasks. Developing topics for the latter involved some engineering work that I think might be worth sharing; I’m planning to do so, but don’t take it as a promise.
Another Entity track related development is that Marc Bron, Maarten de Rijke and myself have a paper accepted at CIKM 2010. In this paper, we propose a generative modeling framework for addressing the related entity finding (REF) task and perform a detailed analysis of four core components; co-occurrence models, type filtering, context modeling and homepage finding. Check out the abstract or the full paper. We made a number of resources used in the paper available to help others to repeat and improve upon our experiments.
TREC Entity 2010 draft guidelines
The draft guidelines for the 2010 edition of the track have been posted on the track’s website.
In 2010, Related Entity Finding (REF) runs as the main task of the track. A number of changes has been made to the previous edition. We also attempted to clarify issues, such as what is and what is not an entity homepage.
In addition, the track introduces a second challenge, entity list completion (ELC), which will run as a pilot task.
Your feedback is not only welcomed, but encouraged! Post them as comments on the guidelines page or send them to the mailing list.