Krisztian Balog

TREC Entity 2010 overview

March 29, 2011 by krisztianbalog

The TREC Entity 2010 overview paper is now available online. We will soon start the discussion about the 2011 edition on the track’s mailing list.

Yahoo! Semantic Search Challenge

March 3, 2011 by krisztianbalog

The 3rd Semantic Search Workshop (SemSearch’10) organized an Entity Search Challenge last year (see my notes from the event). This competition is being organized this year again. There are two tasks: entity search (queries refer to a particular entity) and list search (complex queries with multiple possible answers). The collection is the Billion Triple Challenge 2009 (BTC-2009) data set, which is the same as last year. Also, this is the data set we used at the TREC Entity track in 2010. So I encourage all TREC Entity participants to take part, and vice versa.
There is even cash price of $500 offered by Yahoo! for the winner of each task; it’s more of a symbolic reward than a real remuneration ;-) but anyways, it’s not the money we academics are after, is it?
The submission deadline is Mar 21. For more details see:

Switching colours

February 16, 2011 by krisztianbalog

As of this month, I am a postdoc at the Database Systems research group, headed by Prof. Kjetil Nørvåg at the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway. I would like to say a big thank you to all my former colleagues in Amsterdam for providing an extremely friendly and inspiring research environment throughout the past several years. I wish you best of luck, and hope to see you at the next conference!

My research interests remain essentially unchanged: capturing, representing, and organizing information related to entities, in semantically meaningful ways. And, big data, of course.

LHD-11 Call for papers

January 11, 2011 by krisztianbalog

Workshop on Discovering Meaning On the Go in Large Heterogeneous Data 2011 (LHD-11)

Held at The Twenty-second International Joint Conference on Artificial Intelligence (IJCAI-11) July 16, 2011, Barcelona, Spain.

This workshop is designed to bring together people from different fields working in the area of dynamic matching, interpretation, and integration of heterogeneous data, so that ideas, techniques and problems can be shared and discussed in a broad context. A key part of this aim is attracting those from industry as well as those from academia.

In order to interact successfully in an open and heterogeneous environment, being able to dynamically and adaptively integrate data from other systems “on the go” is necessary. This may not be a precise process but a matter of finding a good enough understanding to allow interaction to proceed successfully. With the advent of the Web, there are massive amounts of information available online that can assist in this task, but this information is often chaotically organised, stored in a wide variety of data-formats, and difficult to interpret.

~~Deadline for abstract subsmission: March 14, 2011~~
Update: Submission deadline extended to April 4th, 2011

More info

TREC 2010 summary

November 29, 2010 by krisztianbalog

The 19th Text REtrieval Conference (TREC) took place at the “usual” time and place: Gaithersburg, MD, in the second half of November. Seven tracks ran in 2010: Blog, Chemical IR, Entity, Legal, Relevance Feedback, Session, and Web.
The Entity track was very popular both in terms of the number of participants and the number of posters presented. The proposed approaches displayed a great degree of diversity and made the presentations very interesting. I don’t want to repeat myself, so I refer to the posts on the Entity website for the conference summary and plans for 2011.
As to TREC 2011, the Chemical IR, Entity, Session, Legal, and Web tracks will continue. The Blog track will migrate to a new Microblog track and will investigate social search, especially search over Twitter data. Two more new tracks will be added: Crowdsourcing (as a means of evaluation) and Medical records (content-based access to the free text fields of medical records, e.g., find patients with disease X treated with Y). Finally, CMU is planning another Web crawl, successor to ClueWeb09; one idea is to have a smaller set of pages, but crawled regularly over a period of time.