Krisztian Balog

Awarded with Victorine van Schaickprijs 2009

October 11, 2009 by krisztianbalog

On the 9th of October 2009, I received the Victorine van Schaickprijs 2009 award for my PhD dissertation entitled “People Search in the Enterprise”. This award is given out yearly by the Victorine van Schaick Funds to one selected publication (journal article, book, or report) in the area of library and information sciences; it comes with a cash prize of €1500 and a bronze medal.

The Board of the Foundation has this year chosen my thesis as the winner because “its impact on the discipline and because it is of interest to a wide circle of colleagues”. Also, “The jury appreciates especially his willingness to undertake research in less explored areas of the field.” (from the Jury report).

I would like to use this opportunity to express my gratitude to my thesis supervisor, Prof. Maarten de Rijke. I would like to thank the selection committee again for this award: I am extremely pleased with this recognition of my work.

Official report of the Award Ceremony (in Dutch).

EARS released

September 14, 2009 by krisztianbalog

After a period of development I am ready to release EARS to the world. EARS is an open source toolkit for entity-oriented search and discovery in large text collections. The association finding framework and models implemented in EARS were originally developed for expertise retrieval in an organizational setting, during my PhD studies. These models are robust and generic, and can be applied to finding associations between topics and entities other than people.

At present, EARS supports two main tasks: finding entities (“Which entities are associated with topic X?”) and profiling entities (“What topics is an entity associated with?”), and implements two baseline search strategies for accomplishing these tasks; these became popularly known as “Model 1” and “Model 2”.

A software system will never be finished; EARS is no exception to that rule. It, however, is an active research project with ongoing development and enhancements. A number of new models and features will be included in upcoming releases. Feedback, comments, and suggestions are always welcome.

The toolkit is available at http://code.google.com/p/ears/.

Update on the TREC Entity track

August 1, 2009 by krisztianbalog

The main development that I am pleased to report is the release of the final test topics. The test set comprises 20 topics, which is less than we originally aimed for, but this is what could be achieved within the time limits. We certainly wanted to avoid extending the deadlines even further.

Since the number of queries is probably too low to support generalizable conclusions, evaluation will primarily focus on per-topic analysis of the results, rather than on average measures.
It is also worth noting that many of the “primary” entity homepages may not be included in the Category B subset of the collection. In such cases the “descriptive” pages (including the entity’s Wikipedia page) are the best available.

The test topics can be downloaded from the TREC site (you need to be a registered participant for TREC 2009 to be able to access them).

The track’s guidelines have been updated and can be considered final, although minor changes or additions are possible, should anything need clarification.

The submission deadline is Sept 21, so there is still plenty of time. In fact, this might attract some more teams to participate, given that submissions for all other TREC tracks are due by the end of August, and many of these tracks use the same collection.

The good and the bad news

June 24, 2009 by krisztianbalog

A quick update on the TREC Entity track, which reminds me of the classical good news-bad news situation. The good news is that we have just reached 100 members on the TREC entity mailing list. The bad news is that almost all of them are mute.
On a more serious account, the track guidelines need to get finalized very soon. One way of interpreting the silence is that people are happy with the proposed task and all details are clear. There may be other (less positive) interpretations. Whichever the case might be, in the absence of discussion, organizers will simply dictate what is to be done.

Seminar on Searching and Ranking in Enterprises

June 24, 2009 by krisztianbalog

Today, on the occasion of the PhD defense of Pavel Serdyukov, a seminar on enterprise seach was held at the University of Twente. Three of Pavel’s committee members gave talks: David Hawking, Iadh Ounis, and Maarten de Rijke.
The summaries of the talks will soon be uploaded.
Of course, the main attraction of the day was Pavel’s defense. His PhD thesis is entitled The search for expertise: Beyond direct evidence. He was confronted with interesting, and, sometimes quite challenging questions, but handled them to the satisfaction of the committee. Congratulations Pavel, I mean, Dr. Serdyukov!