The DBpedia-Entity v2 Test Collection

The DBpedia-Entity collection a standard test set for entity search. It is meant for evaluating retrieval systems that return a ranked list of entities in response to a free text user query. The first version of the collection (DBpedia-Entity v1) was released in 2013, based on DBpedia v3.7. It was created by assembling search queries from a number of entity-oriented benchmarking campaigns (TREC, INEX, SemSearch, etc.) and mapping relevant results to DBpedia. An updated version of the collection, DBpedia-Entity v2, has been released in 2017, as a result of a collaborative effort between the IAI group of the University of Stavanger, the Norwegian University of Science and Technology, Wayne State University, and Carnegie Mellon University. It has been published at the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17), where it received a Best Short Paper Honorable Mention Award.

DBpedia-Entity v2 is based on DBpedia version 2015-10 (specifically on the English subset) and comes with graded relevance assessments collected via crowdsourcing. We also report on the performance of a selection of retrieval methods using this collection.

The collection is available here.

SIGIR’17 papers

Our group has 2 full papers, 3 short papers, and 1 demo at SIGIR this year. The preprints are available. See you in Japan!

  • EntiTables: Smart Assistance for Entity-Focused Tables, S. Zhang and K. Balog. [PDF]
  • Dynamic Factual Summaries for Entity Cards, F. Hasibi, K. Balog, and S. E. Bratsberg. [PDF]
  • Target Type Identification for Entity-Bearing Queries, D. Garigliotti, F. Hasibi, and K. Balog. [PDF|Extended version]
  • Generating Query Suggestions to Support Task-Based Search, D. Garigliotti and K. Balog. [PDF]
  • DBpedia-Entity v2: A Test Collection for Entity Search, F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E. Bratsberg, A. Kotov, and J. Callan. [PDF]
  • Nordlys: A Toolkit for Entity-Oriented and Semantic Search, F. Hasibi, K. Balog, D. Garigliotti, and S. Zhang. [PDF]

PhD position in Deep Learning

I have a fully funded PhD position in deep learning.

Deep neural networks, a.k.a. deep learning, have transformed the fields of computer vision, speech recognition and machine translation, and now rivals human-level performance in a range of tasks. While the idea of neural networks dates several decades back, their recent success is attributed to three key factors: (1) vast computational power, (2) algorithmic advances, and (3) the availability of massive amounts of training data.
There is no doubt that deep learning will continue to transform other fields as well, including that of information retrieval. One major challenge is that for most information retrieval tasks, training data is not available in huge quantities. This is unlike, for example, to object recognition, where there are large scale resources at one’s disposal to train neural networks with (tens of) millions of parameters (e.g., the ImageNet database contains over 14 million images).

Deep learning is inspired by how the brain works. Yet, humans can learn and generalize from a very small number of examples. (A child, for example, does not need to see thousands of instances of cats, in many different sizes and from numerous different angles, to be able to recognize a cat and tell it apart from a dog.) Can deep neural networks be enhanced with this capability, i.e., to be able to learn and generalize from sparsely labeled data? The aim of this project is to answer this question, specifically, in the application domain of information retrieval.

Details and application instructions can be found here.
Application deadline: March 26, 2017.

Important note: there are multiple projects advertised within the call. You need to indicate that you are applying for this specific project. Feel free to contact me directly for more information.

WSDM paper

Earlier today, Jan Benetka has presented our paper “Anticipating Information Needs Based on Check-in Activity” at the WSDM’17 conference in Cambrigde, UK.

In this work we address the development of a smart personal assistant that is capable of anticipating a user’s information needs based on a novel type of context: the person’s activity inferred from her check-in records on a location-based social network. Our main contribution is a method that translates a check-in activity into an information need, which is in turn addressed with an appropriate information card. This task is challenging because of the large number of possible activities and related information needs, which need to be addressed in a mobile dashboard that is limited in size. Our approach considers each possible activity that might follow after the last (and already finished) activity, and selects the top information cards such that they maximize the likelihood of satisfying the user’s information needs for all possible future scenarios. The proposed models also incorporate knowledge about the temporal dynamics of information needs. Using a combination of historical check-in data and manual assessments collected via crowdsourcing, we show experimentally the effectiveness of our approach.

Presentation slides and resources can be found at zero-query.com.