Entity Linking and Retrieval tutorial at WWW’13

Earlier this week, Edgar Meij, Daan Odijk, and I gave a half-day tutorial at the WWW’13 conference on Entity Linking and Retrieval.

The tutorial consists of three parts: (i) entity linking (Edgar), (ii) entity retrieval (me), and a hands-on lab session (Daan). The hands-on session is further subdivided into entity linking and entity retrieval parts. The slides are made available on github. We also created a Mendeley group with all the papers that were discussed. The tags, entity linking and entity retrieval, hint the part of the tutorial to which each paper belongs. We intend to maintain and expand this repository, so it might be useful for you to follow this group.

Given that this was a half-day tutorial, we had to be quite selective in what we presented. A full-day version of the same tutorial will be given by us at SIGIR’13 in July. If you have suggestions for improvements and pointers to papers, approaches, services, etc. that we could/should cover (yes, this includes your own work) then don’t hesitate to get in touch with us!

First picks from 2013

It’s almost mid Feb, so I won’t even attempt to make it a Happy New Year entry. And I’ll keep it short.

As of Jan 1 this year, I’m working as an Associate Professor at the University of Stavanger. Don’t look for the IR group’s homepage, there is no such thing. Yet ;)

Briefly about (some of) my recent work. Not surprisingly, it’s all related to entities. In a SPIRE’12 paper we study ad-hoc entity retrieval in Linked Data in a distributed setting, with focus on the problems of collection ranking and collection selection. In a short position paper, written for the ESAIR’12 workshop, we discuss how to make entity retrieval temporally-aware, using semantic knowledge bases that are enriched with temporal information (like YAGO2). In a CIKM’12 poster we introduce the task of target type identification for entity-oriented queries, where types are organized hierarchically. We also made all related resources publicly available.
Most recently, just earlier this week, I gave a lecture on Semistructured Data Search at the PROMISE Winter School. At some point in the not-too-distant future there might be a written version of this material. So if you have any feedback, comments, suggestions, etc. please don’t hesitate to contact me.

Finally, I decided to set up and maintain a separate page with a list of entity-oriented benchmarking campaigns, workshops, and journal special issues. I hope people will find it useful. If you have a relevant piece to be added here, let me know.

JIWES summary

The First Joint International Workshop on Entity-oriented and Semantic Search (JIWES) was held on Aug 16, 2012 in Portland, Oregon, USA, in conjunction with the 35th Annual International ACM SIGIR Conference (SIGIR 2012). The objective for the workshop was to bring together academic researchers and industry practitioners working on entity-oriented search to discuss tasks and challenges, and to uncover the next frontiers for academic research on the topic. The workshop program accommodated two invited talks, eight refereed papers divided into two technical paper sessions, and a group discussion.

In the forthcoming issue of SIGIR Forum we give a detailed summary of the workshop; the preprint of this article is available here. The workshop papers are available online in the ACM Digital library and at the workshop website. The latter also contains copies of the slides for most presentations.

JIWES@SIGIR’12 CfP

Call for Papers
1st Joint Intl. Workshop on Entity-oriented and Semantic Search (JIWES)
http://km.aifb.kit.edu/ws/jiwes2012/

WORKSHOP THEME
The workshop encompasses various tasks and approaches that go beyond the traditional bag-of-words paradigm and incorporate an explicit representation of the semantics behind information needs and relevant content. This kind of semantic search, based on concepts, entities and relations between them, has attracted attention both from industry and from the research community. The workshop aims to bring people from different communities (IR, SW, DB, NLP, HCI, etc.) and backgrounds (both academics and industry practitioners) together, to identify and discuss emerging trends, tasks and challenges. This joint workshop is a sequel of the Entity-oriented and Semantic Search Workshop series held at different conferences in previous years.

TOPICS
The workshop aims to gather all works that discuss entities along three dimensions: tasks, data and interaction. Tasks include entity search (search for entities or documents representing entities), relation search (search entities related to an entity), as well as more complex tasks (involving multiple entities—spatiotemporal relations inclusive—, involving multiple queries). In the data dimension, we consider (web/enterprise) documents (possibly annotated with entities/relations), LOD, as well as user generated content. The interaction dimension gives room for research into user interaction with entities, also considering how to display results, as well as whether to aggregate over multiple entities to construct entity profiles.

The workshop especially encourages submissions on the interface of IR and other disciplines, such as the Semantic Web, Databases, Computational Linguistics, Data Mining, Machine Learning, or Human Computer Interaction. Examples of topic of interest include (but are not limited to):

  • Data acquisition and processing (crawling, storage, and indexing)
  • Dealing with noisy, vague and incomplete data
  • Integration of data from multiple sources
  • Identification, resolution, and representation of entities (in documents and in queries)
  • Retrieval and ranking
  • Semantic query modeling (detecting, modeling, and understanding search intents)
  • Novel entity-oriented information access tasks
  • Interaction paradigms (natural language, keyword-based, and hybrid interfaces) and result representation
  • Test collections and evaluation methodology
  • Case studies and applications

We particularly encourage formal evaluation of approaches using previously established evaluation benchmarks.

SUBMISSION INFORMATION
We invite submissions of regular research papers (max. 6 pages), position papers (max. 3 pages), and demo descriptions (max. 3 pages). All submissions will be reviewed by at least two program committee members, and will be assessed based on their novelty, technical quality, potential impact, and clarity of writing. Selection uses a standard double blind procedure. All accepted papers will be published as part of the SIGIR workshop proceedings and will be indexed in the ACM Digital Library.

Please, submit in PDF format to:
http://www.easychair.org/conferences/?conf=jiwes2012
Using the ACM SIG Proceedings style (for LaTeX, use the “Option 2” style):
http://www.acm.org/sigs/publications/proceedings-templates

BEST CONTRIBUTION AWARD
The best contribution (paper/presentation) will receive an award sponsored by Yandex.

WORKSHOP FORMAT
The workshop will comprise of invited talks, oral presentations, and open-forum discussions.

IMPORTANT DATES

  • Submissions due: July 2, 2012 extended to July 9, 2012
  • Notification of acceptance: July 23, 2012
  • Camera-ready submission: Aug 1, 2012
  • Workshop date: Aug 16, 2012

ORGANIZING COMMITTEE

  • Krisztian Balog (NTNU, Norway)
  • David Carmel (IBM Research Haifa)
  • Arjen P. de Vries (CWI/TU Delft, The Netherlands)
  • Daniel M. Herzig (Karlsruhe Institute of Technology, Germany)
  • Peter Mika (Yahoo! Research, Barcelona)
  • Haggai Roitman (IBM Research Haifa)
  • Ralf Schenkel (Saarland University/MPII)
  • Pavel Serdyukov (Yandex, Russia)
  • Thanh Tran Duc (Karlsruhe Institute of Technology, Germany)

PROGRAM COMMITTEE
To be announced.

CONTACT
jiwes.workshop@gmail.com

Entity-oriented evaluation efforts in 2012

I’ve got a couple of mails asking about TREC Entity 2012. For those that don’t know it yet: the track won’t run in 2012.

In a nutshell, the level of participation in 2011 was much lower than we would have wished, especially for the REF task; as a consequence, the resulting pools are probably not of great quality. The ELC task was more successful in terms of the number of submissions, but I don’t know about the quality; the relevance assessments are yet to be done there (this has unfortunately been long delayed, mostly because of my lack of time for finishing up the assessment interface). Apart from the ELC results, last year’s efforts has been documented in the 2011 track overview paper.

Why not continue in 2012? We did not see a point in repeating the related entity finding task; over the three years of the track we managed to build a healthy-sized topic set for those that want to work on this. And, we simply didn’t have a great idea for a “next big thing.” The track is not necessarily over, I’d prefer to say it’s on hold.

There is, however, a number of entity-related evaluation campaigns running in 2012. I compiled a list of these (and will try to keep it updated).

  • TREC Knowledge Base Acceleration (KBA) This is a new TREC track. The first edition will feature a special filtering task: given an incoming text stream (news and social media content) and a target entity from a knowledge base (for now: people, specified by their Freebase and Wikipedia entries), generate a score for each item (“document”) based on how “pertinent” it is to the target KB node. The first month of the incoming stream will come with human-generated labels and can be used as training data; the latter months are for evaluation.
  • INEX Data Centric Track (Not sure it’ll run in 2012, as the call is not out yet.) Last year’s track used the IMDB data collection and defined two task. The ad hoc search task has informational requests to be answered by a ranked list of IMDB entities (specifically, persons or movies). The faceted search task asks for a restricted list of facets and facet-values to help the user refine the query through a multi-step search session.
  • TAC Knowledge Base Population (KBP) The track investigates tasks related to extracting information about entities with reference to an external knowledge source (Wikipedia infoboxes). KBP 2011 had three tasks: entity-linking: given an entity name (person, organization, or geopolitical entity) and a document containing that name, determine the KB node for that entity or add a new node for the entity if it is not already in the KB; slot-filling: given a named entity and a pre-defined set of attributes (“slots”) for the entity type, augment a KB node for that entity by extracting all new learnable slot values from a large corpus of documents; temporal slot-filling: similar to the regular slot-filling task, but also requests time intervals to be specified for each extracted slot value.
  • CLEF RepLab This new CLEF Lab is set out to study the problem of online reputation management (ORM); in a sense this effort continues and takes the WePS3 ORM task to the next level by defining a longer-term research agenda and by setting up various tasks within the problem domain. The website is not up yet, but according to the CLEF Labs flyer two tasks will be evaluated on Twitter data: a monitoring task, where the goal is to thematically cluster tweets including a company’s name (this seems the exact same as the WePS3 ORM task); a profiling task, where the goal is to annotate tweets according to their polarity (i.e., whether they have positive or negative implications for the company’s reputation).

Feel free to send me a message about anything that might be added here.