Research

[This page is outdated, an update is in process.]

People Search in the Enterprise

The large increase in recent years in the amount of information available online has led to a renewed interest in a broad range of IR-related areas that go beyond standard document retrieval. Some of this new attention has fallen on entity retrieval. This emerging area of entity retrieval differs from traditional document retrieval in a number of ways. Entities are not represented directly (as retrievable units such as documents), and we need to identify them “indirectly” through occurrences in documents. This brings new, exciting challenges to the information retrieval and extraction fields.
In my research, I focus on one particular type of entity: people. I propose two information access tasks, both within an enterprise (or organizational) setting: (i) people finding, which is concerned with the retrieval of individuals that meet some criteria, and (ii) people profiling, which is about characterizing a specific person. Both tasks are explored along two main axes: topical and social.
The approach I take invokes a probabilistic retrieval framework based on language modeling techniques. Evidence is collected from multiple sources, and is integrated with a restricted information extraction task — the language modeling setting allows us to do this in a transparent manner, and provides a particularly convenient and natural way of modeling the tasks at hand.

Selected publications:

Data:

Moodviews

MoodViews: blog mood analysisMoodViews is a collection of tools for tracking the stream of mood-annotated text made available by LiveJournal. Our research aim is to develop novel methods for searching, discovering and retrieving blogs. We believe that non-factual aspects of blog entries such as moods are an important part of what makes people read and navigate around blogs. In addition to Moodsignals we are currently working on a number of new tools to track, explore, and analyze moods.

Selected publications:

WebCLEF – The CLEF Crosslingual Web Track

For Multi/Crosslingual retrieval the web is the natural and common setting. In the European context, many issues for which people turn to the web are essentially multilingual. These include culture, economy, education, leisure, travel. For IR folks, working with web data is simply very attractive. WebCLEF is about evaluating cross-language retrieval systems in a web setting.

Selected publications: