Research
[This page is outdated, an update is in process.]
People Search in the Enterprise
The large increase in recent years in the amount of information available online has led to a renewed interest in a broad range of IR-related areas that go beyond standard document retrieval. Some of this new attention has fallen on entity retrieval. This emerging area of entity retrieval differs from traditional document retrieval in a number of ways. Entities are not represented directly (as retrievable units such as documents), and we need to identify them “indirectly” through occurrences in documents. This brings new, exciting challenges to the information retrieval and extraction fields.
In my research, I focus on one particular type of entity: people. I propose two information access tasks, both within an enterprise (or organizational) setting: (i) people finding, which is concerned with the retrieval of individuals that meet some criteria, and (ii) people profiling, which is about characterizing a specific person. Both tasks are explored along two main axes: topical and social.
The approach I take invokes a probabilistic retrieval framework based on language modeling techniques. Evidence is collected from multiple sources, and is integrated with a restricted information extraction task — the language modeling setting allows us to do this in a transparent manner, and provides a particularly convenient and natural way of modeling the tasks at hand.
Selected publications:
- A Language Modeling Framework for Expert Finding, IPM 2009
- Non-Local Evidence for Expert Finding, CIKM 2008
- Associating People and Documents, ECIR 2008
- Finding Similar Experts, SIGIR 2007
- Broad Expertise Retrieval in Sparse Data Environments, SIGIR 2007
- Determining Expert Profiles (With an Application to Expert Finding), IJCAI 2007
- Finding Experts and their Details in E-mail Corpora, WWW 2006
- Formal Models for Expert Finding in Enterprise Corpora, SIGIR 2006
Data:
Moodviews
MoodViews is a collection of tools for tracking the stream of mood-annotated text made available by LiveJournal. Our research aim is to develop novel methods for searching, discovering and retrieving blogs. We believe that non-factual aspects of blog entries such as moods are an important part of what makes people read and navigate around blogs. In addition to Moodsignals we are currently working on a number of new tools to track, explore, and analyze moods.
Selected publications:
- How to Overcome Tiredness: Estimating Topic-Mood Associations, ICWSM 2007
- Decomposing Bloggers’ Moods, WWE 2006
- Why Are They Excited? Identifying and Explaining Spikes in Blog Mood Levels, EACL 2006
WebCLEF – The CLEF Crosslingual Web Track
For Multi/Crosslingual retrieval the web is the natural and common setting. In the European context, many issues for which people turn to the web are essentially multilingual. These include culture, economy, education, leisure, travel. For IR folks, working with web data is simply very attractive. WebCLEF is about evaluating cross-language retrieval systems in a web setting.
Selected publications: