People Search in the Enterprise
PhD Thesis, University of Amsterdam, 2008
Within an organizational setting it is natural to look not only for documents, but for entities: answers, services, objects, … people! The work described in this thesis focuses on core algorithms for two information access tasks: expert finding and profiling.
The main contribution of the thesis is a generative probabilistic modeling framework for capturing the expert finding and profiling tasks in a uniform way. On top of this general framework two main families of models are introduced, by adapting generative language modeling techniques for document retrieval in a transparent and theoretically sound way.
Throughout the thesis we extensively evaluate and compare these models across different organizational settings, and perform an extensive and systematic exploration and analysis of the experimental results obtained. Through a series of examples we demonstrate that these models are able to incorporate and exploit special characteristics and features of various organizational settings. Finally, we provide further examples that illustrate the generic nature of these models and apply them to find associations between topics and entities other than people.
- Long abstract (PDF); appeared in Dec 2008 issue of SIGIR Forum
- Summary in Dutch: short or long version
In the news
My thesis work has received press coverage around the world. Check out the list here.
I received the Best Doctoral Consortium Paper Award at the ACM SIGIR Conference in 2007 for my dissertation topic, and the Victorine van Schaickprijs in 2009 for my PhD thesis.
Part of the contributions of the thesis is a collection of resources, including software code, as well as data. These will come in several releases.
The resources available so far are:
- the Entity and Association Retrieval System (EARS), which is the implementation of the models introduced in the thesis, released as an open-source toolkit under the BSD license. EARS is written in C++ and is built on top of the Lemur language modeling toolkit;
- the UvT Expert Collection;
- candidate list and document-candidate associations for the CSIRO collection; these are hosted at CSIRO and are accessible using the CERC username/password.
Additional resources that may end up here at some point:
- lists of document-candidate associations for the W3C collection;
- baseline runs reported in the thesis in TREC format, along with the corresponding EARS configuration settings.