Temporal Expertise Profiling

Expertise is not a static concept. Personal interest as well as the landscape of respective fields change over time; knowledge becomes outdated, new topics emerge, and so on.
In recent work, Jan Rybak, Kjetil Nørvåg, and I have been working on capturing, modeling, and characterizing the changes in a person’s expertise over time.

The basic idea that we presented in an ECIR’14 short paper is the following. The expertise of an individual is modelled as a series of profile snapshots. Each profile spanshot is a weighted tree; the hierarchy represents the taxonomy of expertise areas and the weights reflect the person’s knowledge on the corresponding topic. By displaying a series of profile snapshots on a timeline, we can have a complete overview of the development of expertise over time. In addition, we identify and characterize important changes that occur in these profiles. See our colorful poster for an illustration.

In an upcoming SIGIR’14 demo we introduce a web-based system, ExperTime, where we implemented these ideas. While our approach is generic, the system is particular to the computer science domain. Specifically, we use publications from DBLP, classified according to the ACM 1998 Computing Classification System. Jan also created a short video that explains the underlying ideas and introduces the main features of the system:

The next step on our research agenda is the evaluation of temporal expertise profiles. This is a challenging problem for two reasons: (1) the notions of focus and topic changes are subjective and are likely to vary from person to person, and (2) the complexity of the task is beyond the point where TREC-like benchmark evaluations are feasible. The feedback we plan to obtain with the ExperTime system, both implicit and explicit, will provide invaluable information to guide the development of appropriate evaluation methodology.

If you are interested in your temporal expertise profile, you are kindly invited to sign up and claim it. Or, it might already be ready and waiting for you: http://bit.ly/expertime.

Survey on Expertise Retrieval

Together with Yi Fang (Purdue University, USA), Maarten de Rijke (University of Amsterdam, The Netherlands), Pavel Serdyukov (Yandex, Russia), and Luo Si (Purdue University, USA), I wrote a survey paper on Expertise Retrieval for the Foundations and Trends in Information Retrieval (FnTIR) journal, which is now available online. (If your organization doesn’t have a subscription, you can get a free copy from my homepage.)

The study offers a comprehensive overview of expertise retrieval, primarily from an IR perspective, but many other aspects of this multi-faceted research area are also covered. Our main attention is on models and algorithms, which are organized in five groups of basic approaches. We discuss extensions of these models as well as practical considerations. At the end of the survey, we identify a number of possible future directions; these could be of particular interest to those currently working in this area.

Two evaluation campaigns related to entity/expert search

The CLEF 2010 labs will feature two evaluation campaigns that are potentially of interest to people working in the area of entity/people/expert search.

The third WePS Evaluation Workshop (WePS3) focuses on two tasks related to web entity search:

  • Task 1: Clustering and Attribute Extraction for Web People Search.
    Given a set of web search results for a person name, the task is to cluster the pages according to the different people sharing the name and extract certain biographical attributes for each person. [details]
  • Task 2: Name ambiguity resolution for Online Reputation Management.
    Given a set of Twitter entries containing an (ambiguous) company name, and given the home page of the company, the task is to discriminate entries that do not refer to the company. Entries will be given in two languages: English and Spanish. [details]

The Cross-lingual Expert Search (CriES) workshop addresses the problem of multi-lingual expert search in social media environments. The workshop also includes a pilot challenge, which is very much like the expert finding task at the TREC Enterprise track: given a document collection and a query topic, return a ranked list of experts, who are likely to be experts on the topic. However, the document collection is a multilingual social environment (Yahoo! Answers) and topics come in 4 different languages (English, German, French, Spanish).

Last bundle of updates for 2009

I haven’t had time to post entries on my blog over the past few weeks (or even months — has it really been that long ago?). Anyway, here is a couple of things worth mentioning before 2009 is officially over.

A newer version of the EARS toolkit has been released. Major changes concern document-entity associations and faster computation of candidate models, as well as support for MS Visual Studio. See the changelog for details.

Our paper entitled Category-based Query Modeling for Entity Search, with Krisztian Balog, Marc Bron, and Maarten de Rijke as authors, has been accepted to ECIR 2010 and is available online now.

Abstract. Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

See also: ECIR 2010 accepted papers, posters, and demos.

The TREC Enterprise 2008 overview paper has finally been posted to the proceedings.

Happy 2010!

TREC Enterprise 2008 overview

The overview paper of the TREC 2008 Enterprise track is -finally- available. While I was not an organizer of the track, I helped out with finishing the paper; the track organizers generously awarded my contribution with a first authorship. The document still needs to undergo the NIST approval process, but I am allowed to distribute it as “draft”.
[Dowload PDF|BibTex].

Despite having my name on the overview paper, I am still wearing a participant’s hat. So the first questions that comes to mind is: How did we do? (We is team ISLA, consisting of Maarten de Rijke and me.) To cut the story short — we won! Of course, TREC (according to some people) is not a competition. I am not going to take a side on that matter (at least not in this post), so let me translate the simple “we won” statement from ordinary to scientific language: our run showed the best performance among all submissions for the expert finding task of the TREC 2008 Enterprise track. Actually, we achieved both first and second place for all metrics and for all three different versions of the official qrels (they differ in how assessor agreement was handled). Our best run employed a combination of three models: a proximity-based candidate model, a document-based model, and a Web-based variation of the candidate model; our second best run is the same, but without the Web-based component. See the details in our paper [Download PDF|BibTex].
Needless to say, I am very content with these results. Seeing that my investments into research on expert finding has resulted in the state-of-the-art feels just great.