Summary note on the Expert Search session that took place 2nd April at the ECIR 2008 conference in Glasgow.
The session featured four presentations. The first three were focusing on the TREC flavor of the expert search task; these works actually well represented the three main groups from Europe currently doing research in this area (University of Glasgow, University of Amsterdam, University of Twente). The fourth presentation addressed a different but related task, that is, finding users that are potential recipients of an e-mail message. But let’s proceed in order.
Craig Macdonald presented a paper titled High Quality Expertise Evidence for Expert Search [DOI]. The aim of their work is identify high-quality evidence for expert search by predicting the quality of documents in expertise profiles, which are likely to be good indicators of expertise. The techniques they use include the identification of possible candidate homepages, and of clustering the documents in each profile to determine the candidate’s main areas of expertise. These techniques are integrated into their Voting Model for expert search. The main findings are that clustering and proximity techniques seem very promising. However, in contrast to Web search settings, features such as URL and inlinks did not exhibit large increases in performance.
Second in the session, I presented our paper (co-authored by Maarten de Rijke) titled Associating People and Documents [PDF]. It focuses on a feature, shared by many of the models proposed for the expert finding task — associations between people and documents. For example, if someone is strongly associated with an important document on a given topic, this person is more likely to be an expert on the topic than someone who is not associated with any documents on the topic. Despite the important role of associations between candidate experts and documents for today’s expert finding models, such associations have received relatively little attention in the research community. While a number of techniques have already been used, these have never been compared. This gave rise to the research questions that we addressed in this paper: What is the impact of document-candidate associations on the end-to-end performance of expert finding models? What are effective ways of capturing the strength of these associations? How sensitive are expert finding models to different document-candidate association methods? We show that refined ways of estimating the strength of associations between people and documents leads to significant improvements over the state-of-the-art.
The slides of the presentation are available here.
A best student paper award winning paper Modeling documents as mixtures of persons for expert finding [PDF] by Pavel Serdyukov and Djoerd Hiemstra was presented by Pavel. This paper seems to me as a continuation of the work put forward in their SIGIR 2007 poster. They propose a person-centric method that combines the features of both document- and profile-centric expert finding approaches. Model 2 from Balog et. al. SIGIR2006 is taken as a baseline, but a principal difference is that they keep the conditional dependence between query terms and candidate mentions, and regard people as generators of the document’s content.
My personal opinion is that the modeling part of the work has been done in a very nice and principled way. On the other hand, I have some concerns regarding the experimental evaluation, which in fact is not specific to this paper, but holds for all their published work on expert search. They keep limiting themselves to using only the e-mail archive (lists) part of W3C collection. Probably this is the main reason for their scores being relatively low in absolute terms, i.e., around the TREC median scores. It would be interesting to see how their methods perform using the full collection, and whether it can compete with the state-of-the-art. Second, one might say that it is not really safe to draw any conclusions when the significance of the differences is not tested.
In any case, I am looking forward to seeing how their other line of work on expert finding, based on the graph-based relevance propagation framework, develops.
The final presentation of the session was given by Vitor Carvalho. The paper Ranking Users for Intelligent Message Addressing [PDF] investigates a task related to expert search — finding persons who are potential recipients of an e-mail message (under composition) given its current contents, its previously-specified recipients, or a few initial letters of the intended recipient contact (intelligent auto-completion). The techniques proposed for this task include a TF.IDF classifier, K-Nearest Neighbors, and Model 1 and Model 2 from Balog et. al. SIGIR2006. They also investigated the combinations of the proposed methods using fusion techniques, which led to improvements over the baseline.