Experimental evaluation has always been central to Information Retrieval research. The field is increasingly moving towards online evaluation, which involves experimenting with real, unsuspecting users in their natural task environments, a so-called living lab. Specifically, with the recent introduction of the Living Labs for IR Evaluation initiative at CLEF and the OpenSearch track at TREC, researchers can now have direct access to such labs. With these benchmarking platforms in place, we believe that online evaluation will be an exciting area to work on in the future. This half-day tutorial aims to provide a comprehensive overview of the underlying theory and complement it with practical guidance.
Among the variety of approaches proposed for entity linking, the TAGME system has gained due attention and is considered a must-have baseline. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, by comparing results obtained from its public API with (re)implementations from scratch. We find that the results reported in the paper cannot be repeated due to unavailability of data sources. Part of the results are reproducible only through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.