Krisztian Balog

EACL paper featured on the Google Research Blog

April 10, 2026April 14, 2026 by krisztianbalog

I’m excited to share that our EACL 2026 paper has been featured on the Google Research Blog!

We explore how to move beyond simple performance metrics to ensure simulated users actually behave like real ones and introduce a unique dual-agent data collection protocol that enables counterfactual validation. We also publicly release a new dataset of 4k+ human-AI shopping conversations.

Read the full deep-dive here: https://research.google/blog/convapparel-measuring-and-bridging-the-realism-gap-in-user-simulators/

CACM Opinion piece available online

April 7, 2026April 14, 2026 by krisztianbalog

I’m happy to share that our latest opinion piece, “The Indispensable Role of User Simulation in the Pursuit of AGI,” is now available in Communications of the ACM.

In this article, we argue that the path to Artificial General Intelligence (AGI) is currently blocked by two major bottlenecks: the lack of scalable evaluation and the scarcity of high-quality interaction data. We propose that user simulation is not just a helpful tool, but a critical catalyst for overcoming these challenges.

Read the full piece here: https://cacm.acm.org/opinion/the-indispensable-role-of-user-simulation-in-the-pursuit-of-agi/

EACL’26 and ECIR’26 papers

March 13, 2026March 13, 2026 by krisztianbalog

I’m excited to share some recent research we’ve been doing in the areas of user simulation, recommender systems, and explainability. The following papers will be presented at the upcoming EACL and ECIR conferences. Importantly, all these papers come with publicly available resources!

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders (EACL full paper, with Google colleagues O. Meshi, S. Goldman, A. Caciularu, G. Tennenholtz, J. Jeong, A. Globerson, and C. Boutilier) — This work proposes a comprehensive validation framework for user simulators, combining statistical alignment, a human-likeness score, and counterfactual validation.
Trust Me on This: A User Study of Trustworthiness for RAG Responses (ECIR short paper, with W. Łajewska) — This study investigates how different types of explanations can influence user trust in a RAG setting.
UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems (ECIR resource paper, with N. Bernard) — This paper presents significant extensions to the UserSimCRS toolkit, including LLM-based simulators, support for a wider range of CRSs and datasets, and new evaluation metrics and utilities.
Sim4IA-Bench: A User Simulation Benchmark Suite for Next Query and Utterance Prediction (ECIR resource paper, with A. K. Kruff, C. K. Kreutz, T. Breuer, and P. Schaer) — This work presents the simulation benchmark that is the result of the micro shared-tasks we ran at the Sim4IA workshop @SIGIR2025.
SciNUP: Natural Language User Interest Profiles for Scientific Literature Recommendation (ECIR resource paper, with M. Arustashvili) — This paper introduces a synthetic dataset for NL profile-based recommendation in the scholarly domain.

SIGIR’25 contributions

June 26, 2025March 13, 2026 by krisztianbalog

I’m happy to share that I’ll be attending SIGIR ’25, which is shaping up to be a busy and exciting event.

Accepted papers:

“Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation” — perspectives paper with Don Metzler and Zhen Qin [PDF]
“GINGER: Grounded Information Nugget-Based Generation of Responses” — short paper with W. Łajewska [PDF]
“MultiConAD: A Unified Multilingual Conversational Dataset for Early Alzheimer’s Detection” — resource paper with Arezo Shakeri and Mina Farmanbar [PDF]

In addition to the papers, I’ll also be giving a tutorial, together with Nolwenn Bernard, Saber Zerhoudi, and ChengXiang Zhai, on “Theory and Toolkits for User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation” [website]. The tutorial covers key simulation methodologies, with a particular focus on recent advancements leveraging LLMs. Crucially, we will also provide practical guidance, highlighting relevant toolkits, libraries, and datasets available to researchers and practitioners.

Finally, I’m co-organizing the Second SIGIR Workshop on Simulations for Information Access (Sim4IA 2025) together with Philipp Schaer, Christin Katharina Kreutz, Timo Breuer, and Andreas Konstantin Kruff [website]. The workshop features a keynote, invited tech talks, a panel discussion, and (micro) shared tasks for simulating interactions with a traditional search engine or a conversational assistant.

If you’re attending the conference, please come say hello, drop into the tutorial or workshop, or reach out ahead of time—I’d love to connect.

PhD position in Large Language Models for Recommendation

October 7, 2024October 25, 2024 by krisztianbalog

I have a PhD position in Large Language Models for Recommendation, funded by the NorwAI research-based innovation center.

The proposed PhD project aims to advance the field of personalized recommender systems by harnessing the natural language reasoning capabilities of large language models (LLMs). The research will focus on three key areas:

developing methods to construct natural language user interest profiles that enhance transparency and provide user control over recommendations;

designing conversational recommendation systems that utilize LLMs to effectively elicit user preferences and generate tailored responses, including both recommendations and explanations; and

developing approaches to mitigate limitations of LLMs through retrieval-augmented generation (RAG) and tool use.

Overall, this project seeks to push the boundaries of how LLMs can be applied to create more intuitive, responsive, and user-centered recommendation systems.

See the details on jobbnorge. Application deadline is Oct 31.