Benchmarking the Discovery of Expert LLMs
The Million LLMs Track introduces a novel challenge: ranking large language models (LLMs) based on their expected ability to answer specific user queries.
As organizations deploy ensembles of LLMs—ranging from general-purpose to domain-specific—it becomes crucial to determine which models to consult for a given task. This track focuses on evaluating systems that can effectively identify the most capable LLM(s) for a query, without issuing new queries to the models.
Participants are provided with LLM responses and metadata in advance, and must rank the LLMs for each test query based solely on this information.
Given a user query, your system must rank a fixed set of LLMs. The goal is to predict which LLMs are most likely to produce high-quality answers.
Your submission should consist of a ranked list of LLMs for each query.
The dataset is split into two parts:
Use the discovery data to predict the expertise of each LLM and develop your ranking system, then submit your results on the test set.
Note: No access to raw document collections or LLMs is required. The task is designed to benchmark ranking systems under fixed conditions using provided data.
CSV or TSV with the following columns:
query_id, LLM_id, rank, score, team_id
Each team may submit up to 3 official runs.
September 2025
You must provide a short description of your method, including any training data or models used.
Submissions will be evaluated using relevance-based metrics.
Evaluation will be based on hidden ground-truth labels. Results will be reported in aggregate and per query category.
Date | Event |
---|---|
July 1, 2025 | Training and development data released |
September 2025 | Test queries released |
September 2025 | Submission deadline |
October 2025 | Evaluation & results shared |
November 2025 | TREC conference |
We welcome participation from:
![]() Evangelos Kanoulas
University of Amsterdam, The Netherlands
|
![]() Panagiotis Eustratiadis
University of Amsterdam, The Netherlands
|
![]() Mark Sanderson
RMIT University, Australia
|
![]() Jamie Callan
Carnegie Mellon University, USA
|
![]() Vaishali Pal
University of Amsterdam, The Netherlands
|
![]() Yougang Lyu
University of Amsterdam, The Netherlands
|
![]() Zihan Wang
University of Amsterdam, The Netherlands
|