CLEF eHealth 2020 – Task 2: Consumer Health Search

The 2020 CLEF eHealth Task 2 on consumer health search builds on the information retrieval tasks that have run at CLEF eHealth since its inception. The consumer health search task follows a standard information retrieval shared challenge paradigm from the perspective that it provides challenge participants with a test collection consisting of a set of documents and a set of topics to develop retrieval techniques for. Runs submitted by participants are pooled, and manual relevance assessment conducted.

This year the lab proposes 2 subtasks:

  1. Adhoc subtask
  2. Spoken queries subtask

The document collection is common to all subtasks, only the topics change (they are provided in several versions).


  • CLEF 2018 Collection Released (corpus + topics): January 2020 [released]
  • Result submission: 15th April 2020, everywhere on Earth
  • Participants’ working notes papers submitted [CEUR-WS]: 31st May 2020
  • Notification of Acceptance Participant Papers [CEUR-WS]: 15th June 2020
  • Camera Ready Copy of Participant Papers [CEUR-WS] due: 29th June 2020
  • CLEFeHealth2020 one-day lab session: Sept 2020

Document Collection

The document collection used is the collection newly introduced in 2018, named clefehealth2018. This collection consists of over 5 million medical webpages from selected domains acquired from the CommonCrawl. Given the positive feedback received for this document collection, it will be used again in the 2020 CHS task.

Document collection structure:

How to get the document collection?


Historically the CLEF eHealth IR task has released text queries representative of layperson medical information needs in various scenarios. In recent years query variations issued by multiple laypeople for the same information need have been offered. In this year’s task we extend this to spoken queries. These spoken queries are generated by 6 individuals using the information needs derived for the 2018 challenge. We also provide textual transcripts of these spoken queries and automatic speech-to-text translations.

Topics for subtask 1: Adhoc IR

Topics for subtask 2: Spoken queries retrieval

Evaluation Methodology

The challenge this year is: given the query variants for a given information need, participants are challenged with retrieving the relevant documents from the provided document collection. This is divided into a number of sub-tasks which can be completed using the spoken queries, textual transcripts of the queries, or the provided automatic speech-to-text transcripts of the queries.

Participants can submit multiple runs for each subtask. Evaluation measures used are NDCG@10, BPref and RBP for the ad-hoc search.