CLEF eHealth 2020 – Task 1: Multilingual Information Extraction

This year’s Multilingual Information Extraction (IE) task continues the growth path identified in previous year’s CLEF eHealth IE challenges. The 2020 task focuses on ICD coding for clinical textual data in Spanish. Coding also includes some textual evidence annotations. A subtask focuses on term mapping, which is now a crucial and pressing need. The terms considered are extracted from EHRs in Spanish and they are manually linked to HPO, ICD10 and SNOMED CT. These tasks can be treated as a named entity recognition and normalisation task, but also as a text classification task.

This year the lab proposes 3 subtasks:

  1. ICD10-CM [CIE10 Diagnóstico] codes assignment. This sub-track evaluates systems that predict ICD10-CM codes (in the Spanish translation, CIE10-Diagnóstico codes).
  2. ICD10-PCS [CIE10 Procedimiento] codes assignment. This sub-track evaluates systems that predict ICD10-PCS codes (in the Spanish translation, CIE10-Procedimiento codes).
  3. Explainable AI. Systems are required to submit the reference to the predicted codes (both ICD10-CM and ICD10-PCS). The correctness of the provided reference is assessed in this sub-track, in addition to the code prediction.


  • Train, Development and Additional set release: January 13
  • Evaluation Script release: January 20
  • Test set release (includes Background set): March 2
  • End of evaluation period: Participant results submissions: May 10
  • Results notified. Test set with GS annotations release: May 12
  • Participants’ working notes papers submitted: July 17
  • Notification of acceptance participant papers: TBC
  • Camera ready paper submission: August 28
  • CLEF 2020: September 22-25

Useful links


For this task, we have prepared a corpus of clinical cases. This CodiEsp corpus of 1,000 clinical case studies was selected manually by a practicing physician. The CodiEsp corpus is distributed in plain text in UTF8 encoding, where each clinical case is stored as a single file whose name is the clinical case identifier. Annotations are released in a tab-separated file with the following fields:

articleID ICD10-code

Tab-separated files for the third sub-track on Explainable AI contain an extra field that provides the position in the text of the text-reference:

articleID label ICD10-code text-reference reference-position

The entire CodiEsp corpus has been randomly sampled into three subsets, the training, development and test set.  The training set comprises 500 clinical cases, and the development and test set 250 clinical cases each. Together with the test set release, we will release an additional collection of more than 2,000 documents (background set) to make sure that participating teams will not be able to do manual corrections and also promote that these systems would potentially be able to scale to larger data collections.

Train, Development and Test are available in ZENODO. Additional information about them is available in our group webpage.


Complete the registration at the CLEF 2020 webpage. Go to and, after the subscribe button, fill your contact details, click CLEF eHealth Task 1 – Multilingual Information Extraction and submit.

Submission Guidelines

Submit your predictions on EasyChair at The submissions are due by 10 May 2020 at 23:55 (GMT). See here submission guidelines.

For further instructions on submission procedure and format, visit CodiEsp webpage at our group site:

Evaluation Methodology

For sub-tracks 1 and 2, participants will submit their coding predictions ranked. For every document, a list of possible codes will be submitted ordered by confidence or relevance. Then, since these sub-tracks are ranking competitions, they will be evaluated one of the standard ranking metrics: Mean Average Precision (MAP).

For the Explainable AI sub-track, the explainability of the systems will be considered, in addition to their performance on the test set. To evaluate both explainability and performance of a system, systems have to provide a reference in the text that supports the code assignment. That reference is cross-checked with the true reference used by expert annotators. Only correct codes with correct references are valid. Then, Precision, Recall, and F1-score are used to evaluate system performance (F-score is the primary metric).

Evaluation library is available here.

Further information

For further information and resources, please visit the CodiEsp webpage at our group site: