Datasets

CLEF eHealth Acknowledgement, Citation, and Licensing Policy

Thank you for choosing to use data, software, or other resources provided to you by the CLEF eHealth initiative. All our datasets are freely available for research purpose. The following table gives the list of datasets, the related tasks, and link to get access.

Please find below our guidelines for acknowledging and citing the initiative and its resources, together with instructions about our licensing policy.

Recommended Acknowledgement

If you use data, software, or other resources provided to you by the CLEF eHealth initiative for academic presentation, research paper, or other publication purposes, please include the following acknowledgement: 

We gratefully acknowledge the contribution of the people and organizations involved in the CLEF eHealth initiative as participants, organizers, or funders.

Citation Recommendation

Please cite the most relevant lab or task overview(s). The list of these overviews is available at References.

Licensing Information

Please pay careful attention to the licensing agreement associated with each data, software, or other resource release by the CLEF eHealth initiative. You can find the details in the task specific overviews at References.

Category Year Task Link
Information extraction 2013-2014 IE from clinical reports: MIMIC II dataset has been used for two tasks in 2013, and one task in 2014. Link
2015-2016

2017-2018
2019

The goal of the task is to perform named entity recognition in a corpus of biomedical articles in French.

(Details to be added)

The goal of the task is to assign ICD-10 codes to health-related documents with the focus on the German language and on non-technical summaries (NTPs) of animal experiments.

Link

(Link to be added)

Training Data
Test Data
Information management 2014 The goal of the task is to design visualization systems for eHealth data. The corpus contains clinical reports, annotations, patient search queries, and matching relevant web documents. Link
2015-2016 The goal of the 2015 task is to design correction systems for speech recognition output from nurses handovers. For the 2016 task, it is to fill out a handover form with 35 headings with information extracted from the free-form text handover reports. Link
Technology assisted reviews 2017-2019 The goal of the task is to design visualization systems for eHealth data. The corpus contains clinical reports, annotations, patient search queries, and matching relevant web documents. Link
Information retrieval 2013-2018 The goal of the task is to improve information retrieval systems, to better handle health consumer queries. The dataset contains queries in multiple languages, web documents, and relevance judgement (including judgements of other dimensions of relevance). 2013 document collection(also used in 2014-15)
2016-2017 document collection: Clueweb12-B13
2018 document collection: (link to be added)
2013-18 queries, qrels, etc

Link to the original page
https://sites.google.com/site/clefehealth/datasets