This year’s Multilingual Information Extraction (IE) task continues the growth path identified in previous year’s CLEF eHealth IE challenges. The 2021 task focuses on Named Entity Recognition from Spanish clinical text, in the domain of radiology reports, more concretely, ultrasounds.
The task targets the detection of seven different entities as well as hedge cues. Targeted entities include Anatomical Entities but also Findings, describing a pathological or abnormal event, and indicators of probability or future outcomes.
SpRadIE offers multiple challenges to motivate participants to find creative solutions, such as integrating background knowledge from additional resources, or the usage of other additional (also cross-lingual) datasets to supplement the given training dataset:
Domain-specific language: Radiology reports tend to be written in haste, with mistakes and high variability: typos, inconsistencies and in a telegraphic style. Moreover, resources for clinical text are scarce, particularly for languages other than English.
Semantic Split: Training, development and test sets cover different semantic fields, i.e. heart- or liver-related reports, etc., so that various topics and their corresponding entities that occur in the test dataset have not been previously seen in the training dataset.
Small data: To approach realistic deploy conditions, only a small amount of annotated reports will be available during training, and the rest will be used for evaluation.
Complex entities: The linguistic form of entities presents some particular difficulties: lengthier entities with inner structure, embedded entities and discontinuities.
- 30 April 2021: Registration closes
- 8 May 2021: Submissions closes
- 21 May 2021: Publication of results
- 31 May 2021: Submission of participant papers
Runs must be submitted on Easychair: https://easychair.org/conferences/?conf=clefehealth2021runs
Each team can make up to 4 submissions. For the final report of results, only the best submission for each team will be selected.
Each submission must be compressed in a ZIP folder, that must include:
- team description (plain text file with a brief team description),
- solution/run description (plain text file with a brief description of the system used for the submission)
- system output files (.ann files of test set, see below -under “Submission format”- for more information)
The name of the ZIP folder must include the team name, the task name (task 1), and a unique submission number for this task. For instance, the third submission of Team Raleigh to Task 1 has to be named TeamRaleigh_Task1_Run03.zip. When preparing your submission ZIP file, please follow the instructions above. If the same team submits to more than one task, please use the same team name in the submissions of all your tasks. Please, when submitting specify the task (i.e., Task 1. Multilingual Information Extraction) as your submission topic. The first submission is due May 1st 2021 at 23:55 (GMT). The rest of the submissions are due by May 8th 2021 at 23:55 (GMT) on EasyChair at https://easychair.org/conferences/?conf=clefehealth2021runs.
Please do NOT update your submission after your task-specific submission deadline unless the organizers have emailed you a request to do this (e.g., a minor problem in a submission format).
The format for the submission (for system output files) must follow the Brat Standoff format. For each text file in the test set, an annotation file (with extension .ann) must be provided. Text files must not be included.
Annotation file example (from sameSampleStarting/57108_brat.ann):
T1 Negation 70 73 sin
T2 Finding 74 86 alteraciones
T3 Negation 88 104
T4 Finding 105 131
T6 Finding 165 180
T7 Finding 185 196
T10 Anatomical_Entity 8 45 Ecoestructura parenquimatosa
T11 Anatomical_Entity 51 69 nucleos de la base
T5 Finding 133 163 desviaciones de la linea media
- Registration: http://clef2021-labs-registration.dei.unipd.it/
- Datasets: to download them, fill this form
- Further information: https://sites.google.com/view/spradie-2020/