CLEF eHealth 2016 – Task 1

Training data available: Link to the dataset (30 Oct 2015). See below for details.

Submission system open: Link to the Easy Chair system (12 Apr 2016). 
Test data available: Link to the dataset (15 Apr 2016).
Submissions due: 1 May 2016 (UTC-11:00). Closed for new/updated submissions on 1 May 2016 at midnight (UTC-11:00). 

CLEF eHealth 2016 Task 1 addresses clinical information extraction related to Australian nursing shift changes. This extends the 2015 task 1a of converting verbal nursing handover to written free-text records; in 2016, we challenge participants to maximise the correctness in structuring these written free-text records by pre-filling a handover form by automatically identifying relevant text-snippets for each slot of the form. This year we are aiming to lower the entry barrier and encourage novelty by providing participants with not only evaluation but also processing code

Only fully automated means are allowed, that is, human-in-the-loop approaches are not permitted. All communication is in English. See the following paper for the organisers’ initial work on this task: Suominen H, Zhou L, Hanlen L, Ferraro G. Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations. JMIR Med Inform 2015;3(2):e19.

Targeted Participants

The task is open for everybody. We particularly welcome academic and industrial researchersscientistsengineers and graduate students in speech recognitionnatural language processing and biomedical/health informatics to participate. We also encourage participation by multi-disciplinary teams that combine technological skills with content expertise in nursing.

Overall Task

The 2016 task is a part of a bigger processing cascade that combines voice recording, speech recognition, information extraction, information visualisation, and clinical sign-off.  More specifically, the cascade is outlined as follows:

Step 1. Verbal handover audio is recorded and speech recognition transcribes these voice recordings into computer-readable free-form text (i.e., CLEF eHealth 2015 Task1a).

Step 2. Information extraction is used to pre-fill a handover form by automatically identifying relevant text-snippets for each slot of the form. (i.e., CLEF eHealth 2016 Task 1).

Step 3. An information visualisation system associates the pre-filled form with the original context of the extracted information in the speech-recognised, free form text by highlighting text for a clinician to proof, edit, and sign off.An empirical clinical justification for this cascade has been provided by Dawson et al. (2014), Johnson et al. (2014a2014b), and Suominen et al. (2014).

Data Set

The data set is called NICTA Synthetic Nursing Handover Data. It has been developed for clinical speech recognition and information extraction related to nursing shift-change handover at NICTA from 2012.

The data set has been created by Maricel Angel, registered nurse (RN) with over twelve years’ experience in clinical nursing, supported by Hanna Suominen, Adj/Prof in machine learning for communication and health computing, Leif Hanlen, Adj/Prof in health and Adj/Assoc Prof in software engineering, and Liyuan Zhou, Software Engineer. The text is thus very similar to real documents in Australian English (which cannot be made available). This data creation included the following steps:

  1. generation of patient profiles,
  2. creation of written, free-form text documents,
  3. development of a structured handover form with 50 headings to fill out
  4. using this form and the written, free-form text documents to create written, structured documents,
  5. creation of spoken, free-form text documents,
  6. using a speech recognition engine with different vocabularies to convert the spoken documents to written, free-form text, and
  7. using an information extraction system to fill out the handover form from the written, free-form text documents.

The data release has been approved at NICTA and the RN has been consented in writing. The license of the spoken, free-form text documents (i.e., WMA and WAV files) is Creative Commons – Attribution Alone – Non-commercial – No Derivative Works (CC-BY-NC-ND) for the purposes of testing speech recognition and language processing algorithms. Our intention is to allow others to test their computational methods against these files with appropriate acknowledgement. The remaining documents (i.e., DOCX and TXT files) are licensed under is Creative Commons – Attribution Alone (CC-BY). Our intention is to allow others to use these text and image files for any purpose with appropriate acknowledgement, as detailed here
For further information on the data creation and the form, we refer the reader to the Methods section of the aforementioned Suominen H, Zhou L, Hanlen L, Ferraro G. Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations. JMIR Med Inform 2015;3(2):e19.

Data Examples

***[TODO: Add the picture]***

The figure is a re-print from Suominen H, Zhou L, Hanlen L, Ferraro G. Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations. JMIR Med Inform 2015;3(2):e19.

Training Set

In total, 200 synthetic patient cases can be used for training and validation. More precisely, this consists of 100 training and 100 testing cases (now used for validation) from the CLEF eHealth 2015 Task 1a that have been annotated with respect to the information extraction task. The sets are independent and their comparative analysis  has been published as Suominen, Hanlen, et al. (2015). Both sets can be used for method development, but we strongly recommend using the former 100 cases for training and the latter 100 cases for validation. 
Link to the dataset (i.e., data set 1 for training and data set 2 for validation) was provided on this page on 30 October 2015.

Test Set and Submission

An independent test set of another 100 cases (i.e., data set 3)  was developed for the purposes this task. Link to the dataset was provided on this page on 15 April 2016. Participants must stop all development before downloading this independent test set  – we do not allow its use for method development.

Participant solutions for the test set are due by 1 May 2016 (UTC-11:00). No extension will be given. All solutions need to be submitted as a ZIP file using the task’s official EasyChair system. Each team is allowed to submit up to 2 methods/compilations.To supplement the submissions, participants are expected to give us details on their team and method as well as their solutions for the data sets 1 and 2. Please follow the detailed instructions below. 

Please submit separately to each task.  Each Task 1 submission must consist of the following items:

  1. Address of Correspondence: address, city, post code, (state), country
  2. Author(s): first name, last name, email, country, organisation
  3. Title: Instead of entering a paper title, please specify your team namehere. A good name is something short but identifying. For example, Mayo, Limsi, and UTHealthCCB have been used before. If the same team submits to more than one task, please use the same team name.
  4. Keywords: Instead of entering three or more keywords to characterise a paper, please use this field to describe your methods/compilations. We encourage using MeSH or ACM keywords.
  5. ZIP file: This file is an archive of at least five files
    1. Team description as team.txt (max 100 words): Please write short general description of your team. For example, you may tell that “5 PhD students, supervised by 2 Professors, collaborated” or “A multi-disciplinary approach was followed by a clinician and biostatistician bringing in content expertise, a computational linguist capturing this as features of the learning method and two machine learning researchers choosing and developing the learning method”. 
    2. Method description as methods.txt (max 100 words per method and 100 words per additional data set): Please write a short general description of the method(s)/compilation(s) you used. If you have two methods/compilations, please refer to them with letters A and B (e.g., Method A and Method B) and indicate clearly how they differ from each other (e.g., specify the parameter values or name the different learning algorithms). If you have one method/compilation to submit, there is no need to use these letters. If you use additional data to build your method, please describe these data, their possible annotations, and their use. Note: In May, you can write a more thorough description of your submission as a working note paper, but this short description provides crucial information to the organisers and is used to write lab and task overviews. Hence, please write this description carefully. 
    3. Processing output(s) for 
      1. the data set 1 as train_A.txt and train_B.txt where the letter reflects your method/compilation ID: If you submit only one method/compilation, then you include only train.txt. 
      2. the data set 2 as validation_A.txt and validation_B.txt where the letter reflects your method/compilation ID: If you submit only one method/compilation, then you include only validation.txt. 
      3. the data set 3 as test_A.txt and test_B.txt where the letter reflects your method/compilation ID: If you submit only one method/compilation, then you include only test.txt. 
      • ALL these files need to follow the tabulator (TAB) -separated 3-column format of documentIIDTABwordTABclass and
      1. Class labels need to be spelled precisely the same way as in the data set 1. The label list is also available here
      2. Each line must describe precisely one word as defined by the Standford CoreNLP and proceed from the first to the last word of a given document
      3. Documents should be in the typical terminal reading of their ID numbers, that is, for example for the data set 1 (train), this is 0.txt, 1.txt, 10.txt, 100.txt, 11.txt, 12.txt, …,  19.txt, 2.txt, 20.txt, 21.txt, … ,  3.txt, 30.txt, … , 99.txt.
      • In order to assist with this formatting, we have provided you these three documents without the last column here. We have also provided you shortcuts to data set 1 (train), data set 2 (validation), and data set 3 (test).

Do not hesitate to contact the task leader (Hanna Suominen, if you require further assistance. Thank you for participation!

Evaluation Methods

We will measure the Precision, Recall, and F1 (i.e., the harmonic mean of Precision and Recall) as implemented in the CoNLL 2000 Shared Task on Chunking. We will evaluate performance both separately in every heading of the form (i.e., category) and over all categories present in the training data (available here). When evaluating the latter performance, we will use both macro- and micro-averaging over all other categories than NA for text irrelevant to headings. We will also document the performance in the dominating category of NA category-specifically. Because our desire is to perform well in all classes, and not only in the majority classes, the macro-averaged results will be emphasised over the micro-averaged results and this macro-averaged F1 over all other categories than NA will be used to rank the participant submissions. 


Please remember to register on the main CLEF 2016 registration page

Contact Information

The best (and maybe the fastest) way to get your questions answered is joining the clef-ehealth mailing lists: