CLEF eHealth 2014 – Task 2

Task 2 – Information Extraction

Task 2: Information extraction from Clinical Text: Disease/Disorder Template Filling

To support the continuum of care, our goal is to develop annotated data, resources, methods that make clinical documents easier to understand from nurses and patients’ perspective. Similar to ShARe corpus from ShARe/CLEFeHealth2013 Tasks 1 and 2, we open the lab for method and resource submissions to be evaluated statistically. We will extend Task 1 from 2013 by focusing this year’s task on Disease/Disorder Template Filling. For this task, participants will be provided an empty template for each disease/disorder mention; each template consists of the mention’s Unified Medical Language System concept unique identifiers (CUI), mention boundaries, and unfilled attribute: value slots. Participants are asked to develop attribute classifiers that predict the value for each each attribute:value slot for the provided disease/disorder mention.

Disease/Disorder Templates consist of 10 different attributes: Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression.  There are two attribute: value slot types: normalization and cue. The ShARe/CLEFeHealth2013 Task 1 & 2 corpus and Disease/Disorder Template annotations in English will serve as an initial development set (n=300 documents of 4 clinical report types) and new annotations will be developed to create an unseen evaluation set (n=133 discharge summaries). 

Task 2 Dataset

Task 2 Dataset

To support the continuum of care, our goal is to develop annotated data, resources, methods that make clinical documents easier to understand for patients.  We will extend Task 1 from 2013 by focusing this year’s task on Disease/Disorder Template Filling. For this task, participants will be provided an empty template for each disease/disorder mention; each template consists of the mention’s Unified Medical Language System concept unique identifiers (CUI) and mention boundaries. Participants are required to fill in values for each of 10 attributes. Attributes have two slot types, a normalized category (normalization) and the lexical cue from the sentence that indicates the normalized value (cue). Task 2a will evaluate participants’ ability to predict each normalization slot value; Task 2b will evaluate participants’ ability to predict the cue slot value for each disease/disorder template.

There are 10 different attribute types: Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression Normalization values for nine of the attributes come from a list of possible values, such as “yes, no” for Negation Indicator. Normalized values for the tenth attribute—Body Location—come from the UMLS (concept unique identifier (CUI)). The definition of each Attribute type can be found in Table 1.

Table 1. Disease/Disorder Attribute Types with definitions and norm and cue slot values.   

*Default Slot Values; **CEM = Clinical Element Models, the original source of many of the attributes (http://www.clinicalelement.com)

*Default Slot Values; **CEM = Clinical Element Models, the original source of many of the attributes (http://www.clinicalelement.com)

The training dataset will contain templates in a “|” delimited format with: a) the disorder CUI assigned to the template as well as the character boundary of the named entity, and b) the default values for each of the 10 attributes of the disease/disorder. Each template will contain the following format:

DD_DocName|DD_Spans|DD_CUI|Norm_NI|Cue_NI|Norm_SC|Cue_SC|Norm_UI|Cue_UI|Norm_CC|Cue_CC|Norm_SV|Cue_SV|       Norm_CO|Cue_CO|Norm_GC|Cue_GC|Norm_BL|Cue_BL|Norm_DT|Norm_TE|Cue_TE

The default values for the Normalization slots are shown in Table 2. The default value for the Cue slot is NULL. The default values will be provided for each attribute in the template in the test set. See the table below for disease/disorder attribute types, example sentences, its Normalization and Cue Slot Value.

Table 2. Attribute types with example sentences and their norm and cue slot values.


The ShARe/CLEFeHealth2013 Task 1 corpus and Disease/Disorder Template annotations will serve as a training set (n=300 documents of four clinical report types). The test set comprises an unseen evaluation set (n=133 documents of discharge summaries).  Participants are required to participate in Task 2a and have the option to participate in Task 2b.

Task 2a and 2b Example: For the following sentence, “The patient has an extensive thyroid history.”, participants are provided the following disease/disorder template with defaults:

09388-093839-DISCHARGE_SUMMARY.txt|30-36|C0040128|*no|*NULL|*patient|*NULL|*no|*NULL|*false|*NULL|  *unmarked|*NULL|*false|*NULL|*false|*NULL|*NULL|*NULL|*Unknown|*None|*NULL

Task 2a) Assign Normalization values to the ten attributes. Participants will keepor update the Normalization values.  For the example sentence, the Task 2a changes:

09388-093839-DISCHARGE_SUMMARY.txt|30-36|C0040128|*no|*NULL|*patient|*NULL|*no|*NULL|*false|*NULL| *unmarked|*NULL|severe|*NULL|*false|*NULL|C0040132|*NULL|Before|*None|*NULL

Task 2b) Assign Cue values to the nine attributes with cues.  Participants will keepor update the Cue values.  For the example sentence, the Task 2b changes:

09388-093839-DISCHARGE_SUMMARY.txt|30-36|C0040128|*no|*NULL|*patient|*NULL|*no|*NULL|*false|*NULL|                       *unmarked|*NULL|severe|20-28|*false|*NULL|C0040132|30-36|Before|*None|*NULL

Please note patient Cue span is not annotated in ShARe since it is an attribute default.

Task 2 Guidelines

Attention lab participants! – you should now start writing your working notes papers!

*Online working notes internal review submission deadline June 3rd.
**Online working notes camera ready for CLEF submission deadline June 7th.
Details on preparing working notes & link to the working notes submission system are available at:  http://clef2014.clef-initiative.eu/index.php?page=Pages/instructions_for_authors.html

The ShARe/CLEF Task 2 disease/disorder guidelines provide an overview of the original ShARe attribute guidelines with extensions for the 2014 ShARe/CLEF Task 2. We encourage participants to review these guidelines prior to system development. Participants will be provided training and test data sets. The evaluation will be conducted using the withheld test disease/disorder templates. Participating teams are asked to stop development as soon as they download the test disease/disorder templates. Teams are allowed to use any outside resources in their algorithms.  

Timeline

(Small) Example data set release: Dec 9 2013
(Full) Training data set release: Jan 10 2014
Test data set release: April 23 2014
Test data submissions: May 1 2014

Task 2 Evaluations
Post submission normalization and cue detection assessment will be conducted on the test disease/disorder templates to generate the complete result set. To do this, task participants will be asked to submit up to two runs for Task 2a and Task 2b.

  • Task 2a is mandatory for participation.
  • Task 2b is optional for participation.

Runs submitted have to follow template format.

Submitting your runs:

To submit your runs to Task 2a and 2b, please follow these guidelines carefully.

1. Follow the task-specific submission deadline for 1 May 2014

2. Navigate to our Easy Chair for CLEFeHealth2014 runs  submission page (https://www.easychair.org/conferences/?conf=clefehealth2014) and submit separately to each task by selecting “New Submission”. You will submit all runs for one task at the same time. After you have created a new submission, you can update it, but no updates of runs are accepted after the deadline has passed.

3. List all your team members as “Authors”. “Address for Correspondence” and “Corresponding author” refer to your team leader. Note: you can acknowledge people not listed as authors separately in the working notes (to be submitted by June 7 (instructions to be described below in due time)) – we wish this process to be very similar to defining the list of authors in scientific papers.

4. Please provide the task and your team name as “Title” (e.g., “Task 2a: Team NICTA” or “Task 2a using extra annotations: Team NICTA”) and a short description (max 100 words) of your team as “Abstract”. See the category list below the abstract field for the task names. If you submit to multiple tasks, please copy and paste the same description to all your submissions and use the same team name in all submissions.

5. Choose a “category” and one or more “Groups” to describe your submission. We allow up to 2 runs for Task 2a and Task 2b.

6. Please provide 3-10 “Keywords” that describe your the different runs in the submission, including methods (e.g., MetaMap, Support Vector Machines, Weka) and resources (e.g., Unified Medical Language System, expert annotation). You will provide a narrative description later in the process.

7. As “Paper” please submit a zip file including the runs for this task. Please name each run as follows: “name + run + task + add/noadd” (e.g., TeamNICTA.1.2a.add) where name refers to your team name; run to the run ID; task to 2a or 2b; and “add/noadd” to the use of additional annotations.  Please follow the file formats available at https://sites.google.com/a/dcu.ie/clefehealth2014/task-2/2014-dataset.

8. As the mandatory attachment file, please provide a txt file with a description of the submission. Please structure this file by using your run-file names above. For each run, provide a max 200 word summary of the processing pipeline (i.e., methods and resources). Be sure to describe differences between the runs in the submission.

Evaluation Metrics:

Evaluation will focus on Accuracy for Task 2a and F1-measure for Task 2b. We will evaluate each task by overall performance and by attribute type.

(2a) predict each attribute’s normalization slot value

Evaluation measure: Accuracy (overall and per attribute type)
Accuracy = Correct/Total 
Correct = Number of attribute: value slots with correct normalization value
Total = Number of attribute: value slots

(2b) predict each attribute’s cue slot value

Evaluation measure: F1-score (overall and per attribute type)
F1-score = (2 * Recall * Precision) / (Recall + Precision)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)
TP = same span 
FP = spurious span 
FN = missing span

Exact F1-score: span is identical to the reference standard span                                                                                   Overlapping F1-score: span overlaps reference standard span

Evaluation Write-up:

Participating groups in Task 2 are asked to submit a report (working notes) describing their Task 2 experiments.
Details on preparing working notes & link to the working notes submission system are available at:  http://clef2014.clef-initiative.eu/index.php?page=Pages/instructions_for_authors.html

Task 2 Getting Started

Obtaining Task 2 Dataset

The dataset will be distributed through the Physionet website. The steps for accessing the ShARe dataset for this year’s Task 2 can be found below.

  • 1. Register for CLEF eHealth 2014: http://147.162.2.122:8888/clef2014labs/
  • 2. Obtain a human subjects training certificate. If you do not have a certificate, you can take the CITI training course (https://www.citiprogram.org/Default.asp) or the NIH training course (http://phrp.nihtraining.com/users/login.php)Note: First time users need to create an account in order to be able to take the courses. Expect a couple of hours work to complete the certification. Please save an electronic copy of the certificate – it will be needed in the subsequent steps to obtain the data.
  • 3. Go to the Physionet site: http://physionet.org/mimic2/mimic2_access.shtml
  • 4. Click on the link for “creating a PhysioNetWorks account” (near middle of page) (https://physionet.org/pnw/login) and follow the instructions.
  • 5. Go to this site and accept the terms of the DUA: https://physionet.org/works/MIMICIIClinicalDatabase/access.shtml You will receive an email telling you to fill in your information on the DUA and email it back with your human subjects training certificate.Important: Fill out the DUA using the word “ShARe/CLEF” in the description of the project and mail it back (pasted into the email) with your human subjects certificate attached. General research area for which the data will be used: CLEF (plus perhaps something more descriptive)
  • 6. Once you are approved, the organizers will add you to the physionetworks ShARE/CLEF eHealth 2014 account as a reviewer. We will send you an email informing you that you can go to the PhysioNetWorks website and click on the authorized users link to access the data (it will ask you to log in using your physionetworks account login): https://physionet.org/works/ShAReCLEFeHealth2014Task2/

Note: If you participated in CLEF eHealth 2013 and obtained permissions, you will skip Steps 2-5 and will be provided access to the 2014 dataset following successful Step 1 registration.

Please note that all individuals working on the data need to individually obtain a human subjects training certificate, apply for a Physionet account, and sign their own DUA on the Physionet site.

To register for the task on the CLEF site, it is sufficient to register only one participant per participating group, but for access to the task 2 data, each participating individual needs her/his own access permission from Physionet.

Timeline

(Small) Example data set release: Dec 9 2013
(Full) Training data set release: Jan 10 2014
Test data set release: April 23 2014
Test data set submissions due: May 1 2014
Online working notes (internal review) due: June 3 2014
Online working notes (camera ready for CLEF) due: June 7 2014

Information and Discussion Forum

General information and discussions during the task will be organised through the following Google group:
https://groups.google.com/forum/?hl=en#!forum/share-clef-ehealth-2014–tasks-2

Reviewing Task 2 Dataset and Annotations

We are providing a GUI interface for calculation of outcome measures, as well as for visualization of system annotations against reference standard annotations. Use of the Evaluation Workbench is completely optional. Because the Evaluation Workbench is still under development, we would appreciate your feedback and questions if you select to use it.

A. Memory issues. You need to allocate extra heap when you run the workbench with all the files, or you will get an “out of memory” error.  To do so, you need to use a terminal (or shell) program, go to the directory containing the startup.parameters file, and type:

java -Xms512m -Xmx1024m -jar Eval*.jar 

B. Startup Properties file and GUI. The Evaluation Workbench relies on a parameter file called “startup.properties”. Since the Workbench is a tool for comparing two sets of annotations, the properties refer to the first (or gold standard) and second (or system) annotators. The following properties will need to be set using the Startup properties GUI before selecting “Initialize” to start the Workbench:

WorkbenchDirectory Full filename where the executable (.jar) file is located. For example,WorkbenchDirectory=/Users/wendyc/Desktop/EvaluationWorkbenchFolderDistribution_  2014ShARECLEF

TextInputDirectory: Directory containing the clinical reports (every document is a single text file in the directory). For example,
TextInputDirectory=/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_  2014ShARECLEF/corpus

AnnotationInputDirectoryFirstAnnotator / AnnotationInputDirectorySecondAnnotator Directories containing the sets of annotations (gold standard annotations is first, system annotations is second). If you do not have system annotations but just want to view the gold standard annotations, point both input directories to the gold standard annotations.

AnnotationInputDirectoryFirstAnnotator=/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_   2014ShARECLEF/ShAReTask2TrainingKnowtatorFiles

AnnotationInputDirectorySecondAnnotator=/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_  2014ShARECLEF/ShAReTask2TrainingKnowtatorFiles

Knowtator Schema File: File containing the protégé ontology file representing the ShARe schema

Knowtator Schema File =/Users/wendyc/Desktop/CLEFEvaluationWorkbenchFolderDistribution_2014ShARECLEF/     SHARe_Jan18_2012_base.pont

Classification LabelsLabels for classes, attributes, and relations between classes for ShARe schema

Classification Labels= DefaultClassificationProperties

or

Classification Labels= associatedcode,associatedCode,distal_or_proximal_normalization,negation_indicator_normalization,  negation_indicator_normalization,severity_normalization,course_normalization,  subject_normalization_CU,Strength number,Strength unit,Strength,Dosage,Frequency number,  Frequencyunit,Frequency,Duration,Route,Form,Attributes_medication,disease_disorder,  Disease_Disorder,severity,negation_indicator,LABEL,degree_of,subject_class,TIMEX3,  uncertainty_indicator_class,subject

**Please remember to set pathnames appropriate for your operating system.  MacOS / Unix pathnames are in the form “/applications/EvaluationWorkbench/…”, whereas Windows paths are in the form “c:\\Program Files\\Evaluation Workbench\\…” (escape characters included).  After setting paths appropriately for your computer and operating system, you can activate the Workbench by going to the distribution directory and using the mouse to double-click the EvaluationWorkbench.jar icon.**

Select “Save” once you have set these parameters in the GUI, then “Initialize” to start the Evaluation Workbench.

C. Short tutorial on Evaluation Workbench (5 minute video here: http://screencast.com/t/QzaMLwWwFe):

  • To open the workbench, double click on the EvaluationWorkbench.jar file and follow the steps to set the parameters described in B.
  • To view the 2014 ShARe/CLEF template annotations, select “Utilities” from the tool bar, then select “Convert annotations to pipe-delimited format (CLEF 2)”
  • To navigate the Workbench, most operations will involve holding down the CTRL key until the mouse is moved to a desired position; once the desired position is reached, release the CTRL key. 
  • You can view the 2014 ShARe/CLEF template for a given annotation by holding down the CTRL key and hovering over an annotation.
  • The Workbench displays information in several panes
    • Statistics pane: rows are classifications (e.g., Disorder CUI); columns display a contingency table of counts and several outcome measures (e.g., F-measure). The intersecting cell is the outcome measure for that particular classification. When a cell is highlighted, the reports generating that value are shown in the Reports pane. When you move the mouse over a report in the Reports pane, that report will appear in the Document pane.
    • The Document pane displays annotations for the selected document. The parameter button with label “Display=” selects whether to view a single annotation set at a time (gold or system), or to view both at once. Pink annotations are those that occur in only one source, and so indicate a false negative error (if it appears in the gold but not the system annotation set) or false positive (if it appears in the system but not the gold set). Highlighting an annotation in the document pane updates the statistics pane to reflect statistics for that classification. It also shows the attributes and relationships for that annotation (not relevant for this dataset but in other datasets you may have attributes like Negation status or relationships like Location of).
    • The Detail panel on the lower right side displays relevant parameters, report names, attribute, and relation information. The parameters include “Annotator” (whether the currently selected annotator is Gold or System), “Display” (whether you are viewing gold annotations, system annotations, or both), MatchMode (whether matches must be exact or any-character overlap) and MouseCtrl (whether the ctrl key must be held down to activate selections).
  • You can store the evaluation measures to a file by selecting File->StoreOutcomeMeasures, and entering a selected file name.

To participate in an electronic dialogue about use of the Workbench, please sign up for the google group: https://groups.google.com/forum/?fromgroups#!forum/evaluation-workbench