CLEF eHealth Task 2015 – Task 1a

Task 1a: Clinical Speech Recognition

The CLEFeHealth 2015 Task 1a addresses clinical speech recognition related to Australian nursing shift changes. The aim is to convert verbal nursing handover to written free-text records. We challenge the participants to minimize word-detection errors by

  1. addressing the correctness of the speech recognition engine itself and/or 
  2. improving this through post-processing methods for the recognised text.

Only fully automated means are allowed, that is, human-in-the-loop approaches are not permitted. All communication is in English. See Suominen and Ferraro (2013) and Suominen, Zhou et al. (2015) for the organisers’ initial work on this task. 

Targeted Participants

The task is open for everybody. We particularly welcome academic and industrial researchersscientistsengineers and graduate students in speech recognitionnatural language processing and biomedical/health informatics to participate. We also encourage participation by multi-disciplinary teams that combine technological skills with content expertise in nursing.

Overall Task

The 2015 task is a part of a bigger processing cascade that combines voice recording, speech recognition, information extraction, information visualisation, and clinical sign-off.  More specifically, the cascade is outlined as follows:Step 1. Verbal handover audio is recorded.Step 2. Speech recognition transcribes these voice recordings into computer-readable free-form text.Step 3. Information extraction is used to pre-fill a handover form by automatically identifying relevant text-snippets for each slot of the form.Step 4. An information visualisation system associates the pre-filled form with the original context of the extracted information in the speech-recognised, free-form text by highlighting text.Step 5.  A clinician to proof-reads, edits, and signs off the form.
The steps 1-2 above are included in the 2015 task. An empirical clinical justification for this cascade has been provided by Dawson et al. (2014), Johnson et al. (2014a2014b), and Suominen et al. (2014).

Data Set

The data set is called NICTA Synthetic Nursing Handover Data. It has been developed for clinical speech recognition and information extraction related to nursing shift-change handover at NICTA in 2012-2014.

The data set has been created by Maricel Angel, registered nurse (RN) with over twelve years’ experience in clinical nursing, supported by Hanna Suominen, Adj/Prof in machine learning for communication and health computing, Leif Hanlen, Adj/Prof in health and Adj/Assoc Prof in software engineering, and Liyuan Zhou, Software Engineer. The text is thus very similar to real documents in Australian English (which cannot be made available). This data creation included the following steps:

  1. generation of patient profiles,
  2. creation of written, free-form text documents,
  3. development of a structured handover form
  4. using this form and the written, free-form text documents to create written, structured documents,
  5. creation of spoken, free-form text documents,
  6. using a speech recognition engine with different vocabularies to convert the spoken documents to written, free-form text, and
  7. using an information extraction system to fill out the handover form from the written, free-form text documents.

Data Examples

Speech-recognised document

Own now on bed 3 he is then Harry 70 is 71 years old under Dr Greco he came in with arrhythmia he complained of chest pain this morning in ECG was done and reviewed by the team he was given some and leaning in morphine for the pain in she is still tachycardic in new meds have been ordered in the bedtime is still 4 hours checks for one full minute are still waiting for echocardiogram this afternoon he is BP is just normal though he is scarring meals of 3 for the tachycardia larger otherwise he still for more new taurine

Reference document

Ken harris, bed three, 71 yrs old under Dr Gregor, came in with arrhythmia. He complained of chest pain this am and ECG was done and was reviewed by the team. He was given some anginine and morphine for the pain. Still tachycardic and new meds have been ordered in the medchart. still for pulse checks for one full minute. Still awaiting echo this afternoon. His BP is just normal though he is scoring MEWS of 3 for the tachycardia. He is still for monitoring.

Top-5 Errors

Substitutions (speech recognition – reference)years – yrs, in – and, one – 1, also – obs, to -2Insertions (speech recognition – reference): and, is, in, she, areDeletions (speech recognition – reference): is, are, and, s, obs

Sound-alike substitutions

Single words (speech recognition – reference):Geylor – Gayler, dialyses – dialysis, results – result, harrowed – Harrod, cord – GORDMulti words (speech recognition – reference):george desilva s – jorge de silva, in ampulla – and ambulant, aspergilloses are – she aspergillosis he, blanford – plan for, can assume – cannot seem

Training Set

The data set (download as a zip file) includes the following documents

  1. Folder initialisation (used in the 2015 task to initialise a speech recognition engine)Initialisation details for speech recognition using Dragon Medical 11.0 (i.e., i) DOCX for the written, free-form text document that originates from the Dragon software release and ii) WMA for the spoken, free-form text document by the RN)
  2. Folder 100profiles (not needed in the 2015 task)100 patient profiles (DOCX)
  3. Folder 101writtenfreetextreports (used in the 2015 task as the written reference standard): 101 written, free-form text documents (TXT)
  4. Folder 100audiofiles (used in the 2015 task as speech to be recognised): 100  spoken, free-form text document by the RN (WAV)
  5. Folder 100x6speechrecognised  (documents related to the nursing vocabulary are used in the post-processing part of the 2015 task as speech-recognised text): 100 speech-recognised, written, free-form text documents for six Dragon vocabularies (TXT)
  6. Folder 101informationextraction (not needed in the 2015 task): 101 written, structured documents for information extraction that include i) the reference standard text, ii) features used by our best system, iii) form categories with respect to the reference standard and iv) form categories with respect to the our best information extraction system (TXT in the CRF++ format).

The data release has been approved at NICTA and the RN has been consented in writing. The license of the spoken, free-form text documents (i.e., WMA and WAV files) is Creative Commons – Attribution Alone – Non-commercial – No Derivative Works (CC-BY-NC-ND) for the purposes of testing speech recognition and language processing algorithms. Our intention is to allow others to test their computational methods against these files with appropriate acknowledgement. The remaining documents (i.e., DOCX and TXT files) are licensed under is Creative Commons – Attribution Alone (CC-BY). Our intention is to allow others to use these text and image files for any purpose with appropriate acknowledgement. In both cases, the acknowledgement requirement is to cite Suominen, Zhou et al. (2015).

Test Set and Submission

Updated on 23 April 2015.

An independent test set was released in April 2015 and submissions are due on 1 May 2015. The same RN has created these written and spoken documents. The aforementioned release approval under the same licensing policy was used for this set too.

Teams need to submit their solutions to this test set. The submission file must follow the evaluation file format specified below (see red font). When submitting, the teams must assign one of the following three categories to each submission file: 

1.a.1: development of the speech recognition engine itself,

1.a.2: development of post-processing methods for the recognised text, 

1.a.3: solutions are based on both 1.a.1 and 1.a.2.

Each team is allowed to submit up to 2 methods/compilllations to the category 1.a.1 (i.e., teams can submit 2 different methods or 2 alternating compilations (e.g., different parameter values or vocabularies in use)). Similarly, each team is allowed to submit up to 2 files to the category 1.a.2. If participating to the category 1.a.2, teams must submit all possible combinations of these methods as their category 1.a.3 submission (i.e., up to 2 x 2 = 4 files).
All submissions must made by using the lab’s Easy Chair system by 1 May 2015. Each submission must consist of the following items:

  1. Address of Correspondence: address, city, post code, (state), country
  2. Author(s): first name, last name, email, country, organisation
  3. Title: Instead of entering a paper title, please specify your team namehere. A good name is something short but identifying. For example, Mayo, Limsi, and UTHealthCCB have been used before.
  4. Keywords: Instead of entering three or more keywords to characterise a paper, please use this field to describe your methods/compilations. We encourage using MeSH or ACM keywords.
  5. Topics: please thick 1.a.1, 1.a.2 or 1.a.3, depending on your aforementioned submission category.
  6. ZIP file: This file is an archive of at least four files
    1. Team description as team.txt (max 100 words): Please write short general description of your team. For example, you may tell that “5 PhD students, supervised by 2 Professors, collaborated” or “A multi-disciplinary approach was followed by a clinician and biostatistician bringing in content expertise, a computational linguist capturing this as features of the learning method and two machine learning researchers choosing and developing the learning method”. 
    2. Method description as methods.txt (max 100 words per method): Please write a short general description of the method(s)/compillation(s) you used. If you submit multiple methods/compilations, please indicate clearly how they differ from each other. Please use the same names as you use to name your output files. Note: In May, you can write a more thorough description of your submission as a working note paper (see the important dates and guidelines for working notes), but this short description provides crucial information to the organisers and is used to write lab and task overviews. Hence, please write this description carefully. 
    3. Processing output(s) for the training set as train_1.txt, train_2.txt,… , train_4.txt (the numbering here reflects your method/compilation IDs, if you submit only one method/compilation, then you include only train_1.txt): Please make sure this is formatted the same way as the model document below (see the output file and make sure you can perform the Running SCTK part on this file yourself).
    4. Processing output(s) for the test set as test_1.txt, test_2.txt,… , test_4.txt (the numbering here reflects your method/compilation IDs, if you submit only one method/compilation, then you include only test_1.txt):  Again, please make sure this is formatted the same way as the model documents below (see the reference file and the output file below) but naturally the document IDs originate from the test set.

Do not hesitate to contact the task leader (Hanna Suominen, hanna.suominen@nicta.com) if you require further assistance. Thank you for participation!

Evaluation Methods

We challenge the participants to maximise the number of correctly detected words. This correctness is evaluated using the number of correct, deleted, inserted, and substituted words. The official measure is the error rate percentage (Err, see below), as defined by the Speech Recognition Scoring Toolkit (SCTK), Version 2.4.0, available here. Usage guidelines are available here. We also provide some helpful tips below.

Running SCTK:

  1. Input: We preprocess the files by removing extra new lines (generated by the speech recognition engine) and remove punctuation. We use the default encoding (-e) for the extended ASCII. We use the hypothesis file (-h) format trn. We use the default format (i.e., trn) for the reference file (-r).
  2. Alignment: We use the option -d for the GNU diff alignment.
  3. Output: We can choose the option -o to get all output files.
  4. Testing that the package works OK: run the following command under the folder where you installed the package (i.e., under mypath/sctk-2.4.0)bin/sclite -r STT_reference_trn_ASCII.txt -h STT_reference_trn_ASCII.txt trn -i spu_id  -o all
  5. Real evaluation: bin/sclite -r reference_trn_ASCII.txt -h myoutput_trn_ASCII.txt -i spu_id  -o all
  6. You can experiment this command for the training set by using the reference file and the output file from the Dragon software with the nursing vocabulary. Teams need to submit their test set results in this SCTK reference file format. This applies to solutions for a) speech recognition engine, b) post-processing, and c) both a&b tracks.
  7. This should result in the following document-specific and overall evaluation results as a sys file:
    
                     SYSTEM SUMMARY PERCENTAGES by SPEAKER                      


       ,----------------------------------------------------------------.
       |                          nursing.txt                           |
       |----------------------------------------------------------------|
       | SPKR   | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
       |--------+-------------+-----------------------------------------|
       | 100    |    1     55 | 61.8   30.9    7.3   30.9   69.1  100.0 |
       |--------+-------------+-----------------------------------------|
       | 10     |    1     84 | 78.6   20.2    1.2   46.4   67.9  100.0 |
       |--------+-------------+-----------------------------------------|


[ . . .]


       |--------+-------------+-----------------------------------------|
       | 99     |    1     65 | 67.7   26.2    6.2   15.4   47.7  100.0 |
       |--------+-------------+-----------------------------------------|
       | 9      |    1     36 | 75.0   22.2    2.8   27.8   52.8  100.0 |
       |================================================================|
       | Sum/Avg|  100   7277 | 72.3   24.1    3.6   28.2   55.9  100.0 |
       |================================================================|
       |  Mean  |  1.0   72.8 | 72.5   23.9    3.5   30.3   57.8  100.0 |
       |  S.D.  |  0.0   34.2 |  6.6    6.3    2.6   14.9   17.0    0.0 |
       | Median |  1.0   65.5 | 73.7   23.5    3.1   26.7   55.5  100.0 |

An example script for removing punctuation and changing to ASCII:

#!/bin/bash

for file in *.txt

do

  iconv -f UTF8 -t  ASCII//TRANSLIT//IGNORE "$file" > "$file.mod.txt"

  tr '[:punct:]|\n|\r' ' ' < "$file.mod.txt" > "$file"

  echo " ("$file |  cut -d '.' -f 1  | tee -a "$file"

  echo "_V)" | tee -a "$file"

  tr -d '\n|\r' < "$file" > "$file.mod.txt"

  cp "$file.mod.txt" "$file"

  rm $file.mod.txt

done

cat *.txt > output.txt

iconv -f UTF8 -t ASCII//TRANSLIT//IGNORE output.txt > nopunctreference_trn_ASCII.txt

tr 'V)' 'V)\n'  < nopunctreference_trn_ASCII.txt  >  output.txt

cp output.txt  nopunctreference_trn_ASCII.txt 

rm output.txt

mv nopunctreference_trn_ASCII.txt /Users/hannasuominen/Downloads/sctk-2.4.0/nopunctreference_trn_ASCII.txt