BioNLP / JNLPBA Shared Task 2004


This page provides a brief summary of the BioNLP / JNLPBA Shared Task 2004 and links to its resources. Please see also the BioNLP / JNLPBA Shared Task 2004 home page.

The BioNLP / JNLPBA Shared Task 2004 involves the identification and classification of technical terms referring to concepts of interest to biologists in the domain of molecular biology. The task was organized by GENIA Project based on the annotations of the GENIA Term corpus (version 3.02). 

Corpus format

The JNLPBA corpus is distributed in IOB format, with each line containing a single token and its tag, separated by a tab character. Sentences are separated by blank lines.


For detailed documentation about the task, see the BioNLP / JNLPBA Shared Task 2004 home page.


  • Jin-Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi, and Nigel Collier. (2004). Introduction to the Bio-Entity Recognition Task at JNLPBA. in the Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-04). pp. 70--75.


  • Training data
    • 2,000 PubMed abstracts with term annotation
  • Evaluation data
    • 404 PubMed abstracts, with one file with term annotation and one without for each. Evaluation tool is also included.
  • Tagging results
    • Tagging results by participating systems. Note that the original submissions has been cleaned (to remove illegal sequence of tags) and normalized (to include Medline UID).
  • Evaluation tool
    • Updated evaluation tool. Use this tool to get the evaluation equivalent to that of the shared task.
