GENIA corpus‎ > ‎

Term annotation

Overview

The identification of linguistic expressions referring to entities of interest in molecular biology such as proteins, genes and cells is a fundamental task in biomolecular text mining. The GENIA technical term annotation covers the identification of  physical biological entities as well as other important terms. The corpus annotation covers the full 1,999 abstracts of the primary GENIA corpus.

Example


Corpus format

The GENIA Term corpus is available in an XML format described in the GENIA Corpus Manual.

Major applications

Documentation

Encoding scheme

  • Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.

Annotation guidelines

  • Kim, Jin-Dong, Tomoko Ohta, Yuka Tateisi and Jun’ichi Tsujii. GENIA Ontology. Technical Report(TR-NLP-UT-2006-2). Tsujii Laboratory, University of Tokyo, 2006.
  • Kim, Jin-Dong and Jun’ichi Tsujii. GENIA Corpus Curation Framework. Technical Report(TR-NLP-UT-2006-3). Tsujii Laboratory, University of Tokyo, 2006. 

Publications

  • Ohta, Tomoko, Yuka Tateisi, Hideki Mima and Jun'ichi Tsujii. The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain. In the Proceedings of the Human Language Technology Conference (HLT 2002). San Diego, USA, March 2002.
  • Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics. 19(suppl. 1). pp. i180-i182, Oxford University Press, 2003. ISSN 1367-4803.
  • Kim, Jin-Dong, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi and Nigel Collier. Introduction to the Bio-Entity Recognition Task at JNLPBA. In the Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-04). Geneva, Switzerland, pp. 70-75, 2004.

Download

Acknowledgments

Tomoko Ohta: GENIA term corpus annotation coordinator
Ċ
Tomoko OHTA,
Dec 8, 2011, 11:58 PM
Ċ
Tomoko OHTA,
Dec 8, 2011, 11:59 PM
Ċ
Tomoko OHTA,
Dec 8, 2011, 11:59 PM
Ċ
Tomoko OHTA,
Dec 18, 2011, 10:20 PM
Ċ
Tomoko OHTA,
Dec 18, 2011, 10:20 PM
Ċ
Tomoko OHTA,
Dec 8, 2011, 11:58 PM