T4SS event corpus


Type IV secretion systems (T4SS) are mechanisms for transferring DNA and proteins across cellular boundaries. T4SS are found in a broad range of Bacteria and in some Archaea. These systems enable gene transfer across cellular membranes, thus contributing to the spread of antibiotic resistance and virulence genes, making them an especially important mechanism in infectious disease research. To explore the opportunities opened by structured event extraction from text for establishing a better understanding of these systems, we annotated a corpus of T4SS-relevant documents using the GENIA event representation.

This corpus was produced in part as a preparatory study for the organization of the BioNLP Shared Task 2011 Infectious Diseases (ID) task. The ID corpus annotations include a larger and more comprehensive set of annotations for associated events.

Corpus format

The corpus is distributed in the GENIA Event corpus XML format.
  • Kim, Jin-Dong, Tomoko Ohta, Yuka Teteisi and Jun'ichi Tsujii. GENIA Corpus Manual - Encoding schemes for the corpus and annotation. Technical Report(TR-NLP-UT-2006-1). Tsujii Laboratory, University of Tokyo, 2006.

Annotation guidelines

The corpus is annotated following the GENIA Event corpus annotation guidelines, adapted as described in "Towards Event Extraction from Full Texts on Infectious Diseases"

  • Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.