T4SS event corpus


Type IV secretion systems (T4SS) are mechanisms for transferring DNA and proteins across cellular boundaries. T4SS are found in a broad range of Bacteria and in some Archaea. These systems enable gene transfer across cellular membranes, thus contributing to the spread of antibiotic resistance and virulence genes, making them an especially important mechanism in infectious disease research. To explore the opportunities opened by structured event extraction from text for establishing a better understanding of these systems, we annotated a corpus of T4SS-relevant documents using the GENIA event representation.

This corpus was produced in part as a preparatory study for the organization of the BioNLP Shared Task 2011 Infectious Diseases (ID) task. The ID corpus annotations include a larger and more comprehensive set of annotations for associated events.

Corpus format

The corpus is distributed in the GENIA Event corpus XML format.
Annotation guidelines

The corpus is annotated following the GENIA Event corpus annotation guidelines, adapted as described in "Towards Event Extraction from Full Texts on Infectious Diseases"

