DNA methylation corpus


DNA methylation is a key mechanism of epigenetic control of gene expression and implicated in many cancers, but there had, until recently, been little study of automatic information extraction for DNA methylation. To address the opportunities of automatic extraction of information on DNA methylation from the literature, we annotated a corpus of relevant documents using the GENIA event representation.

This corpus was produced in part as a preparatory study for the organization of the BioNLP Shared Task 2011 Epigenetics and Post-translational Modifications (EPI) task. The EPI corpus annotations include a larger and more comprehensive set of annotations for associated events.

Corpus format

The DNA methylation corpus is distributed in the BioNLP Shared Task - flavored standoff format.

Annotation guidelines

The DNA methylation corpus is annotated following the GENIA Event corpus annotation guidelines
  • Tomoko Ohta, Jin-Dong Kim and Jun’ichi Tsujii, Guidelines for event annotation, University of Tokyo Technical Report, 2007.