Meta-knowledge corpus


The meta-knowledge corpus is an extension of the GENIA Event corpus annotation, created by the National Centre for Text Mining (NaCTeM).

The meta-knowledge corpus adds an extra level of annotation to the GENIA Event corpus, which encodes the interpretation of each event according to its textual context.

To illustrate the type of annotation performed, consider the examples below. In each sentence, the event (triggered by the verb activate and its participants (narL gene product as the CAUSE and nitrate reductase operon as the THEME) are identical, although the way in which the events should be interpreted is different in each case.
  1. It is known that the narL gene product activates the nitrate reductase operon
  2. We examined whether the narL gene product activates the nitrate reductase operon
  3. The narL gene product did not activate the nitrate reductase operon
  4. These results suggest that the narL gene product is activated by the nitrate reductase operon
  5. The narL gene product partially activated the nitrate reductase operon
  6. Previous studies have shown that the narL gene product activates the nitrate reductase operon

In sentence 1), the word known tells us that the event is a generally accepted fact, while in 2), the interpretation is completely different. The word examined denotes that the event is under investigation, and hence the truth value of the event is unknown. The presence of the word not in sentence 3) shows that the event is negated, i.e. it did not happen. In sentence 4), the verb suggest, together with its subject adds further speculation regarding the truth of the event. The word partially in sentence 4) does not challenge the truth of the event, but rather conveys the information that the strength or intensity of the event is less than may be expected by default. Finally, the phrase previous studies in sentence 5) shows that the event is based on information available in previously published papers, rather than relating to new information from the current study.


The annotations starting with "META" correspond to the meta-knowledge annotations

Corpus format

The Meta-knowledge corpus is available in XML format. The format is the same as the GENIA Event corpus with a few small additions to allow meta-knowledge values and clue phrases to be annotated.


Annotation guidelines

Meta-Knowledge Annotation of Bio-Events: Annotation Guidelines.  Paul Thompson, Raheel Nawaz, John McNaught and Sophia Ananiadou. University of Manchester Technical Report. 2010.


Thompson, P., Nawaz, R., McNaught, J. and Ananiadou, S. Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics 2011, 12:393 (Open Access; Highly accessed)


Please see the meta-knowledge corpus home page to download the corpus and annotation guidelines, and for more detailed information about the corpus.


Paul Thompson: Annotation coordination and scheme design
Raheel Nawaz: Scheme design and annotation analysis
Sophia Ananiadou: Prinicpal investigator
John McNaught: Co-investigator
Maria Aretoulaki: Annotation
Syed Amir Iqbal: Annotation