Hedge Classification

Please cite this paper when using any of the material on this page.

Annotation Guidelines

The annotation guidelines (with relevant examples) described in the paper Weakly Supervised Learning for Hedge Classification in Scientific Literature (Medlock and Briscoe 2007) can be downloaded here.

Data

The hedge classification data is provided in three formats:
- Tokenized   (.tok extension)
- Tokenized & Stemmed   (.stm extension)
- Tokenized & Stemmed + Bigrams   (.bgm extension)

The following files are available in all formats:
- spec_seeds / nspec_seeds    Seed data for spec and nspec classes
- spec_test / nspec_test    Test data for spec and nspec classes
- pool    Unlabeled pool

The following files are also available in the .stm and .bgm formats:
- spec_train / nspec_train    Training data sets automatically induced using the probabilistic acquisition model of Medlock and Briscoe (2007)

Downloads (gzipped tar files):
- Tokenized: tok.tar.gz
- Tokenized & Stemmed: stm.tar.gz
- Tokenized & Stemmed + Bigrams: bgm.tar.gz

For related resources, see the Flyslip project page from the University of Cambridge.