Relation Extraction.
1. Relation Extraction
Last post we briefly introduced Information Extraction and one of the tasks: Named Entity Recognized (NER). This time we will continue - not only extract entities, but also extract the relationships among entities: IS - A relation, instance - of relation, etc (more from WordNet Thesaurus). For example, after we extract entities from company report, we get Company/Location/Date, to get more advanced knowledge structure, we focus on relation triples: Company - Founding, like Founding - year (IBM, 1911), Founding - location (IBM, New York).
Why Relation Extraction?
- create new structured knowledge bases, useful for any app;
- augment current knowledge bases;
- support Question - Answering system.
Two resources:
- Automated Content Extraction (ACE) gives 17 relations from 2008 “Relation Extraction Task”, these are specific rules for extraction;
- Unified Medical Language System (UMLS) specifies 54 relations among 134 entity types.
Besides those hand-written patterns, we can also use supervised learning, semi-supervised learning and unsupervised learning.
2. Hand-Written Patterns
First let’s see the simplest one - IS-A relation. The early intuition from Hearst (1992):
Y
such asX
((,X
) * (, and | or)X
);- such
Y
asX
; X
or otherY
;X
and otherY
;Y
includingX
;Y
, especiallyX
.
There are more relations like Located-in, founded, cures, etc. Named Entities are also helpful when extract relations.
Advantages:
- human made rules tend to be high-precision;
- can be tailored to specific domains.
Disadvantages:
- human made rules are often low-recall;
- time consuming work.
3. Supervised Relation Extraction
The basic task for the classifier is decide any 2 entities are related. Specific steps are as follows:
- choose a set of relations we’d like to extract;
- choose a set of relevant named entities;
- find and label data, choose corpus - label the named entities in the corpus - hand label the relations between these entities - split into training and test set;
- train a classifier on training set.
For features, we could extract word-based features (words before/after the target entities, words between target entities), entity-based features (POS tags of entities) and syntactic features (constituent path, base syntactic chunk path, typed-dependency path), etc.
Advantages:
- can get high accuracy with enough training data and test data is similar with training.
Disadvantages:
- labeling large training data is expensive;
- classifier may not generalize well to different genres.
4. Semi - Supervised and Unsupervised Relation Extraction
When we don’t have label or even no training data, we could have a few seed tuples or high-precision patterns.
we can use bootstrapping: use the seeds to directly learn to populate a relation. First gather a set of seed pairs that have relation \(R\), then iterate:
- find sentences with these pairs;
- look at the context between or around the pair and generalize the context to create patterns;
- use the patterns for
grep
for more pairs.
Also there are more advanced methods available, like Distant Supervision, etc.