Parsing Introduction.
1. Two Views of Syntactic Structure
Using statistical model, there are two views of syntactic structure:
- Constituency
Phrases structures organizes words into nested constituents, more intuitively - segment sentences by brackets.
- Dependency
Structure shows which words depend on (modify or are arguments of) which other words using dependency arc.
Before these structures / parsing models are raised, the classical parsing models are based on symbolic grammar (Context-Free-Grammar, CFG) and lexicon. A big issue is they scaled very badly and didn’t give coverage.
The solutions include:
- categorical constraints can be added to limit unlikely / weird parses
- using statistical parsing to help find the most likely parses for sentences
Annotated data including Treebank are built, with benefits:
- re-usability of labor (many parsers, POS taggers)
- broad coverage
- frequencies and distributional information
- evaluation systems
2. Exponential Problem in Parsing
A key parsing decision is how to “attach” various constituents.