Part-of–Speech and Tagging.
1. Parts of Speech
The initial idea is also known as lexical categories, word classes. There are two classes:
- Open class words
- Nouns (Proper / Common): IBM, Italy
- Verbs (Main): see, register
- Adjectives: old, older, oldest
- Adverbs: slowly
- …
- Closed class words
- Determiners: the, some
- Conjunctions: and, or
- Pronouns: he, it
- Verbs (Modals): can, had
- Prepositions: to, with
- Particles: off, up
- Interjections: ow, eh
- …
2. POS Tagging
POS Tagging is determine the tag for a particular instance of a word. Word often have more than one POS. See more at Penn Treebank.
Use case:
- Text-to-speech
- Can be write into regexps
- As input to or to speed up a full parser
The current best POS Tagging accuracy can achieve 97%. A baseline model can be tagging every word with its most frequent tag, and tag unknown words as noun.
Source of information for POS Tagging:
- knowledge of neighboring words (context)
- knowledge of word POS probabilities
- features (prefixes, suffixes, capitalization, word shapes) with classifier (maximum entropy model, sequence model - HMM)
A interesting statement: using words only in a straight classifier words as well as a basic sequence model.