Stanford NLP (coursera) Notes (11) - POS Tagging | Bangda Sun

Part-of–Speech and Tagging.

1. Parts of Speech

The initial idea is also known as lexical categories, word classes. There are two classes:

Open class words
- Nouns (Proper / Common): IBM, Italy
- Verbs (Main): see, register
- Adjectives: old, older, oldest
- Adverbs: slowly
- …
Closed class words
- Determiners: the, some
- Conjunctions: and, or
- Pronouns: he, it
- Verbs (Modals): can, had
- Prepositions: to, with
- Particles: off, up
- Interjections: ow, eh
- …

2. POS Tagging

POS Tagging is determine the tag for a particular instance of a word. Word often have more than one POS. See more at Penn Treebank.

Use case:

Text-to-speech
Can be write into regexps
As input to or to speed up a full parser

The current best POS Tagging accuracy can achieve 97%. A baseline model can be tagging every word with its most frequent tag, and tag unknown words as noun.

Source of information for POS Tagging:

knowledge of neighboring words (context)
knowledge of word POS probabilities
features (prefixes, suffixes, capitalization, word shapes) with classifier (maximum entropy model, sequence model - HMM)

A interesting statement: using words only in a straight classifier words as well as a basic sequence model.