Bangda Sun

Practice makes perfect

Stanford NLP (coursera) Notes (11) - POS Tagging

Part-of–Speech and Tagging.

1. Parts of Speech

The initial idea is also known as lexical categories, word classes. There are two classes:

  • Open class words
    • Nouns (Proper / Common): IBM, Italy
    • Verbs (Main): see, register
    • Adjectives: old, older, oldest
    • Adverbs: slowly
  • Closed class words
    • Determiners: the, some
    • Conjunctions: and, or
    • Pronouns: he, it
    • Verbs (Modals): can, had
    • Prepositions: to, with
    • Particles: off, up
    • Interjections: ow, eh

2. POS Tagging

POS Tagging is determine the tag for a particular instance of a word. Word often have more than one POS. See more at Penn Treebank.

Use case:

  • Text-to-speech
  • Can be write into regexps
  • As input to or to speed up a full parser

The current best POS Tagging accuracy can achieve 97%. A baseline model can be tagging every word with its most frequent tag, and tag unknown words as noun.

Source of information for POS Tagging:

  • knowledge of neighboring words (context)
  • knowledge of word POS probabilities
  • features (prefixes, suffixes, capitalization, word shapes) with classifier (maximum entropy model, sequence model - HMM)

A interesting statement: using words only in a straight classifier words as well as a basic sequence model.