NLP introduction.
Recently I watched videos and slides about Natural Language Processing from Stanford Coursera course by Dan Jurafsky and Christopher Manning. Although CS224d is popular, I still want to start with more basic materials, neural nets are not everything. Therefore I will spend several posts to go through this course. The materials of this course could be found here, and these materials will be my main references.
Generally speaking, NLP is using computers / machines to process / understand natural language used by human, including the language we speak and write. And with the high speed development of AI, it becomes more and more popular. NLP examples and applications are everywhere: question answering (siri); information extraction (email analysis); sentiment analysis (reviews of products on Amazon); machine translation (google translator), etc.
One of the difficulties in NLP is language ambiguity, and it’s everywhere. Also, take English as an example, factors like non-standard english; segmentation issues; idioms; neologism; tricky entities and so on all increase the difficulties for understanding natural language.
Current language tasks could be divided into three parts:
- mostly solved, like spam detection, POS tagging, name entity recognition;
- making good progress, like sentiment analysis, word sense disambiguation, parsing, machine translation and information extraction;
- still really hard, like question answering, paraphrase, text summarization.
This course mainly focus on the first two parts, contents include
- Basic Text Processing
- Minimum Edit Distance
- Language Modeling
- Spelling Correction
- Text Classification
- Sentiment Analysis
- Maximum Entropy Model
- Information Extraction and Named Entity Recognition
- Relation Extraction
- POS Tagging
- Parsing (Probabilistic Parsing, Lexicalized Parsing, Dependency Parsing)
- Information Retrieval
- Semantics
- Question Answering
- Summarization
Next time we will discuss Basic Text Processing and Minimum Edit Distance!