Shizengengoshorinotameno shinsougakushuu (Neural Network Methods for Natural Language Processing)
336 pages, B5 format
Kyoritsu Shuppan Co., Ltd.
Artificial intelligence has recently been gaining widespread attention. Its basis, namely neural network (deep learning), was originally a pattern recognition technology. A typical example of pattern recognition is image recognition, wherein one can identify what is depicted (for example, a cat) in a given image. The achievement that has caused the recent boom has come out of the field. However, natural language processing, an array of techniques for analyzing sentences and texts, is based on frameworks that differ from that of pattern recognition. Some of these techniques, such as document classification (for example, determining whether a given article is associated with politics or entertainment), feature the properties of pattern recognition. However, techniques such as machine translation substantially differ from pattern recognition.
The processes associated with identifying that a cat is depicted by looking at an image and translating “A cat is lying.” into Japanese are significantly different. For instance, if the image pixels are yellow and brown, a tiger cat is implied. Such colors correspond to the physical attribute of wavelength and thereby allow comparisons (for example, yellow and brown are similar in color, whereas red is different). However, the meaning of the word “cat” is not a physical attribute. The word “cat” by itself does not state whether cats are associated with kittens or desks. Furthermore, although a person can easily identify a cat while looking at individual photos, it is difficult to explain the reasoning behind such identification. Both cats and dogs have two ears and four legs. In contrast, the process of translation can be explained to a certain extent—“cat” is “neko” in Japanese, a subject should be followed by a predicate, and so on. Pattern recognition targets collections of physical features and involves tasks wherein it is easy to identify individual entities but difficult to describe such identifications as rules. Continuing with the earlier example, one cannot write rules for describing what cats are like. Hence, by providing a generalization framework to a computer and showing numerous cats, the computer can be trained to learn the concept of cats.
Two mechanisms are required for conducting natural language processing (such as machine translation) through pattern recognition technologies. First, it is necessary to represent the meaning of words as physical quantities and allow comparisons among them. This can be achieved by incorporating the frequency (a physical quantity) of those words that a given word often occurs around. Second, a generalization framework is needed for learning the attributes of given sentences and texts. By providing several parallel sentences, the generalization framework can learn what types of English sentences are given and how words should be arranged for translating such sentences. Surprisingly, such generalization can produce better results than human-written rules.
This book introduces the basics of neural network technologies and provides clear and detailed explanations regarding the aforementioned two mechanisms. Although this book is a technical treatise whose primary audience is graduate students specializing in natural language processing and machine learning, one will not face any difficulties in following the outline if one possesses the basic knowledge of vectors and matrices. As a technical commentary, this book includes few applications that can be developed up to this significant point and does not discuss ethical or social issues. It would be our immense pleasure if reading this book helps in answering the question “What is artificial intelligence?” and motivates readers to ponder on it.
(Written by KATO Tsuneaki, Professor, Graduate School of Arts and Sciences / 2019)
Table of Contents
PART 1 Supervised Classification and Feed-forward Neural Networks
2 Learning Basics and Linear Models
3 From Linear Models to Multi-layer Perceptrons
4 Feed-forward Neural Networks
5 Neural Network Training
PART II Working with Natural Language Data
6 Features for Textual Data
7 Case Studies of NLP Features
8 From Textual Features to Inputs
9 Language Modeling
10 Pre-trained Word Representations
11 Using Word Embeddings
12 Case Study: A Feed-forward Architecture for Sentence Meaning Inference
PART III Specialized Architectures
13 NGram Detectors: Convolutional Neural Networks
14 Recurrent Neural Networks: Modeling Sequences and Stacks
15 Concrete Recurrent Neural Network Architectures
16 Modeling with Recurrent Networks
17 Conditional Generation
PART IV Additional Topics
18 Modeling Trees with Recursive Neural Networks
19 Structured Output Prediction
20 Cascaded, Multi-task and Semi-supervised Learning