Communication and Natural Language
1. Communication and language
The environment of an AI system (or agent) may include a "natural" environment (the world) and a "social" environment (other agents, either human beings or computer systems). While the system's interaction with the natural environment can be studied concretely in physics, chemistry, biology, and so on, social interaction can be considered as "communication", which is a special case of the former.
Communication happens between goal-driven agents where the interactions happens in languages and can be considered abstractly without involving all the concrete details of the underlying processes.
A language is a symbol system, which allows its users to map the commonly used symbols to internal representation (which may use another language), independent of the concrete media and sensorimotor devices. Major levels of language analysis include syntax (form), semantics (meaning), pragmatics (usage).
The study of the communications between AI systems mainly happen in the field of "multi-agent systems". Major topics in the study include
For the communication between AIs and human beings, Natural language processing (NLP) consists of understanding (NLU, comprehension) and generating (NLG, production). NLP has been considered as part of AI from the very beginning of the field, though the depth of processing may depend on the requirements of specific tasks:
NLP is more difficult than the processing of machine languages, mainly because human languages are not designed with fixed aspects (syntax, semantics, and pragmatics) but evolved in history, so these aspects (and the mappings among them) are never well-specified and change over time, as well as depend on the context and the users. The related problems have been studied for a long time in linguistics and its branches, specially psycholinguistics, cognitive linguistics, computational linguistics.
2. Symbolic (rule-based) approaches in NLP
At the beginning, NLP study was strongly influenced by the linguistic theory of Chomsky, which assumes
- Human linguistic competence is modular and innate,
- Syntactic knowledge takes the form of a universal grammar, which can be studied without considering semantics and pragmatics.
Stages of NLU:
- Parsing: analyzing the syntactic structure of each sentence into a tree, according to a given grammar.
- Interpreting: mapping a parsing tree to an internal representation, according to given definitions of the words.
- Enhancing: augmenting the representation with background knowledge to get the intended meaning.
- Using: supporting multiple usages of the represented sentences, according to given algorithms.
Stages of NLG:
- Content determination
- Document structuring
- Aggregation
- Lexical choice
- Referring expression generation
- Realization
The rule-based approaches lack flexibility and naturalness. Major issues:
3. Statistical and neural approaches in NLP
Basic (implicit) assumptions:
- Linguistic knowledge can be discovered from the statistical regularities in the actual usage data,
- The meaning of a lexical item (word, phrase, etc.) is fully determined by its occurring contexts, i.e., the items around it.
- Linguistic items with similar occurrence distributions have similar meanings (e.g., Latent semantic analysis).
Statistical NLP treats a natural language as an existing random process that can be learned from sample data:
- Starting with a corpus, i.e., collections of actual text, with or without labels/annotations.
- Learning relevant probability distributions (such as n-grams),
- Processing language (NLU and NLG) as pattern association and prediction.
Deep neural networks have been widely used in NLP, often by treating a NLP problem as learning an end-to-end mapping between linguistic materials. Representative techniques:
- Words are represented (embedded) in vectors (e.g., Word2vec) use neural networks to predict the context of each word. Consequently, words are put in a vector space so that the similarity of their context in training data decides their "distance" in the space.
- Recurrent neural networks (e.g., Long short-term memory) use feedback connections to encode a sequence of arbitrary length into a fixed-length vector.
- Transformer models uses an "encoder" to map input sequences into vectors and a "decoder" to map the vectors into output sequences, so as to provide a sequence-to-sequence model. An attention mechanism is used to decide the influence of each input node on an output node in the context.
Large-Language Models (LLMs) are neural networks that are (pre-)trained with a huge amount of language materials, and can be used for various NLP tasks. The basic function of an LLM is to predict the next token in the input stream, though in the process it has also acquired syntactic, semantic, and ontological knowledge embedded in the training materials.
Though the NLP techniques have achieved remarkable progress in solving various language-related problems, there are been many related debates:
Issues on language and intelligence:
An alternative approach under development which is neither purely rule-based nor purely statistical: NLP by Reasoning/Learning and Understanding as Conceptualizing.
Readings
- Poole and Mackworth: Sections 8.5, 9.6.6, 14.1, 15.7, 16.3
- Russell and Norvig: Chapters 23, 24
- Luger: Chapter 15