CIS 5603. Artificial Intelligence

Learning as Approximation

1. Learning in AI

Learning has been considered as an important feature of AI from the beginning of the field (as shown in Turing 1950 and Dartmouth Meeting). In the early days of the field (say, the 1980s), there were many approaches in Machine Learning, and similar works have also been labeled as Data Mining or Knowledge Discovery. In recent years the field has been dominated by deep learning, though there are still different opinions on what learning is, whether it is a necessary feature of intelligence, and how it is related to other cognitive processes, as well as to the study of learning in psychology and cognitive sciences.

A representative definition of Machine Learning (ML) is "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E" (Tom M. Mitchell). A more recent specification is shown in this figure (from a book by Peter Flach), a (meta-level) machine learning algorithm takes the given training data as input and produces a model as output, then the model can serve as an (object-level) algorithm that solves the domain problem. Since the learning process replaces the programming process by human programmers, the application scope of computer is greatly extended.

2. Supervised learning

Supervised learning is a special case of the above figure, where the training data are sample input-output pairs, and the model to be learned is a function that generalizes the mappings in the training data.

An early attempt (under the name of "scientific discovery") was exemplified by the BACON system, which guesses the expression relating the variables (say, if X and Y are positively correlated, calculate X/Y; if they are negatively correlated, calculate X*Y). The program "re-discovered" several laws in physics and chemistry, though its explanation of scientific discovery has been questioned (cartoon).

A different approach is to approximate the target function statistically. The simplest case is linear regression, where the best linear function is found to summarize the training data. Since the type of the expression (or call it "model") has been determined in the design stage, the only thing to be learned is the parameters in the expression.

More complicated approaches have been used to approximate various functions in multidimensional spaces. For example, Support Vector Machine (SVM) finds the hyperplane that best separates the data into classes, maximizing the margin between the closest points (support vectors) of each class. For regression tasks (called Support Vector Regression), SVM tries to fit a line within a threshold (epsilon) that encompasses most of the data points. Using given kernel functions, points in the input space can be mapped into a feature space where a separating hyperplane can be easily found. When the number of dimensions is large, "features" of the input are used as intermediate abstraction between the input and the output.

3. Artificial neural networks

Artificial neural networks (ANNs) provide flexible models for function approximation.

Artificial neuron: weighted summation followed by nonlinear activation/transfer function
Perceptron: linear classifier with single-layer and learning algorithm, XOR, Universal approximation theorem
Backpropagation: learning in multi-layer networks
Deep learning: feature learning

Major issues:

The quality of generalization: overfitting, underspecification, adversarial examples
The validity of the statistical assumptions, such as IID and stationarity
The requirements on training data and time, catastrophic forgetting, reference class problem
Explainability, "XAI is in trouble"

Readings

Poole and Mackworth: Chapter 7
Russell and Norvig: Sections 19.2, 21.1-6
Luger: Sections 11.1-4