CIS 5603. Artificial Intelligence

Learning as Approximation

1. Learning in AI

Learning has been considered as an important feature of AI from the beginning of the field (as shown in Turing 1950 and Dartmouth Meeting). In the early days of the field (say, the 1980s), there were many approaches in Machine Learning, and similar works have also been labeled as Data Mining or Knowledge Discovery. In recent years the field has been dominated by deep learning, though there are still different opinions on what learning is, whether it is a necessary feature of intelligence, and how it is related to other cognitive processes, as well as to the study of learning in psychology and cognitive sciences.

A representative definition of Machine Learning (ML) is "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E" (Tom M. Mitchell). A more recent specification is shown in this figure (from a book by Peter Flach), a (meta-level) machine learning algorithm takes the given training data as input and produces a model as output, then the model can serve as an (object-level) algorithm that solves the domain problem. Since the learning process replaces the programming process by human programmers, the application scope of computer is greatly extended.

2. Supervised learning

Supervised learning is a special case of the above figure, where the training data are sample input-output pairs, and the model to be learned is a function that generalizes the mappings in the training data.

An early attempt (under the name of "scientific discovery") was exemplified by the BACON system, which guesses the expression relating the variables (say, if X and Y are positively correlated, calculate X/Y; if they are negatively correlated, calculate X*Y). The program "re-discovered" several laws in physics and chemistry, though its explanation of scientific discovery has been questioned (cartoon).

A different approach is to approximate the target function statistically. The simplest case is linear regression, where the best linear function is found to summarize the training data. Since the type of the expression (or call it "model") has been determined in the design stage, the only thing to be learned is the parameters in the expression.

More complicated approaches have been used to approximate various functions in multidimensional spaces so that future input can be properly mapped, under the assumption that nearby inputs produce nearby outputs. When the number of dimensions is large, "features" of the input are used as intermediate abstraction between the input and the output.

3. Artificial neural networks

Artificial neural networks (ANNs) provide flexible models for function approximation. Major issues:

Readings