3203. Introduction to Artificial Intelligence
Learning: Connectionist Approaches
1. Neural networks
Neural network (NN) gets its inspiration from the human brain, though most NNs do not tend to simulate a real neural net accurately. The emphasis is on accounting for intelligence via the statistical and dynamic regularities of highly interconnected, large-scale networks formed by simple units. For this reason, it is also called connectionist model or PDP (Parallel Distributed Processing) model.
A NN consists of interconnected nodes. Each node in the network is like an
artificial neuron in that it accepts input signals, takes a weighted sum of them as the total input to the unit, and generates an output according to a simple (but usually nonlinear) function. For example, it can be a threshold function of the input.
Each time the NN is used, an input activation vector is applied to the input nodes, then the activation values of the other nodes is calculated according to the activation sent out of the input nodes. This process is repeated until certain condition is satisfied, then the activity values of certain nodes are taken to be the output. The weight values on the links may be adjusted according to a learning algorithm.
Overall, a neural network often corresponds to a function that map input vectors to output vectors, though the function is not explicitly expressed by a formula, but implicitly represented by the NN, in its structure and weight values.
2. Perceptron and back-propagation
A simple perceptron consists of an input layer of neurons and an output layer. It learns by adjusting the weights according to the error, that is, the difference between the actual output and the target output. Such an NN can learn a linear function after repeated training.
For more complicated functions, one popular learning algorithm is "back propagation" in multilayer perceptrons, which are fully connected, layered, feedforward networks. Typically, such a NN has an input layer, a hidden layer, and an output layer. The weights of the links are initialized to random numbers. Then, the network is trained by repeating the following procedure for each training case:
- Apply the input values to the input layer, and use the current weights to calculate the activation value of the hidden layer, then the output layer.
- Compute the difference between the actual output and the target output.
- The weights of the links connecting the output layer and the hidden layer are adjusted to reduce the difference as much as possible (given the current activation of the hidden layer).
- The previous steps are repeated on the links connecting the hidden layer and the input layer.
In principle, a three-layer perceptron can apprximate many functions.
3. Hebbian learning
NN can be used for unsupervised learning, too, where no "correct answer" is provided for each training example. The approach suggested by Hebb is to increased the weight of a link between two nodes if both an activated by the same input signal, and to decrease the weight if one is activated and the other isn't.
After repeated training, an "associative memory" will be formed, such that when part of an input pattern is activated, the other part will become active, too.
Hebbian learning can also be used in supervised learning by remember the input/output pair according to Hebbian rule.
4. Hierarchical networks
In recent years, certain NN architectures and algorithms are becoming popular. What makes them different from the traditional models is the stress on the hierarchical nature of the processing.
Deep learning are approaches that build multiple levels of features or representations of the data. [Demonstrations]
Hierarchical temporal memory (HTM) is a model that takes the large-scale structure of the brain into account.
5. Capability and limitation of NN
Compared to traditional symbolic approaches, NN is characterized by its stress on learning and its tolerance to uncertainty.
Typical applications: categorization, pattern recognition, data mining, and so on.
Limitations:
- Can the problem be naturally represented as a mapping from an input vector to an output vector?
- Under the vector representation, should similar inputs produce similar outputs?
- Are there sufficient and stable training data?
- Is there enough time for the training?
- Do we need an explanation of the internal process?