David Clark's Neural Net Introduction

AN INTRODUCTION TO NEURAL NETWORKS

Artificial neural networks are computer programs that simulate biological neural networks.

In order to process vague, noisy, or incomplete information, researchers are turning to biological neural systems as a model for a new computing paradigm. Biological neural systems process this type of information seemingly effortlessly. Neuroscientists have learned much about biological neural systems in recent decades and engineers are using this information to create artificial neural systems in the laboratory. The results to this date are revealing robust and elegant solutions.

Artificial neural systems are unlike artificial intelligence programs. Artificial intelligence programs use deductive reasoning to apply known rules to situations to produce outputs. Each new situation may require that another rule be implemented. The programs can become quite large and complicated in an attempt to address all possible situations. Artificial neural systems, however, automatically construct associations based upon the results of known situations. For each new situation, the neural system automatically adjusts itself and eventually generalizes it.

Artificial neural systems are unlike digital computers.

Digital Computers	Neural Networks
Deductive Reasoning. We apply known rules to input data to produce output.	Inductive Reasoning. Given input and output data (training examples), we construct the rules.
Computation is centralized, synchronous, and serial.	Computation is collective, asynchronous, and parallel.
Memory is packetted, literally stored, and location addressable.	Memory is distributed, internalized, and content addressable.
Not fault tolerant. One transistor goes and it no longer works.	Fault tolerant, redundancy, and sharing of responsibilities.
Fast. Measured in millionths of a second.	Slow. Measured in thousandths of a second.
Exact.	Inexact.
Static connectivity.	Dynamic connectivity.
Applicable if well defined rules with precise input data.	Applicable if rules are unknown or complicated, or if data is noisy or partial.

Artificial neural systems are trained from experience. An artificial neural system is constructed and then simply presented with historical cause and effect situations. The artificial neural system then shapes itself to build an internal representation of the governing rules. Later, after the artificial neural system is trained, it can be presented with a hypothetical situation to produce a prediction of real event results.

Biological Neural Systems

Artificial neural systems are inspired by biological neural systems. The elementary building block of biological neural systems is the neuron.

The single cell neuron consists of the cell body, or soma, the dendrites, and the axon. The dendrites receive signals from the axons of other neurons. The small space between the axon of one neuron and the dendrite of another is the synapse. The afferent dendrites conduct impulses toward the soma. The efferent axon conducts impulses away from the soma.

Biological neural systems are highly distributed. In the human brain, there are between 10¹⁰ and 10¹¹ neurons [Kuffler,1984]. Biological neural systems are highly interconnected. A single neuron may receive input from as many as 8x10⁴ neighboring neurons [Kuffler,1984].

The function of the neuron is to integrate the input it receives through its synapses on its dendrites and either generate an action potential or not. The action potential is generally a 1 millisecond electrical pulse of 0.1 mV amplitude travelling along the axon at 120 meters/second [Kuffler,1984]. When the action potential reaches a synapse at the end of the axon, the electrical signal is converted to a chemical signal to be communicated across the synaptic gap to the post synaptic neuron. At the membrane of the post synaptic neuron, the chemical signal is converted back to an electrical signal to be conveyed along the dendrite to the soma. A synapse can either be excitory or inhibitory. Input from an excitory synapse increases the internal activation level of the neuron while input from an inhibitory synapse reduces it.

Adrian [Adrian,1946] showed that the neuron was an all or nothing processor. Below a threshold level of activity, the neuron produces no output. At the threshold level of activity, the neuron produces an action potential and then resets itself. At higher levels of activity, the action potential increases in frequency but not in amplitude. At even higher levels of activity there is no increase in amplitude or frequency. Input activity vs neuron firing frequency could be plotted as:

This s-shaped curve can be described by the equation:

    f(x) = 1/(1 + e^-a(z-T))

where z is the input activity, T is a threshold, and 'a' is a measure of the slope of f(x). This function is commonly known as a sigmoid.

The mystery facing researchers now is in finding the mechanism for learning in biological neural systems. Changing the strengths of the synapses is the focus of attention. One long standing theory, called Hebbian learning, [Hebb,1949], states that the strength of a synapse between two neurons increases when both neurons fire simultaneously. A more recent theory, put forth by Alkon [Alkon,1989], is based upon conditioned and un-conditioned stimuli received on neighboring synapses.

The Threshold Gate

McCulloch and Pitts [McCulloch,1943] proposed a simple mathematical expression for a formal neuron model and showed that any finite logical expression could be realized with combinations of formal neurons. Engineers further simplified the formal neuron into what became to be known as the threshold gate. By taking the limit as a goes to infinity, the Sigmoid function becomes the simple step function. The output of the threshold gate thus becomes binary, as does its inputs. The synapse function can be replaced by a simple positive or negative scaling factor called a weight. The output y of the threshold gate, when excited by a binary input vector x, is then described simply by:

       1  iff  w₁x₁ + w₂x₂ + ... + w_nx_n >= T
  y =                                          
       0  iff  w₁x₁ + w₂x₂ + ... + w_nx_n <  T

When presented with a pattern of binary inputs, the threshold gate will output either a zero or a one. The threshold gate may thus be a building block for realizing switching functions.

One threshold gate realizes one threshold function. A threshold gate with n-inputs can realize a subset of the (2²)ⁿ switching functions called linearly separable threshold functions. A switching function can be represented geometrically by an n-dimensional hypercube having n mutually orthogonal basis vectors and 2ⁿ vertices. Each of the 2ⁿ vertices represent an output state for the input combination given by the coordinates of the vertex. A labeled n-dimensional hypercube is a threshold function if its true vertices can be separated from its false vertices by a (n-1)-dimensional hyperplane. This hyperplane is referred to as a linear separating surface. Shown below is the n-cube representation (a square) for the two dimensional OR function. A (n-1)-dimensional linear separating surface (a straight line) is drawn to separate the true and false vertices. Also shown is the n-cube representation for the non-linearly separable XOR function. All attempts to separate the true and false vertices with a straight line fail.

The equation for the separating surface is of the form:

    w₁x₁ + w₂x₂ + ... + w_nx_n  =  T

where x_i is the ith component of a binary vector x and T is the threshold. The separating surface intersects the ith axis of the hypercube at T/w_i. The vector perpendicular to the separating surface and passing through the origin is the isobar vector and has direction:

    (  w₁ , w₂ , ... , w_n  )

The distance between the origin and the separating surface is:

    T/(w₁² + w₂² + ... + w_n²)^1/2

along the isobar vector.

For example, consider the OR problem. One possible equation for the separating surface is:

        2x₁ + 2x₂ = 1

where w₁ = 2, w₂ = 2, and threshold = 1. The separating surface intersects the x₁ axis at T/w₁ = 1/2, and intersects the x₂ axis at T/w₂ = 1/2. The distance between the origin and the separating surface is:

        T/(w₁² + w₂2)^1/2 = 1/(2(2)^1/2).

References

J. Anderson, "A Memory Storage Model Utilizing Spatial Correlation Functions," Kybernetik, vol. 5, no. 3, pp. 113-119, 1969

J. Anderson, "A Simple Neural Network Generating an Interactive Memory," Mathematical Biosciences 14:197-220, 1972.

E. Adrian, The Physical Background of Perception, Clarendon Press, Oxford, 1946.

R. Duda, P. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973.

M.L. Dertouzos, Threshold Logic: A Synthesis Approach, MIT Press, Cambridge Mass., 1965.

D. Hebb, The Organization of Behavior, Wiley, New York, 1949.

M.H. Hassoun and D.W. Clark, "An Adaptive Attentive Learning Algorithm for Single Layer Neural Networks," in Learning Algorithms II, Proceedings of the First Annual IEEE Conference on Neural Networks, vol. I, pp. 431-440, 1988.

M.H. Hassoun and J. Song, "Adaptive Ho-Kashyap Rules for Perceptron Training," IEEE Trans. Neural Networks, vol. 3, no. 1, 1992.

M.H. Hassoun, "Adaptive Dynamic Heterassociative Neural Memories for Pattern Classification," in Optical Pattern Recognition, H-K. Liu, ed., Proc. SPIE, Vol 1053, pp 75-83, 1989.

Y-C. Ho and R.L. Kashyap, "An Algorithm for Linear Inequalities and its Applications," IEEE Trans. Elec. Comp., EC-14(5), pp. 683-688, 1965.

J. Hopfield, "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Nat. Acad of Sci, vol. 79, pp. 2554-2558, 1982.

J. Hopfield and D. Tank, Biological Cybernetics, Vol. 52, pp. 141-152, 1985.

J. Hopfield and D. Tank, IEEE Transactions Circuits and Systems, Vol. CAS-33, 1986.

T. Kohonen, "Correlation Matrix Memories," IEEE Transactions on Computers, C-21, pp 353-359, 1972.

T. Kohonen and M. Ruohonon, IEEE Trans on Computers, vol. 22, pp 701-702, 1973.

T. Kohonen, Self Organization and Associative Memories, Springer-Verlag, New York, 1984.

B. Kosko, "Bidirectional Associative Memories," IEEE Trans. Systems, Man, Cybernetics, vol. SMC-18, pp 49-60, 1988.

S. Kuffler, J. Nicholls, A Martin, From Neuron to Brain, Sinauer Publishers, Sunderland, Mass., 2nd Ed., 1984.

W. McCulloch, W. Pitts, "A Logical Calculas of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics 5:115-133, 1943.

S. Muroga, Threshold Logic and its Applications, John Wiley and Sons, New York, 1972.

F. Rosenblatt, "The Perceptron - a Perceiving and Recognizing Automaton," Report 85-460-1, Cornell Aeronautical Laboratory, Ithica, N.Y. January 1957.

F. Rosenblatt, "The Perceptron: a Probabalistic Model for Information Storage and Organization in the Brain," Psychological Review 65:386-408, 1958.

F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan Books, Washington, D.C., 1962.

C.L. Sheng, Threshold Logic, Academic Press, New York, 1969.

K. Steinbach, IEEE Trans. EC-12 1963

B. Widrow and M.E. Hoff, Jr., "Adaptive Switching Circuits," 1960 IRE WESCON Convention Record, Part 4, pp. 96-104, August 1960.

D. Willshaw, O. Buneman, H. Longuet-Higgins, "Non-holographic Associative Memory," Nature 222:960-962, 1969.