Expert Systems and Heuristic Programming
CS 530 Summer 2005
The notes are a work in progress. They supplement the presentations and the text. In most cases the material in the notes cover the same material as the presentations, although in a different style and a different order. The collection of notes will be expanded as the class progresses. If you find problems with the notes, please feel free to let me know. All of the documents are presented as pdfs.
In contrast to the symbol manipulation models of the previous chapter, this chapter approaches machine learning from the view that intelligence arises from the collective behavior of a large numbers of simple interacting components.
General introductions to connnectionism are readily available. See: http://plato.stanford.edu/entries/connectionism/, http://carbon.cudenver.edu/~mryder/itc_data/connectionism.html and http://en.wikipedia.org/wiki/Connectionism. an interesting corporate perspective can be found at http://www.makhfi.com/index.htm. A fun swf (by an engineer) can be found at http://cialab.ee.washington.edu/nn_tutorial/nn_tutorial_1-01.html. ANNs with Java abound. See: http://rfhs8012.fh-regensburg.de/~saj39122/jfroehl/diplom/e-index.html and http://www.jooneworld.com/. If you have already looked at IBM ABLE you wil find an ANN component included.
Chapter 11 presents neurally-inspired models, also known as parallel distributed processing (PDP) or connectionist systems, which deemphasize the explicit use of symbols in problem solving. Connectionists hold that intelligence arises in systems of simple, interacting components (biological or artificial neurons) through a process of learning or adaptation by which the connections between components are adjusted. Processing in these systems is distributed across collections or layers of neurons. Problem solving is parallel in the sense that all the neurons within the collection or layer process their inputs simultaneously and independently. These systems are also claimed to degrade gracefully because information and processing are not centrally controlled but distributed across the networks nodes and layers. In connectionist models there is, however, a strong representational character, a strong inductive bias, both in the creation of input parameters as well as in the interpretation of output values. To build a neural network, for example, the designer must create a scheme for encoding patterns in the world into numerical quantities. The choice/bias of an encoding scheme can play a crucial role in the eventual success or failure of the network to learn.
In connectionist systems, processing is parallel and distributed with no manipulation of symbols as symbols. Patterns in a domain are encoded as numerical vectors. The connections between components, or neurons, are also represented by numerical values. Finally, the transformation of patterns is the result of a numerical operations, usually, matrix multiplications. These "designer's choices" for a connectionist architecture constitute the inductive bias of the system.
The algorithms and architectures that implement these techniques are usually trained or conditioned rather than explicitly programmed. This is a major strength of the approach: an appropriately designed network architecture and learning algorithm can often capture invariances in the world, even in the form of strange attractors, without being explicitly programmed to recognize them. How this happens makes up the material of Chapter 11. Make sure to read Sections 11.1 and 11.2. These sections serve as an historical overview of work in connectionist systems. The topics included are: the artificial neuron, McCullock-Pitts systems, perceptron learning, the basic components of neural network learning, the "mechanical" neuron, the McCulloch-Pitts (1943) neuron, perceptron learning, and the delta rule.
Section 11.3 ntroduces nets with hidden layers, and the backpropagation-learning rule. These innovations were introduced in the evolution of artificial neural networks to overcome problems the early systems had in generalizing across data points that were not linearly separable. Backpropagation is an algorithm for apportioning "blame" for incorrect responses to the nodes of a multilayered system with continuous thresholding.
Section 11.4 presents models for competitive learning developed by Kohonen (1984) and Hecht-Nielsen (1987). In these models, network weight vectors are used to represent patterns rather than connection strengths. The winner-take-all learning algorithm selects the node whose pattern of weights is most like the input vector and adjusts it to make it more like the input vector. It is unsupervised in that winning is simply identifying the node whose current weight vector most closely resembles the input vector. The combination of Kohonen with Grossberg (1982) layers in a single network offers a model for stimulus-response learning called counterpropagation learning.
Section 11.5 examines Hebb's (1949) model of reinforcement learning. Hebb conjectured that each time one neuron contributes to the firing of another neuron, the strength of the pathway between the neurons is increased. Hebbian learning is modeled by a simple algorithm for adjusting connection weights. Both unsupervised and supervised versions of Hebbian learning are possible. A Hebbian based model for pattern retrieval from memory, callled the linear associator, is also examined.
Section 11.6 introduces an important family of networks called attractor networks. These networks employ feedback connections to repeatedly cycle a signal within the network. The network output is considered to be the network state upon reaching equilibrium. Network weights are constructed so that a set of attractors is created. Input patterns within an attractor basin reach equilibrium at that attractor. The attractors can therefore be used to store patterns in a memory. Given an input pattern, we retrieve either the closest stored pattern in the network or a pattern associated with the closest stored pattern. The first type of memory is called autoassociative, the second type heteroassociative. John Hopfield (1982), a theoretical physicist, defined a class of attractor networks whose convergence can be represented by energy minimization. Hopfield networks can be used to solve constraint satisfaction problems, such as the traveling salesperson problem, by mapping the optimization function into an energy function.
Clearly there is enogh material here for a separate course, and there is!. In this course attempt to understand the main differences between this sort of learning and sybolic lerarning, the basic procedures (algorithms) for the various kinds of subsymbolic learning, and the differences ammong them.