Training an Elman Network :: Recurrent Networks (Neural Network Toolbox)

Neural Network Toolbox

Training an Elman Network

Elman networks can be trained with either of two functions, train or adapt.

When using the function train to train an Elman network the following occurs.

At each epoch:

The entire input sequence is presented to the network, and its outputs are calculated and compared with the target sequence to generate an error sequence.
For each time step, the error is backpropagated to find gradients of errors for each weight and bias. This gradient is actually an approximation since the contributions of weights and biases to errors via the delayed recurrent connection are ignored.
This gradient is then used to update the weights with the backprop training function chosen by the user. The function traingdx is recommended.

When using the function adapt to train an Elman network, the following occurs.

At each time step:

Input vectors are presented to the network, and it generates an error.
The error is backpropagated to find gradients of errors for each weight and bias. This gradient is actually an approximation since the contributions of weights and biases to the error, via the delayed recurrent connection, are ignored.
This approximate gradient is then used to update the weights with the learning function chosen by the user. The function learngdm is recommended.

Elman networks are not as reliable as some other kinds of networks because both training and adaption happen using an approximation of the error gradient.

For an Elman to have the best chance at learning a problem it needs more hidden neurons in its hidden layer than are actually required for a solution by another method. While a solution may be available with fewer neurons, the Elman network is less able to find the most appropriate weights for hidden neurons since the error gradient is approximated. Therefore, having a fair number of neurons to begin with makes it more likely that the hidden neurons will start out dividing up the input space in useful ways.

The function train trains an Elman network to generate a sequence of target vectors when it is presented with a given sequence of input vectors. The input vectors and target vectors are passed to train as matrices P and T. Train takes these vectors and the initial weights and biases of the network, trains the network using backpropagation with momentum and an adaptive learning rate, and returns new weights and biases.

Let us continue with the example of the previous section, and suppose that we want to train a network with an input P and targets T as defined below

P = round(rand(1,8))
P =
     1     0     1     1     1     0     1     1

and

T = [0 (P(1:end-1)+P(2:end) == 2)]
T =
     0     0     0     1     1     0     0     1

Here T is defined to be 0, except when two 1's occur in P, in which case T is 1.

As noted previously, our network has five hidden neurons in the first layer.

net = newelm([0 1],[5 1],{'tansig','logsig'});

We use trainbfg as the training function and train for 100 epochs. After training we simulate the network with the input P and calculate the difference between the target output and the simulated network output.

net = train(net,Pseq,Tseq); 
Y = sim(net,Pseq);
z = seq2con(Y);
z{1,1};
diff1 = T - z{1,1}

Note that the difference between the target and the simulated output of the trained network is very small. Thus, the network is trained to produce the desired output sequence on presentation of the input vector P.

See Chapter 11 for an application of the Elman network to the detection of wave amplitudes.

Creating an Elman Network (newelm) Hopfield Network