Summary :: Backpropagation (Neural Network Toolbox)

Neural Network Toolbox

Summary

Backpropagation can train multilayer feed-forward networks with differentiable transfer functions to perform function approximation, pattern association, and pattern classification. (Other types of networks can be trained as well, although the multilayer network is most commonly used.) The term backpropagation refers to the process by which derivatives of network error, with respect to network weights and biases, can be computed. This process can be used with a number of different optimization strategies.

The architecture of a multilayer network is not completely constrained by the problem to be solved. The number of inputs to the network is constrained by the problem, and the number of neurons in the output layer is constrained by the number of outputs required by the problem. However, the number of layers between network inputs and the output layer and the sizes of the layers are up to the designer.

The two-layer sigmoid/linear network can represent any functional relationship between inputs and outputs if the sigmoid layer has enough neurons.

There are several different backpropagation training algorithms. They have a variety of different computation and storage requirements, and no one algorithm is best suited to all locations. The following list summarizes the training algorithms included in the toolbox.

Function
Description

traingd
Basic gradient descent. Slow response, can be used in incremental mode training.

traingdm
Gradient descent with momentum. Generally faster than traingd. Can be used in incremental mode training.

traingdx
Adaptive learning rate. Faster training than traingd, but can only be used in batch mode training.

trainrp
Resilient backpropagation. Simple batch mode training algorithm with fast convergence and minimal storage requirements.

traincgf
Fletcher-Reeves conjugate gradient algorithm. Has smallest storage requirements of the conjugate gradient algorithms.

traincgp
Polak-Ribiére conjugate gradient algorithm. Slightly larger storage requirements than traincgf. Faster convergence on some problems.

traincgb
Powell-Beale conjugate gradient algorithm. Slightly larger storage requirements than traincgp. Generally faster convergence.

trainscg
Scaled conjugate gradient algorithm. The only conjugate gradient algorithm that requires no line search. A very good general purpose training algorithm.

trainbfg
BFGS quasi-Newton method. Requires storage of approximate Hessian matrix and has more computation in each iteration than conjugate gradient algorithms, but usually converges in fewer iterations.

trainoss
One step secant method. Compromise between conjugate gradient methods and quasi-Newton methods.

trainlm
Levenberg-Marquardt algorithm. Fastest training algorithm for networks of moderate size. Has memory reduction feature for use when the training set is large.

trainbr
Bayesian regularization. Modification of the Levenberg-Marquardt training algorithm to produce networks that generalize well. Reduces the difficulty of determining the optimum network architecture.

Function	Description
`traingd`	Basic gradient descent. Slow response, can be used in incremental mode training.
`traingdm`	Gradient descent with momentum. Generally faster than `traingd`. Can be used in incremental mode training.
`traingdx`	Adaptive learning rate. Faster training than `traingd`, but can only be used in batch mode training.
`trainrp`	Resilient backpropagation. Simple batch mode training algorithm with fast convergence and minimal storage requirements.
`traincgf`	Fletcher-Reeves conjugate gradient algorithm. Has smallest storage requirements of the conjugate gradient algorithms.
`traincgp`	Polak-Ribiére conjugate gradient algorithm. Slightly larger storage requirements than `traincgf`. Faster convergence on some problems.
`traincgb`	Powell-Beale conjugate gradient algorithm. Slightly larger storage requirements than `traincgp`. Generally faster convergence.
`trainscg`	Scaled conjugate gradient algorithm. The only conjugate gradient algorithm that requires no line search. A very good general purpose training algorithm.
`trainbfg`	BFGS quasi-Newton method. Requires storage of approximate Hessian matrix and has more computation in each iteration than conjugate gradient algorithms, but usually converges in fewer iterations.
`trainoss`	One step secant method. Compromise between conjugate gradient methods and quasi-Newton methods.
`trainlm`	Levenberg-Marquardt algorithm. Fastest training algorithm for networks of moderate size. Has memory reduction feature for use when the training set is large.
`trainbr`	Bayesian regularization. Modification of the Levenberg-Marquardt training algorithm to produce networks that generalize well. Reduces the difficulty of determining the optimum network architecture.

One problem that can occur when training neural networks is that the network can overfit on the training set and not generalize well to new data outside the training set. This can be prevented by training with trainbr, but it can also be prevented by using early stopping with any of the other training routines. This requires that the user pass a validation set to the training algorithm, in addition to the standard training set.

To produce the most efficient training, it is often helpful to preprocess the data before training. It is also helpful to analyze the network response after training is complete. The toolbox contains a number of routines for pre- and post-processing. They are summarized in the following table.

Function
Description

premnmx
Normalize data to fall in the range [-1,1].

postmnmx
Inverse of premnmx. Used to convert data back to standard units.

tramnmx
Normalize data using previously computed minimums and maximums. Used to preprocess new inputs to networks that have been trained with data normalized with premnmx.

prestd
Normalize data to have zero mean and unity standard deviation.

poststd
Inverse of prestd. Used to convert data back to standard units.

trastd
Normalize data using previously computed means and standard deviations. Used to preprocess new inputs to networks that have been trained with data normalized with prestd.

prepca
Principal component analysis. Reduces dimension of input vector and un-correlates components of input vectors.

trapca
Preprocess data using previously computed principal component transformation matrix. Used to preprocess new inputs to networks that have been trained with data transformed with prepca.

postreg
Linear regression between network outputs and targets. Used to determine the adequacy of network fit.

Function	Description
`premnmx`	Normalize data to fall in the range [-1,1].
`postmnmx`	Inverse of `premnmx`. Used to convert data back to standard units.
`tramnmx`	Normalize data using previously computed minimums and maximums. Used to preprocess new inputs to networks that have been trained with data normalized with `premnmx`.
`prestd`	Normalize data to have zero mean and unity standard deviation.
`poststd`	Inverse of `prestd`. Used to convert data back to standard units.
`trastd`	Normalize data using previously computed means and standard deviations. Used to preprocess new inputs to networks that have been trained with data normalized with `prestd`.
`prepca`	Principal component analysis. Reduces dimension of input vector and un-correlates components of input vectors.
`trapca`	Preprocess data using previously computed principal component transformation matrix. Used to preprocess new inputs to networks that have been trained with data transformed with `prepca`.
`postreg`	Linear regression between network outputs and targets. Used to determine the adequacy of network fit.

Limitations and Cautions Control Systems