Data Structure

The mlp training algorithm uses the same data structure as the mlp( ) 'C' function described in mlp.htm. However, we need to add extra arrays to hold the output errors for each layer. We also need an extra array T[ ] to hold the correct outputs corresponding to each set of presented inputs. The original data structure with these extra arrays added is shown below:

As in mlp( ), we need to re-organise this data structure to take advantage of the 'C' language's pointer arithmetic capabilities. An extra pointer array L[ ] is required to access the output arrays for the different layers. Another extra pointer array E[ ] is similarly required to access the error array for the different layers:

Pointer variable po = *(L + nl) is used in conjunction with a neuron number 'nn' within the layer 'nl' in order to access the output 'o' of a given neuron. So the output of a given neuron, o = *(po + nn). The corresponding pointer variable pe = *(E + nl) is used similarly to access a neurone's activation level error. So the activation error for a given neuron, ðF/ða = *(pe + nn).

The method of accessing the inter-neural connection weights is the same exactly as it was in the mlp( ) function discussed in document mlp.htm. It is repeated below for convenience:

Note that one more order of indirection is needed for weights. A 'pointer-to-a-pointer' variable ppw = *(W + nl) is established which is used to set up the pointer variable pw = *(ppw + nn). A given weight is then obtained using pw so that w = *(pw + ni).

Training Procedure

The training procedure is a matter of adjusting the weights of the network until the observed output is as close a match as possible to the correct output provided in the training data file. Finding out by how much and in which direction each weight must be altered and then doing the alteration must be done as a number of separate computational steps as follows:

Step 1:

Each element of the output layer's error array must be primed with the relative amount by which each neurone's output signal affects the overall output function F as follows:

The index 'j' signifies the neuron number 'nn' of a neuron within the output layer. This corresponds to the element number of each of the three arrays where the correct output, observed output and error differential relating to that neuron are stored. A program loop must be used to work out the error differential ðF/ðo for each value of j, ie for each neuron in the output layer.

Step 2:

The error differential of each neurone's output signal must then be replaced by the error differential of its activation level. In other words we want to find how the error differential is propagated back through the neurone's sigmoid function. We saw earlier that this is given by:

To do this we lift ðF/ðo out of the appropriate element of the error array E3[ ], multiply it by ðo/ða for which we get 'o' from the corresponding element of the output array L3[ ] and place the result back in the element of error array as illustrated below:

Again a program loop must be used to work out the error differential ðF/ða for each value of 'j', ie for each neuron in the output layer.

Step 3:

Now we can adjust the output layer's input weights. The inputs to this layer are the outputs from the previous layer. We therefore signify them by the letter 'o' subscripted by the index 'i' indicating the neuron number 'nn' within the previous layer ['j' indicates the neuron number 'nn' within this layer]. The weight adjustment process is illustrated below:

The weights are adjusted in the following way. The weight on each input to the first neuron (j = 0) is adjusted in turn. We start with the weight on input i = 0 and finish with the weight on input i = 2.

In general, if there are I neurons in the previous layer, we start with the weight on input i = 0 and finish by adjusting the weight on input i = I - 1.

We then do the same for the weights on the inputs to the second neuron (j = 1). In general, if there are J neurons in the layer we are dealing with then we do the same for the weights on the inputs to the other neurons.

To do all this we need two nested program loops:

for(j = 0; j < J; j++)
  for(i = 0; i < I; i++)
    W[i][j] -= neta * E3[j] * L2[i];
Direct reference to array elements by the indexes i and j has been used above rather than pointer references. This is to help you understand the process more clearly at this stage.

Step 4:

Finally we must prime the previous layer's error array with ðF/ðo ready for when we come to process that layer in the next program pass.

We prime the error array at this stage because the items we need are easier to reference from the point of view of the current layer than from the point of view of the previous layer. [Note that because we are working backwards through the network the previous layer is the next layer to be processed.]

The process of calculating ðF/ðo for the first neuron of the previous layer is illustrated below:

The value of ðF/ðo is got by adding up the products of each pair of horizontally corresponding weight and error elements, ie you add the product of E3[0] and W30[0] to the product of E3[1] and W31[0].

The weight elements required to obtain the values of ðF/ðo for the second and third elements of E2[ ] are shown below:

Having primed the error array for the previous layer we loop back to Step 2 and repeat Steps 2 though 4 with the array indexing set to the point of view of the previous layer. We then loop through Steps 2 through 3 a third time to perform them with respect to the second layer back from the output layer.

Kolmogorov's Theorem states that a three-layer perceptron can handle any function. Consequently we never need more than three active layers. Therefore on this third and final pass of the loop we do not need to do Step 4 since there are no further error arrays to prime.


This page's parent within this Web Site. About this Web Site. Its home page. Email its Author.