The Training Function

Skeleton 'C' Function

These 4 steps have been put together within an appropriate layer control loop to form the skeleton 'C' function below:
void mlptrain(
  short *pi,      //pointer to inputs
  short *pt,      //pointer to true outputs
  int h) 
{
  for(nl = NAL; nl > 0; nl--)  //for each layer of the network
  {
    if(nl == NAL)  //If doing output layer (ie first time thru) 
    {
      PRIME EACH ELEMENT OF THE ERROR ARRAY WITH -(t j - o j).
    }
    for(nn = 0; nn < NN; nn++) //for each neuron in this layer
    {
      MULTIPLY NEURON'S PRIMED ERROR VALUE BY ðo/ða
      THEN ADJUST ALL THIS NEURON'S INPUT WEIGHTS
    }
    if(nl > 1)     //If not yet reached first active layer 
    {
      PRIME THE PREVIOUS LAYER'S ERROR ARRAY ELEMENTS
      WITH THIS LAYER'S ERROR * WEIGHT SUMMATIONS.
    }
  }
}
This expands into a complete 'C' function as follows:

The Complete 'C' Function

short	E1[N1], E2[N2], E3[N3],  //output errors for each layer
  *E[] = {NULL, E1, E2, E3};   //array of ptrs to above arrays

void mlptrain(
  short *pi,        //pointer to inputs
  short *pt,        //pointer to true outputs
  int h      )
{
  int nl;           //layer number
  L[0] = pi;        //points to start of network inputs array
  h += 15;          //shift factor to multiply by neta / R

  for(nl = NAL; nl > 0; nl--)   //for each layer of the network
  {
    short **ppw = *(W + nl),    //ptr to access layer's weights
    *pe = *(E + nl),            //ptr to layer's output errors
    *po = *(L + nl);            //ptr to layer's neural outputs
    int nn, NN = *(N + nl),     //neuron No within current layer
    NI = *(N - 1 + nl);         //number of inputs to this layer

    if(nl == NAL)               //If doing output layer, 

      //Prime each element of the error array with -(t j - o j)

      for(nn = 0; nn < NN; nn++)
        *(pe + nn) = *(po + nn) - *(pt + nn);

      pi = *(L + nl - 1);      //ptr to start of layer's inputs

      //For each neuron in this layer, compute the output error

      for(nn = 0; nn < NN; nn++)
      {
        short *pw = *(ppw + nn);  //ptr to neurone's first weight 
        long o = *(po + nn),      //this neurone's output signal
          e = (((R + o) * (R - o)) >> 15) * *(pe + nn)) >> 13; 
        if(e > R) e = R; if (e < -R) e = -R;
        *(pe + nn)=e;             //ðF/ða = ðo/ða
                                  // * last time's summation†

        for(ni = 0; ni < NI; ni++)   //adjust each input weight
          *(pw + ni) -= ((e * *(pi + ni)) / NI) >> h;
      }
      if(nl > 1)   //If not yet reached the first active layer
      {
        short *ps = *(E + nl - 1);   /* pointer to previous
                                     layer's output errors*/
        for(ni = 0; ni < NI; ni++)   // for each input weight
        {                            //to this layer
          long Hi = 0, Lo = 0;       //See mlp() for explan-
          for(nn = 0; nn < NN; nn++) //ation of following code
          {
            long P = (long)*(pe + nn) * *(*(ppw + nn) + ni);
            Hi += P >> 16; Lo += P & 0xFFFF;
          }
          *(ps + ni) = ((Hi << 1) + (Lo >> 15)) / NN;
        }
      }   //End prime previous layer's error array elements 
    }     //with this layer's error * weight summations. †
  }
}

Overview

A training example comprises one input pattern plus its corresponding correct output pattern. The training data file contains a number of training examples which together represent the full range of patterns to which the network is being trained to respond.

The function mlptrain() adjusts the network's weights to minimise the error between the network's output and the known correct output for each given training example. The function is therefore called once after each training example has been presented to the network.

[A training example is presented to the network by calling the function mlp() discussed in the document MLP.htm.]
The function mlptrain() is therefore called once for each example in the training data file. The constant of proportionality neta is then reduced and the process repeated. This is done until the error function F for each training example cannot be reduced any further.

Please refer to the complete 'C' function listing while reading the following.

Input Arguments

The function mlptrain() has to be told where to find its inputs by means of a pointer pi which is passed to it as an input argument. Immediately after entry this value is placed in pointer array element L[0]. The pointer pi is then used in the rest of the function to point to the input values pertaining to the neuron currently being processed.

However, mlptrain( ) also needs to be told where to find the correct output responses corresponding to those inputs. We let it know this by means of a second pointer pt which we pass to it as a second input argument.

The constant of proportionality neta must be reduced externally to mpltrain() for successive presentations of the training data file. It must therefore be passed to mlptrain() as an argument.

In the 'C' code, instead of multiplying the weight increment by h, we right-shift it by an amount 'h'. For example if neta = ¼ then h = 2. Since we are using interger arithmetic, we add 15 to this right-shift factor to give it the effect of dividing by R (the maximum interger value).

The prototyped envelope of mlptrain( ) must therefore be:

void mlptrain(short *pi, short *pt, int h)
{

}

Declarations

The neural input, output and weights arrays are declared externally within the mlp source file. Only the additional error arrays E1[ ], E2[ ] and E3[ ] plus their associated pointer array E[ ] need to be declared for mlptrain(). The start addresses of E1[ ], E2[ ] and E3[ ] are placed in the elements of E[ ]. E[0] is primed with a NULL pointer value because there is no error array E0[ ] for the input layer since the input layer has no active neurons.

Layer Loop

The outermost loop in mlptrain() steps from the output layer backwards through the network finishing with the active layer next to the input layer. Thus the passive input layer is Layer 0, the first active layer is Layer 1 and the output layer is Layer 3. The constant NAL (Number of Active Layers) is defined externally within the mlp source file. Its value is in fact 3. The layer loop thus comprises the statements:
int nl;
for (nl = NAL; nl > 0; nl--)
{

}
The layer number nl starts equal to 3 and the loop is executed for nl = 3. The variable nl is then decremented (by nl--) and the loop is executed for nl = 2. It is then decremented again and the loop is executed for nl = 1. It is then decremented again whereupon it becomes zero and therefore the test nl > 0 fails so the loop terminates without executing for nl = 0.

Pointers

The first task done inside the layer loop is to set up pointers to point to the input weights, the output values and output errors for the layer concerned. The pointer variable ppw is a pointer-to-a-pointer-to-an-interger. It is set equal to the address found in the nl th element of the pointer array W[ ] by the statement:
ppw = *(W + nl);
It points to the start address of one of the secondary pointer arrays W3[ ], W2[ ] or W1[ ] according to the current value of nl. Since ppw is not referred to outside the Layer Loop, the statement in the actual function also declares it as a pointer to a pointer to a short interger at the same time viz:
short **ppw = *(W + nl),
The pointer pe is set up to point to the start address of the layer's error array E3[ ], E2[ ] or E1[ ] according to the current value of nl. Since it too is not referred to outside the layer loop, the setting up statement also declares it:
*pe = *(E + nl),
Since the declaration for ppw was terminated by a comma instead of a semi-colon, the word 'short' does not have to be repeated.

Finally, the pointer variable po is set to point to the start address of the layer's output array L3[ ], L2[ ] or L1[ ] according to the current value of nl:

po = *(L + nl);
This is declared in the same way as the other two, and for the same reason as before, the 'short' type word does not have to be repeated.

Neuron Loop

We now need to deal with each neuron in turn within the current layer nl. To do this we establish a loop within the layer loop which we shall call the Neuron Loop. For this we need an interger variable nn (neuron number) to tell us which of this layer's neurons we are currently dealing with. We therefore declare nn:
int nn;
We also need to know how many neurons there are in this layer. This is held in the appropriate element of the array N[ ]. This array is declared globally and initialised in the mlp source file. The number of neurons NN in Layer Nş nl is therefore N[nl]. We therefore declare and assign NN as:
intNN = *(N + nl);
We therefore establish the neuron loop within the layer loop as follows:
int nn, NN = *(N + nl);
for(nn = 0; nn < NN; nn++)
{

}
Note that unlike the layers we can count the neurons upwards from 0 to NN - 1. In fact we need to loop through all the neurons in the current layer up to 3 times each time doing something quite different.

Priming The Output Errors

If we are on the very first pass of the Layer Loop (ie we are dealing with the output layer) we first need to prime the error array E3[ ] with -(tnn - onn) for each neuron.

The pointer pe points to the start address of this layer's error array, E3[ ], ie it points to the error array element corresponding to this layer's Neuron Nş 0.

The pointer value pe + nn therefore points to the element of this layer's error array which contains the error value corresponding to this layer's Neuron Nş nn. In general, ie for any layer nl, the situation is as follows:

The error value epsilonnn is therefore the contents of the array element whose address is pe + nn. The error value epsilonnn for Neuron Nş nn is therefore given by epsilonnn = *(pe + nn).

In the same way, the values for the given correct output for a given neuron nn and the observed output for the neuron nn are given by:

tnn = *(pt + nn) and onn = *(po + nn)

The error differential ðF/ðo for each output neuron is therefore computed by the statement:

*(pe + nn) = *(po + nn) - *(pt + nn);
The complete program fragment for priming the error array for the output layer is therefore:
if(nl == NAL)                   //If doing output layer, 
  //Prime each element of error array with  -(t j - o j)
  for (nn = 0; nn < NN; nn++)
    *(pe + nn) = *(po + nn) - *(pt + nn);
Syntactically the above is a single 'C' statement so no braces are needed.

Activation Error

Having primed each of the output layer's error array elements with ðF/ðo for each output neuron we now have to multiply each by do/da (the first differential of the Sigmoid function) and replace it with the resulting activation error differential ðF/ða.

We first have to get the neurone's output value 'o'. Because we are about to multiply integers we will obtain 'o' as a 'long' 32-bit interger as follows so that we do not lose any precision from the 32-bit product:

long o = *(po + nn);
This ensures that even though other operands in the multiplication may be only 16-bit integers, the product will have full 32-bit precision.

We can then compute ðF/ða (which we shall call 'e') as follows:

The terms (R + o) and (R - o) are scaled to a maximum absolute value of 32768. Their product is therefore scaled to the square of 32768. To bring the scale of the product back to 32768 we divide it by 32768. We achieve this by right-shifting it 15 places (32768 is the 15th power of 2).

Be aware that although it has been right-shifted 15 binary places to scale it back to 32768, the product (R + o) * (R - o) is still held as a 32-bit 'long' quantity. This ensures that before it is multiplied by ðF/ðo, ðF/ðo is itself converted to a 'long' so that the full 32-bit precision of the resulting 'long' product is preserved.

Finally, we shift this 'long' (32-bit) product into a 'short' (16- bit) register by right-shifting the whole lot 16 places.

However at this point we also need to multiply the result by the sigmoid's shaping constant k. Since it has been decided to make k = 8 we simply need to left-shift our result by 3. So that we hold on to as many of the least-significant bits in the final result as possible, we combine the right-shift of 16 and the left-shift of 3 as a single right-shift of 13.

The resulting value of ðF/ða is then stored into the neurone's output error array element.

The complete program fragment for converting ðF/ðo to ðF/ða for each neuron in this layer is therefore:

for(nn = 0; nn < NN; nn++)   //For each neuron in this layer
{                            //compute the output error.
  long o = *(po + nn);       //this neurone's output signal
  long e = (((R + o) * (R - o)) >> 15) * *(pe + nn)) >> 13; 
  if(e > R) e = R;
  if (e < -R) e = -R;
  *(pe + nn) = e;            //ðF/ða = ðo/ða * ðF/ðo
}
The activation error differential ðF/ða is stored as a 'short' interger. Since in the final scaling we are shifting only 13 binary places right instead of 16 there is a remote possibility that the result could overflow 16-bits. That is the reason for the range-limiting 'if' statements above just before the result is stored.

Weight Adjustments

Two levels of indirection are required to access weights. As well as ppw we therefore need a second pointer pw which points directly to the start address of the weights on the inputs to a neuron nn in layer nl. How these two pointers relate and how an individual weight value is addressed is illustrated below:

Since the value of ppw is constant for a given layer, we have already set it up at the start of the layer loop. But here we need to set up pw which we declare and assign at the same time as follows:

short *pw = *(ppw + nn);   //pointer to neurone's first weight
To adjust the input weights to a neuron we need to know the strengths of its input signals from the previous layer. These of course are the output signals generated by the neurons in the previous layer. These are held in the previous layer's output array L(nl-1) [ ]. So if we are adjusting Layer 3's weights we will find each neurone's input signals in L2[ ].

To address a neurone's inputs we therefore need to set up another pointer to point to the start address of the previous layer's outputs array L(nl-1)[ ] as follows:

pi = *(L + nl - 1);   //pointer to start of this layer's inputs
For this purpose we have made use of the now-redundant input argument pointer variable pi.

Each input weight of the neuron we are currently dealing with must be adjusted by an amount:

Using the pointers we have just set up to address the operands we can now implement this within mlptrain() as follows:

Since 'e' is a 'long', the product e * *(pi + ni) is evaluated as a 'long'. Both 'e' and *(pi + ni) are scaled to an absolute maximum of R (= 32768). Their product is therefore scaled to R-squared. To re-scale the right-hand side to 32768 we need to divide by R. That is why 'h', the shift factor equivalent of neta was increased by 15 at the beginning of mlptrain().

The activation level error differential ðF/ða is contributed to by the errors in all the inputs to the neuron concerned. We therefore divide by the number of inputs to the neuron, NI. To preserve as much precision as possible we do this division before doing the right-shift.

The complete program fragment for adjusting the input weights of one neuron (Neuron Nş nn of Layer Nş nl) is as follows:

short *pw = *(ppw + nn);  //pointer to neurone's first weight 
pi = *(L +nl - 1);        //ptr to start of this layer's inputs
//Adjust the input weight for each input to this neuron
for(ni = 0; ni < NI; ni++)
  *(pw + ni) -= ((e * *(pi + ni)) / NI) >> h;
Since this must be done for all the neurons in the current layer, the above program fragment must go inside the neuron loop immediately following the computation of ðF/ða.

The updating of the current layer is now finished.

Priming the Next Layer

Because we already have the pointers pe and ppw already set up, it is more convenient to prime the previous layer's error array with the required values for ðF/ðo now rather than at the start of the next pass of the layer loop. All we lack is a way to address the elements of the previous layer's error array E(nl-1)[ ]. We provide this by declaring and assigning a pointer ps to point to the start address of E(nl-1)[ ] as follows:
short *ps = *(E + nl - 1);
//pointer to previous layer's output errors
Remember that we can only get ðF/ðo directly in the case of the output layer, and we have already done it for the output layer. Whichever pass of the layer loop we are on, the previous layer at this point is necessarily always a hidden layer. We must therefore compute its ðF/ðo values using the summation formula:

To maintain full precision for the product of the activation error differential ðF/ðo and the weight wni,nn we cast one of them to a 'long' and assign the result to a 'long' variable:

Notice that since we are stepping between the weights arrays during the summation loop we cannot simply assign pw = *(ppw + nn) since nn is a variable during the loop. Looking at the weight matrix, we are stepping vertically up the matrix instead of across it as we were in previous loop.

We need to accumulate the sum of these 'long' products without losing precision or risking overflow. We achieved this by the split accumulator method shown below:

long Hi = 0, Lo = 0;
for(nn = 0; nn < NN; nn++)
{
  long P = (long)*(pe + nn) * *(*(ppw + nn) + ni);
  Hi += P >> 16;
  Lo += P & 0xFFFF;
}
The 'long' product P is split into upper and lower 'short' halves. Each half is added into a separate Hi and Lo 'long' accumulator. This is repeated for nn = 0 to NN - 1. At the end of the summation loop the upper half of the Lo accumulator is added to the Hi accumulator to form a 'long' summation as follows:
*(ps + ni) = ((Hi << 1) + (Lo >> 15)) / NN;
The summation is then divided by the number neurons in the current layer in order to rescale the sum to a 'short'. The result is the value of ðF/ðo for this neuron which is then stored in the appropriate element ni of E(nl-1)[ ].

The split accumulator summation is used in the 'neuron' part of the mlp() function which is described in the document neuron.htm and is fully exercised and demonstrated by the program NEURON.C.

This summation must be done for every neuron in the previous layer (ie the next layer back towards the network's input). We must therefore set the summation loop within another loop which steps through each neuron of that previous layer:

//pointer to previous layer's output errors
short *ps = *(E + nl - 1);
for (ni = 0; ni < NI; ni++)   //for each input to this layer
{
  DO THE SUMMATION LOOP
  *(ps + ni) = ((Hi << 1) + (Lo >> 15)) / NN;
}
Since the neurons of the previous layer provide the inputs for the current layer, the previous layer's error array E(nl-1)[ ] has NI elements. We can therefore use the same loop arrangement we used earlier to adjust the input weights. To address the elements of the previous layer's error array we can therefore use the input index ni in conjunction with a new pointer ps which we must declare outside the loop where we must also set it to point to the start address of the previous layer's error array E(nl-1)[ ].

The last error array we need to prime is E1[ ], the first hidden layer of the network (ie the first active layer following the passive input layer). There are no connection weights on the direct inputs to the network from the outside world. Therefore there is no array called E0[ ]. We prime E1[ ] during the pass of the layer loop in which we are dealing with Layer 2. Therefore we must only perform this priming process for Layers 3 and 2. We must therefore condition the execution of the error array priming with an 'if' statement as follows:

if(nl > 1)  //If not yet reached the first active layer 
{
  PRIME THE PREVIOUS LAYER'S ERROR ARRAY
}
We have now reached the end of the layer loop. We therefore at this point loop back to do the previous layer (the next layer back towards the input) until we have dealt with all the layers of the network.

Rearrangements

You will notice that in the listing of the complete 'C' function on page 6 we have relocated some of the pointer declaration/assignment statements so that they are no longer immediately next to the program fragments to which they each directly relate. This is simply to avoid them being repeatedly set up within a loop in which they are constants.
This page's parent within this Web Site. About this Web Site. Its home page. Email its Author.