Extreme Learning Machines - Step by Step to Sucess.

Extreme Learning Machines

Are a form of neural network that offer speed, efficiency and low computational complexity through simplicity and directness.

Composition of an Extreme Learning Machine (ELM):
    

  • Input layer: Takes in the features of the data.
  • Random hidden layer: Processes the input using fixed, randomly assigned weights and activation functions. 
  • Output layer: A linear layer that computes the final output by solving a linear system using the activations from the hidden layer.

 

Key Features of ELMs:

  • Random Initialization of Hidden Layer Weights: The weights connecting the input to the hidden layer are randomly assigned and remain fixed.

  • Simple Training Process: Only the output layer weights are learned - by solving a simple linear system, usually by means of least squares.

  • Universality: ELMs have been shown to approximate any continuous function, which makes them a universal approximator.

  • High Speed and Efficiency: ELMs are capable of training very quickly compared to traditional backpropagation-based neural networks.


Improvements to the ELM structure:

The random layer weights form a matrix that requires the storage of large number of parameters and requires a similarly large number of compute operations when applied to the input layer.

Recognizing that the random matrix performs a random projection on the input layer you can replace the expensive random matrix operation with fast random projection algorithms from the compressive sensing literature.

Typical fast transforms (FFT, WHT) perform (1 to all) point to pattern mapping.
And as well perform mappings from those particular patterns back to points.
For example the FFT does point to sine or cosine wave mapping and the reverse.

Take a natural image in pixel form, apply a fixed random pattern of sign (+-) flips to it and then pass it through a fast transform. Any embedded sine or cosine waves have dissolved into randomness. A linear system will respond to that by outputing noise from the Gaussian distribution (Central Limit Theorem.)

For a natural data then it often enough to apply random sign flippling and a fast transform to obtain a 1 to all linear random projection.
RP(x) = HF(x)
F is fixed random pattern of sign flips and H is a fast transform.

However for sparse input data two rounds are necessary, lest a single non-zero input produce only a know pattern on the output (eg. a sine or cosine wave.)

RP(x) = HFHF(x)

You are now in a position to replace the expensive square matrix operation with much cheaper fast transform based random projections.

Other random projection improvements:

Error Spreading:

You can use fast random projections to spread out (1 to all property) training errors over all the output layer neurons, making every neuron involved in helping another neuron solve its problems.

Locality Sensitive Hashing.
Binarizing the output of a random projection gives you a Locality Sensitive Hash (LSH.)
You can use locality sensitive hashing to select weights or blocks of weights from a vast pool of weights for a particular input. And then have the ELM use those weights to process the particular input. Giving the ELM vast size while still keeping the compute requirement very low.

Code ( Java ):

Fixed random pattern of sign flips

int r=0x9E3779B9; //random number seed
for(int i=0; i<vecLen; i++) {
      x[i]=r<0? -x[i]:x[i];
      r*=0x93D765DD; // MCG random number generator
}

Fast Walsh Hadamard transform - A very suitable transform for random projection.
Input array vector must be a power of 2 in length like 16,32,64....

static void whtN(float[] vec) {
        int n = vec.length;
        int hs = 1;
        while (hs < n) {
            int i = 0;
            while (i < n) {
                final int j = i + hs;
                while (i < j) {
                    float a = vec[i];
                    float b = vec[i + hs];
                    vec[i] = a + b;
                    vec[i + hs] = a - b;
                    i += 1;
                }
                i += hs;
            }
            hs += hs;
        }
        float scale = 1f / (float) Math.sqrt(n);
        for (int i = 0; i < n; i++) {
            vec[i]*=scale;
        }
}








Comments

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits