Extreme Learning Machines (ELMs) as the “sponge”

ELMs (and similar fixed-feature random projection networks) already have these sponge properties:

Fast training — weights in the hidden layer are fixed random projections; only the output weights are learned (often by linear regression).
Good interpolation in the training region, if the random features span the relevant subspace.
Low complexity — less tuning, smaller training cost.

If your “sponge” role really is just to store associative patterns and map them into a useful feature space, then an ELM (or random Fourier features, or kernel approximations) can do that far more cheaply than a full backprop-trained deep net.

The trade-off:

The feature mapping is fixed, so you lose the adaptive representation learning that deep nets excel at.
You may need a lot of random features to match the coverage of a trained net, which can inflate memory.
Without adaptation, robustness to domain shift can be worse unless the random mapping is very high-dimensional.

2. Engineering extrapolation into the system

If you swap a dense ReLU net for an ELM, the extrapolation still has to come from somewhere. Options include:

External retrieval/memory (RAG, vector databases) — doesn’t require backprop through the memory.
Symbolic or algorithmic modules — e.g., planners, calculators, simulators.
Learned controllers or routers that select modules — these can be trained without full backprop into the sponge, using reinforcement learning or policy gradients.
Feature-space composition — e.g., combining ELM features with structured representations (graphs, trees) before feeding into downstream algorithms.

3. When backprop through the whole system matters

You’re right — in many state-of-the-art systems, the engineered modules are differentiable and trained end-to-end:

Vision–language models: image encoder + text decoder + cross-attention are all trained together.
Tool-augmented LLMs with differentiable retrievers: the retriever is fine-tuned via gradients from the language loss.
Reinforcement learning with differentiable environments (rare, but exists).

End-to-end backprop allows the neural core to co-adapt its representation to the needs of the surrounding modules, which often improves efficiency and sample complexity.
If you replace the core with a fixed-feature ELM, you lose this co-adaptation — the surrounding modules have to work with whatever feature space the ELM gives them.

4. Hybrid compromise

One possible hybrid is:

Cheap, partially trainable sponge:
- Randomized features for most of the projection
- A small trainable bottleneck layer for adaptation
Non-differentiable extrapolation modules:
- Retrieval, simulation, symbolic reasoning
Occasional fine-tuning of the bottleneck or output weights using a meta-learning loop, rather than continuous backprop through everything.

This keeps the sponge inexpensive, but lets the engineered shell do the heavy extrapolation lifting.

Search This Blog

Science Limelight

Using Extreme Learning Machines as a Data Sponge

Extreme Learning Machines (ELMs) as the “sponge”

2. Engineering extrapolation into the system

3. When backprop through the whole system matters

4. Hybrid compromise

Comments

Post a Comment

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits