Change of Basis Transforms - Machine Learning & Neural Networks

August 12, 2025

Change-of-basis transforms are interesting in machine learning and neural networks because they re-express data in a new coordinate system, and sometimes the new basis reveals patterns, structure, or computational shortcuts that were hidden in the original one.

Here’s why they matter and where the “interesting” comes from:

1. Geometry: different coordinates, same space

A change of basis is just a rotation/reflection (orthogonal case) or a more general invertible linear transformation.
It preserves all the information in the vector — nothing is lost — but the representation changes.
In ML, this means you can choose a basis that makes the structure of your problem easier to work with.

2. Sparsity and interpretability

Some bases make signals sparse — most coefficients are zero or small.
- Example: images are not sparse in the pixel basis, but often sparse in a wavelet or DCT basis.
Sparsity is gold in ML: it helps with compression, denoising, and learning from fewer examples.

3. Separating important from irrelevant components

In the right basis, important signal directions can be isolated from noise directions.
PCA (Principal Component Analysis) is essentially a data-driven change of basis that aligns axes with the directions of largest variance.
This is useful for dimensionality reduction and avoiding overfitting.

4. Computational efficiency

Some fixed transforms (FFT, FWHT, DCT) can be applied in $O(N \log N)$ instead of $O(N^2)$ — a huge speed gain.
In neural networks, these can be structured weight matrices:
- Example: Replace a dense layer with $H D P (Hadamard × diagonal random sign × permutation) for fast random projections.$
You get something like the same “expressiveness” but with far fewer multiplications.

5. Random features and kernels

Change-of-basis transforms with randomly chosen basis vectors can approximate kernels or preserve geometry (JL lemma).
Random Fourier features, Fastfood transform, and related techniques are structured changes of basis that make kernel methods scalable.

6. Information flow in networks

In a deep net, each layer can be seen as a change of basis + nonlinearity.
In the NTK/infinite-width perspective, the initial random weights define a fixed basis in function space, and learning adjusts coefficients.
Orthogonal basis changes can improve gradient flow and help avoid signal collapse.

7. Connections to physics and signal processing

In physics, choosing the right basis (Fourier, eigenmodes) can turn hard problems into separable ones.
In ML, the same logic applies: if your task has a symmetry or repeating structure, picking the right basis makes it easier for the network to learn.

Core takeaway:
Change-of-basis transforms in ML are powerful because they can make patterns more visible, computations faster, features sparser, and training easier — all without losing information. In many cases, the “magic” is choosing a basis that matches the problem’s structure, whether that’s learned (PCA, learned embeddings) or fixed (Fourier, Hadamard, wavelets).

Search This Blog

Science Limelight