The Convex Hull and the Manifold - neural network training and test data

August 08, 2025

The orange polygon is the convex hull of the training data, the blue curve is the natural image manifold, the red dots are training points, and the green dot is a test point that’s outside the convex hull but still on the manifold.

1. Convex Hull of Training Data

Definition (geometry view):
The convex hull of a set of points is the smallest convex set containing them. Imagine stretching a rubber band around your scattered data points in feature space — the shape you get is the convex hull.
In the context of image data:
If each image is a point in a very high-dimensional space (e.g., a $32 \times 32$ RGB image = 3,072-D vector), the convex hull is a gigantic, convex polytope containing all your training examples.
Importantly:
- Many points inside the convex hull do not correspond to valid, natural images.
- Linear combinations of training images (as in convex combinations) often produce weird blends, not real-world photos.
Neural network connection:
The convex hull is relevant for linear models or models that behave nearly linearly in some regions: if the test point lies outside the convex hull of the training data, you’re extrapolating, which is risky.
But most deep nets learn nonlinear decision boundaries that can handle points outside the convex hull if they lie on the same low-dimensional manifold.

2. Low-Dimensional Manifold for Natural Images

Definition (manifold hypothesis):
Natural images, despite being high-dimensional vectors, occupy a thin, curved subset of the huge ambient space.
This manifold has much lower intrinsic dimensionality — perhaps dozens or hundreds of degrees of freedom, instead of thousands.
Why this happens:
Physical constraints (lighting, shapes, textures, viewpoint, etc.) and semantics greatly limit the possible configurations of pixels.
Random points in pixel space are almost always noise — far off the natural image manifold.
Training and test data location:
Both lie on or near the same manifold.
Good generalization happens because the test set samples the same manifold as the training set, even if it lies outside the convex hull of the training samples.

3. Relationship Between Convex Hull & Manifold

Key idea:
The convex hull is a crude, convex wrapper around your training points. The low-dimensional manifold is a much thinner, curved structure winding through space.
The manifold typically pokes outside the convex hull in some places, and is mostly empty space inside the convex hull except along its surface.
Reasoning intuition in high dimensions:
- In very high dimensions, the convex hull of many points still occupies an infinitesimally small fraction of the space.
- The manifold is even smaller — it’s like a filament or sheet inside the hull.
- The hull contains many “interpolations” between real images that aren’t realistic at all — they’re just valid convex combinations, not valid points on the manifold.
- Neural networks don’t simply interpolate in the convex hull; they learn to follow the manifold structure.

Search This Blog

Science Limelight

The Convex Hull and the Manifold - neural network training and test data

1. Convex Hull of Training Data

2. Low-Dimensional Manifold for Natural Images

3. Relationship Between Convex Hull & Manifold

Comments

Post a Comment

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits