Posts

Low Level Class Methods in Java

  1. Bitwise and integer operations Integer & Long (Also in Long with long versions) bitCount(int x) — popcount (Hamming weight) numberOfLeadingZeros(int x) numberOfTrailingZeros(int x) highestOneBit(int x) — isolate MSB lowestOneBit(int x) — isolate LSB rotateLeft(int x, int n) rotateRight(int x, int n) reverse(int x) — bit reversal reverseBytes(int x) — byte swap compareUnsigned(int x, int y) divideUnsigned(int dividend, int divisor) remainderUnsigned(int dividend, int divisor) toUnsignedLong(int x) Const: BYTES , SIZE 2. Unsigned helpers Byte.toUnsignedInt(byte b) Short.toUnsignedInt(short s) Integer.toUnsignedLong(int i) Integer.compareUnsigned(...) , Integer.divideUnsigned(...) , Integer.remainderUnsigned(...) Same for Long ( compareUnsigned , divideUnsigned , remainderUnsigned ) 3. IEEE-754 float/double bit manipulation Float floatToIntBits(float f) — normalizes NaN floatToRawIntBits(floa...

Low level programming with Java

Java hides a surprisingly capable “low-level subset” for doing bit-twiddly, memory-adjacent work while still staying safe(-ish). Here’s a compact map with the key tools, what they’re for, and tiny examples. 1) Bitwise core: operators & integer promotions Types: byte (8, signed), short (16, signed), char (16, unsigned), int (32, signed), long (64, signed). Operators: & | ^ ~ << >> >>> >> = arithmetic right shift (sign-extends) >>> = logical right shift (zero-fills) Small types ( byte/short/char ) promote to int before ops—mask when you need exact widths. int packRGBA ( int r, int g, int b, int a) { return ((r & 0xFF ) << 24 ) | ((g & 0xFF ) << 16 ) | ((b & 0xFF ) << 8 ) | (a & 0xFF ); } int r = (rgba >>> 24 ) & 0xFF ; 2) Unsigned helpers (on signed primitives) Convert/compare/divide as if unsigned: Byte.toUnsignedInt(byte b) , Short.toUnsignedInt(shor...

Using Extreme Learning Machines as a Data Sponge

  Extreme Learning Machines (ELMs) as the “sponge” ELMs (and similar fixed-feature random projection networks) already have these sponge properties: Fast training — weights in the hidden layer are fixed random projections; only the output weights are learned (often by linear regression). Good interpolation in the training region, if the random features span the relevant subspace. Low complexity — less tuning, smaller training cost. If your “sponge” role really is just to store associative patterns and map them into a useful feature space, then an ELM (or random Fourier features, or kernel approximations) can do that far more cheaply than a full backprop-trained deep net. The trade-off: The feature mapping is fixed, so you lose the adaptive representation learning that deep nets excel at. You may need a lot of random features to match the coverage of a trained net, which can inflate memory. Without adaptation, robustness to domain shift can be worse unles...

Can Neural Networks alone extrapolate or only interpolate?

  Why do dense ReLU-activated neural networks—which in essence combine lots of linear (weighted-sum) interpolation with only a few points of non-linearity (the "hinges" of ReLU)—perform so well in practice? And is this limited but focused non-linearity a feature or a limitation? Especially as the non-linearity per parameter ratio decreases with layer width.   Here's what the current theory and empirical research reveal: 1. Universal Approximation and the Role of Non-linearity Due to the Universal Approximation Theorem , even a network with just one hidden layer and a non-polynomial activation (like ReLU) can approximate any continuous function arbitrarily well, assuming it's sufficiently wide or deep Wikipedia . However, this theorem is an existence guarantee only —it doesn’t say how to train such networks, nor how many neurons you actually need in practice. This shows that dense networks with ReLU do have the expressive capacity—but training them effectively (t...

Data Sponge + Engineered Shell - Modern Learning Systems

Data Sponge Modern neural nets often act like very large parametric associative memories (a “data sponge”), and much of the apparent extrapolation/robust behavior in deployed systems comes from non-parametric and engineered components around the network (retrieval, tool-use, symbolic modules, long-term memory, routing, RL controllers, etc.). Those surrounding components enable—or at least strongly shape—the system’s ability to go beyond the network’s raw parametric interpolation. Below I sketch why, point to concrete evidence, list limitations, and give practical experiments you (or a lab) could run to test the idea. Why the “data sponge + engineered shell” view makes sense Large models memorize and act like associative memory. The model parameters store huge amounts of statistical patterns and factual associations learned from data; in many tasks the model is effectively retrieving and stitching together stored patterns rather than performing symbolic reasoning from first princ...