Here's what actually issued. On July 6, 2021, Intel Corporation was granted US11055604B2, "Per kernel Kmeans compression for neural networks," inventors including Yonatan Glesner and Gal Novik. The CPC mix pairs G06N 3/04 (network architecture) with G06F memory codes (G06F 3/0608, 3/0644, 3/0673), which signals a method aimed at storage and memory layout, not just abstract accuracy.
The mechanism is weight clustering. A trained network's weights are numerous and mostly redundant. K-means groups similar weight values into clusters; you then store the cluster centers plus an index per weight instead of every full-precision value. Doing this per kernel — separately for each filter — tunes the compression to local structure rather than applying one codebook globally. The result is a smaller model that fits in less memory and moves fewer bytes.
Why does this matter strategically? Compression is the gating technology for running models where memory and power are scarce — phones, sensors, embedded silicon. Intel, whose business spans CPUs to edge accelerators, has a direct interest in owning methods that shrink models to fit its hardware. The G06F memory CPCs underline that the claimed value is in the storage representation.
On scope: granted B2, enforceable, but the claims describe per-kernel K-means clustering specifically. Generic quantization, pruning, and other compression families are not captured by this title. Claim 1 sets the line.
The takeaway: US11055604B2 is a silicon vendor patenting the model-shrinking layer that decides whether a network runs on a small device at all — an unglamorous but decisive piece of the inference stack.