Here's what published — published is not granted. Application US20250061316A1, "Dynamic Quantization and Memory Management of Key-Value Cache for Serving Large Language Models," published February 20, 2025, assigned to Intel Corporation, inventors including Sameh Gobriel and Nilesh Jain. The CPC codes are G06N 3/0495 (model compression) and G06N 3/082.
The mechanism attacks the same bottleneck as the headline KV-cache filings, but from the compression angle. The KV-cache is large; storing it in full precision is wasteful. This application's approach is dynamic quantization — representing the cached keys and values in fewer bits, adjusting the precision as needed — combined with active memory management of the cache. Shrink the cache and you fit longer contexts or more concurrent requests in the same memory.
It's instructive to read this alongside the other 2025 KV-cache records: the algorithm-and-model labs file methods for what to cache and how to route it, while the silicon vendors like Intel file methods for how to store it cheaply on real hardware. The G06N 3/0495 compression code marks Intel's contribution as the storage-efficiency layer of the same problem.
Because this is a publication, the verb is "claims as filed." Until a grant issues with allowed claims, scope is undetermined and nothing is enforceable. The document signals Intel's serving-efficiency research direction.
The takeaway: US20250061316A1 is the hardware-vendor's entry in the KV-cache fight — dynamic quantization plus memory management — and, read next to the model-lab filings, it shows the whole industry converging on the cache as the binding constraint of LLM serving.