Google Patent Claims On-Device Speech Model Distillation

A patent granted June 23, 2026 and assigned to Google LLC claims a federated distillation method that compresses a large server-side speech recognition encoder into a smaller on-device one using principal component analysis. This is an issued patent, not a pending application, and it lands in a recent Google cluster pointed at on-device, privacy-preserving model training.

On June 23, 2026 the U.S. Patent and Trademark Office issued a patent that is not about a new speech-recognition architecture so much as a way to move one — to take a large speech model trained on a server and reconstitute a smaller, faithful copy of it inside a phone, without shipping the user's voice data back to the server to do it. Titled "Federated knowledge distillation on an encoder of a global ASR model and/or an encoder of a client ASR model," the patent is assigned to Google LLC and carried at US12664978B2. The first label matters: this is a granted patent, not a published application — the claims below are ones an examiner allowed and that Google now holds, not merely language it filed. With that fixed, the question worth answering is what claim 1 actually covers.

The setup the claim describes is a pair of automatic speech recognition (ASR) models built as RNN-Transducers (RNN-T), the streaming architecture that pairs an audio encoder with a prediction network and a joint network. There is a global model on the server with a large encoder, and a client model on the device whose encoder is explicitly "smaller than the global encoder." Critically, the client model reuses the global model's prediction network and joint network unchanged — so the only thing being learned, and the only thing being shrunk, is the encoder. That framing is the whole point: distillation here is an encoder-to-encoder operation, and the rest of the model is held constant on both sides.

The mechanism that makes the transfer work is principal component analysis. As claimed, the method first processes a set of training instances through PCA to produce two artifacts: a mean vector and a set of principal directions — a compact, lower-dimensional coordinate system that captures the dominant structure of the global model's representations. Training then proceeds instance by instance. For each utterance, the client model produces "one or more predicted coefficients," the global model produces its output, and a loss is computed from four things together: the global output, the client's predicted coefficients, the PCA mean vector, and the PCA principal directions. The system updates only the client encoder against that loss. In plain terms, the small on-device encoder is not asked to match the big model in its full output space; it is asked to predict where the big model's behavior lands in the compressed PCA space, a far smaller target to learn.

distilling information from a global automatic speech recognition (“ASR”) model to generate a client ASR model, wherein the global ASR model includes a global encoder, a prediction model, and a joint network, wherein the client ASR model includes a client encoder that is smaller than the global encoder of the global ASR model, the prediction model of the global ASR model, and the joint network of the global ASR model, and wherein distilling the global ASR model to generate the client ASR model comprises: processing a set of training instances using principal component analysis (“PCA”) to generate (a) a mean vector for the set of training instances and (b) a set of principal directions for the set of training instances… generating a loss based on the global output… the one or more predicted coefficients… the mean vector… and the set of principal directions… and updating one or more portions of the client encoder based on comparing the loss and the one or more predicted coefficients.— Federated knowledge distillation on an encoder of a global ASR model and/or an encoder of a client ASR model, US12664978B2

Where the claims land in the CPC landscape

The classification places the grant precisely. Its primary CPC home is G10L 15/16 — speech recognition using neural networks — which is exactly where a method whose subject is an ASR encoder belongs, and it carries the G06N machine-learning family for the distillation and PCA training machinery itself. That pairing is informative: the claimed contribution is not a new acoustic front end or a new language model, but a training procedure that sits on top of an existing neural ASR stack. The G10L 15/16 placement signals the application domain (speech), while the G06N association signals that the allowed novelty lives in how the smaller model is trained, not in the recognition pipeline it ultimately runs.

The dependent claims sharpen the boundaries of what issued. Claim 4 fixes both models as RNN-Transducers. Claim 5 narrows the PCA step to Bregman PCA, a generalization of standard PCA suited to non-Gaussian representations. Claims 6 and 7 are the on-device claims: the client model is "stored locally at a client device" while the global model sits "at a server remote from the client device," and the global encoder's memory footprint is explicitly larger than the client encoder's. Claim 8 notes the global model is initially trained on "non-private training data." And claim 9 closes the loop the title promises — distilling information back from the client model to the global model, updating the global encoder from client-side predicted coefficients. Independent claim 10 recasts that reverse direction as its own method, and claim 16 casts the forward direction as a system claim. Read together, the allowed scope is a bidirectional, PCA-mediated encoder-distillation protocol with a server-large / device-small asymmetry written directly into the claims.

A cluster directed at on-device, privacy-preserving training

The hero grant did not issue alone. It sits at the head of a tightly themed run of Google grants that share the same concern: training and personalizing models without centralizing raw user data. Issued alongside it is US12664977B2, directed to on-the-fly parameter compression for federated learning — the compression-of-model-updates problem that is the natural companion to distilling a smaller encoder in the first place. Where the hero patent shrinks the model, this neighboring grant is directed at shrinking what crosses the network during federated rounds. The two read as adjacent solutions to the same on-device-training bottleneck: the size of the model and the size of the update.

The cluster widens from there into personalization and on-device inference. US12664976B2 is directed to query-replay personalization of a large language model, a method for adapting an LLM to a user from their own queries rather than from a centrally pooled dataset. US12664818B2 is directed to on-device face recognition — another instance of pushing a recognition model down to the client so the underlying biometric or behavioral signal need not be transmitted. Across the four grants, the throughline is consistent and the CPC spread reflects it: speech recognition at G10L for the hero, the G06N learning family running underneath the distillation and federated-compression methods, and personalization and on-device recognition rounding out an estate that is repeatedly directed at keeping the training signal where the data already is — on the device.

What US12664978B2 claims, then, is narrower and more concrete than "federated learning for speech." It is a specific encoder-distillation procedure: take a large server ASR encoder, run PCA to build a compact coordinate system, and train a smaller device encoder to predict coefficients in that space against a loss assembled from the global output and the PCA components, with the prediction and joint networks held fixed and the direction of distillation runnable both ways. Because the device side learns only to reproduce PCA coefficients rather than to match raw outputs, the protocol is structured so the on-device training the claims describe can proceed in a federated setting without raw audio leaving the device. The scope is what the allowed claims say it is, and the claims say it plainly. For reading the record as granted, the coverage is on its face, and its company is unambiguous: it issued in a Google cluster directed, repeatedly, at the same problem — training capable models on the client without pulling the client's data to the server.

A Newly Issued Patent Claims Distilling a Server Speech Model Into On-Device Encoders Without Moving Raw Audio

Where the claims land in the CPC landscape

A cluster directed at on-device, privacy-preserving training

Comments