Just Issued: Amazon RL Model-Compression Patent

A November-2022 Amazon grant uses reinforcement learning to learn how to compress ML models. RL meets model efficiency.

Here's what actually issued. On November 15, 2022, Amazon Technologies, Inc. was granted US11501173B1, "Reinforcement learning for training compression policies for machine learning models," inventors including Gurumurthy Swaminathan and Ragav Venkatesan. The CPC codes are G06N 5/003 and G06N 20/00 — machine learning broadly.

Two AI techniques are stacked. Model compression shrinks a trained network so it runs cheaper; the open question is always how to compress — which layers to prune, how aggressively, at what bit-width. This grant's answer is to learn the compression policy itself with reinforcement learning: an RL agent treats compression choices as actions, gets rewarded for small-and-accurate results, and discovers a compression strategy rather than hand-tuning one.

“A compression policy to produce compression profiles for compressing trained machine learning models may be trained using reinforcement learning. An iterative reinforcement learning may be performed response to a search request.”— U.S. Patent No. 11,501,173 source

The independent claim spells the loop out as a service. Claim 1 describes a "compression profile search system" containing a "reinforcement agent process," exposed through an interface that receives a "compression profile search request" for one or more machine learning models, each trained on one or more data sets, from a client. The system then iterates "until a training criteria is satisfied," and each iteration does three concrete things. First, the RL agent generates "a plurality of different compression profiles" for the models according to a "prospective compression policy" — so on every pass the agent proposes a whole batch of candidate ways to compress, not one. Second, the system directs "performance of different respective versions" of the models corresponding to those profiles, run against the same data sets the models were trained on — meaning each candidate compression is actually instantiated and measured, not estimated. Third, the agent applies "a reward function for one or more performance criteria to performance results" of those versions and uses the reward to update the policy, and the updated policy is then used to generate the next iteration's profiles. When the loop ends, the interface returns the latest policy as the "final compression policy" plus generated profiles in response to the request.

That structure makes the reward signal the heart of the invention, and the dependent claims show the user controls it. Claim 2 lets the client specify the performance criteria for the final policy; claim 7 lets the client specify the compression policy; claim 8 lets the client specify the reward function itself. So a customer can say, in effect, "compress my model, and define 'good' as this accuracy floor at this size" — and the RL search optimizes against exactly that. Claim 4 even lets the request name reinforcement learning as the technique to use "out of a plurality of different search techniques supported by the system," which tells you the system is built as a general profile-search service with RL as one selectable strategy.

Other dependent claims add operational detail that betrays a fleet mindset. Claim 3 stores the generated compression profiles "for subsequent access by the reinforcement learning agent process when updating the compression policy" — the agent remembers what it has already tried, so the search compounds rather than repeating itself. Claim 11 has the different model versions "share one or more portions of execution state amongst computing resources," an efficiency move when you are evaluating many candidate compressions in parallel. Claim 12 provides the trained policy to "a model compression system implemented as part of a machine learning service offered by a provider network" — explicitly a managed-cloud service. And claim 20 narrows the models to convolutional neural networks and the profiles to identifying "one or more channels in different layers of the convolutional neural network to prune," giving the abstract policy a concrete compression action: structured channel pruning.

The strategic logic is cloud economics. Amazon runs models at enormous scale on AWS; a method that automatically finds better compression policies translates into lower serving cost across a huge fleet, and the claims are written as a customer-facing service — request in, trained policy and profiles out, client-specified reward. Owning the RL-for-compression method protects an efficiency lever that compounds with volume, and the "share execution state" and "provider network" language shows the design was meant to run that search cheaply at scale.

On scope, the standard line holds: granted B1, enforceable, but the claims describe using reinforcement learning to train compression policies through this specific iterate-evaluate-reward-update search loop, exposed as a request-driven service with client-specified criteria. They do not cover model compression generally, nor reinforcement learning generally — only the particular marriage of the two as claimed, down to generating multiple profiles per iteration and measuring real compressed versions. Read claim 1 for the boundary.

One detail worth dwelling on is why measuring real compressed versions, rather than estimating them, is load-bearing in these claims. The reward in claim 1 is applied to "performance results of the different respective versions" — meaning each candidate compression is actually built and run against the data sets the model was trained on, and the measured accuracy-and-size of that real artifact is what feeds the reward function. That is more expensive than predicting compressed performance from a heuristic, but it is also what makes the learned policy trustworthy: the RL agent is rewarded for outcomes it has empirically observed, not for a proxy that might mislead it. The claims clearly anticipate that expense and engineer around it. Claim 11's sharing of "execution state amongst computing resources" amortizes the cost of standing up many model versions at once, and claim 3's stored profile history keeps the agent from re-evaluating compressions it has already measured. Claim 20's narrowing to convolutional networks and channel pruning gives the search a concrete, structured action set — pick which channels in which layers to drop — which is both hardware-friendly (it removes whole feature maps rather than scattering zeros) and tractable for an RL agent to explore. Taken together, the dependent claims read as a working blueprint for running a reward-driven compression search economically at fleet scale, not just an abstract pairing of RL with compression.

The takeaway: US11501173B1 is a clean example of meta-learning IP — using one AI technique (RL) to automate another (compression) — from a hyperscaler whose incentive is to drive down the per-inference cost at fleet scale, and whose claims read like a managed cloud service with a tunable reward.

Just Issued: Amazon's 2022 Grant on RL-Trained Compression Policies for Models

Comments