Just Issued: NVIDIA Data Augmentation Patent (G06V) | AlgorithmClaims

A June 2026 NVIDIA grant covers the unglamorous step that often decides whether a model works at all — manufacturing training data.

Here's what actually issued. On June 9, 2026, NVIDIA Corporation was granted US12651480B2, "Data set generation and augmentation for machine learning models." The inventors include Yuzhuo Ren, Weili Nie, Arash Vahdat, and Animashree Anandkumar — names associated with NVIDIA's generative and vision research. The CPC list is entirely in the vision class: G06V 40/176 and G06V 40/164 (facial-feature subgroups), plus G06V 10/62, G06V 10/774, and G06V 10/82. That classification footprint is the first clue to scope — this is image-focused data work, not a general data-pipeline patent.

The claimed contribution, read plainly, is method for expanding and varying a training set. Data augmentation takes real examples and produces variations — geometric and photometric transforms, occlusions, or synthetic examples drawn to resemble the real distribution — so the model sees richer variety and learns the underlying pattern rather than the quirks of the specific images it was handed. The grant claims a particular approach to generating and augmenting that data for ML models, with the vision subgroups pointing at image (and, given G06V 40/16x, facial-feature) applications.

“A machine learning model (MLM) may be trained and evaluated. Attribute-based performance metrics may be analyzed to identify attributes for which the MLM is performing below a threshold when each are present in a sample.”— U.S. Patent No. 12,651,480 source

Why does a step this unglamorous warrant a granted patent? Because data is the constraint everyone hits, and augmentation is the cheapest defense against overfitting — manufacturing useful variety beats collecting more real data, which is slow, expensive, and sometimes impossible. A method that reliably generates good training variety is directly valuable, and NVIDIA, whose hardware runs the training, has a clear strategic reason to own pieces of the workflow that feeds its chips.

On scope, the discipline applies. This is granted (B2) and enforceable, but the claims cover specific augmentation/generation techniques, not the entire idea of "making more training data." The vision-class CPCs — especially the facial-feature subgroups — suggest the allowed claims are tied to image and face-oriented augmentation rather than a domain-agnostic monopoly. Read the claim language for the actual boundary; don't infer it from the broad title.

The takeaway for the IP reader: US12651480B2 is a clean example of a chipmaker patenting the data plumbing, not just the silicon. The architecture patents get the headlines; grants like this one stake out the boring, decisive step where many models quietly succeed or fail. It issued, it's specific, and it's another data point in NVIDIA's pattern of filing across the application layer it accelerates.

Just Issued: NVIDIA's Grant on Data-Set Generation and Augmentation (G06V)

Comments