Just Published: NVIDIA Multimodal Data-Selection App

A December-2025 NVIDIA application uses foundation models to select and enrich multimodal training datasets. Curating data with models.

Here's what published — published is not granted. Application US20250384660A1, "Foundation Models for Multimodal Semantic Data Selection and Dataset Enrichment," published December 18, 2025, assigned to NVIDIA Corporation, inventors including Jose Alvarez and Sifei Liu. The CPC codes are vision classes G06V 10/762, G06V 10/82, and G06V 20/70 — flagging genuine multimodality, which the house standard says to name.

The mechanism is using models to curate data. Training a strong model depends as much on which data you train on as on the architecture. This application's approach uses foundation models to perform semantic data selection — choosing the most useful examples from a large pool — and dataset enrichment, augmenting the data to fill gaps. In effect, a model is employed to decide what a downstream model should learn from, across multiple modalities.

“In various examples, a system can perform multimodal selection of data to generate and/or enrich efficient datasets. The system can retrieve clusters of image frames generated according to semantic characteristics, such as semantic embeddings, of the image frames.”— U.S. Patent Application 2025/0384660 A1 source

This is the data-centric thread that runs through NVIDIA's filings: the company patents not just architectures and silicon but the data plumbing that feeds training. As the field accepts that data quality is often the binding constraint, methods that automate selection and enrichment become strategically valuable, and they tie naturally to NVIDIA's role across the training stack.

Because this is a publication, the framing is intent. The claims as filed describe what NVIDIA seeks; the allowed claims, if a grant issues, set the scope. The vision CPCs and the multimodal title indicate the worked domain, but the enforceable boundary awaits prosecution.

The takeaway: US20250384660A1 is a recent published marker on the data-curation layer — using foundation models to choose and enrich multimodal training data — consistent with NVIDIA's pattern of staking claims across the application layer it accelerates.

Just Published: NVIDIA's Application on Foundation Models for Multimodal Data Selection (2025)

Comments