Here's what actually issued. On June 9, 2026, NVIDIA Corporation was granted US12651459B2, "Synthesizing video from audio using one or more neural networks," with inventors Ming-Yu Liu, Ting-Chun Wang, and Arun Mallya — a generative-media research lineage. The CPC list crosses three areas: G06V 20/46 and G06V 40/169 (video and facial-region vision), G06N 3/04 (neural networks), and G10L 15/16 (speech). A claim that touches vision, speech, and neural networks at once is, structurally, an audio-to-video synthesis method, and the classifications say so before the claims do.

Read plainly, the claimed contribution takes an audio input and produces corresponding video — the textbook example being a face whose lip and head motion track a speech signal. A neural network learns the statistical relationship between sound and the visual motion that produces it, then, given new audio, generates the frames that would plausibly accompany it. "Plausibly" is load-bearing: the method synthesizes a convincing video, it does not recover a true one. The G06V 40/169 facial-region code is the tell that talking-face synthesis is squarely within the claimed territory.

On scope and status, the discipline holds. This is granted, B2, enforceable as of issue. But the claim covers a specific audio-driven synthesis method using one or more neural networks, not the entire idea of generative video. The cross-class CPC bounds it toward the audio-to-motion mapping, particularly for faces; it is not a monopoly on text-to-video or generative media writ large. Read claim 1 for the boundary rather than inferring breadth from the general-sounding title.

Why is a company best known for accelerators filing on media generation? Because NVIDIA's pattern, visible across its portfolio, is to own pieces of the workloads its hardware accelerates — not only the silicon. A granted claim on audio-to-video synthesis is a stake in the generative-media application layer, the same posture as its data-augmentation grant (US12651480B2) covered elsewhere on this site. The chips run the models; the patents claim the methods that run on them.

The takeaway for the IP reader: US12651459B2 is a clean, issued example of generative-media IP accumulating at a hardware company. It sits at the center of both useful applications — dubbing, avatars, accessibility — and the deepfake-risk conversation, but the patent itself is narrower than either of those framings: a specific method for turning audio into corresponding video, granted and classifiable, and worth reading for exactly what its claims allow.