Just Issued: NVIDIA Audio-to-Video Synthesis Patent | AlgorithmClaims

A June 2026 NVIDIA grant turns a voice track into matching video. What issued, what it covers, and why the chip company is filing on generative media.

Here's what actually issued. On June 9, 2026, NVIDIA Corporation was granted US12651459B2, "Synthesizing video from audio using one or more neural networks," with inventors Ming-Yu Liu, Ting-Chun Wang, and Arun Mallya — a generative-media research lineage. The CPC list crosses three areas: G06V 20/46 and G06V 40/169 (video and facial-region vision), G06N 3/04 (neural networks), and G10L 15/16 (speech). A claim that touches vision, speech, and neural networks at once is, structurally, an audio-to-video synthesis method, and the classifications say so before the claims do.

Read plainly, the claimed contribution takes an audio input and produces corresponding video — the textbook example being a face whose lip and head motion track a speech signal. A neural network learns the statistical relationship between sound and the visual motion that produces it, then, given new audio, generates the frames that would plausibly accompany it. "Plausibly" is load-bearing: the method synthesizes a convincing video, it does not recover a true one. The G06V 40/169 facial-region code is the tell that talking-face synthesis is squarely within the claimed territory.

“Apparatuses, systems, and techniques are presented to reduce an amount of data to be transmitted for media content.”— U.S. Patent No. 12,651,459 source

On scope and status, the discipline holds. This is granted, B2, enforceable as of issue. But the claim covers a specific audio-driven synthesis method using one or more neural networks, not the entire idea of generative video. The cross-class CPC bounds it toward the audio-to-motion mapping, particularly for faces; it is not a monopoly on text-to-video or generative media writ large. Read claim 1 for the boundary rather than inferring breadth from the general-sounding title.

Why is a company best known for accelerators filing on media generation? Because NVIDIA's pattern, visible across its portfolio, is to own pieces of the workloads its hardware accelerates — not only the silicon. A granted claim on audio-to-video synthesis is a stake in the generative-media application layer, the same posture as its data-augmentation grant (US12651480B2) covered elsewhere on this site. The chips run the models; the patents claim the methods that run on them.

The takeaway for the IP reader: US12651459B2 is a clean, issued example of generative-media IP accumulating at a hardware company. It sits at the center of both useful applications — dubbing, avatars, accessibility — and the deepfake-risk conversation, but the patent itself is narrower than either of those framings: a specific method for turning audio into corresponding video, granted and classifiable, and worth reading for exactly what its claims allow.

Just Issued: NVIDIA's Grant on Synthesizing Video From Audio (G06V, G10L)

Comments