Who Owns the Transformer and Attention Patents?

A landscape read of who holds attention- and transformer-related neural-network patents, grounded in real granted records across Microsoft, Google, Qualcomm, and others — and a caution about what patent counts do and do not show.

"Who owns the transformer?" is one of the most common questions about AI intellectual property, and the accurate answer is: no one owns the transformer as a concept. The transformer architecture and its central attention mechanism were introduced in academic research published in 2017 — work that entered the public domain of ideas as a paper, not as a patent claim. The broad architecture is therefore not controlled by a single patent or a single assignee. What exists instead is a dense field of narrower grants, each claiming a specific technical variation on attention, transformers, or their training — and reading that field correctly means resisting the urge to translate patent counts into control of the idea.

The distinction matters because of how patents work. A patent does not protect a general concept; it protects what its claims specifically recite, limitation by limitation. So even where a company holds many transformer-related grants, each grant covers a particular implementation — a specific masking scheme, a specific way of combining attention with other layers, a specific hardware mapping. Holding such a patent does not give the holder rights over every transformer; it gives rights over systems that practice that claim's specific limitations. The foundational architecture sits beneath all of them, unowned.

What the granted record actually shows

A search of granted neural-network patents reciting transformer and attention structures returns a spread of assignees, each on a narrow technical variation. Microsoft Technology Licensing, LLC holds US12260338B2, "Transformer-based neural network including a mask attention network" (issued March 25, 2025, classified G06N 3/088, G06N 3/045, G06N 3/063), which claims a transformer using a "mask attention network" that modifies an attention data structure with mask values — a specific variation, not attention in general. Qualcomm Incorporated holds US12652401B2, "Neural network with transformer based video coding tool" (issued June 9, 2026), applying transformer blocks to capture non-local correlations in video coding. Records in this space also appear under Advanced New Technologies, Bank of America, and others — a reminder that transformer patenting is not confined to the AI labs but extends to anyone applying the architecture to a specific domain.

"A transformer-based neural network includes at least one mask attention network (MAN). The MAN computes an original attention data structure that expresses influence between pairs of data items in a sequence of data items."— US12260338B2, "Transformer-based neural network including a mask attention network" (Microsoft Technology Licensing, LLC), source

That abstract is instructive precisely because of how specific it is. The grant is not for "attention" or for "the transformer"; it is for a transformer that includes a mask attention network performing a particular modification of the attention data structure. The claim scope is the mask-attention mechanism, not the general idea of attention. Read across the landscape, the pattern repeats: each assignee's grants cluster around the specific extensions, optimizations, and applications that assignee actually built and filed on. The classification confirms it — these records carry CPC symbols in G06N 3/00 (the neural-network branch), often G06N 3/045 for architecture and G06N 3/08 for learning methods, with vision-applied transformers picking up G06V symbols.

Why counts mislead, and what to read instead

The temptation in landscape analysis is to count grants per assignee and declare a leader. That count is real data, but it answers a narrow question — how much filing activity an assignee has in a defined search — and not the question people think it answers. It does not measure who controls the architecture (no one does), it does not measure the breadth or strength of any individual claim, and it is sensitive to the search itself: which keywords, which CPC subgroups, which date window, and whether published applications are mixed in with grants. A count that lumps the broad neural-network branch G06N 3/00 together will look different from one restricted to records explicitly reciting "transformer" and "attention," and both differ from a count that includes the G06V vision-applied layer. Filing dates add another caveat: classifications and counts shift as applications publish and issue, so a snapshot is exactly that.

A more honest landscape read does three things. It states the foundational fact — the core architecture came from public research and is not owned by one patent. It identifies the specific clusters — which assignees hold grants on which variations (mask attention, video-coding transformers, source-code summarization, and so on), deep-linking the exemplar grants so the claim language can be checked rather than summarized. And it caveats the counts — noting that they reflect filing activity in specific implementations, are query-dependent, and say nothing about claim strength or enforceability. The standout records are more informative than the totals: US12260338B2 tells you Microsoft built and protected a mask-attention variation; US12652401B2 tells you Qualcomm protected a transformer-based video-coding tool. Those are concrete, checkable facts about who filed what.

The same search surfaces a long tail that underscores how diffuse the landscape is. Records reciting transformer and attention structures appear under FPT USA Corp. (US12147791B1, a transformer-based system for source-code summarization), Realtek Semiconductor (US12462548B2, a convolutional-transformer signal-processing method), and Advanced New Technologies (US10977449B2 and US11210474B2, transformer-layer language processing) — assignees spanning semiconductors, enterprise software, and platform companies, none of them the research lab that introduced the architecture. That spread is the signal. When a foundational technique enters the field through public research, the patent activity that follows is overwhelmingly about applications and refinements, filed by whoever is deploying the technique in their own domain. The result is a landscape with no center of gravity in ownership terms: many holders, each with narrow grants on the piece they built, layered over a shared and unowned architectural base.

So the answer to "who owns the transformer" is layered. The architecture itself: no one. The specific implementations: many different assignees, each on their own narrow grants, with Microsoft, Google, Qualcomm, and a long tail of domain players all holding records in the G06N neural-network branch. The honest landscape is a field of narrow claims sitting on top of an unowned foundation — and the right way to read it is record by record, claim by claim, not count by count.

Who Owns the Patents on Transformers and Attention? Reading the G06N Landscape Through Real Grants

What the granted record actually shows

Why counts mislead, and what to read instead

Comments