Claim Breakdown: Anthropic's Computer-Use Agent Patent | AlgorithmClaims

A May 2026 Anthropic grant covers an agent that reads a screen and operates it. Reading the claim structure shows what the IP behind 'computer use' actually protects.

Here's what issued and the core claim structure behind it. On May 5, 2026, Anthropic, PBC was granted US12619815B2, "Magnitude invariant multimodal agent for efficient image-text interface automation." The CPC list is unusually broad for a single grant — eighteen codes spanning G06V (vision: 10/82, 10/774, 20/40, 30/41), G06N (neural networks and learning: 3/0455, 3/091, 20/00, 5/04), and G06F (interfaces and text: 3/0481, 3/0484, 40/166, 40/174, 40/284). That spread is the fingerprint of a system claim that perceives, reasons, and acts.

The independent claim, read structurally, is a loop. The agent obtains a visual representation of a user interface; it processes that representation with one or more neural networks to identify interface elements and associated text; it selects an action with respect to an identified element; and it causes that action to be performed. Then it repeats. "Multimodal" because the claim fuses image and text features; "interface automation" because the output is a UI action, not a sentence.

“A system for magnitude-invariant image-text agentic interface automation is disclosed. A bit vectorization logic is configured to convert image patches in a plurality of image patches into magnitude-invariant bit vectors, and generate a plurality of lines of magnitude-invariant bit vectors.”— U.S. Patent No. 12,619,815 source

The load-bearing phrase is "magnitude invariant." In claim terms, this is the robustness limitation — the perception step is constructed so that recognition does not depend on the absolute scale of interface elements. That is not decorative language. It is the inventors telling the examiner (and the reader) that the claimed method specifically handles the brittleness that breaks naive screen-driving agents when resolution or element size changes. A claim that names its robustness mechanism is a claim that expects to be read narrowly around it.

Note the family. US12387036B1, "Multimodal agent for efficient image-text interface automation," issued earlier, on August 12, 2025, with largely overlapping inventors (Elsen, Hawthorne, Odena, Nye, and others) and a tighter CPC set in G06V and G06F. The B1 came first; the B2 with the "magnitude invariant" refinement followed. Reading them in sequence shows the prosecution direction — from a base interface-automation agent toward a scale-robust variant.

What the claim does not give you is a benchmark. A method claim describes how the agent is constructed to work; it makes no representation about success rate on real tasks. Screen-driving agents remain famously fragile, and nothing in claim 1 promises otherwise. The grant tells you Anthropic treats this perception-action method as core enough to protect — twice — not that it finishes your ten-step task unattended.

Read this way, the IP behind "computer use" stops being a black box. US12619815B2 protects a specific, scale-robust loop that turns screenshots into UI actions. That is the claimed invention: not the idea of an agent, but a particular method for making one perceive an interface reliably enough to operate it.

Claim Breakdown: The Independent Claim in Anthropic's Computer-Use Agent Patent

Comments