Just Published: Microsoft Disentangled-Attention LM (2021)

A 2021 Microsoft application on efficient transformer language models with disentangled attention and multi-step decoding. Published, not granted.

Here's what published — and published is not granted. On October 28, 2021, application US20210334475A1, "Efficient Transformer Language Models With Disentangled Attention and Multi-Step Decoding," was published, assigned to Microsoft Technology Licensing, LLC, with inventors Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. The CPC codes are G06F 40/40 (natural-language processing) and G06N 20/00 (machine learning). The inventor list maps to Microsoft's DeBERTa line of work.

The technical hook is "disentangled attention." Standard transformer attention bundles a token's content and its position into a single representation. Disentangling them — computing attention from content and relative position as separate components — lets the model reason about word relationships more precisely, which improved benchmark performance in the published research this application tracks. "Multi-step decoding" points at how the model generates output across several refinement passes.

“Systems and methods are provided for facilitating the building and use of natural language understanding models. The systems and methods identify a plurality of tokens and use them to generate one or more pre-trained natural language models using a transformer.”— U.S. Patent Application 2021/0334475 A1 source

Because this is a publication, the right verb is "claims as filed," not "covers." An A1 document reflects what the applicant sought, not what an examiner has allowed. The claims may narrow substantially before any grant issues, and until one does, there is nothing enforceable here. The publication's value is as a window into Microsoft's language-model research direction and its intent to protect it.

For scope discipline, that distinction is the whole story. A reader who calls this "Microsoft's transformer patent" is wrong twice over: it is an application, not a patent, and even the issued version would cover the specific disentangled-attention method, not transformers writ large.

The takeaway: US20210334475A1 is a published marker of Microsoft's efficient-LM research, useful for tracking where the company is investing — but anyone assessing enforceability should wait for the grant and read the allowed claims, because the gap between a 2021 filing and an issued claim set is exactly where scope gets decided.

Just Published: Microsoft's Application on Disentangled-Attention Transformers (2021)

Comments