Here's what actually issued. On January 24, 2023, Salesforce.com, Inc. was granted US11562147B2, "Unified vision and dialogue transformer with BERT," inventors Yue Wang, Chu Hong Hoi, and Shafiq Joty. The CPC codes pair dialogue and language classes (G06F 40/35, G06F 40/284) with vision matching G06K 9/6217 and network learning G06N 3/08.

The mechanism is multimodal unification. Vision-and-dialogue tasks — think discussing the contents of an image across a conversation — traditionally needed separate models for seeing and for talking. This grant unifies them in one transformer built on BERT, so a single model jointly handles the visual input and the dialogue turns. Fusing modalities in one architecture is the direction the field moved as multimodal models became central.

The CPC footprint is the tell here: the simultaneous presence of a vision code and dialogue/language codes flags genuine multimodality, which Adaeze's desk standard says to name explicitly. For Salesforce, whose research lab produced a steady stream of multimodal and dialogue work, the grant secures a unified-architecture method as visual conversation became a real product surface.

On scope: granted B2, enforceable, but the claims describe a specific unified vision-dialogue transformer. They do not cover multimodal models generally, nor BERT, nor every visual-dialogue system. The independent claim sets the line.

The takeaway: US11562147B2 is early multimodal IP arriving in granted form — a single-transformer method fusing seeing and talking, held by a software company whose research consistently fed its product roadmap.