Here's what actually issued. On October 3, 2023, International Business Machines Corporation was granted US11775839B2, "Frequently asked questions and document retrieval using bidirectional encoder representations from transformers (BERT) model trained on generated paraphrases," inventors including Yosi Mass and Haggai Roitman. The CPC codes are G06N 3/088, G06N 3/045 (network architectures), and retrieval code G06F 16/24578.

The mechanism is retrieval with a twist. BERT, the bidirectional transformer encoder, turns text into embeddings you can match for semantic similarity — the backbone of modern search and of the retrieval step in retrieval-augmented generation. The claimed twist is training data: the model is trained on generated paraphrases, synthetic rewordings of questions, so it learns to match a user's phrasing to a stored FAQ or document even when the wording differs.

That data-augmentation angle is the heart of the invention. Generating paraphrases to make a retriever robust to phrasing variation is a concrete engineering method, and it's directly relevant to enterprise help-desk and knowledge-base products — exactly IBM's commercial turf. As RAG systems proliferate, retrieval quality becomes a competitive surface, and methods that improve it are worth owning.

On scope, the line holds: granted B2, enforceable, but the claims cover this specific paraphrase-trained BERT retrieval method. They do not lock up BERT, semantic retrieval, or RAG generally. Claim 1 defines the boundary.

The takeaway: US11775839B2 is retrieval IP arriving as the RAG pattern goes mainstream — a method that improves a retriever through synthetic paraphrase training, held by an incumbent with a deep enterprise-search business.