Mixture-of-Experts Perplexity Routing: The Claims

A freshly published application is directed to deciding which experts a downstream transformer block should run, using a perplexity measurement taken mid-network. It is a pending application, not a granted patent.

Among the applications that published on June 18, 2026, one assigned to Microsoft Technology Licensing, LLC is directed to a narrow but specific question inside a large language model: when a transformer reaches a mixture-of-experts layer, which experts should actually do the work? The application, titled "Machine Learning Model Processing Based on Perplexity" and carrying publication number US20260170296A1, describes measuring the model's own uncertainty partway through the network and using that measurement to decide which experts a later block will run. It is a published application — a document the patent office has made public — and not a granted patent.

The distinction matters before any reading of scope. A publication discloses what an applicant has asked the office to consider; it does not establish that any claim has been allowed, and the claims that ultimately issue, if any, may be narrower than what is published. With that framing in place, the independent method here can be stated plainly from the record.

What the application is directed to

The method operates a model that includes a sequence of transformer blocks. Input data arrives at one block and is processed through a mixture-of-experts layer — the now-standard arrangement in which a layer holds many "expert" sub-networks but routes each input through only a subset of them. The application adds a step on top of that arrangement: at an auxiliary classifier, the method determines a measure of perplexity of the processed data. Perplexity is a familiar quantity in language modeling, a measure of how surprised a model is by what it is seeing. Based on that measure, the method indicates one or more experts in a downstream transformer block that will subsequently process the input, and then fetches the weight matrices for the indicated experts.

Read in order, the claimed sequence is: process at a mixture-of-experts layer, measure perplexity at an auxiliary classifier, indicate downstream experts from that measure, and fetch the weights for those experts. The load-bearing idea is the coupling between an in-network uncertainty signal and a weight-fetch decision made before the downstream block runs. The application is directed to using a perplexity measurement, rather than only a learned gating function applied at the layer itself, to govern which experts' parameters are brought in for the next block.

"At an auxiliary classifier, a measure of perplexity of the processed input data is determined. Based on the determined measure of perplexity, one or more experts in a downstream transformer block that will subsequently process the input data are indicated. Weight matrices are then fetched for the indicated one or more experts."— Machine Learning Model Processing Based on Perplexity, US20260170296A1

The fetch step is worth isolating because it points at where the disclosure sits in the field. Mixture-of-experts inference spends much of its cost moving expert weight matrices into the compute path; the experts not selected for a given token impose no matrix-multiply cost, but the system still has to decide which weights to stage. By tying that staging decision to a perplexity measurement taken upstream, the application is directed at the question of which experts to prepare for a later block, and at doing so before that block executes. The record describes the measurement as occurring at an auxiliary classifier, a component distinct from the experts themselves.

The classification and the landscape

The application is classified under CPC G06N 3/042, G06N 3/0495, and G06N 3/08. G06N 3/08 covers learning methods for neural networks; G06N 3/042 and G06N 3/0495 fall within the neural-architecture subgroups of G06N 3 that cover network models and their implementation details. That places the filing squarely in the machine-learning core of the G06N landscape rather than in the hardware-packaging classes (the H10W and H10B groups) that dominate much of this week's broader "neural network" publication drop. Within the same drop, the bulk of records keyworded to neural networks describe chip packaging, memory, and accelerators; this application is one of the comparatively small set directed to a model-processing method itself.

It does not stand alone in the assignee's June 18 publications. Several related applications from Microsoft Technology Licensing, LLC published the same day and sit in adjacent G06N territory. US20260170338A1, "Fine-Tuning Generative Models for Resource Allocation Tasks" (CPC G06N 3/082, G06N 3/0455), is directed to generating synthetic sample solutions, ranking them with a validation function, labeling the top-ranked solution, and fine-tuning a target generative model on that labeled data. US20260170817A1, "Model Pre-Training for User Interface Navigation" (CPC G06N 20/00, G06V 10/82, G06F 18/214), is directed to pre-training a feature-extraction model on navigation-path data so the model learns representations tied to UI navigation tasks. US20260170387A1, "Quantum Error Correction using Tesseract Subsystem Code" (CPC G06N 10/70, G06N 10/40), sits in the quantum-computing subgroup of G06N and is directed to error correction using a tesseract subsystem code.

Two further same-day applications round out the cluster on the applied and hardware sides. US20260172697A1, "Adaptive Image Enhancement for Improved Device Operation," is directed to selectively choosing between a hardware-based and an AI-based image-enhancement algorithm based on sensor parameters meeting adaptive device-operation criteria. US20260173907A1, "Three-Dimensional Fanout Packaging Structure for a System-on Chip," is directed to a double-sided fanout structure built around a pair of system-on-chip dies and is classified in the H10W packaging groups. Taken together, the same-day records span model-processing methods, generative fine-tuning, pre-training, quantum error correction, applied imaging, and packaging — a spread that is descriptive of the assignee's filing activity on this date and is reported here as such.

For the perplexity-routing application specifically, the named inventors include Bita Darvish Rouhani, Douglas Christopher Burger, and Eric S. Chung. The independent method, as published, is the four-step sequence above: measure perplexity at an auxiliary classifier, indicate downstream experts from that measure, and fetch the corresponding weight matrices. Whether claims of that scope issue, and in what form, is a matter for prosecution that the published record does not resolve. What the record does establish is the subject matter the application is directed to and the CPC classes under which the office has indexed it.

What a New Mixture-of-Experts Routing Application Claims, and Where It Lands in G06N

What the application is directed to

The classification and the landscape

Comments