Continual MI · MGPT API

Small models with large-model performance.

MGPT is our efficient model architecture. The first MGPT-trained models are coming soon — meanwhile the platform API serves 10+ models today for agentic work, compatible with the pi.dev agent harness, on one Continual subscription and credit balance shared across every product.

API platform Read the research →

99%

QA1 token accuracy on BABILong

parameters — small enough to run anywhere

O(1)

memory, flat across the whole conversation

How it works

The model governs its own memory.

A standard transformer either grows its context every turn or drops old turns and loses the thread. MGPT does neither. The model emits explicit <|mask|> tokens to hide stale messages from direct attention while keeping their compressed traces in the KV cache.

The effective context contracts as the model masks, but the traces left behind behave like a fading-but-durable recollection — enough to answer questions about information it chose to hide turns earlier, at a fixed cost.

<|mask|>

attended masked

Proven on BABILong

Capability without the context bill.

On the BABILong long-context QA1 task, a 4B-parameter MGPT reaches 99% token accuracy with constant memory — outperforming much larger systems that have to keep growing their context to keep up.

Read the benchmark write-up

Standard context — grows every turnMGPT — bounded memory

Why MGPT

Efficient enough to put your best models everywhere.

Small models, large-model performance.

MGPT — Mask-Generative Pretrained Transformer — is our efficient architecture: open base models finetuned on our growing MGPT masking dataset to match or exceed far larger ones. The first MGPT-trained models are coming soon.

Built for agentic work.

Heavy tool calling, long trajectories, growing context — coding agents and beyond. Compatible with the pi.dev agent harness; point your agents at the platform through a standard OpenAI-compatible endpoint and keep your existing tooling.

One account, one balance.

A single Continual subscription and credit balance works across every product — the MGPT API and Monte Lua. No per-feature buckets to juggle.

Live platform API

The MGPT API

The platform API is live today. Create scoped platform keys, call the OpenAI-compatible endpoint, and route agentic model work through Continual MI — billed against your shared Continual credits. MGPT-trained models arrive on the same endpoint; until then we serve leading open and frontier models.

Available todayqwen3.5-9bqwen3.6-27bdeepseek-v4-flashgpt-oss-120bminimax-m310+ modelspi.dev harness compatible

Coming soonmgpt-9bmgpt-9b-coder

Open MGPT API API reference

request

POST /api/mgpt/chat/completions

Authorization: Bearer cmi_...

{ "model": "qwen3.5-9b", "messages": [ { "role": "user", "content": "Refactor this module" } ] }

The research

MGPT on BABILong: 99% accuracy at 4B with constant memory.

How model-governed memory lets a small model sustain long-horizon interaction at a fixed cost.

Read the post

Continual MI · MGPT

Build on MGPT, or follow the work.

The conversation around efficient architecture, local open models, and the path toward continual learning lives in the Continual Society.

API platform Open Society