Continual MI · MGPT

Efficient local models need a better attention architecture.

MGPT — Mask-Generative Pretrained Transformer — is our main research surface. It uses masking to fix the attention deficit of smaller models, so an open base model can get strong locally before it's ever asked to learn continually.

Architecture thesis

Smaller models underperform partly because their attention is too weak.

Continual learning can't happen on a model you can't run. If the base model is too large for local hardware, too costly to update, or locked behind someone else's API, its weights will never adapt to your everyday use.

MGPT works on that foundation. The hypothesis: small models have an attention deficit — they can't reliably route the information that matters through limited context and compute. Masking gives the model a deliberate way to choose what to keep and act on, adding capability without just adding parameters.

The aim isn't a longer context window. It's a stronger local base model — small, efficient, and open — capable enough to carry continual learning later.

Live platform API

The MGPT API

The hosted API is the product-facing route for building on MGPT today, while the architecture work pushes toward more capable local base models. Create scoped platform keys, call the endpoint, and route model work through Continual MI.

POST https://platform.continualmi.com/api/mgpt/chat/completions
Authorization: Bearer cmi_...
{ model: "mdl-1-lite", messages: [{ role: "user", content: "Hello MGPT" }] }
MGPT · Demo

BABILong QA1

MGPT solving a QA1 problem on the BABILong benchmark using model-managed masking. The demo is an early proof that masking can help route attention through constrained memory instead of relying only on bigger models.

MGPT · Paper

Work-in-progress draft

Continual MI · MGPT
Follow and discuss the work.

MGPT updates and discussion also live in the Continual Society, especially around efficient architecture, local open models, and the path toward continual learning.