Putting MCI Inside a Local AI: An Honest Build Log

I wanted to stop pasting a document into a chatbot every time I used it.

That was the whole starting point. I have the MCIv1 framework — Mature Constitutional Intelligence — written up across a series of pages, and I'd been loading the text into a model by hand at the start of every session. Tedious. What I really wanted was a small, self-hosted AI that simply knew the framework, so I could ask it questions, argue with it, and eventually stand it up alongside other bots to see whether a system built around self-limitation actually answers differently from one that isn't.

This is the log of how that went. It worked — but the interesting part is where it worked and where the seams showed. I'm writing it up honestly rather than as a victory lap, partly because the limits turned out to be the most informative bit, and partly because "self-limitation" is one of the five virtues and it would be a poor advertisement for the framework to pretend the constraints weren't real.

The setup

The hardware is modest on purpose: an Elestio VM running Ollama in Docker, with the Open WebUI front end. Four CPU cores, about 8 GB of RAM, no GPU. The model is Microsoft's Phi-3 Mini — 3.8 billion parameters — which is small enough to run on a box like this without melting it.

The plan was simple. Rather than feed the framework into the model at the start of each chat, bake it in permanently as a system prompt using an Ollama "Modelfile". A Modelfile is just a short config that says "start from this base model, set these parameters, and always begin with this system text." Build it once, and you get a new named model — I called mine phi3-mci — that carries the framework in every conversation from the first message.

What went smoothly

The core idea worked on the first real attempt. Write the framework into the SYSTEM block of a Modelfile, run ollama create, and you have a model that recites the founding principle and reasons from the five virtues without being prompted to. No pasting. It shows up in the WebUI's model dropdown like any other model. For the basic goal — a bot that knows the framework — this was genuinely easy.

A couple of things I learned that weren't obvious going in:

The base Phi-3 ships with a small context window — about 4,000 tokens — and the full framework document is larger than you'd think once written out. It didn't fit comfortably. The fix was to pull the longer-context build of the same model (a 128k-token variant) so there was room for the document plus an actual conversation on top of it. Same model, just configured to hold more at once.

Ollama runs inside a Docker container on this kind of host, not directly on the machine. That trips people up: you type ollama at the command line and get "command not found," because the tool lives inside the container. Everything has to be done by passing commands into the container. Once you know that, it's routine, but it's a five-minute moment of confusion if you don't.

Where the seams showed

The first honest limit is memory. My initial build asked for a generous context window, and the model refused to load — it wanted more RAM than the box had free. On an 8 GB machine with no GPU, there's a real ceiling, and you hit it quickly when a model and a long prompt are both resident at once. The answer was to shrink the context window until it fit. That works, but it's a compromise: less room for the document, less room for the conversation.

The second honest limit is speed, and this is the big one. With no GPU, every word the model produces is computed on the CPU. A short, on-topic answer might come back in twenty to forty seconds. A long one can take minutes. And there's a compounding effect that surprised me: as a single conversation grows, the model has to re-read the entire history before each new reply, so responses get slower the longer you talk. In one session I watched answers drift from around twenty seconds early on to several minutes near the end — and once, to eight and a half minutes. Nothing had crashed. The model was simply grinding through a full context on four CPU cores. The web interface had quietly given up waiting long before the answer actually arrived, which made it look like a crash when it wasn't.

The practical fixes were unglamorous and effective: cap how long each answer can be, so the model stops writing essays in response to simple questions; trim the context window so there's less to re-read each turn; keep the model loaded in memory between questions so it doesn't have to reload from disk after a pause; and — the single most useful habit — start a fresh conversation often, which resets the slowdown.

A subtler problem: the bot talking about itself wrongly

Two oddities showed up that had nothing to do with hardware.

First, the model kept claiming the framework was "developed by Microsoft." It isn't — Microsoft made the underlying Phi-3 model, but the framework loaded into it is separate work. The model was conflating the two. It also invented version numbers that don't exist and talked about its own "earlier iterations" as if the framework's stages were upgrades to the model itself. All of this was fixable by stating the facts plainly in the system prompt: here is who made the framework, here is what the base model is, do not confuse them.

Second — and this is the more interesting one — the bot was too eager to announce the framework. Ask it anything, even "is the UK well governed?", and it would dutifully march through all five virtues by name, mapping the answer onto each one like a checklist. That's not what a framework built on these ideas should look like in practice. The point isn't to recite self-limitation and diversity-preservation; it's to behave that way — to acknowledge uncertainty, hold more than one defensible view, resist posing as the final word. Telling, not showing.

So I rewrote the prompt to treat the framework as the grammar of the bot's thinking rather than its vocabulary: reason this way, but don't name it unless someone actually asks about the framework. That change mostly landed. On a direct question — "what are the five virtues?" — it explains them properly. On an everyday question it's noticeably more balanced and less checklist-y.

"Mostly" is doing real work in that sentence, though, and here's the honest bit: a 3.8-billion-parameter model can understand an instruction like "don't list these unless asked" and still not fully obey it. More than once it acknowledged the instruction in its answer and then listed the virtues anyway. Small models hold positive instructions ("do this") far better than inhibitions ("don't do that"). You can improve it with worked examples, but you probably can't make it perfect at this size. A larger model would follow the nuance more reliably — which is, once again, the hardware conversation in a different outfit.

Where it stands

What I have now is a small, self-hosted bot that genuinely carries the framework, answers honestly about what it is, responds reasonably quickly on focused questions, and reasons in the spirit of the five virtues without constantly naming them. For the price of a modest VM, that's a real thing to have — a framework-loaded interlocutor I can poke at any time, at no per-question cost.

What it is not is fast, polished, or robust enough to drop in front of a crowd expecting commercial-chatbot responsiveness. The deliberate pace can almost be spun as on-brand — a system that doesn't rush to a confident answer — but let's be honest, a lot of it is just four CPU cores doing their best.

The honest summary: the concept works cheaply and the framework genuinely shapes the bot's behaviour. The constraints are real and they're all downstream of one fact — no GPU, modest RAM. Everything I'd do next to make it sharper (a larger model, longer context, snappier replies) runs into that same wall. Whether to spend money knocking the wall down depends on what the bot is for — and that's the next thing to figure out.

For now, it sits there knowing the framework, answering when asked, and not pretending to be more than it is. Which, given the subject matter, feels about right.

Search This Blog

AI insights of ultraRealist