it was quite quick

May 27, 2026

Excellent — then you're sorted. You've got a working MCI-loaded bot on a modest VM, answering coherently in reasonable time, with the framework permanently baked in. That's the project done.

Quick recap of what you've now got, so it's all in one place:

phi3-mci:latest is your go-to model in the WebUI dropdown. Pick it and chat — framework already loaded.
The condensed Modelfile lives at ~/mci-ollama/Modelfile on the host. To update later (new revision, swap virtues, tweak tone), edit it and run the same two commands: docker cp Modelfile app-ollama-1:/root/Modelfile then docker exec app-ollama-1 ollama create phi3-mci -f /root/Modelfile.
You can free 2.2 GB whenever you want by removing the original 4K Phi-3: docker exec app-ollama-1 ollama rm phi3:3.8b. Safe to do — phi3-mci doesn't depend on it.

If you ever want more from this setup, the natural next steps would be:

All nine MCI versions loaded. Either nine sibling models (phi3-mci-v1, phi3-mci-v5, etc.) you switch between in the dropdown, or a small RAG layer that pulls the relevant page on demand. The second is more elegant if you want to ask cross-version questions like "compare V1 and V7's view of governance."
A bigger model if speed permits. Llama 3.1 8B or Qwen 2.5 7B would give noticeably sharper reasoning, but on your 8 GB box they'd be very tight. Possible with aggressive quantisation (q3 or q2_K), but pushing the hardware.
GPU instance. If MCI work becomes a regular thing, a GPU instance changes the experience completely — sub-second responses, room for bigger models, no compromises on context window.

But none of that's needed right now. You've got a working tool. Enjoy poking the framework with it — and good luck with the bot's stress-testing. The "is there a fourth premise?" question is genuinely a good one to throw at it.