AI Architecture — Published: 24 March 2026 · 7 min read
Beyond Trivial RAG: MCP, Guarded Tools, and Fire Safety Review
Some details of this engagement remain confidential. What we can share is the technical pattern: how an OntosLab Frontline Deployed AI Engineer moved beyond simple retrieval to build a multi-step task assistant for fire safety review inside a London council, using MCP, guarded backend tools, and grounded reasoning across plans, inspections, records, and regulations.
Most enterprise AI systems still stop at what we would call trivial RAG. A user asks a question. The system embeds the prompt. A vector database returns nearby chunks. Those chunks are passed to a language model. An answer comes back. For internal knowledge search, that can be useful. For real operational work, it quickly becomes insufficient.
The problem is not retrieval. Retrieval is necessary. The problem is assuming retrieval is the whole architecture. In a real fire safety review, the task is rarely “find me similar text”. The task is more like this: Are the fire doors in this building recorded against the current plan set, do the latest inspection notes align with the door schedule, and is there a regulatory issue that needs escalation? That is not one search. It is a sequence of checks across multiple systems, each of which may have different permissions, different authority, and different operational meaning.
This is where MCP becomes useful. Not as a replacement for RAG, but as the layer that lets the model move from passive document retrieval to active, controlled tool use. The model no longer has to pretend that everything is one big document pile. Instead, it can reason over a set of capabilities exposed to it in a governed way.
“Trivial RAG retrieves passages. Production AI makes controlled decisions across systems.”
The limit of simple retrieval
A vector search can find related text. It cannot guarantee operational truth.
Take a simple example. A housing officer asks: Are the plans for this block current to the original architecture diagrams, and do the fire safety door records still match? A conventional RAG system might retrieve the original drawing pack, a later refurbishment note, a fire door inspection report, and a policy document containing the word door. That sounds promising until you realise what has not happened. Nobody has checked which plan revision is authoritative. Nobody has resolved whether the inspection refers to the same door identifiers. Nobody has checked whether the user is even allowed to view all of those records together. Nobody has created an audit trail explaining how the conclusion was reached.
This is exactly the kind of gap that matters in regulated environments. A plausible answer is not enough. You need a system that can perform the next step deliberately, and then the step after that, with the right controls around each one.
The architecture
From one-shot RAG to multi-step MCP reasoning.
The pattern we designed is straightforward to describe, even if it is much more powerful in practice:
User → LLM reasoning → MCP tool call → guarded backend execution → result → LLM reasoning → final answer
That flow matters because it separates reasoning from execution. The language model does not directly reach into every system. It decides what it needs next. The MCP host or client passes that request on. The MCP server and backend enforce the rules, execute the task, and return only the result that should come back into context.
In practical terms, the flow looks like this:
The LLM decides what tool it wants. It may determine that the first step is not to answer, but to call a tool such as get_current_plan_revision for a specific block.
The MCP host or client passes that request. The tool call is formalised rather than improvised, with known parameters and known intent.
The MCP server and backend execute the task. That execution may involve checking a document repository, querying a housing asset register, resolving plan metadata, or applying a permissions filter before any data is returned.
The result comes back to the LLM. The model now has a grounded output: the latest approved plan revision, not just a similar PDF.
The LLM decides whether another step is needed. It may now call compare_fire_door_schedule, then get_latest_inspection_notes, then check_relevant_regulations, assembling a chain of evidence rather than a bag of text fragments.
Only after those steps does the model produce the final answer. At that point the output is no longer just retrieved prose. It is the result of a controlled, multi-stage review.
Permissions
Access is enforced in the backend
The model may request a tool, but the backend decides what the user is authorised to see. Security is not left to prompt wording.
Auditability
Every step can be logged
Which tool was called, what it returned, and how the answer was formed can all be recorded, creating a defensible operational trail.
Multiple data sources
Plans, records, inspections, regs
The model can reason across several authoritative systems instead of forcing everything into one vector index and hoping similarity search will sort it out.
Multi-step decisions
The model can ask what comes next
It can retrieve, compare, check, and only then answer — which is what real review work usually requires.
Why MCP matters
Because real enterprise AI needs more than a retriever.
MCP becomes valuable once you care about permissions, auditability, multiple data sources, multi-step decisions, and portability across model providers. Those are not side concerns. They are the difference between a demo and a production architecture.
Permissions matter because a fire safety assistant may have access to building plans, operational notes, inspection records, and regulatory references, but not every user should be able to access every combination of those materials. Auditability matters because in regulated settings, the path to the answer is often as important as the answer itself. Multiple data sources matter because operational truth rarely lives in one index. Multi-step decisions matter because many real questions cannot be answered in one retrieval pass. Portability matters because organisations should not have to rebuild the entire orchestration layer every time they change model provider.
That last point is easy to underestimate. Once your system exposes stable tools such as get_current_plan_revision, compare_fire_door_schedule, and check_relevant_regulations, the orchestration pattern becomes less dependent on any one model vendor. The model reasons. The tools remain yours.
“The model should choose the next capability. The backend should decide what is allowed. The answer should emerge from that chain.”
What this changes
The system stops behaving like document chat and starts behaving like a task assistant.
That is the real shift. We are not interested in making a pile of documents conversational for its own sake. We are interested in building systems that help professionals complete serious work under real constraints. In this case, that means helping officers review whether building plans are current, whether fire safety door records align, whether the relevant regulations have been checked, and whether more action is needed.
RAG still has an important role to play inside that architecture. Retrieval is one of the tools. It is just no longer the whole system. Once you allow the model to use guarded tools through MCP, and once those tools are backed by permissions, logging, and authoritative data sources, the architecture becomes much more interesting — and much more useful.
That is where we see enterprise AI moving. Not away from retrieval, but beyond trivial retrieval. Toward systems in which the model can reason, select the next operation, and work through a governed chain of evidence before it answers.
Work with us
Got a similar problem
in your organisation?
If your team is working on AI in compliance, housing, infrastructure, public sector operations, or any environment where retrieval alone is not enough, we can help design a grounded, tool-using architecture that works in production.