Sovereign AI: enterprise generative AI that runs on your servers, not somebody else's.
The architecture regulated enterprises use to deploy production generative AI without sending a single byte of customer data outside their network perimeter. Definition, reference stack, deployment timeline, compliance posture, and how MindMap Digital ships it in 6–9 weeks.
Sovereign AI, defined.
Sovereign AI is an architecture pattern with four non-negotiable properties. First, customer data never leaves the network perimeter. Second, model weights run on hardware the customer controls. Third, inference logs stay in the customer's own SIEM with a complete provenance trail. Fourth, the entire stack can operate air-gapped with zero outbound calls to any external LLM provider, telemetry endpoint or model registry.
It is not a sovereign cloud region. It is not a Bring Your Own Key arrangement on top of a hyperscaler API. It is not a Microsoft Azure deployment in your country's borders. It is the deployment model the Saudi Central Bank's SAMA Cyber Resilience Framework, the Reserve Bank of India's Master Direction on IT Governance and the UK Information Commissioner's Office have signalled as the only acceptable posture for regulated AI workloads.
What "sovereign" actually means in production
Data never leaves the perimeter
Inference, embeddings, RAG, fine-tuning — all on your hardware. No outbound calls to OpenAI, Anthropic, Gemini, Bedrock or Vertex. Network egress is explicitly blocked at the cluster namespace level so no component can phone home even if it tried to.
Model weights you own
Open-weights LLMs (Llama 3.3, Mistral, Qwen 2.5, DeepSeek V3) stored on disks you control. No vendor can revoke them, deprecate them on 60-day notice, or change pricing mid-engagement. The model serving your customers on Monday is the same model that served them the previous Friday — auditable, frozen, and yours.
Audit trails in your SIEM
Every prompt, every retrieval, every model call streams into the SIEM your security team already operates. Provenance is complete: a regulator can trace any answer back to the document set it was grounded on and the model version that produced it.
Compliance pre-mapped
GDPR, UK DPA, India DPDP Act, HIPAA, SAMA, RBI Master Directions, the EU AI Act high-risk system provisions, SOC 2 and ISO 27001 — controls are built into the accelerator architecture rather than bolted on at the end.
The five-layer sovereign stack
Containerised, Kubernetes-native, runs on bare metal or any CNCF-conformant cluster. We deploy this stack inside your perimeter, integrated with your identity provider, your monitoring estate and your SIEM.
The regulators driving sovereign AI
The pressure is no longer abstract. Across the geographies our customers operate in, the explicit guidance is the same: model weights and inference under the regulated entity's exclusive control.
SAMA
Saudi Central Bank — Cyber Resilience Framework requires localisation + explicit guidance against cross-border AI inference.
RBI
Reserve Bank of India — Master Direction on IT Governance specifies AI/ML model lifecycle artefacts on infrastructure under the regulated entity's exclusive control.
ICO (UK)
UK Information Commissioner's Office has signalled that LLM prompts containing PHI constitute a cross-border transfer subject to UK GDPR Article 44.
EU AI Act
Article 14 + Annex III high-risk system requirements push BFSI, healthcare and HR workloads toward auditable on-prem deployments.
DPDP Act
India's Digital Personal Data Protection Act treats health and financial data as a special category with explicit localisation expectations.
HIPAA + BAA
US covered entities increasingly find on-prem deployment is faster than completing the BAA gymnastics required for cloud LLM use at scale.
From clean cluster to first production prompt in 11 days
This is the timeline observed across the last fourteen sovereign deployments at MindMap Digital. The full first-pilot engagement, including use case and adoption work, completes in 6–9 weeks.
Use-case fit
Pick a workflow with regulated data that today can't use cloud LLMs. Common starting points: internal compliance Q&A, clinical-coding assist, branch-staff knowledge assistant, regulatory document extraction.
Cluster provisioning
Bare metal, VMware, OpenShift or any CNCF-conformant Kubernetes. Two A100 or H100 GPUs for inference, a 3-node CPU pool for vector DB and orchestration, MinIO for object storage, Keycloak for identity. Three days.
Stack deployment
We containerise vLLM, the vector DB, the embedding worker, the retrieval API and a LangGraph orchestration layer. Network egress blocked at namespace level. Four days.
Customer corpus + evals
Load the first document corpus, run our standard eval harness plus the customer's SME-built evaluation set. Four days.
Hypercare + rollout
Phased rollout — typically 5% of users, 20%, then full — with our delivery team embedded for a 45-day hypercare period. Production-stable within 90 days of go-live.
On-prem is now cheaper than the cloud API, at any meaningful enterprise volume.
A single A100 80GB GPU costs roughly $15,000 amortised over three years, runs Llama 3.3 8B at 60+ tokens per second per request, and can sustain 30–50 concurrent enterprise users with sub-second time-to-first-token. The fully-loaded cost — power, cooling, rack space and a fraction of an SRE FTE — works out to roughly $0.10 per million tokens served. That's an order of magnitude below frontier cloud API pricing for the same quality tier, and the gap widens as you move to 70B-class hardware because per-token cost scales sublinearly with throughput.
For any enterprise running more than 200M tokens per month — and most regulated-industry customers we work with run 5–50B — on-prem is now the cheaper option, not just the more compliant one. The 2023 argument that "we'll figure out compliance later, the cloud is so much cheaper" has inverted to "the cloud is more expensive at our volume, and we still have the compliance problem."
Sovereign AI across the portfolio
Generative AI service →
End-to-end sovereign LLM deployment — model serving, RAG, fine-tuning and evaluation, on your hardware.
Redacto →
PII redaction that runs inside the perimeter so document data is masked before it ever crosses a boundary.
Google Cloud sovereign tier →
Where regulated workloads can run on GCP with Assured Workloads, VPC Service Controls and customer-managed encryption keys.
117 accelerator library →
Every accelerator is air-gap capable by default. Browse the production-tested catalogue.
West African bank — sovereign LLM platform →
A reference deployment: tier-1 bank running an open-weights LLM platform inside its own data centre.
Pan-African bank — sovereign WhatsApp banking →
Six million customers served through a WhatsApp channel running entirely inside the bank's perimeter.
Sovereign AI — the questions buyers ask
What is sovereign AI?
Sovereign AI is an architecture pattern with four non-negotiable properties: (1) data never leaves the customer's network perimeter; (2) model weights run on hardware the customer controls; (3) inference logs stay in the customer's own SIEM; and (4) the entire stack can operate air-gapped with zero outbound calls to any external provider. It is the deployment model regulated enterprises in banking, healthcare, defence and government require.
Why is sovereign AI suddenly important in 2026?
Three forces converged. Regulators (SAMA in Saudi Arabia, RBI in India, the UK's ICO, the EU AI Act, India's DPDP Act) explicitly require model lifecycle artefacts and customer data to remain under the regulated entity's exclusive control. The open-weights model gap closed — Llama 3.3 70B, Qwen 2.5 72B and DeepSeek V3 are now within single-digit percentage points of GPT-4 class evals on enterprise workloads. And the unit economics inverted: at any meaningful enterprise volume, on-prem amortised cost is now lower than cloud API per-token cost.
Can a sovereign AI stack match the quality of GPT-4 or Claude?
For the workloads enterprises actually deploy — document Q&A, structured extraction, classification, summarisation, agentic workflow orchestration — open-weights models served on customer infrastructure now match the frontier closed-source models on standard evals. For multilingual and low-resource-language workloads, the open models are demonstrably ahead. The capability gap that justified cloud dependency three years ago has largely closed.
What is the reference architecture for sovereign AI?
A Kubernetes-native deployment on bare metal, VMware, OpenShift or any CNCF-conformant cluster, with: an inference server (vLLM or TGI) running an open-weights LLM (Llama, Mistral, Qwen, DeepSeek) on 1–2 A100 or H100 GPUs; a vector database (pgvector for under 10M chunks, Qdrant for 10–100M, Milvus beyond); a local embedding model (nomic-embed-text or BGE-M3); a retrieval API with hybrid dense plus BM25 search and re-ranking; SSO integration with the customer's identity provider; and explicit network egress blocking at the namespace level so no component can call out.
How long does sovereign AI take to deploy?
MindMap Digital's average sovereign deployment runs eleven days from clean cluster to first production prompt across the last fourteen implementations: three days infrastructure provisioning, four days integration with the customer's identity, network and monitoring estate, and four days loading the first document corpus, running the eval suite and signing off user-acceptance criteria. The full first-pilot engagement, including the use case and adoption work, completes in 6–9 weeks.
Is sovereign AI compliant with GDPR, DPDP, HIPAA and SAMA?
Yes. Because no data leaves the customer's network perimeter and no model weights are held by a third party, sovereign AI by design satisfies the data-residency and processing-control requirements of GDPR, the UK Data Protection Act, India's DPDP Act, HIPAA, the Saudi SAMA Cyber Resilience Framework, RBI Master Direction on IT Governance and the EU AI Act's high-risk system provisions. The audit posture is also stronger: every inference log stays in the customer's own SIEM with a complete provenance trail.
What does sovereign AI cost compared to cloud LLM APIs?
For any enterprise running more than 200M tokens per month — and most regulated-industry customers run 5–50B — on-prem is now cheaper than frontier cloud APIs at the same quality tier. A single A100 80GB amortised over three years, plus power, cooling, rack space and a fraction of an SRE FTE, comes to roughly $0.10 per million tokens served. That is an order of magnitude below cloud frontier API pricing, and the gap widens as you move from 8B-class to 70B-class hardware because per-token cost scales sublinearly with throughput.
Why can't the cloud LLM vendors offer sovereign AI?
OpenAI, Anthropic and Google's frontier models depend on centralised training data flywheels, multi-tenant inference economics, and continuous model rollouts. Their unit economics break if a meaningful share of customers take the sovereign tier; their release cadence breaks if customers freeze model versions for compliance reasons; and none of them ship the actual model weights to your data centre with a perpetual licence that survives a contract dispute or a regulator instruction to cut off cross-border services. Azure for Government and AWS GovCloud are sovereign-flavoured cloud, not sovereign in the architectural sense regulators are starting to demand.
Score your sovereign-AI readiness. In 2 minutes.
Six questions on data, infrastructure, use cases and compliance — your tier, your gaps, and the engagement that fits.