Question 1

What is sovereign AI?

Accepted Answer

Sovereign AI is an architecture where (1) customer data never leaves the customer's network perimeter, (2) model weights run on hardware the customer controls, (3) inference logs stay in the customer's own SIEM, and (4) the entire stack can operate air-gapped with zero outbound calls. It is not a sovereign cloud region, not a Bring Your Own Key arrangement, and not a hyperscaler deployment in your country's borders.

Question 2

How is sovereign AI different from BYOK?

Accepted Answer

Bring Your Own Key gives the customer cryptographic control over data at rest but doesn't address inference, model lifecycle, or the structural dependence on the cloud LLM vendor's continued operation. Sovereign AI closes all four of these gaps. For SAMA, RBI and the EU AI Act, the distinction is the one that matters.

Question 3

When is on-prem AI actually cheaper than cloud APIs?

Accepted Answer

On-prem is cheaper above roughly 200M tokens per month for the lean single-rack architecture, and stays meaningfully cheaper from there. At 5B tokens per month enterprises typically pay 5-8x more on cloud APIs than equivalent on-prem capacity. The economics inverted in 2024-25 as cloud-API pricing flattened and GPU costs continued to fall.

Question 4

Which open-weights model do you recommend?

Accepted Answer

Default to Llama 3.3 70B unless multilingual coverage pushes you to Qwen 2.5 (better Chinese, Arabic, South-Asian languages) or licensing pushes you to Apache 2.0 alternatives (Qwen, Gemma, DeepSeek). On the enterprise workloads we ship — document Q&A, structured extraction, classification, summarisation, agentic orchestration — the capability gap across these families is now in the single-digit percentage points.

Question 5

Which vector database should I pick?

Accepted Answer

pgvector for deployments under 10M chunks (one less system to back up — operational simplicity dominates at this scale). Qdrant for 10M-100M chunks. Milvus beyond 100M chunks or for GPU-accelerated indexing. We've deprecated Chroma from our reference architecture, and we never recommend Pinecone for sovereign deployment because air-gap is a non-starter.

Question 6

When does the EU AI Act become enforceable for high-risk AI?

Accepted Answer

2 August 2026 for Annex III high-risk systems (BFSI credit scoring, HR screening, healthcare diagnostic support, critical infrastructure, etc.). The full Articles 9-15 evidence stack must be in place by that date for any AI system serving EU residents or EU-located decisioning.

Question 7

What is Annex III high-risk AI?

Accepted Answer

Annex III enumerates 8 categories automatically classified as high-risk under the EU AI Act: biometric identification, critical infrastructure management, education and vocational training, employment and worker management, access to essential services (credit scoring is canonical), law enforcement, migration and border control, and administration of justice. Healthcare diagnostic support is high-risk via Annex III combined with the Medical Devices Regulation route.

Question 8

What is Article 25 and why does it matter?

Accepted Answer

Article 25 converts a deployer of an AI system into a provider when they make substantial modifications, rebrand, or repurpose the AI from its vendor's intended use. Provider status triggers the full Articles 9-15 obligation stack. Across MindMap's audit of regulated enterprise AI portfolios, 70% contain at least one Article 25 trigger — typically fine-tuning a foundation model or white-labelling a vendor system.

Question 9

What does Article 14 (human oversight) actually require?

Accepted Answer

Effective human oversight means the human can (a) understand the system's capabilities and limitations, (b) monitor operation to detect anomalies, (c) decide not to use the system in any particular case, (d) interpret output correctly, and (e) intervene to override. Most existing 'human in the loop' implementations don't meet this bar. Effective oversight is a protocol with documented competence requirements and explicit override authority.

Question 10

What's the penalty for non-compliance?

Accepted Answer

€35M or 7% of global turnover (whichever is higher) for prohibited practices, €15M or 3% for high-risk system non-compliance, and €7.5M or 1.5% for supplying incorrect information to authorities. Penalties scale with enterprise size and are imposed by Member State market-surveillance authorities.

Question 11

What if my enterprise isn't ready by 2 August 2026?

Accepted Answer

Median enterprise readiness across our 50-enterprise benchmark is 38% — only 14% would survive a supervisory audit today. We expect supervisory leniency for customers with a credible 12-month remediation plan in place at the deadline (consistent with how MiFID II, GDPR, and DORA enforcement actually unfolded). The supervisors that will move first will be the ones who have publicly stated they will: ICO, BaFin, ACPR, Banca d'Italia.

Question 12

What is agentic AI?

Accepted Answer

Agentic AI refers to any LLM-driven system that takes multiple sequential steps to complete a task, where the LLM chooses among tools or actions at each step based on the previous step's outcome. This excludes pure RAG (one-shot retrieval, one-shot generation) and simple prompt-engineering pipelines. Production agent runtimes have bounded reasoning steps, structured tool definitions, persistent reasoning traces, and budget controls.

Question 13

What patterns work in production for agentic AI in regulated industries?

Accepted Answer

Three patterns survive supervisor review: (1) bounded ReAct with reasoning-trace persistence, (2) planner-executor with explicit plan persistence, (3) multi-agent orchestration with hand-off contracts. Three patterns consistently fail audit: silent loops, hidden tool calls, unbounded reasoning. Four engineering controls separate prototypes from production: hashed tool registry, content-addressed reasoning-trace storage, budget enforcement, separate verification pass.

Question 14

How do I defend against prompt injection?

Accepted Answer

There is no silver-bullet defence. The layered architecture: input guardrails filter obvious injection patterns, output guardrails block leaked secrets, agentic boundaries prevent privilege escalation, and architecturally you don't put highly-privileged tool calls behind a prompt at all. For regulated workloads the regulator increasingly expects an explicit prompt-injection threat model in the security review.

Question 15

What's the ReAct pattern?

Accepted Answer

ReAct (Reason + Act) is the foundational agent design pattern — the LLM alternates between explicit reasoning steps ('thought') and tool-using action steps ('action'), looping until the goal is met. Production-grade ReAct has bounded reasoning steps (typically 8-12 with hard fail-over to human review), structured tool definitions with output validation, explicit reasoning-trace persistence, and budget controls.

Question 16

Should I use RAG or fine-tune?

Accepted Answer

Do retrieval first, exhaust it, and only fine-tune when you're solving a format, style or domain-vocabulary problem that prompting can't enforce. 80% of RAG quality lives in retrieval (chunking, hybrid search, re-ranking), not generation. Teams that go straight to fine-tuning the LLM as the first quality lever almost always discover six weeks later they could have got the same lift from better chunking and a re-ranker.

Question 17

How do I evaluate a RAG system in production?

Accepted Answer

Four metrics matter: context precision (retrieved chunks relevant?), context recall (all relevant chunks retrieved?), answer faithfulness (response grounded in retrieved context?), answer relevance (response addresses the question?). Run RAGAS or equivalent on every change. Also maintain a custom SME-written eval set (200-500 questions) scored by a strong LLM as judge.

Question 18

Why does hybrid retrieval beat dense-only?

Accepted Answer

Pure dense retrieval fails on rare entities (drug names, regulation IDs, ticket numbers, customer codes) where the embedding model has never seen the token. Hybrid retrieval combines BM25 (sparse) and dense vector search with Reciprocal Rank Fusion, typically lifting answer-correct rate from 71% to 89% on corpora that include legal citations or technical entities. Add a cross-encoder re-ranker for another 8-15 points.

Question 19

What does it take to hit 94% straight-through processing?

Accepted Answer

Four engineering choices: (1) classifier accuracy above 98% (below this, routing fails); (2) type-specific extraction strategies (one strategy across heterogeneous corpora reliably under-performs); (3) explicit confidence-scoring with field-level routing to human review; (4) exception-handling design treated as a first-class workflow, not an afterthought. The vendor-demo 95%+ STP collapses to 75-85% in production unless these four are in place.

Question 20

Template OCR or LLM-extraction — when do I pick which?

Accepted Answer

Template OCR with field coordinates works on highly-structured documents (60% of typical enterprise document mix). It collapses on the 40% long-tail driving the operational pain — contracts, correspondence, free-form claims. LLM extraction handles that long tail. Production pattern: cheap classifier routes structured docs to template extraction, unstructured docs to LLM extraction with schema-driven prompting.

Question 21

What's the ROI on moving to high-STP IDP?

Accepted Answer

For a mid-market BFSI customer processing 25,000 monthly docs at 50% STP, the move to 94% STP typically delivers €1.5-2.5M annual benefit (per-doc cost reduction + FTE reallocation) against a €280k implementation + €60k annual licence. Payback in 4-7 months. Model your specifics at /tools/idp-roi-calculator.

Question 22

How long does a typical engagement take to reach production?

Accepted Answer

6-9 weeks from signed contract to production. Median across our last 14 sovereign LLM deployments: 11 days from clean cluster to first production prompt for the platform itself, then the use-case build runs in parallel. The 117-accelerator library means we never start from zero — most engagements reach production faster than the customer's internal change-management can absorb.

Question 23

What does a typical engagement cost?

Accepted Answer

Indicative ranges: AI Readiness Sprint €40-80k (2 weeks). First Pilot €180-340k (6-9 weeks contract to production). Managed AI CoE €30-80k/month against an annual contract. Sovereign platform deployment €220-450k depending on architecture. Numbers move with deployment complexity, customer-side integration scope, and language coverage. We share the full cost build-up on a scoping call.

Question 24

Can you deploy entirely on-premise / air-gapped?

Accepted Answer

Yes — every accelerator supports fully on-premise, air-gapped deployment with zero internet dependency. Inference, embeddings, RAG, fine-tuning all run inside the customer perimeter. This is the default architecture model for our regulated-industry customers (central banks, hospitals, insurers) and the only acceptable deployment posture for SAMA, RBI Master Direction, NHS DSPT, and EU AI Act high-risk workloads.

Question 25

Which compliance frameworks do you support?

Accepted Answer

GDPR, UK GDPR, India DPDP Act, HIPAA, SOC 2, ISO 27001, PCI DSS — plus sector frameworks (SAMA in Saudi Arabia, RBI Master Direction in India, NHS DSPT in the UK, DORA for EU financial entities, EU AI Act). Compliance is built into the accelerator architecture, not bolted on at the end.

Question 26

What is MindMap Digital?

Accepted Answer

MindMap Digital is an enterprise AI engineering firm that designs, builds, and deploys production AI for regulated industries — banking, insurance, healthcare, government, pharma. Founded 2017, headquartered in Hyderabad with offices across India, UAE, UK and US. We've shipped 117 production AI accelerators to 50+ Fortune-class customers across India, the UK, EU, Gulf, North America, Africa and APAC.

Question 27

Who is Saurabh Goenka?

Accepted Answer

Saurabh Goenka is the founder and CEO of MindMap Digital. He has spent the last nine years building enterprise AI engineering capability for regulated industries. He is a Chartered Accountant by training, a Forbes Business Council member since 2021, and the recipient of the NASSCOM Tech Excellence 2026 Healthcare AI Award, ET NOW 40 Under 40 (2026), Outlook Dynamic Leaders (2025), and ICAI 40 Under 40 (2021).

Question 28

What recognition has MindMap Digital received?

Accepted Answer

NASSCOM Tech Excellence 2026 — Healthcare AI category winner. ET NOW 40 Under 40 (2026, Saurabh Goenka). Outlook Dynamic Leaders (2025). ET Family Business Award (2023). ICAI 40 Under 40 (2021). Forbes Business Council member (2021–present). Smartsheet Platinum partner. Member of multiple industry advisory groups on regulated-industry AI.

Question 29

Where are MindMap's offices?

Accepted Answer

Headquartered in Hyderabad, India, with hubs in Bengaluru and Noida. International offices in Dubai (UAE), London (UK), and Lewes, Delaware (USA). The majority of our enterprise customers are international, spanning North America, the United Kingdom, continental Europe, the Middle East, Africa and APAC.

Enterprise AI FAQ

Sovereign & on-premise AI

EU AI Act compliance

Agentic AI in production

Retrieval-augmented generation

Document intelligence (IDP)

Engagement & commercial model

About MindMap Digital

Question not answered here?