EU AI ACTto the 2 August 2026 high-risk enforcement deadline.Check your tier →
Frequently Asked Questions

Enterprise AI FAQ

29 answers across 7 clusters — sovereign AI, EU AI Act, agentic AI, RAG, document intelligence, engagement model, and the company itself. Use the jump links below or scroll through.

Sovereign & on-premise AIEU AI Act complianceAgentic AI in productionRetrieval-augmented generationDocument intelligence (IDP)Engagement & commercial modelAbout MindMap Digital

Sovereign & on-premise AI

What sovereign deployment means, how it differs from BYOK, and when it makes economic sense.

What is sovereign AI?
Sovereign AI is an architecture where (1) customer data never leaves the customer's network perimeter, (2) model weights run on hardware the customer controls, (3) inference logs stay in the customer's own SIEM, and (4) the entire stack can operate air-gapped with zero outbound calls. It is not a sovereign cloud region, not a Bring Your Own Key arrangement, and not a hyperscaler deployment in your country's borders.
How is sovereign AI different from BYOK?
Bring Your Own Key gives the customer cryptographic control over data at rest but doesn't address inference, model lifecycle, or the structural dependence on the cloud LLM vendor's continued operation. Sovereign AI closes all four of these gaps. For SAMA, RBI and the EU AI Act, the distinction is the one that matters.
When is on-prem AI actually cheaper than cloud APIs?
On-prem is cheaper above roughly 200M tokens per month for the lean single-rack architecture, and stays meaningfully cheaper from there. At 5B tokens per month enterprises typically pay 5-8x more on cloud APIs than equivalent on-prem capacity. The economics inverted in 2024-25 as cloud-API pricing flattened and GPU costs continued to fall.
Which open-weights model do you recommend?
Default to Llama 3.3 70B unless multilingual coverage pushes you to Qwen 2.5 (better Chinese, Arabic, South-Asian languages) or licensing pushes you to Apache 2.0 alternatives (Qwen, Gemma, DeepSeek). On the enterprise workloads we ship — document Q&A, structured extraction, classification, summarisation, agentic orchestration — the capability gap across these families is now in the single-digit percentage points.

EU AI Act compliance

Scope, Annex III, Articles 9-15, deadlines, deployer vs provider.

When does the EU AI Act become enforceable for high-risk AI?
2 August 2026 for Annex III high-risk systems (BFSI credit scoring, HR screening, healthcare diagnostic support, critical infrastructure, etc.). The full Articles 9-15 evidence stack must be in place by that date for any AI system serving EU residents or EU-located decisioning.
What is Annex III high-risk AI?
Annex III enumerates 8 categories automatically classified as high-risk under the EU AI Act: biometric identification, critical infrastructure management, education and vocational training, employment and worker management, access to essential services (credit scoring is canonical), law enforcement, migration and border control, and administration of justice. Healthcare diagnostic support is high-risk via Annex III combined with the Medical Devices Regulation route.
What is Article 25 and why does it matter?
Article 25 converts a deployer of an AI system into a provider when they make substantial modifications, rebrand, or repurpose the AI from its vendor's intended use. Provider status triggers the full Articles 9-15 obligation stack. Across MindMap's audit of regulated enterprise AI portfolios, 70% contain at least one Article 25 trigger — typically fine-tuning a foundation model or white-labelling a vendor system.
What does Article 14 (human oversight) actually require?
Effective human oversight means the human can (a) understand the system's capabilities and limitations, (b) monitor operation to detect anomalies, (c) decide not to use the system in any particular case, (d) interpret output correctly, and (e) intervene to override. Most existing 'human in the loop' implementations don't meet this bar. Effective oversight is a protocol with documented competence requirements and explicit override authority.
What's the penalty for non-compliance?
€35M or 7% of global turnover (whichever is higher) for prohibited practices, €15M or 3% for high-risk system non-compliance, and €7.5M or 1.5% for supplying incorrect information to authorities. Penalties scale with enterprise size and are imposed by Member State market-surveillance authorities.
What if my enterprise isn't ready by 2 August 2026?
Median enterprise readiness across our 50-enterprise benchmark is 38% — only 14% would survive a supervisory audit today. We expect supervisory leniency for customers with a credible 12-month remediation plan in place at the deadline (consistent with how MiFID II, GDPR, and DORA enforcement actually unfolded). The supervisors that will move first will be the ones who have publicly stated they will: ICO, BaFin, ACPR, Banca d'Italia.

Agentic AI in production

Production patterns, audit substrate, evaluation harness.

What is agentic AI?
Agentic AI refers to any LLM-driven system that takes multiple sequential steps to complete a task, where the LLM chooses among tools or actions at each step based on the previous step's outcome. This excludes pure RAG (one-shot retrieval, one-shot generation) and simple prompt-engineering pipelines. Production agent runtimes have bounded reasoning steps, structured tool definitions, persistent reasoning traces, and budget controls.
What patterns work in production for agentic AI in regulated industries?
Three patterns survive supervisor review: (1) bounded ReAct with reasoning-trace persistence, (2) planner-executor with explicit plan persistence, (3) multi-agent orchestration with hand-off contracts. Three patterns consistently fail audit: silent loops, hidden tool calls, unbounded reasoning. Four engineering controls separate prototypes from production: hashed tool registry, content-addressed reasoning-trace storage, budget enforcement, separate verification pass.
How do I defend against prompt injection?
There is no silver-bullet defence. The layered architecture: input guardrails filter obvious injection patterns, output guardrails block leaked secrets, agentic boundaries prevent privilege escalation, and architecturally you don't put highly-privileged tool calls behind a prompt at all. For regulated workloads the regulator increasingly expects an explicit prompt-injection threat model in the security review.
What's the ReAct pattern?
ReAct (Reason + Act) is the foundational agent design pattern — the LLM alternates between explicit reasoning steps ('thought') and tool-using action steps ('action'), looping until the goal is met. Production-grade ReAct has bounded reasoning steps (typically 8-12 with hard fail-over to human review), structured tool definitions with output validation, explicit reasoning-trace persistence, and budget controls.

Retrieval-augmented generation

Vector databases, chunking, hybrid retrieval, evaluation.

Which vector database should I pick?
pgvector for deployments under 10M chunks (one less system to back up — operational simplicity dominates at this scale). Qdrant for 10M-100M chunks. Milvus beyond 100M chunks or for GPU-accelerated indexing. We've deprecated Chroma from our reference architecture, and we never recommend Pinecone for sovereign deployment because air-gap is a non-starter.
Should I use RAG or fine-tune?
Do retrieval first, exhaust it, and only fine-tune when you're solving a format, style or domain-vocabulary problem that prompting can't enforce. 80% of RAG quality lives in retrieval (chunking, hybrid search, re-ranking), not generation. Teams that go straight to fine-tuning the LLM as the first quality lever almost always discover six weeks later they could have got the same lift from better chunking and a re-ranker.
How do I evaluate a RAG system in production?
Four metrics matter: context precision (retrieved chunks relevant?), context recall (all relevant chunks retrieved?), answer faithfulness (response grounded in retrieved context?), answer relevance (response addresses the question?). Run RAGAS or equivalent on every change. Also maintain a custom SME-written eval set (200-500 questions) scored by a strong LLM as judge.
Why does hybrid retrieval beat dense-only?
Pure dense retrieval fails on rare entities (drug names, regulation IDs, ticket numbers, customer codes) where the embedding model has never seen the token. Hybrid retrieval combines BM25 (sparse) and dense vector search with Reciprocal Rank Fusion, typically lifting answer-correct rate from 71% to 89% on corpora that include legal citations or technical entities. Add a cross-encoder re-ranker for another 8-15 points.

Document intelligence (IDP)

Straight-through processing, schema-driven extraction, when LLMs beat template OCR.

What does it take to hit 94% straight-through processing?
Four engineering choices: (1) classifier accuracy above 98% (below this, routing fails); (2) type-specific extraction strategies (one strategy across heterogeneous corpora reliably under-performs); (3) explicit confidence-scoring with field-level routing to human review; (4) exception-handling design treated as a first-class workflow, not an afterthought. The vendor-demo 95%+ STP collapses to 75-85% in production unless these four are in place.
Template OCR or LLM-extraction — when do I pick which?
Template OCR with field coordinates works on highly-structured documents (60% of typical enterprise document mix). It collapses on the 40% long-tail driving the operational pain — contracts, correspondence, free-form claims. LLM extraction handles that long tail. Production pattern: cheap classifier routes structured docs to template extraction, unstructured docs to LLM extraction with schema-driven prompting.
What's the ROI on moving to high-STP IDP?
For a mid-market BFSI customer processing 25,000 monthly docs at 50% STP, the move to 94% STP typically delivers €1.5-2.5M annual benefit (per-doc cost reduction + FTE reallocation) against a €280k implementation + €60k annual licence. Payback in 4-7 months. Model your specifics at /tools/idp-roi-calculator.

Engagement & commercial model

Time to production, pricing structure, how a typical engagement runs.

How long does a typical engagement take to reach production?
6-9 weeks from signed contract to production. Median across our last 14 sovereign LLM deployments: 11 days from clean cluster to first production prompt for the platform itself, then the use-case build runs in parallel. The 117-accelerator library means we never start from zero — most engagements reach production faster than the customer's internal change-management can absorb.
What does a typical engagement cost?
Indicative ranges: AI Readiness Sprint €40-80k (2 weeks). First Pilot €180-340k (6-9 weeks contract to production). Managed AI CoE €30-80k/month against an annual contract. Sovereign platform deployment €220-450k depending on architecture. Numbers move with deployment complexity, customer-side integration scope, and language coverage. We share the full cost build-up on a scoping call.
Can you deploy entirely on-premise / air-gapped?
Yes — every accelerator supports fully on-premise, air-gapped deployment with zero internet dependency. Inference, embeddings, RAG, fine-tuning all run inside the customer perimeter. This is the default architecture model for our regulated-industry customers (central banks, hospitals, insurers) and the only acceptable deployment posture for SAMA, RBI Master Direction, NHS DSPT, and EU AI Act high-risk workloads.
Which compliance frameworks do you support?
GDPR, UK GDPR, India DPDP Act, HIPAA, SOC 2, ISO 27001, PCI DSS — plus sector frameworks (SAMA in Saudi Arabia, RBI Master Direction in India, NHS DSPT in the UK, DORA for EU financial entities, EU AI Act). Compliance is built into the accelerator architecture, not bolted on at the end.

About MindMap Digital

Company, founder, recognition, geographic footprint.

What is MindMap Digital?
MindMap Digital is an enterprise AI engineering firm that designs, builds, and deploys production AI for regulated industries — banking, insurance, healthcare, government, pharma. Founded 2017, headquartered in Hyderabad with offices across India, UAE, UK and US. We've shipped 117 production AI accelerators to 50+ Fortune-class customers across India, the UK, EU, Gulf, North America, Africa and APAC.
Who is Saurabh Goenka?
Saurabh Goenka is the founder and CEO of MindMap Digital. He has spent the last nine years building enterprise AI engineering capability for regulated industries. He is a Chartered Accountant by training, a Forbes Business Council member since 2021, and the recipient of the NASSCOM Tech Excellence 2026 Healthcare AI Award, ET NOW 40 Under 40 (2026), Outlook Dynamic Leaders (2025), and ICAI 40 Under 40 (2021).
What recognition has MindMap Digital received?
NASSCOM Tech Excellence 2026 — Healthcare AI category winner. ET NOW 40 Under 40 (2026, Saurabh Goenka). Outlook Dynamic Leaders (2025). ET Family Business Award (2023). ICAI 40 Under 40 (2021). Forbes Business Council member (2021–present). Smartsheet Platinum partner. Member of multiple industry advisory groups on regulated-industry AI.
Where are MindMap's offices?
Headquartered in Hyderabad, India, with hubs in Bengaluru and Noida. International offices in Dubai (UAE), London (UK), and Lewes, Delaware (USA). The majority of our enterprise customers are international, spanning North America, the United Kingdom, continental Europe, the Middle East, Africa and APAC.

Question not answered here?

Email hello@mindmapdigital.ai or book a 20-minute call with the engineering team.

Book a call →
Talk to the product team