This is Part 1 of the RAG Enterprise Series — the anchor post. Parts 2–5 apply this framework to Travel & Tourism, Hospital Management, Wealth Management, and Personal Banking. Parts 6–8 cover the supporting stack, Mamba/SSMs, and PageIndex.
Scope: Four RAG sophistication levels applied across Travel & Tourism, Hospital Management, Wealth Management, and Personal Banking. Each section covers real-world use cases, domain-specific challenges, how LLM + RAG architecture addresses them, and the full supporting stack including memory, prompt engineering, fine-tuning, and embedding improvements.
Quick note: this article covers four domains, four RAG levels each, plus the full supporting stack. It is intentionally long — bookmark it, come back with coffee, or read it in sections.
1. The RAG Levels — Recap and Framing
| Level | Name | Core Mechanism | Accuracy Range | Primary Constraint |
|---|---|---|---|---|
| L1 | Vanilla RAG | Dense vector → top-k → prompt | 70–80% | Single retrieval pass, semantic drift |
| L2 | Hybrid RAG | Dense (semantic) + Sparse (BM25) → rerank → prompt | 82–90% | Static retrieval, no multi-document synthesis |
| L3 | GraphRAG | Vector + structured knowledge graph + ontology traversal | 92–99% | Ontology investment, relationship modeling |
| L4 | Agentic RAG | Retrieve → reflect → re-query loop → multi-hop synthesis | 95–99%+ | Latency, cost, loop-control complexity |
In plain terms: L1 guesses, L2 narrows, L3 reasons, L4 debates with itself until it's confident. Pick your complexity based on what the problem actually needs — not what the architecture diagram looks coolest.
A note on the retrieval assumption: All four levels above assume that retrieval works by similarity — embed the query, embed document chunks, find the nearest vectors. This is the right default for corpus-level search across thousands of documents. But for structured professional documents (financial filings, clinical guidelines, legal agreements, regulatory disclosures), there is an emerging alternative: reasoning-based retrieval, where the LLM navigates a document's structure directly instead of searching a vector space. Section 6.4 introduces this paradigm, and Section 10 applies it across all four domains.
Architectural framing: The levels are not milestones to progress through linearly — they are tools. A production system at a bank might use L1 for FAQ deflection, L2 for product search, L3 for compliance checks, and L4 for portfolio incident analysis. The architectural decision is: which level of retrieval sophistication does this specific problem require, and can the organization afford the ontology and latency cost of higher levels?
Decision Framework — Which Level for Which Problem?
Use this framework when designing a RAG system for any enterprise problem:
Question: Does the answer exist in a single document?
YES → L1 is sufficient
NO → continue
Question: Does the query contain exact identifiers (codes, tickers, drug names, amounts)?
YES → L2 minimum (hybrid required)
NO → L1 may suffice if purely semantic
Question: Does the answer require reasoning across multiple facts in relationship to each other?
YES → L3 (GraphRAG) if relationships are pre-definable
NO → L2 sufficient
Question: Is the relationship structure pre-known and consistent?
YES → L3 (invest in ontology)
NO → L4 (let the agent discover the retrieval path)
Question: Does answering require iterative refinement — "I need more context before I can answer"?
YES → L4 (agentic loop)
NO → L3 sufficient
Question: Is the latency tolerance under 2 seconds AND context per turn under 4,000 tokens?
YES → L1 or L2 with any backend
NO → Evaluate Mamba-backed L3/L4 before concluding infeasible
(Mamba's 5× throughput + constant memory enables 3-5s L4 responses
in configurations where Transformers require 15-20s due to KV cache pressure)
Question: Does answering require ingesting a document longer than 8,000 tokens in a single pass?
(examples: full offering memorandum, complete EHR summary, full insurance policy)
YES → Consider Mamba-based backend; chunk + average with Transformer will lose coherence
NO → Standard Transformer backend sufficient
Question: Is the answer in a specific known document with logical section structure?
(examples: SEC filing, clinical guideline, fare manual, mortgage agreement)
YES → Consider reasoning-based retrieval (PageIndex) instead of or alongside vector search
— eliminates chunking artifacts, follows cross-references, provides audit trail
NO → Vector/hybrid retrieval is the right mechanism
Question: Is accuracy life-critical or regulatory-binding?
YES → L3 minimum; L4 with human-in-the-loop for final decision
NO → L1/L2 with appropriate hedging
Decision Matrix by Domain and Use Case
| Use Case | Domain | Recommended Level | Rationale |
|---|---|---|---|
| Policy FAQ | All | L1 | Single doc, static knowledge |
| Exact identifier lookup | Travel, Finance, Banking | L2 | BM25 required |
| Destination semantic search | Travel | L2 | Semantic + keyword fusion |
| Clinical protocol lookup | Healthcare | L2 | Exact drug/code matching critical |
| Drug interaction checking | Healthcare | L2–L3 | Exact names + relationship graph |
| Differential diagnosis | Healthcare | L3 | Multi-symptom → multi-condition reasoning |
| Visa route eligibility | Travel | L3 | Multi-hop nationality + route + transit rules |
| Visa regulation navigation | Travel | L2 + PageIndex | Known document, cross-referenced sections, conditional logic |
| Fare rules interpretation | Travel | PageIndex | Precise conditional logic in long fare manuals |
| Itinerary planning | Travel | L3–L4 | Constraint satisfaction + multi-source |
| Suitability assessment | Wealth | L3 | Regulatory rules as graph edges |
| SEC filing analysis | Wealth | L2 + PageIndex | Known document, cross-referenced notes, precise table extraction |
| IPS compliance check | Wealth | L3 + PageIndex | Portfolio state vs. constraint graph + IPS document navigation |
| Proactive portfolio review | Wealth | L4 | Multi-client × multi-event synthesis |
| Clinical guideline navigation | Healthcare | PageIndex | Multi-constraint lookup across sections of a known guideline |
| Cash flow diagnosis | Banking | L4 | Multi-hop transaction + income + product |
| Benefits guide navigation | Banking | PageIndex | Known document, cross-referenced coverage sections |
| Mortgage prepayment analysis | Banking | PageIndex | Known document, conditional penalty calculations |
| Sepsis warning | Healthcare | L4 | Multi-source patient data temporal synthesis |
| Incident post-mortem (network) | Infra/Ops | L4 | "Which Q3 changes contributed to today's incident?" |
What's Next in This Series
This post is the map. The rest of the series is the territory.
| Part | Post | Topic |
|---|---|---|
| 1 | You are here | The Four RAG Levels — Decision Framework |
| 2 | RAG in Travel & Tourism Systems | GDS, visa routing, itinerary planning |
| 3 | RAG in Hospital Management | Zero hallucination tolerance, clinical precision |
| 4 | RAG in Wealth Management | Fiduciary constraints, suitability, MiFID II |
| 5 | RAG in Personal Banking | Scale, AML, transaction intelligence |
| 6 | The RAG Supporting Stack | Memory, prompt engineering, fine-tuning, embeddings |
| 7 | Mamba and SSMs for RAG | What the generation backbone change means |
| 8 | PageIndex and Vectorless RAG | Reasoning-based retrieval for professional documents |