The Four RAG Levels — A Decision Framework for Enterprise Systems

This is Part 1 of the RAG Enterprise Series — the anchor post. Parts 2–5 apply this framework to Travel & Tourism, Hospital Management, Wealth Management, and Personal Banking. Parts 6–8 cover the supporting stack, Mamba/SSMs, and PageIndex.

Scope: Four RAG sophistication levels applied across Travel & Tourism, Hospital Management, Wealth Management, and Personal Banking. Each section covers real-world use cases, domain-specific challenges, how LLM + RAG architecture addresses them, and the full supporting stack including memory, prompt engineering, fine-tuning, and embedding improvements.

Quick note: this article covers four domains, four RAG levels each, plus the full supporting stack. It is intentionally long — bookmark it, come back with coffee, or read it in sections.

✈

✈️

Travel & Tourism

Multi-source · Multilingual · Real-time pricing · Visa routing

GDS (Amadeus, Sabre) Visa DB Reviews Pricing APIs

Stakes: Medium · Hallucination risk: Moderate

🏥

Hospital Management

Highest-stakes · HIPAA/PHIPA · HL7 FHIR · Clinical precision

EHR (Epic, Cerner) HL7 FHIR DICOM SNOMED CT

Stakes: Life-critical · Hallucination tolerance: Zero

📈

Wealth Management

Fiduciary duty · MiFID II/Reg BI · Suitability · Real-time markets

IPS Documents Bloomberg/Refinitiv KYC/AML SEC Filings

Stakes: Regulatory + financial · Risk: Suitability violation

⬡

🏦

Personal Banking

Broadest audience · FINTRAC/AML · PCI-DSS · Transaction intelligence

Core Banking Transactions Open Banking APIs CRA Data

Stakes: Consumer protection · Scale: 10M+ daily transactions

1. The RAG Levels — Recap and Framing

Level	Name	Core Mechanism	Accuracy Range	Primary Constraint
L1	Vanilla RAG	Dense vector → top-k → prompt	70–80%	Single retrieval pass, semantic drift
L2	Hybrid RAG	Dense (semantic) + Sparse (BM25) → rerank → prompt	82–90%	Static retrieval, no multi-document synthesis
L3	GraphRAG	Vector + structured knowledge graph + ontology traversal	92–99%	Ontology investment, relationship modeling
L4	Agentic RAG	Retrieve → reflect → re-query loop → multi-hop synthesis	95–99%+	Latency, cost, loop-control complexity

In plain terms: L1 guesses, L2 narrows, L3 reasons, L4 debates with itself until it's confident. Pick your complexity based on what the problem actually needs — not what the architecture diagram looks coolest.

A note on the retrieval assumption: All four levels above assume that retrieval works by similarity — embed the query, embed document chunks, find the nearest vectors. This is the right default for corpus-level search across thousands of documents. But for structured professional documents (financial filings, clinical guidelines, legal agreements, regulatory disclosures), there is an emerging alternative: reasoning-based retrieval, where the LLM navigates a document's structure directly instead of searching a vector space. Section 6.4 introduces this paradigm, and Section 10 applies it across all four domains.

L1 Vanilla RAG

Accuracy70–80%

Dense vector search → top-k → LLM prompt

⚠ Single pass · semantic drift risk

L2 Hybrid RAG

Accuracy82–90%

Dense + BM25 → RRF fusion → reranker → LLM

✦ Exact + semantic · static retrieval

L3 GraphRAG

Accuracy92–99%

Vector + knowledge graph + ontology traversal

◈ Multi-hop · ontology investment required

L4 Agentic RAG

Accuracy95–99%+

Retrieve → reflect → re-query → synthesis loop

⚡ Highest accuracy · 5–30s latency cost

Architectural framing: The levels are not milestones to progress through linearly — they are tools. A production system at a bank might use L1 for FAQ deflection, L2 for product search, L3 for compliance checks, and L4 for portfolio incident analysis. The architectural decision is: which level of retrieval sophistication does this specific problem require, and can the organization afford the ontology and latency cost of higher levels?

Retrieval Architecture — Select a Level

Input

User Query

→

Encode

Embedding Model

→

Vector DB cosine

→

Retrieve

Top-K Chunks

→

Generate

LLM Prompt

→

Output

Response

Constraint: Single retrieval pass. No feedback loop. If the answer spans multiple documents or requires reasoning across facts, this pipeline fails silently — the LLM hallucinates a plausible but wrong answer.

Decision Framework — Which Level for Which Problem?

Use this framework when designing a RAG system for any enterprise problem:

Question: Does the answer exist in a single document?
  YES → L1 is sufficient
  NO  → continue

Question: Does the query contain exact identifiers (codes, tickers, drug names, amounts)?
  YES → L2 minimum (hybrid required)
  NO  → L1 may suffice if purely semantic

Question: Does the answer require reasoning across multiple facts in relationship to each other?
  YES → L3 (GraphRAG) if relationships are pre-definable
  NO  → L2 sufficient

Question: Is the relationship structure pre-known and consistent?
  YES → L3 (invest in ontology)
  NO  → L4 (let the agent discover the retrieval path)

Question: Does answering require iterative refinement — "I need more context before I can answer"?
  YES → L4 (agentic loop)
  NO  → L3 sufficient

Question: Is the latency tolerance under 2 seconds AND context per turn under 4,000 tokens?
  YES → L1 or L2 with any backend
  NO  → Evaluate Mamba-backed L3/L4 before concluding infeasible
        (Mamba's 5× throughput + constant memory enables 3-5s L4 responses
         in configurations where Transformers require 15-20s due to KV cache pressure)

Question: Does answering require ingesting a document longer than 8,000 tokens in a single pass?
  (examples: full offering memorandum, complete EHR summary, full insurance policy)
  YES → Consider Mamba-based backend; chunk + average with Transformer will lose coherence
  NO  → Standard Transformer backend sufficient

Question: Is the answer in a specific known document with logical section structure?
  (examples: SEC filing, clinical guideline, fare manual, mortgage agreement)
  YES → Consider reasoning-based retrieval (PageIndex) instead of or alongside vector search
        — eliminates chunking artifacts, follows cross-references, provides audit trail
  NO  → Vector/hybrid retrieval is the right mechanism

Question: Is accuracy life-critical or regulatory-binding?
  YES → L3 minimum; L4 with human-in-the-loop for final decision
  NO  → L1/L2 with appropriate hedging

Decision Matrix by Domain and Use Case

Use Case	Domain	Recommended Level	Rationale
Policy FAQ	All	L1	Single doc, static knowledge
Exact identifier lookup	Travel, Finance, Banking	L2	BM25 required
Destination semantic search	Travel	L2	Semantic + keyword fusion
Clinical protocol lookup	Healthcare	L2	Exact drug/code matching critical
Drug interaction checking	Healthcare	L2–L3	Exact names + relationship graph
Differential diagnosis	Healthcare	L3	Multi-symptom → multi-condition reasoning
Visa route eligibility	Travel	L3	Multi-hop nationality + route + transit rules
Visa regulation navigation	Travel	L2 + PageIndex	Known document, cross-referenced sections, conditional logic
Fare rules interpretation	Travel	PageIndex	Precise conditional logic in long fare manuals
Itinerary planning	Travel	L3–L4	Constraint satisfaction + multi-source
Suitability assessment	Wealth	L3	Regulatory rules as graph edges
SEC filing analysis	Wealth	L2 + PageIndex	Known document, cross-referenced notes, precise table extraction
IPS compliance check	Wealth	L3 + PageIndex	Portfolio state vs. constraint graph + IPS document navigation
Proactive portfolio review	Wealth	L4	Multi-client × multi-event synthesis
Clinical guideline navigation	Healthcare	PageIndex	Multi-constraint lookup across sections of a known guideline
Cash flow diagnosis	Banking	L4	Multi-hop transaction + income + product
Benefits guide navigation	Banking	PageIndex	Known document, cross-referenced coverage sections
Mortgage prepayment analysis	Banking	PageIndex	Known document, conditional penalty calculations
Sepsis warning	Healthcare	L4	Multi-source patient data temporal synthesis
Incident post-mortem (network)	Infra/Ops	L4	"Which Q3 changes contributed to today's incident?"

What's Next in This Series

This post is the map. The rest of the series is the territory.

Part	Post	Topic
1	You are here	The Four RAG Levels — Decision Framework
2	RAG in Travel & Tourism Systems	GDS, visa routing, itinerary planning
3	RAG in Hospital Management	Zero hallucination tolerance, clinical precision
4	RAG in Wealth Management	Fiduciary constraints, suitability, MiFID II
5	RAG in Personal Banking	Scale, AML, transaction intelligence
6	The RAG Supporting Stack	Memory, prompt engineering, fine-tuning, embeddings
7	Mamba and SSMs for RAG	What the generation backbone change means
8	PageIndex and Vectorless RAG	Reasoning-based retrieval for professional documents