RAG policy knowledge platform

Production retrieval-augmented generation system grounding GPT-4 responses in 2,000+ proprietary insurance policy documents. Reduced agent handle time by 41%.

Tech stack

OpenAI APIPineconeLangChainFastAPIApache AirflowPythonMLflowDockerAWS ECSPostgreSQL

Problem

Insurance agents spent 11 minutes per call locating policy clauses across a fragmented document store of 2,000+ PDFs. Existing keyword search returned too many results with no ranking or citation.

Architecture

A two-stage RAG pipeline: an offline ingestion pipeline and an online query pipeline.

Ingestion (Airflow, weekly): PDF text extraction → sliding-window chunking (512 tokens, 10% overlap) → OpenAI text-embedding-3-large → upsert to Pinecone with structured metadata (policy_id, effective_date, clause_type).

Query (FastAPI, real-time): user query → embedding → top-8 Pinecone ANN retrieval → cross-encoder re-ranking (ms-marco-MiniLM) → GPT-4 generation with strict citation prompt → structured JSON response with source refs.

Evaluation: offline RAGAS harness against 200 manually labelled QA pairs, tracked in MLflow. Every prompt or retrieval config change required evaluation before deployment.

Results

—Answer faithfulness: 99.2% (0.8% hallucination rate on golden set).
—Agent handle time: -41% (11 min → 6.5 min).
—p95 query latency: 4.2 seconds end-to-end.

Architecture

Continue

NextReal-time fraud detection lakehouse