AI Platform Lead
Remote (Madrid, ES)
Tech stack
Context
Velantis Insurance is a mid-size Spanish insurer with a 60-year-old product catalogue. Customer service agents spent an average of 11 minutes per call locating the correct clause across 2,000+ policy documents stored as PDFs. The AI team was a greenfield hire: I was employee #1.
Challenge
The task was to design an AI-assisted knowledge retrieval system that could answer natural-language questions about policy coverage with source citations, in under five seconds, with accuracy high enough for agent use (we set the bar at <2% harmful hallucinations). Budget was constrained: no GPU cluster, no proprietary LLM fine-tuning.
Architectural decisions
I designed a retrieval-augmented generation (RAG) pipeline using OpenAI GPT-4 as the generation backbone and Pinecone as the vector store:
- —Document processing pipeline: Apache Airflow DAG running weekly — PDF extraction via pdfminer, chunking with sliding windows (512 tokens, 10% overlap), embedding via text-embedding-3-large, upsert to Pinecone with metadata (policy_id, effective_date, clause_ref).
- —Query pipeline: user query → embedding → top-8 Pinecone retrieval → re-ranking with a cross-encoder (ms-marco-MiniLM) → GPT-4 with system prompt enforcing citation format.
- —Evaluation framework: built an offline evaluation harness using a manually curated golden set of 200 QA pairs; tracked answer relevance (RAGAS), faithfulness, and context precision in MLflow. Ran on every prompt or retrieval change.
- —Guardrails: NeMo Guardrails for off-topic deflection; source citation required in every response (hallucination rate fell to 0.8%).
Outcomes
- —Average handle time: 11 minutes → 6.5 minutes (-41%).
- —Agent satisfaction score: 3.1 → 4.6 / 5.
- —Hallucination rate on golden set: 0.8% (below the 2% target).
- —System adopted by 3,000 agents across 4 contact centres within 6 months of launch.