Real-time fraud detection lakehouse

Streaming lakehouse on Apache Iceberg + Kafka processing 180K events/sec for a European payments processor, feeding ML fraud scores back to the transaction engine in under 200ms.

Tech stack

Apache KafkaApache FlinkApache IcebergAWSRedisXGBoostMLflowFastAPIPythonTerraformDebezium

Problem

A payments processor was detecting fraud using a nightly batch job — by morning, €2M+ in fraudulent transactions had already been processed and reversed. The SLA target was a fraud score delivered to the transaction engine within 200ms of event ingestion.

Architecture

Ingestion: Debezium CDC from core banking PostgreSQL → Kafka (MSK) → Flink consumer writing raw events to Iceberg on S3 (bronze layer). Throughput: 180,000 events/second at peak.

Feature pipeline: Flink stateful operators computing 48 real-time features per transaction (velocity checks, geo-anomaly, merchant risk score) with 30-day sliding windows materialised in Iceberg snapshots.

Scoring: online feature store (Redis) populated from the Flink pipeline; XGBoost model served via FastAPI on ECS; model artefacts versioned in MLflow, promoted via a shadow-mode A/B framework before production cutover.

Feedback loop: fraud labels from the chargeback system written back via Kafka → Iceberg gold layer → weekly model retraining Airflow DAG.

Results

—Fraud detection latency: batch (8+ hours) → 180ms p95.
—Fraud loss rate: -23% in first 90 days post-launch.
—False positive rate: 0.4% (below the 0.5% SLA target).
—System handles Black Friday peaks (3x normal throughput) with no horizontal scaling changes.

Continue

PreviousRAG policy knowledge platform NextGenAI evaluation framework