Enterprise RAG Platform
Multi-tenant retrieval system over 12M+ documents with hybrid search, reranking, and citation-grounded answers. Reduced hallucinations by 64% vs baseline.
Senior AI Engineer with 8+ years architecting RAG systems, ML pipelines, and vector-search products in Python — from notebook to scaled production.
I help teams take AI from prototype to production — focusing on retrieval quality, evaluation harnesses, and the unglamorous infrastructure that keeps LLM products reliable.
Fine-tuning, evaluation, and deploying transformer models for real workloads.
Designing retrieval pipelines on pgvector, Pinecone, Weaviate and Qdrant.
Streaming, batch, and feature pipelines with Airflow, Spark, and dbt.
Latency, cost, observability and guardrails for LLM systems at scale.
A focused toolkit refined across eight years of building data, ML, and AI systems in production.
A snapshot of recent AI systems I designed, built and shipped end-to-end.
Multi-tenant retrieval system over 12M+ documents with hybrid search, reranking, and citation-grounded answers. Reduced hallucinations by 64% vs baseline.
Tool-using agent with planning, web search, and code execution. Built deterministic eval harness with 200+ scenarios and CI gating on regression.
Streaming feature pipeline serving 30+ models with point-in-time correctness. Migrated from batch to event-driven, cutting model staleness from hours to seconds.
Fine-tuned vision transformer for manufacturing QA on edge devices. Active-learning loop reduced labeled data needed by 70%.
From data pipelines to LLM platforms — building systems that earn their keep in production.
Open to senior AI engineering, ML platform, and applied research roles. Remote-friendly, EU/US time zones.
Reach out — I usually reply within a day. Happy to share case studies, code samples, or jump on an intro call.