Senior PM · 15 Years · CFA · Now Building AI Products

From user problem
to shipped product.

Senior PM (Societe Generale, CFA) shipping eval-first AI products end to end — from grounded RAG to multi-repo anomaly detection across the Nasdaq-100.

PM Philosophy
Ground every claim, then measure it

AI that can't cite its source is a liability. I build with provenance first, then put an eval suite around it so quality is a number, not a vibe.

Ship to learn, not to finish

Every release is a question to the market. I design for fast feedback loops — not for perfect launches.

PMs who build earn trust

I write code, ship prototypes, and build tooling. Engineers respect PMs who understand what they're asking for.

1,900+
Filings scored across the Nasdaq-100
700+
Grounded, cited sources in production RAG
33→90%
Retrieval precision, via eval-driven tuning
56×
Warm-search latency cut (11.2s → 0.2s)
Featured Project

RedInk

Eval-first financial anomaly detection across the Nasdaq-100 — where the signal isn't the anomaly, it's the gap between the numbers and the story management tells about them.


More Work

Also Building

Each one a rung on the same ladder — every link below is live.

Grounded RAG · Live on Cloud Run
PM Confessional

A grounded "Decision Coach" over 700+ verbatim PM regrets from Lenny's Podcast. Precision 33%→90% via prompt iteration; confidence threshold tuned to refuse rather than guess (top-1 relevance 30%→80%).

0→1 Web App · Live on Cloud Run
Strategic Fit Canvas

Résumé + JD → an AI-scored candidate-fit radar, with auth, batch processing, an analytics dashboard, and a feedback loop. Kept deliberately raw — the "before" of the arc.

Open Source · Python · MCP
Product Management OS

An open-source AI system that reads your goals + backlog and surfaces what to work on next. The meta-tool that runs this whole portfolio — building the system that builds the products.


The Arc

A deliberate progression

Each product taught the next. Read top to bottom: the capability ladder from shipping to systems to agents.

0→1 · Raw
Strategic Fit Canvas
Shipping a real web app end to end — auth, uploads, deploy.
Grounding + Evals
PM Confessional
RAG with provenance; precision auditing; threshold tuning; a 56× latency fix.
Eval-first Systems
RedInkFlagship
Multi-repo architecture; LLM-judge eval suite; narrative divergence; telemetry.
Agentic
RedInk v2 (in progress)
Multi-step call-vs-filing agent; labelling a golden set before building.
Meta / Infra
Product Management OS
Building the system that builds the products — and learning in public.

Learned to ship learned to measure learned to build systems learned to make them agentic — in public.


The AI-PM Bar

What each product proves

The capabilities a senior AI PM is expected to own — and where each shipped product demonstrates them.

Product Grounding / RAG Evals Agentic Telemetry Shipped live
RedInk
PM Confessional
Strategic Fit Canvas
Product Management OS
Demonstrated Partial / in progress Not the focus

Build in Public

I ship the lessons, not just the products

A few from the feed — each one a real decision from building these tools, in the open.

Evals
"A 4.9/5 became 42% FAIL the moment I switched to binary."

Numerical rubrics smooth over the failures you most need to see. Moving RedInk's eval suite to binary PASS/FAIL (per Hamel Husain's method) surfaced a routing bug — the model pointing analysts to the wrong filing artifact — and cut the failure rate 41% in one iteration, with no prompt change.

RedInk · 3,585 impressions
Precision
"I built a gate, not a filter."

Three independent anomaly signals — including teaching the system to notice what management chose not to address. ALERT fires only when all three agree. Set one false positive and an analyst forgives you; three and they stop opening your alerts. Precision is a trust problem before it's a technical one.

RedInk · 687 impressions
Latency
"I was calling an LLM because I could — not because every query needed it."

PM Confessional's search took 11 seconds. Dropped it to under 0.1s by skipping the expensive rerank when internal confidence is already high, falling back to Gemini Flash-Lite when it isn't, and caching embeddings. The fix was judgment about when to spend the call, not a faster model.

PM Confessional · 1,084 impressions
Prompting
"Too bullshitty. Trash. Wordy vomit."

My manager's review of an exec-summary agent I built over our org's goals, repos, and sprints. Tightening the instructions didn't fix it. Feeding it five summaries he'd actually written did — one pass later it learned the audience cares about clients onboarded and CSAT, not plumbing. Few-shot beat instruction-tuning.

Internal agent · build-in-public
Read all posts on LinkedIn ↗

Product Philosophy

How I Think

The principles that shape every product decision I make.

01
User problems over feature requests

Features are solutions in search of problems. I spend more time understanding why users behave the way they do than cataloguing what they ask for. The best product insights live one question deeper — "why does that matter to you?" is where the real brief is.

02
Quality is a number, not a vibe

For AI products especially, "it feels better" isn't shippable. I put an eval harness around the thing — precision, TPR/TNR, an LLM-as-judge with a golden set — so I can tell whether a prompt change helped, regressed, or just moved the demo. Metrics inform; judgment decides; but I refuse to fly blind.

03
Build the smallest thing that proves the point

Momentum beats perfection. I bias toward shipping a prototype that answers the key risk question over writing a detailed spec for a product no one has validated — then I iterate fast on real signal. Scoping RedInk to the top anomalies instead of all 1,371 filings was exactly this call.


Skills

What I Bring

Product
Product Strategy
Prioritisation
User Research
PRDs & Specs
Go-to-Market
AI & Evals
Evals & Measurement
Grounding / RAG
Agentic Workflows
Prompt Engineering
Build & Cloud
Google Cloud (GCP)
Claude API
MCP Protocol
Python

Get in touch

Let's build something worth trusting.

I'm looking for senior AI PM roles where grounding, evals, and shipping matter. If that's the bar you're hiring for, let's talk.