back to projects
public · MITapplied ai
Sentinel
AI code review with hybrid retrieval and a deterministic eval harness.
Eval fixtures98Hand-curated PR ground truth
Categories scored4Security · bug · perf · style
CI gate−5% F1Any category regression fails build
The problem
The market is full of "AI code review" wrappers around a prompt. The hard part is not generating text — it is knowing whether the system catches real issues without fooling yourself. Sentinel separates the production pipeline, a deterministic scorer, and a curated eval set so quality is measurable, not vibed.
Architecture
GitHub webhook (HMAC verified, X-GitHub-Delivery idempotent)→
FastAPI service→
hybrid retrieval over PR history: BM25 for exact identifiers + pgvector dense embeddings, fused via RRF→
structured Pydantic v2 review with cost guardrails (daily budget + per-PR cap + circuit breaker)→
deterministic scorer over 98 fixtures yielding per-category P/R/F1→
CI gate that fails the build if any category regresses >5%.
Key decisions
PythonFastAPINext.jsPostgreSQLpgvectorBM25Docker