Open-Source AI Coding
Benchmark Suite
I'm open-sourcing the benchmark suite I use to evaluate coding performance on real development work. Join the waitlist and I'll send you the repo, setup guide, and launch note as soon as it's ready.
What You'll Get at Launch
Everything you need to inspect the benchmark, run it locally, and use the same framework with your own models.
Real-World Tasks
Benchmark on actual bug fixes, refactors, and builds instead of toy prompts and synthetic scores.
Reproducible Runs
Use the same benchmark flow I use to baseline model performance and compare runs fairly.
Open Repo + Docs
Get the benchmark code, setup notes, and starter guidance so you can run the suite yourself.
Launch Updates
Be first to know when the repo goes live and when new benchmark cases and results are added.
Get Notified When the Repo Is Ready
Join the waitlist and I'll email you as soon as the benchmark repo is public, documented, and ready to use.