II · Writing
Writing.
Technical essays on coding agents, evaluation methodology, and shipping with AI. The work behind the work.
The three failure modes I see in every coding agent
2026-05-07 · 5 min
Months of designing adversarial tasks surface the same failure patterns across Claude Code, Codex CLI, and Gemini CLI.
coding agentsfailure modesred-teamingWhy I built an executable benchmark for coding agents
2026-05-07 · 5 min
LLM-as-judge fails on code. Jest tests don't. The eval design behind the Coding-Agent Shootout.
evalscoding agentsmethodology