II · Writing
Writing.
Technical essays on coding agents, evaluation methodology, and shipping with AI. The work behind the work.
The three failure modes I see in every coding agent
2026-05-07 · 5 min
After designing adversarial tasks for coding-agent qualifications, the same patterns surface across Claude Code, Codex CLI, and Gemini CLI.
coding agentsfailure modesred-teamingWhy I built an executable benchmark for coding agents
2026-05-07 · 5 min
LLM-as-judge fails on code. Jest tests don't. The eval design behind the Coding-Agent Shootout.
evalscoding agentsmethodology