Evals for Next.js up to 15.5.6 to test AI model competency at Next.js
by vercelTypeScript
Last 12 weeks · 104 commits
2 of 6 standards met
Adds experiment configs and results for Claude Sonnet 4.6, both with and without AGENTS.md. Base pass rate is 70% (14/20). With AGENTS.md the pass rate jumps to 100% (20/20), a +30 point delta. The six evals that flipped from fail to pass with docs are , , , , , and . No regressions. The map and in are updated to include the new experiments.
Repository: vercel/next-evals-oss. Description: Evals for Next.js up to 15.5.6 to test AI model competency at Next.js Stars: 232, Forks: 32. Primary language: TypeScript. Languages: TypeScript (90%), JavaScript (8.7%), CSS (1.3%). License: MIT. Homepage: https://nextjs.org/evals Open PRs: 10, open issues: 5. Last activity: 3d ago. Community health: 62%. Top contributors: gaojude, mclenhard, vercel[bot], timneutkens, quuu, elsigh.
Adds , an eval that tasks the agent with making client-side navigation to a products page instant using , , and `unstable_instantgetProducts@/lib/datawindow.__EXPERIMENTAL_NEXT_TESTING__.navigation.lock()unlock()@sparticuz/chromiumpuppeteer-corednfVERCEL=1@sparticuz/chromium/tmpinstant(page, fn)navigation-testing-lock.jsnode_modules/next/dist/docs/AGENTS.mdCLAUDE.mdGEMINI.mdunstable_instant` export).
Summary Add experiment configs and results for 7 models (Claude Opus 4.6, Claude Sonnet 4.5, Cursor Composer 1.5, Gemini 3.0 Pro Preview via Gemini CLI, Gemini 3.0 Pro Preview via OpenCode, GPT 5.2 Codex xhigh, GPT 5.3 Codex xhigh) Update export script to merge variants into their base experiments, attaching a field with baseline/docs success rates and newly passed/failed evals Standardize all agents-md experiments to use identical AGENTS.md prompt with BEGIN/END markers and deprecation notices Add GEMINI.md to all agents-md experiments Fix agent-023 eval: restore starting code to unsolved state, use absolute URL, remove cacheComponents trap Bump @vercel/agent-eval to 0.8.0 Bump all evals to next@16.2.0-canary.41 Results (with AGENTS.md) Zero regressions** — no eval that passed without AGENTS.md started failing with it. Test plan [x] All 7 pairs correctly matched and exported with docsImpact [x] All agents-md experiments use standardized prompt [x] GEMINI.md created for all agents [x] agent-023 eval restored to unsolved starting state [x] Bumped @vercel/agent-eval to 0.8.0 [x] All evals use next@16.2.0-canary.41
This PR amends .gitignore, changing from to because a number of outputs from Claude Code and dry runs from Grok had been committed. It also adds .DS_Store to prevent needless files from Mac users. I've also removed all of the outputs as I believe they were likely accidentally included. Please shout if there's a Contributor's guide/standard you'd like me to follow as I will likely have a few more upcoming PRs.