conformance

Conformance Tests for MCP

by modelcontextprotocol

Star on GitHub Forknpm

TypeScript

74 stars 59 forks 34 contributorsActive · 15h agoSince 2025 v0.1.16

Meet the team

See all 34 on GitHub →

pcarleton117 contributions

felixweinberger29 contributions

Bot

dependabot[bot]14 contributions

nbarbettini5 contributions

maxisbey4 contributions

mikekistler4 contributions

CaitieM203 contributions

localden3 contributions

Languages

View on GitHub →

TypeScript98.2%

JavaScript1.8%

Commit activity

Last 12 weeks · 79 commits

Full graph →

Community health

5 of 6 standards met

Community profile →

✓README✓License✓Contributing✓Code of Conduct○Issue Template✓PR Template

Recent PRs & issues

Active · Last activity 15h ago

See all on GitHub →

fix(server-stateless): declare the elicitation capability on the streaming-probe requestOpenPR

Fixes #382 What drives its streaming probe by calling under the scenario's shared envelope, whose is intentionally empty. The fixture legitimately needs the elicitation capability, so a capability-enforcing server MUST reject the probe with — the exact behavior requires and rewards earlier in the same scenario. Since #372 turned missing-prerequisite skips into hard failures, one check requires the rejection that makes the other untestable. This PR declares the capability at the probe site only: itself stays empty — depends on it. This matches the intent recorded for this check in #296 ("Call a tool that needs sampling/elicitation, scan stream frames") and strictly widens coverage: capability-enforcing servers become testable; servers that ignore capability declarations behave exactly as before. The sibling probe () was audited for the same latent problem: its fixture keys off , not a client capability, so it needs no change. Self-tests (prove it passes and fails) Two tests added to , modeled on the existing mock patterns: Green: a spec-correct, capability-enforcing mock (rejects without and without , each with a spec-shaped ) now passes BOTH formerly-contradictory checks. Against without the fix this test fails exactly as real servers do: the stream check reads . Guard: a server that rejects the streaming probe despite the declared capability still produces the red untestable outcome, not a vacuous SUCCESS. 333/333, clean. Real-SDK evidence (per CONTRIBUTING) Run against the PHP SDK (logiscape/mcp-sdk-php v2 ), whose everything-server enforces per-request capability declaration: Before: 27/28 — fails: After: 28/28. No regression against this repo's own example server (): 30/30 before and after (it does not enforce capabilities on this fixture, so the added declaration is inert there). Note on sequencing Draft PR #351 hoists/shares this same probe stream between checks; this fix is written against current and is a one-expression change at the probe site, so whichever lands first, the other rebases trivially. Happy to coordinate. 🤖 Generated with Claude Code

logiscapedev · 5h ago

server-stateless: `sep-2575-http-server-no-independent-requests-on-stream` probe never declares the elicitation capability its own fixture requiresOpenIssue

Describe the bug The scenario builds one shared request envelope with (src/scenarios/server/stateless.ts, ) and reuses it for every check. The check drives its streaming probe by calling with under that empty-capabilities envelope. Per SEP-2575, a server MUST reject a request that exercises a capability the client did not declare with — the exact behavior the sibling check (same scenario, same shared envelope, fixture ) verifies and rewards two checks earlier. So a spec-correct, capability-enforcing server necessarily answers the streaming probe with , and the check can never be exercised against it. Before #372 this reported SKIPPED; since #372 (alpha.8) it scores a hard failure: the 'test_streaming_elicitation' call was rejected (code -32021), so the response stream could not be exercised The scenario is now internally contradictory: one check requires the rejection that breaks the other. To Reproduce Steps to reproduce the behavior: 1. Run at against any server that enforces per-request capability declaration (observed with the PHP SDK, logiscape/mcp-sdk-php v2 , whose everything-server exposes the fixture): 27/28 checks pass; fails with the message above. The only ways to "pass" today are to strip the capability requirement from the fixture (defeating its purpose of demonstrating SEP-2322 elicitation-as-) or to stop enforcing capability declaration (failing ). Expected behavior Declare the capability the probe needs, at the probe site only — the shared must stay empty because depends on it: This matches the intent recorded in tracking issue #296 for this check ("Call a tool that needs sampling/elicitation, scan stream frames") and strictly widens the set of servers the check can exercise: capability-enforcing servers become testable, and servers that ignore capability declarations behave as before. Additional context Draft PR #351 refactors this same probe (shares the stream with a new check). I have a fix with a self-test ready and verified against a real SDK per CONTRIBUTING (both outcomes: the untestable failure without the change, 28/28 with it) and will open a PR referencing this issue.

logiscapedev · 5h ago

Fix SEP-986(-ish) strict tool name checks: align with SHOULD semantics and 2025-11-25 specOpenIssue

Describe the bug The suite already emits a check from (#240 / #238), but it does not correctly enforce the documentation-level SHOULD rules for tool names: 1. Wrong severity — violations emit , but every normative sentence in the spec prose is SHOULD / SHOULD NOT, not MUST. Per AGENTS.md, SHOULD requirements must emit (Tier-1 CI still treats WARNING as a failure — #245). 2. Wrong rules — the check validates chars and (from the original SEP-986 draft). The published 2025-11-25 and draft prose says chars and allows only , , , , , (no forward slash). 3. Wrong version gate — is tagged , but the Tool Names normative prose exists only from onward. Proof from the spec source on GitHub (, ): Absent — : (L177) jumps to (L193); no heading. Absent — : same structure — (L180) then (L198); no . Present — : (L217) under . Present — : (L308). 4. Misleading scenario prose — description and comments say tool names “MUST” match the format; the spec says SHOULD. 5. No negative vitest — unlike other wire/format checks (e.g. , ), there is no deliberately non-conformant fixture proving the check catches violations. 6. Schema vs docs gap — in , (via ) is an unconstrained with no , , or . Conformance is the right place to test the documentation SHOULD until/unless the schema adds constraints. Timeline note: SEP-986 was opened 2025-07-16 — nearly a month after the dated spec was published. The rule did not exist anywhere in the spec tree at the time that revision was cut; it landed first in draft (PR #1603, 2025-10-09) and only entered a dated release with . Tagging the check therefore backdates a requirement that postdates that version. Format drift (64 + → 128, no ) The finalized SEP file was not updated when the rule entered the spec — still says 1–64 chars and allows (example: ). When SEP-986 was integrated into the spec, PR #1603 wrote as 1–128 chars, only (no ; examples not ) — see the PR #1603 diff. No separate SEP introduced that change; the dated spec prose diverged from the SEP markdown at merge time. PR #1603’s additional context explicitly links typescript-sdk#900, whose validation matches the spec text (128 max, rejects in tests) rather than the SEP file — plausibly reference TypeScript SDK treatment driving what landed in the spec. Conformance #238 / #240 then encoded the stale SEP rules (64 + ), not the PR #1603 spec diff. Misalignment is still open on the SEP side: Solido on modelcontextprotocol#986 (Feb 2026) (2025-11-25 spec vs SEP on ); typescript-sdk#1502 closed in favor of tracking against current spec / #1512. No equivalent hint exists elsewhere in the * or trees (only “unique identifier” prose, unconstrained in schema, and illustrative examples like — not charset/length SHOULDs). The check should run only from onward (including draft), not on or . When was this introduced? To Reproduce Steps to reproduce the behavior: 1. Run against a reference server on : 2. Observe check — it runs even though the requirement is not in 2025-06-18 prose, and uses the 64-char / slash-allowed rules. 3. Point the suite at a server advertising a tool named (space) or .repeat(100) (valid under 2025-11-25 length, invalid under the harness’s 64-char rule) — behavior does not match the spec text. 4. Inspect in — violations return instead of . 5. Search and — no fixture for invalid tool names. Expected behavior For and draft only: After , validate each advertised against the spec prose for that version: Length SHOULD be 1–128 (inclusive) Characters SHOULD be only (no spaces, commas, , etc.) (Optional follow-up checks: uniqueness within server, case-sensitivity — harder to test passively from a single list call) Emit (not ) when any name violates the SHOULD rules; when all conform; when is empty (current behavior is fine). Do not emit the check for or (gate via on the check or scenario, or ). Update scenario description/comments to say SHOULD, not MUST. Add a negative vitest: broken server fixture advertising → expect . Add traceability rows (one check ID per SHOULD sentence exercised, or one consolidated row if that matches manifest convention). Resolve SEP-986 (64 + ) vs 2025-11-25 spec (128, no ) against the spec diff before coding — the check must track the dated spec text, not the stale SEP markdown alone. For / : check should not run (no false signal). Logs Example from the current (incorrect) implementation when a name violates the harness rule: Expected after fix: Additional context Precedent for enforcing documentation SHOULD semantics This repo already treats SHOULD-level spec prose as checks that Tier-1 SDKs must still pass: Convention: AGENTS.md — Severity follows the spec keyword. Negative-test precedent: + broken fixtures. JSON Schema gap in the machine-readable schema is only: No or length bounds. “Strict tool names” is therefore a prose SHOULD requirement until the schema catches up — same class of problem as other doc-only constraints conformance already tests. Prior issues and PRs (attempted, but problem remains) Several upstream efforts touched SEP-986 / tool names. None fully resolved the conformance gaps listed above (SHOULD severity, 2025-11-25 rules, version gate, negative fixture, traceability). Conformance repo () No open issue or PR in conformance tracks correcting #240's gaps. Spec repo () SDK repos (runtime validation — not conformance coverage) These implement warn-at-registration (or log) in SDKs. They do not replace a server-side conformance check against a live response. Takeaway for this issue #240 is the only merged conformance change to date. It closed #238 prematurely relative to the current 2025-11-25 / draft prose and AGENTS.md severity rules. This issue is a fix-and-complete follow-up, not greenfield work — consider referencing #238/#240 in the GitHub issue and optionally reopening #238 rather than filing from scratch. Adjacent open policy (not tool-name-specific) conformance#245 — whether SHOULD-level checks count toward Tier-1 (relevant if severity is corrected to ) Acceptance criteria [ ] uses for SHOULD violations on 2025-11-25 and draft [ ] Validation rules match 2025-11-25 / draft prose (128 max, no ) — confirmed against spec diff [ ] Check does not run on 2025-06-18 or 2025-03-26 [ ] Scenario description says SHOULD*, not MUST [ ] Negative vitest + broken-server fixture proves the check fires [ ] Reference SDK / everything-server passes on applicable versions [ ] traceability added if required by manifest workflow

canardleteer · 9h ago

Recent fixes

View closed PRs →

chore: bump version to 0.2.0-alpha.9MergedPR

Bump to for release. Contains exactly one change since : #376 — the assertion now matches the schema's object shape, so spec-correct servers pass the capability checks. After merge: dispatch the CI workflow with to publish, as with previous alphas.

felixweinberger · 15h ago

Structured data for AI agents

Repository: modelcontextprotocol/conformance. Description: Conformance Tests for MCP Stars: 74, Forks: 59. Primary language: TypeScript. Languages: TypeScript (98.2%), JavaScript (1.8%). Latest release: v0.1.16 (3mo ago). Open PRs: 35, open issues: 37. Last activity: 15h ago. Community health: 87%. Top contributors: pcarleton, felixweinberger, dependabot[bot], nbarbettini, maxisbey, mikekistler, CaitieM20, localden, pja-ant, Yuan325 and others.