evalite

Evaluate your LLM-powered apps with TypeScript

by mattpocock

ai evals typescript

Star on GitHub Fork

1.4k stars 64 forks 20 contributorsActive · 1w agoSince 2024

Meet the team

See all 20 on GitHub →

mattpocock525 contributions

Bot

github-actions[bot]74 contributions

christianklotz13 contributions

DogPawHat10 contributions

Languages

View on GitHub →

TypeScript89.8%

MDX7.9%

CSS1.1%

JavaScript0.8%

HTML0.2%

Astro0.2%

Community health

3 of 6 standards met

Community profile →

✓README✓License✓Contributing○Code of Conduct○Issue Template○PR Template

Recent PRs & issues

Active · Last activity 1w ago

See all on GitHub →

v1OpenPR

mattpocock · 1w ago

Recent fixes

View closed PRs →

`wrapAISDKModel` fails on Groq's `openai/gpt-oss-120b`ClosedIssue

As per title, the wrapper won't work with groq models.

teocns · 6d ago

Evalite peer dependency conflict with AI SDK v6 (requires ai@^5)

Structured data for AI agents

Repository: mattpocock/evalite. Description: Evaluate your LLM-powered apps with TypeScript Stars: 1401, Forks: 64. Primary language: TypeScript. Languages: TypeScript (89.8%), MDX (7.9%), CSS (1.1%), JavaScript (0.8%), HTML (0.2%). License: MIT. Homepage: https://www.evalite.dev/ Topics: ai, evals, typescript. Latest release: evalite@0.19.0 (3mo ago). Open PRs: 9, open issues: 41. Last activity: 1w ago. Community health: 57%. Top contributors: mattpocock, github-actions[bot], christianklotz, DogPawHat, jacobparis, tyom, dschlabach, vojtaholik, cdavis1324, iamladi and others.

Migrate AI SDK integration from v5 to v6ClosedIssue

Problem Statement Evalite's AI SDK integration is pinned to v5 (, ). AI SDK v6 has been released with breaking changes including renamed types (V2→V3), deprecation of /, and renamed test mocks. Users adopting AI SDK v6 in their projects cannot use it with the current version of Evalite since the peer dependency requires . Solution Upgrade Evalite's AI SDK integration from v5 to v6. This includes bumping all dependencies, migrating to the new type names, replacing deprecated calls with the new + pattern in all built-in scorers, and updating documentation to reflect the new APIs. The public API surface will use version-agnostic type aliases (, from ) instead of version-specific types, future-proofing against further SDK version bumps. User Stories 1. As a developer using AI SDK v6 in my project, I want Evalite to accept v6 as a peer dependency, so that I don't have conflicting AI SDK versions in my project. 2. As a developer writing eval files, I want to work with AI SDK v6 models, so that my tracing and caching continue to function after upgrading. 3. As a developer using Evalite's built-in scorers (faithfulness, answerCorrectness, answerRelevancy, contextRecall, noiseSensitivity), I want them to work with AI SDK v6 models, so that I can evaluate my LLM outputs without errors. 4. As a developer using or scorers, I want to pass AI SDK v6 embedding models without generic type parameters, so that the API matches the v6 type system where no longer takes a generic. 5. As a developer reading Evalite's TypeScript types, I want scorer option types to use and (version-agnostic), so that type names don't break on future SDK version bumps. 6. As a developer using from in my tests, I want Evalite's test fixtures and examples to use the same mock class, so that the codebase is consistent with v6 conventions. 7. As a developer reading Evalite's documentation, I want the API reference to show the correct type signature for v6, so that I'm not confused by outdated references. 8. As a developer reading the "Works With All AI SDK Methods" docs, I want to see the current + pattern for structured output instead of the deprecated / examples, so that I learn the recommended approach. 9. As a developer running Evalite's integration tests, I want the ai-sdk-traces and ai-sdk-caching tests to pass with v6 dependencies, so that I have confidence the middleware layer works correctly. 10. As a developer looking at the example package, I want the example eval files to work with v6 dependencies, so that I can reference them as working code samples. Implementation Decisions Dependency updates peer dependency: → dev dependency: → : → (in test and example packages) All packages in the monorepo (root, evalite, evalite-tests, example, evalite-ui) get their AI SDK deps bumped Type migration strategy Public-facing types use version-agnostic aliases imported from : , Internal middleware types use version-specific types from : , drops its generic parameter (was , now just ) generateObject → generateText + Output migration All built-in scorers currently use with . These migrate to where the result changes from to . The helper should remain compatible as a valid schema for . Call sites: (3 call sites: decomposeIntoStatements, evaluateStatementFaithfulness, evaluateStatementsSimple) (1 call site) (1 call site) (1 call site) Mock class renames → across all test fixtures and example files Constructor shape must be verified after install — the V3 mock may have a different response structure Middleware verification The and middleware callbacks in must be verified against the V3 interface after installing v6 Key areas to check: structure, shape, shape Token usage fields used by Evalite (, , ) are unchanged in v6 Documentation updates : Update → in signature, replace / examples with / + pattern Scorer docs already use version-agnostic types — no changes needed : Review for any v5-specific references Changeset Minor version bump for the package Testing Decisions Existing integration tests are sufficient. The and test suites in already exercise the middleware layer (tracing, caching, cache config precedence, cache config disabled). If these pass with v6 deps, the migration is verified. No new tests are required specifically for the → migration in scorers, since the scorer behavior is unchanged — only the underlying AI SDK call mechanism changes. The prior art for these tests is in — they use (after rename) for deterministic testing without real API calls. A good test here tests external behavior (traces are captured, cache hits/misses work correctly) rather than implementation details (which AI SDK function was called internally). Out of Scope Migrating from to Zod schemas — The helper still works in v6 and the scorers' hand-written JSON schemas are well-tested. A Zod migration could be done separately. New AI SDK v6 features — ToolLoopAgent, stable MCP support, DevTools integration, enhanced reranking, etc. are not part of this migration. Dual v5/v6 support — Evalite will require after this change. Users on v5 should stay on the previous Evalite version. async migration — Evalite does not use /, so this v6 breaking change does not apply. Provider-specific changes — OpenAI's default change, Azure Responses API, Vertex metadata key rename, etc. are provider concerns and don't affect Evalite's core. Further Notes An automated codemod exists () but we're doing all changes manually for full control. The type from (used in ) is unchanged in v6. The type (already used in ) is a version-agnostic alias that existed in v5 and continues in v6 — it simply points to the latest version-specific type.