Last 12 weeks · 177 commits
1 of 6 standards met
Repository: sveltejs/ai. Description: All things AI related...current home of the svelte LLM benchmark Stars: 37, Forks: 2. Primary language: TypeScript. Languages: TypeScript (88.7%), Svelte (10.9%), JavaScript (0.3%), Shell (0.1%). Homepage: https://sveltejs.github.io/ai/ Open PRs: 3, open issues: 3. Last activity: 4w ago. Community health: 50%. Top contributors: khromov, paoloricciuti.
Test Name svelte-reactivity Test Description It would be nice to test the ability of using purposefully non-deeply reactive values correctly, avoid common gotchas and avoid letting LLMs reimplement svelte/reactivity functionality. For example: https://svelte.dev/playground/6133806fa4ff4ce09e2f0d393e1d6084?version=5.46.4 Usage of createSubscriber too -- but I can't think of a way of testing a generalized approach that checks footguns like creating useless subscribers or a new one on every "get", etc. (Not in the proposed prompt) Skills would help a lot here, but interested in seeing their "native" approach. Proposed prompt was generated by Gemini 3 Flash, but changed most of it since it was too extensive. Even now it's not very generic. Proposed Prompt Component Task Create a dashboard that tracks multiple counters using different data structures. The goal is to demonstrate the nuances of Svelte 5's deep reactivity, specifically focusing on why values require explicit wrapping compared to standard POJOs. Requirements Deep Reactivity (POJO): Show a standard nested object where properties are automatically reactive (e.g., ). Correct SvelteMap Usage: Implement a SvelteMap where values are explicitly wrapped in before being stored, ensuring deep reactivity works as expected. Safe Lifecycle**: Ensure a proper cleanup function is returned. Reference Implementation (optional) Additional Context _No response_
Right now, the models are scored solely based on the amount of test that they pass...a more nuanced score that also involves how the tests are passing would be wonderful. This could involve: Whether the model is using the MCP server or not Whether the model is using the Test tool or not The amount of step it took to complete The number of tokens it took to complete Possibly cost (?) Other ideas?
This is to show the generated code as highlighted in the html report