Last 12 weeks · 1015 commits
4 of 6 standards met
Problem A durable agent submission can exhaust its recovery budget while a framework-owned task tool call is still open in the parent conversation. The child task conversation is durable and reattachable, but retry exhaustion currently runs before the safe task-child repair path. The parent stream can then record a terminal submission advisory like without a matching terminal event for the parent tool_start. Root cause hypothesis checks before the recovery branch. The safe repair path for unresolved task calls exists in ( -> -> ), but retry exhaustion bypasses it. Desired behavior Before terminalizing an exhausted durable submission, Flue should make parent task-tool state deterministic: If an unresolved parent call has a retained child session, reattach/repair it when safe, or emit a deterministic terminal task tool error if repair is unsafe. The parent transcript should not contain a tool_start with only a submission-level advisory and no terminal tool event/result. If the submission still fails, the terminal advisory should include structured interrupted tool metadata that apps can use to settle their own run state. This is not asking Flue to blindly replay arbitrary tools. The special case is framework-owned calls with durable child conversation topology already retained by Flue.
Problem The direct agent prompt APIs appear to generate inside Flue rather than letting the caller provide one. That makes direct prompts hard to use safely from an application authority that needs durable, idempotent admission across retries. The issue is separate from #307: #307 fixed durable user-message replay after a direct prompt is admitted. This request is about exposing the admission identity at the public API boundary so the application can own retry/reconciliation semantics. Current versions checked: , , and . Distributed-systems failure case A typical application flow looks like this: 1. App authority commits product intent locally, for example . 2. App calls Flue direct prompt over a network/runtime boundary. 3. Flue may successfully admit the prompt. 4. The response may still be lost, the Worker may be replaced, or the caller may crash before recording the receipt. 5. The app retries the same product intent. For this to be safe, the caller needs to retry with the same stable operation identity. If Flue generates only after the call crosses the boundary, the caller cannot prove whether a retry is the same admission or a new admission. That pushes applications toward duplicate prompts, custom sidecar wrappers, or weaker reconciliation logic. This is the usual at-least-once distributed systems shape: exactly-once delivery is not available across the boundary, so the callee should accept a caller-owned idempotency key / operation identity for the mutation. Desired behavior Allow callers to pass an optional through direct prompt APIs. If omitted, Flue can keep today’s generated-id behavior. Sketch: For React, the same should be possible from options: Expected semantics: The supplied becomes the canonical submission identity for the direct prompt. Retrying the same direct prompt with the same should reconcile to the same durable submission/admission rather than create a second logical input. If Flue cannot guarantee full idempotent no-op semantics yet, still accepting and persisting caller-supplied is useful because it gives applications a stable correlation/fencing key for reconciliation. If is omitted, current behavior remains unchanged. Why this matters Flue already has strong runtime/session machinery. Applications built on top of it still need to coordinate product authority with Flue admission. Without caller-supplied , the boundary forces app-level wrappers around direct prompts just to recover from commit-success/response-lost and replacement races. With caller-supplied , the clean layering is: application authority owns product intent and persists the stable submission id; Flue owns runtime admission/execution/replay for that submission; retries and late receipts can be fenced by the same id instead of inferred from timing. Local patch shape that worked for us We patched the public API and runtime path narrowly: / React accept ; SDK direct prompt requests include it when present; direct runtime payload schema accepts optional ; direct submission input uses ; Node and Cloudflare direct-admission paths pass the payload value through. That was enough for our application boundary to use Flue’s own submission identity instead of inventing a separate wrapper identity.
Describe the Bug Any model resolved through whose id is not in pi-ai's catalog (gateway presets, brand-new model ids) hydrates from , which hardcodes (). The registration API offers no way to override this — the per-model record only accepts and (). Downstream, pi-ai's → replaces every image block with the literal text whenever lacks . Tool-result images are gated on the same check. Net effect: images sent via the option on / (the feature shipped in #99) silently never reach any model behind a registered provider, unless the exact model id happens to be in pi-ai's catalog. The user message still carries the `prepareImagePrompt()@preset/...registerProvider()contextWindowmaxTokensresolveModel('openrouter/@preset/my-vision-model')input: ['text', 'image']resolveModel('openrouter/@preset/vision-model').input['text']session.prompt(text, { images: [{ type: 'image', data, mimeType: 'image/png' }] })"(image omitted: model does not support images)"inputbuildModelFromRegistrationtsc --noEmit` clean): https://github.com/steven4354/flue/commit/addf7e30 Happy to open a PR from that branch if invited (per CONTRIBUTING.md, not opening one unsolicited). 🤖 Filed with Claude Code
Summary Flue's model-facing tool surface today is two paths flattened into one array passed to the harness: Built-in s assembled in (, , , , , , optional ). User-supplied accepted at / / / scope. () returns over HTTP/SSE — the user wires those tools into the same slot manually. No stdio MCP today. Both flatten in and reach as one parallel-tool-call array (). The model sees every tool as a separate function-call definition. This proposes two related additions, both opt-in and additive: 1. A tool broker — a single registry that all tool sources feed into, with normalized / , a scoped secrets store, a binary / policy chain, and centralized lifecycle (so users stop hand-rolling per server). Inspired by 's plugin/dispatch shape; not a dependency on it. 2. A code-mode execution path — a second, opt-in way for the broker to expose its tools to the model. Instead of N function-call definitions, the model receives a generated TypeScript API and a single tool, then writes one snippet that orchestrates many calls in a sandbox. Pattern from https://blog.cloudflare.com/code-mode/. Motivation Discovery is fragmented. No single — agents must know which connection or which array each tool came from. No shared secrets or policy. Today each MCP server gets its header inline at the call site; a custom reads itself. There's no scoped secrets store and no place to express "block all writes during a planning prompt" — running is treated identically to . Connection lifecycle is hand-rolled. Every call needs a matching — typically . Multiplied across servers and call paths this gets noisy. Adding sources is open-coded. A future OpenAPI / GraphQL adapter would re-implement registration, schema normalization, and lifecycle. Token cost grows with tool count. With many MCP servers attached, every turn ships every tool schema. Code mode eliminates per-step LLM round-trips for chained calls and lets the model handle a richer typed API than flat function definitions. Proposal — Part 1: Broker A owns registration and dispatch; providers are the extension point. Existing paths become the first three providers; future sources plug in identically. Before — manual connection lifecycle, secrets inline, no policy hook (Adapted from the README "Remote MCP Tools" example.) After — one registration site, one secrets store, one policy chain, automatic lifecycle Provider interface (modeled on executor's plugin shape, simplified): Policies are binary ( | ), evaluated first-match-wins per the scope stack (prompt > session > init). No mid-execution elicitation — flue's primary deployment targets (Workers, GH Actions) are unattended. A blocked invoke returns a structured error to the model so it can adapt. Proposal — Part 2: Code mode (opt-in exposure) Today every registered tool becomes a separate function-call definition the model picks from. With many providers attached, that means many schemas every turn, and data-dependent chains* (where step N+1 needs the parsed result of step N) round-trip the LLM at every step, with every intermediate result flowing through the model's context window. Code mode flips this: the broker generates a TypeScript namespace from the registered tools and gives the model a single tool. The model writes one snippet; it runs in a sandbox; tool calls are local function invocations. What the model sees in code mode (generated from registered tools): Before — function-calling, data-dependent chain Task: for each open PR with no review, look up CODEOWNERS for its changed files and assign the matching owner. Six turns. Every intermediate result (~85 KB) ships through the model's context, even though the model never needed the raw bodies — it only used , , file paths. After — code mode, one snippet One turn, one model-side payload (the snippet), one result (the summary). The 85 KB of intermediate data stays in the sandbox. Sandbox. Two viable starting points: QuickJS (small, embeddable, Workers-compatible) or Deno subprocess (richer stdlib, Node-only). Default behavior: each call inside the snippet is marshalled out of the sandbox and dispatched through on the host, so policies, secrets, and providers behave identically — the only difference is who is asking. Snippet stdout returns to the model; thrown errors are surfaced as structured results so the model can repair and re-run. Where each tool actually runs (important — three different runtimes are now in play): The code-mode sandbox is only the runtime for the orchestration snippet itself. It does not host MCP servers (flue doesn't host them anywhere — they're remote), it does not host built-in tools (those live in the agent sandbox), and it does not see secrets directly. This keeps the policy/secrets enforcement boundary on flue's host — a malicious or buggy snippet cannot bypass policies by calling MCP directly, because it has no network access to MCP endpoints. A future per-provider could allow the snippet to call certain providers directly (skipping the host RPC) for performance — out of scope for this issue but called out in open questions. Why both modes. Function-calling stays the default for short-horizon tasks and tool-poor agents (lower latency, no sandbox boot, easier debugging). Code mode wins when tool count is high or the call chain is deep and data-dependent. A future could pick per-turn — out of scope for this issue but the config slot is reserved. How current code maps in The broker is purely additive. Existing and call sites can keep working as thin wrappers over it for one or two minor versions before deprecation. Non-goals DB-backed catalog (executor has one; in-memory + file-secrets is enough for v1). Cross-process catalog sharing. Mid-execution elicitation / interactive approval — unattended-only. Stdio/subprocess MCP support (still out of scope per the README; stays HTTP/SSE). exposure heuristic (mentioned only to keep the API forward-compatible). per-provider (snippet calls a provider directly, bypassing host broker) — interesting for perf but defer. Open questions 1. Naming — , , or just ? 2. Schema source of truth — (current) or accept JSON Schema verbatim from providers? 3. Sandbox default for code mode — QuickJS (Workers-compatible) or Deno (richer)? Or ship both with an adapter interface? 4. Code-mode binding generation — emit to the prompt verbatim, or a more compact shape? 5. MCP runtime / isolation in code mode. Confirm the proposed default: snippet sandbox has no* direct network or MCP access, every call RPCs back through the host broker. Alternative designs to consider: (a) per-server to let the snippet call MCP HTTPS endpoints directly (faster loops, but secrets must enter the sandbox and policy enforcement is bypassed); (b) if/when stdio MCP ever lands, decide whether stdio MCP processes are spawned by the host or inside the code sandbox. Worth pinning before any prototype. 6. Workers compat for secrets — file store won't work; need a KV/Secrets-binding adapter spec. 7. Deprecation window for once lands. Prior art — and show the registry + plugin shape this proposal mirrors. Differences: executor uses Effect for fiber pause/resume and ships a DB-backed catalog; this proposal stays plain-async + in-memory to match flue's footprint, and drops elicitation since flue is unattended-first. Cloudflare's code mode (https://blog.cloudflare.com/code-mode/) — V8-isolate sandbox, MCP-schemas-as-TypeScript, single tool. Argues that LLMs handle real TS APIs better than tool-schema function-call lists, and that local chaining beats per-step LLM round-trips. MCP itself — already gives flue a normalized external-tool model; this proposal generalizes that pattern beyond MCP and adds an alternative model-facing exposure.
When wrapping a generated Cloudflare agent Durable Object class (for example to apply platform instrumentation via the extension's wrap callback), the callback's public type is the structural agent-extension class, not the generated Durable Object constructor. As a result, downstream code has to assert/verify constructability at runtime to bridge from the structural extension type to the concrete Durable Object class shape the platform instrumentation expects, and repeat that assertion after wrapping. It would be useful for the extension/wrap surface to expose (or preserve) the generated Durable Object class type so consumers can wrap it with platform instrumentation without runtime constructability assertions or unsafe casts.
Downstream clients that render agent progress need structured, durable UI milestones without reconstructing them from transcript text, tool output strings, or private application state. Some agent workflows have user-visible progress that is not exactly a chat message: status milestones, tool/delegation progress, resumable workflow progress, approval gates, and other activity that should appear in a timeline or activity panel. Without a typed public shape for these milestones, clients have to stitch together separate state fields and conversation/tool messages, which makes renderers brittle and difficult to keep aligned with runtime behavior. It would be useful for Flue to expose typed public activity/data parts or equivalent metadata for agent progress, for example: stable activity IDs activity kind/type display visibility associated turn/submission identity when applicable status/progress fields resumable/approval state when applicable timestamps suitable for ordering without client-side guessing The goal is for clients to render durable agent activity directly from Flue's public conversation/runtime contract, rather than deriving timeline rows from free text or private state shape.
Repository: withastro/flue. Description: The sandbox agent framework. Stars: 7026, Forks: 402. Primary language: TypeScript. Languages: TypeScript (91.1%), JavaScript (4.3%), Astro (3.6%), MDX (0.6%), CSS (0.4%). License: Apache-2.0. Homepage: https://www.flueframework.com Open PRs: 0, open issues: 8. Last activity: 4h ago. Community health: 87%. Top contributors: FredKSchott, stainlu, cpojer, ketankhairnar, elithrar, github-actions[bot], mhart, chris-plucker, zozo123, toonverbeek and others.