libxev is a cross-platform, high-performance event loop that provides abstractions for non-blocking IO, timers, events, and more and works on Linux (io_uring or epoll), macOS (kqueue), and Wasm + WASI. Available as both a Zig and C API.
by mitchellhZig
Last 12 weeks · 4 commits
2 of 6 standards met
TL;DR On the IOCP backend, re-initializing/re-arming a while it is still linked in corrupts the event loop. 's resets the whole struct, clearing both and the intrusive pointer, which produces two symptoms from one root cause: 1. Null-result panic — later unwraps ("Completion queue items MUST have a result set") on the cleared result → . 2. Orphaned completions — because was also cleared, popping sets , permanently dropping every completion queued after (their callbacks never fire → silent stalls). Same defect as #169. This is the IOCP counterpart of the multi-queue-membership class that #169 describes and that #224 is fixing for kqueue — but no open PR touches . Verified mechanism (against 34fa508) does then (); asserts . — a default field, so in resets it to null. So if is in as and any path re-initializes it via (or the branch of ), then on the next drain: → null → panic at the unwrap in the completions loop (). → null → becomes null and , are orphaned. Secondary hazard: double-cancellation A canceled timer is pushed to with its state kept (so the active count can be decremented on processing). If runs on it while it is still pending there, it sees and schedules a second cancellation; the second then removes the timer from where it no longer exists (heap corruption) and pushes it to the queue again. How it was hit A Ghostty-based Windows terminal (ghostinthewsl) on IOCP: the cursor-blink timer (renderer thread) re-arms/resets/cancels the same timer completion from several events (blink re-arm, focus change, content-driven reset), so it can be re-initialized while still queued. Symptoms matched exactly — intermittent crashes, and on an older related port, input/scroll freezing preceding the crash (the orphaned-completion stall). Windows/IOCP is WIP per the README. Relation to #169 / #170 / #224 #169 is exactly this class (" not cleared when re-enqueuing a completion between queues"). #170 clears before re-enqueue but only in shared/kqueue paths — it does not touch . #224 (open) fixes three kqueue completion-lifecycle bugs, including a defensive before pushes a kevent-returned completion (its fix #3, "Fixes #169"). But #224 only changes / — IOCP is untouched, and none of these address the -null symptom. So the kqueue side of this class is being handled in #224; this issue is specifically the uncovered IOCP gap* (and the symptom). (The kqueue side is intentionally out of scope here — #224 owns it.) What I ran in production (a mitigation, not a fix) As a pragmatic guard in a fork, I skip a popped entry whose is null, before touching its reused state: Full patch (also carries a parity guard in kqueue, which I'd defer to #224): https://patch-diff.githubusercontent.com/raw/nanasess/libxev/pull/1.patch Why this mitigation is empirically sufficient for this case* With a single repeating timer, the queued completion typically has no successors, so clearing its orphans nothing — only the -null panic remains, which the guard skips. A build has run ~1 week with zero crashes (previously it crashed within minutes–hours). Note that in neither the safety check nor fires, so the corruption would be silent there; the explicit guard still takes effect. The general case (a queued completion with successors) is not mitigated — that needs the ownership fix. Suggested direction Consistent with #169 / #170 / #224: prevent a completion from being re-initialized/re-armed while it is a member of an intrusive queue (or dequeue it safely first), rather than clearing / after the fact. The correct fix touches the loop's ownership model, so I'm leaving the design to you. Disclosure / environment I'm not fluent in Zig, and I'm not confident this patch is the correct fix — it's a mitigation I verified empirically, not something I can vouch for at the internals level. This was diagnosed and written with AI assistance. I'm providing it as a reference patch only (not a PR) and leaving the proper fix to you. Windows 11 + WSL2, , . Minidumps / symbolized stacks available on request.
Three fixes for the kqueue backend, each with a reproducing test: 1. now flushes pending EV_DELETE changes before returning. When a disarm occurs in the event-processing path, the EV_DELETE was queued but exited before flushing it to the kernel, leaving a stale filter with a dangling udata pointer. Related to PR #209. 2. Completions that return without kqueue registration (close, shutdown, no-threadpool fallback) now set state to before re-entering the submissions queue. Previously the stale state caused to route through (cancellation) instead of (resubmit). Related to PR #170. 3. Defensive before pushing kevent-returned completions to the completions queue in . Prevents the queue.zig assertion failure when a completion is present in multiple queues concurrently. Fixes issue #169. 10 tests added (all fail without their respective fixes): tick(0) EV_DELETE flush (6 tests including a synchronous reproduction of the race path via cross-pipe callback write) rearm state for non-kqueue completions (1 test) queue re-enqueue invariants (3 tests)
This makes libxev work with Zig 0.16. Fair warning that this is messy.** The goal was to keep libxev working with minimal external API changes. That means we don't play nicely with yet and I'm not 100% sure what the path forward is for that since in many ways libxev is an alternate implementation but std.Io doesn't have full coverage for our functionality so we can't simply switch to it either. In cases wher we need an Io here, I use the global single threaded Io which preserves the Zig 0.15 behavior. Like I said, the goal is to get people who use libxev (including me) upgraded to 0.16, not to fully adapt to the new idioms. Particularly nasty is the large amount of shims we need in and to address removed APIs from Zig.
Repository: mitchellh/libxev. Description: libxev is a cross-platform, high-performance event loop that provides abstractions for non-blocking IO, timers, events, and more and works on Linux (io_uring or epoll), macOS (kqueue), and Wasm + WASI. Available as both a Zig and C API. Stars: 3529, Forks: 177. Primary language: Zig. Languages: Zig (98%), CSS (0.9%), C++ (0.6%), Nix (0.3%), JavaScript (0.1%). License: MIT. Topics: async, c, epoll, io-uring, kqueue, wasi, webassembly, zig. Open PRs: 16, open issues: 34. Last activity: 2d ago. Community health: 42%. Top contributors: mitchellh, dependabot[bot], Corendos, charlesrocket, ianic, steeve, recursiveGecko, linuxy, kcbanner, rockorager and others.