Last 12 weeks · 11 commits
3 of 6 standards met
Hi, As part of starting my new role contracting for the Foundation helping deal with a flood of bugs found by the next generation of AI tooling (foundation > New Contractor Position: AI Security Engineer @ 💬 ) I have Minor UB in memchr, when incorrectly using safe APIs. The tool spit out this test case which is safe code that fails with MIRI. https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=39095e2573c976db66e2d70958f5edbf This involves a user who imports internal details of your library, but they are public. Then the user needs to ignore the documentation and pass a longer needle at runtime than they did at construction time, but these are safe Functions so users are allowed to ignore documentation. In the Rust community UB is not considered a reasonable punishment for misuse of safe public APIs. Some of the useful quotes out of the raw AI word vomit: Every in memchr 2.8.1 exposes a safe that decouples the search-time needle from the construction-time needle. The unsafe inner function does not validate arch::generic::packedpair::Finder::::find(haystack, needle)haystack.len() >= self.min_haystack_lenmin_haystack_len = max(construction_needle.len(), max_pair_index + V::BYTES)needle.len()find_in_chunkend.sub(needle.len())is_equal_raw(needle.as_ptr(), cur, needle.len())needle.len() > haystack.len()end.sub(needle.len())unsafe fntarget_feature# Safety` precondition, and the public wrapper does not mark its function unsafe. On a meta note, We are still trying to figure out how best to handle these AI reports. If you have advice on how I could do it better in the future, please let me know!
The main hitch here is that the lower level APIs require the caller to pass the needle in both the constructor and the search routine. The constructor does some pre-computation based on that needle that the surrounding code relied upon for soundness. If the caller passes in a different needle than what was given to the constructor, then one could trigger undefined behavior. This likely falls under "malicious caller" because it would be odd for the caller to not follow the documented contract (by passing two different needles). However, the contract is not a safety invariant and the routine is not marked . So this API was just plain unsound. The fix is thankfully very easy: subtract from to get a distance between the pointers and compare that with the needle length. The invariant between and is not dependent on the needle, so this subtraction is always safe. Fixes #225
Repository: BurntSushi/memchr. Description: Optimized string search routines for Rust. Stars: 1490, Forks: 148. Primary language: Rust. Languages: Rust (99.9%), Python (0.1%). License: Unlicense. Topics: bytes, memchr, rabin-karp, rust, simd, string, string-searching, twoway. Open PRs: 23, open issues: 14. Last activity: 4d ago. Community health: 57%. Top contributors: BurntSushi, bluss, atouchet, waywardmonkeys, cholcombe973, mkroening, nicokoch, allan2, alexcrichton, dflemstr and others.