GitShow/BurntSushi/memchr
BurntSushi

memchr

Optimized string search routines for Rust.

by BurntSushi
bytesmemchrrabin-karprustsimdstringstring-searchingtwoway
Star on GitHubFork

Rust

1.5k stars148 forks52 contributorsActive · 4d agoSince 2015Unlicense

Meet the team

See all 52 on GitHub →
BurntSushi
BurntSushi186 contributions
bluss
bluss18 contributions
atouchet
atouchet6 contributions
waywardmonkeys
waywardmonkeys4 contributions
cholcombe973
cholcombe9734 contributions
mkroening
mkroening3 contributions
nicokoch
nicokoch3 contributions
allan2
allan23 contributions

Languages

View on GitHub →
Rust99.9%
Python0.1%

Commit activity

Last 12 weeks · 11 commits

Full graph →

Community health

3 of 6 standards met

Community profile →
57
✓README✓License✓Contributing○Code of Conduct○Issue Template○PR Template

Recent PRs & issues

Active · Last activity 4d ago
See all on GitHub →
HansBrende
Fix NeonMoveMask and align naming with actual semanticsOpenPR

Fixes #219 I also updated the naming to align with the semantics, given by the comment "The mask has all of its bits set except for the first N least significant bits", and saved a few bit ops.

HansBrende · 1mo ago
HansBrende
Show test failures for 219OpenPR

I'm only opening this PR to show the test failures when I revert my change from #220. Will close after the tests complete.

HansBrende · 1mo ago

Recent fixes

View closed PRs →
BurntSushi
arch: add `unsafe` to internal routinesMergedPR

These should have been marked . They are marked as such on and .

BurntSushi · 4d ago
Eh2406
UB through misuse of safe APIsClosedIssue

Hi, As part of starting my new role contracting for the Foundation helping deal with a flood of bugs found by the next generation of AI tooling (foundation > New Contractor Position: AI Security Engineer @ 💬 ) I have Minor UB in memchr, when incorrectly using safe APIs. The tool spit out this test case which is safe code that fails with MIRI. https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=39095e2573c976db66e2d70958f5edbf This involves a user who imports internal details of your library, but they are public. Then the user needs to ignore the documentation and pass a longer needle at runtime than they did at construction time, but these are safe Functions so users are allowed to ignore documentation. In the Rust community UB is not considered a reasonable punishment for misuse of safe public APIs. Some of the useful quotes out of the raw AI word vomit: Every in memchr 2.8.1 exposes a safe that decouples the search-time needle from the construction-time needle. The unsafe inner function does not validate arch::generic::packedpair::Finder::::find(haystack, needle)haystack.len() >= self.min_haystack_lenmin_haystack_len = max(construction_needle.len(), max_pair_index + V::BYTES)needle.len()find_in_chunkend.sub(needle.len())is_equal_raw(needle.as_ptr(), cur, needle.len())needle.len() > haystack.len()end.sub(needle.len())unsafe fntarget_feature# Safety` precondition, and the public wrapper does not mark its function unsafe. On a meta note, We are still trying to figure out how best to handle these AI reports. If you have advice on how I could do it better in the future, please let me know!

Eh2406 · 3w ago
BurntSushi
arch: fix undefined behavior in lower level (but public) APIsMergedPR

The main hitch here is that the lower level APIs require the caller to pass the needle in both the constructor and the search routine. The constructor does some pre-computation based on that needle that the surrounding code relied upon for soundness. If the caller passes in a different needle than what was given to the constructor, then one could trigger undefined behavior. This likely falls under "malicious caller" because it would be odd for the caller to not follow the documented contract (by passing two different needles). However, the contract is not a safety invariant and the routine is not marked . So this API was just plain unsound. The fix is thankfully very easy: subtract from to get a distance between the pointers and compare that with the needle length. The invariant between and is not dependent on the needle, so this subtraction is always safe. Fixes #225

BurntSushi · 3w ago
Structured data for AI agents

Repository: BurntSushi/memchr. Description: Optimized string search routines for Rust. Stars: 1490, Forks: 148. Primary language: Rust. Languages: Rust (99.9%), Python (0.1%). License: Unlicense. Topics: bytes, memchr, rabin-karp, rust, simd, string, string-searching, twoway. Open PRs: 23, open issues: 14. Last activity: 4d ago. Community health: 57%. Top contributors: BurntSushi, bluss, atouchet, waywardmonkeys, cholcombe973, mkroening, nicokoch, allan2, alexcrichton, dflemstr and others.

·@ofershap

Replace github.com with gitshow.dev