Last 12 weeks · 1 commit
2 of 6 standards met
I have byte slices of which I would like to obtain the number of characters, as given by 's . I have noted that when the byte slice is a valid UTF-8 string, it is about 8x faster to convert the byte slice to UTF-8 and then call instead: ~~~ rust fn len(b: &[u8]) -> usize { use bstr::ByteSlice; // 1. only use //b.chars().count() // 2. use only as fallback if the bytes cannot be converted to UTF-8 core::str::from_utf8(b).map_or_else( s.chars().count()) } fn main() { let mut s = String::new(); use bstr::ByteSlice; while len(&s.as_bytes()) < 10000 { s.push('a'); } } ~~~ (I also performed this experiment in a real-world program to exclude that the Rust compiler does some nifty optimizations here that dilute the results, but I still got roughly the same factor of performance difference.) ~~~ $ hyperfine -M 5 -L v only,fallback "target/release/bstr-{v}" Benchmark 1: target/release/bstr-only Time (mean ± σ): 2.505 s ± 0.016 s [User: 2.495 s, System: 0.001 s] Range (min … max): 2.487 s … 2.522 s 5 runs Benchmark 2: target/release/bstr-fallback Time (mean ± σ): 313.8 ms ± 0.2 ms [User: 311.7 ms, System: 0.9 ms] Range (min … max): 313.5 ms … 314.0 ms 5 runs Summary target/release/bstr-fallback ran 7.98 ± 0.05 times faster than target/release/bstr-only ~~~ This came very unexpected for me, to the point that I think it should either be pointed out in the documentation or, even better, be sped up in the implementation. I have looked at 's , but I have not seen obvious ways to speed it up. The question is: Why is + so fast that their combination can walk through the whole bytes _twice_ in 1/8 of the time that walks through once? Is this some dark SIMD magic? The source of uses , do you think it would be worthwhile to adapt 's to use something more akin to that? I would be motivated to try that, but first, I would like to hear your opinion on this.
Repository: BurntSushi/bstr. Description: A string type for Rust that is not required to be valid UTF-8. Stars: 1046, Forks: 68. Primary language: Rust. Languages: Rust (96.2%), Shell (3.8%). Topics: byte-string, bytes, graphemes, substring, substring-search, unicode, utf-8. Open PRs: 9, open issues: 24. Last activity: 3w ago. Community health: 42%. Top contributors: BurntSushi, lopopolo, joshtriplett, atouchet, m-ou-se, LingMan, TethysSvensson, erickt, ggriffiniii, Freaky and others.
Rust
This feature was renamed. Hopefully it will be stabilized soon. Fixes #217