Last 12 weeks · 4 commits
2 of 6 standards met
Hello! I would like to propose an optimization for XZ decompression. Currently, XZ decompression is performed sequentially in a single thread. We have implemented a concurrent parallel block decompressor in a fork of the underlying dependency at (which functions as a drop-in replacement). By leveraging the XZ format's native index block boundaries, the parallel decompressor parses the index backwards in O(1) time and decompresses independent blocks concurrently using a worker pool. To keep memory utilization bounded and prevent Garbage Collector overhead, we also introduced pooling for both decompressed block buffers and the large LZMA decoder dictionary slices. Benchmarks (on a 2-core / 4-thread CPU with a 20MB payload): Vanilla decompressor: ~14 MB/s Optimized sequential decompressor: ~30 MB/s Optimized parallel decompressor:** ~70 MB/s On systems with 4, 8, or more physical cores, the throughput scales near-linearly, easily exceeding 120+ MB/s. We can utilize this optimization when the input source implements and the total compressed stream size is known. Integration Instruction To utilize the parallel reader within your Go code when dealing with random-access inputs, you can check if the underlying stream supports seeking and pass it to the parallel decompressor.
Summary This pull request implements multi-threaded, concurrent XZ decompression for independent blocks when the input stream supports random access. This change is a direct response to the issue described in #75. Performance Impact By leveraging block-level parallelism, decompression performance scales near-linearly with available CPU cores on multi-block XZ files. Sequential baseline: ~14–32 MB/s Parallel decompression (multi-core): Up to 70–80 MB/s (on modern quad-core CPUs) Changes & Implementation Details 1. Dependency Switch: Replaced the original dependency with the fork. The fork includes a performance-optimized sequential decoder (using register caching and manual inlining) and the new worker pool implementation. 2. Interface Check: The implementation in now checks if the incoming implements (supporting both and ). 3. Parallel Activation: When seek/read-at capabilities are verified, we calculate the stream size, account for potential starting offsets (by wrapping with ), and launch the concurrent . 4. Resilient Fallback: If the stream is unseekable (standard sequential pipes) or initialization fails, the decoder safely falls back to standard sequential decompression without errors. 5. Testing: Added a comprehensive suite of unit tests in covering happy paths, offset streams, sequential fallbacks, seek error recovery, and partial reads. Fixes #75
What would you like to have changed? I think this mostly applied to , but may apply to others as well. There's currently a way to skip preserving UID and GID during the archive creation process. I'd like to request the same for extraction as well. Why is this feature a useful, necessary, and/or important addition to this project? I'm using it inside a project where I don't really care about the UID and GID because they're modified later anyway. My terminal is flooded with a lot of warnings which I'd rather not see. What alternatives are there, or what are you doing in the meantime to work around the lack of this feature? I'm thinking about processing the output, and potentially filtering these warnings out. Please link to any relevant issues, pull requests, or other discussions. N/A
What version of the package or command are you using? github.com/mholt/archives v0.1.5 Regressed from: https://github.com/mholt/archiver/issues/338 What are you trying to do? with a zip file that does not have explicit directory entries. Archiver works fine when there are explicit directory entries like: What steps did you take? This returns: What did you expect to happen, and what actually happened instead? Should Return How do you think this should be fixed? I'm guessing the call should be smarter to understand that if there are files listed with directories not represented in the dir list that they should have a entry stubbed in. Please link to any related issues, pull requests, and/or discussion Original Fix: https://github.com/mholt/archiver/pull/339 Original Bug: https://github.com/mholt/archiver/issues/338 Bonus: What do you use this package for, and do you have any other suggestions or feedback? to browse different archives and archive file nesting.
What version of the package or command are you using? v0.1.5 What are you trying to do? Unpack ImageMagick release archive What steps did you take? Created PR to add ImageMagick to aqua registry in https://github.com/aquaproj/aqua-registry/pull/50281 and run CI tests that failed to unpack the ARM64 archive What did you expect to happen, and what actually happened instead? Expected unpacking archive OK, got error , see https://github.com/aquaproj/aqua/issues/4639 How do you think this should be fixed? According to @bodgit, author of , it can be fixed by upgrading to v1.6.2, released 3 weeks ago with support for ARM64 executable compression, see https://github.com/bodgit/sevenzip/issues/449 Please link to any related issues, pull requests, and/or discussion Original PR that hit the error: https://github.com/aquaproj/aqua-registry/pull/50281 Aqua issue: https://github.com/aquaproj/aqua/issues/4639 Sevenzip issue: https://github.com/bodgit/sevenzip/issues/449 Bonus: What do you use this package for, and do you have any other suggestions or feedback? Aqua is a package management tool and uses archives to unpack release archives.
Repository: mholt/archives. Description: Cross-platform library to create & extract archives, compress & decompress files, and walk virtual file systems across various formats Stars: 427, Forks: 40. Primary language: Go. Languages: Go (100%). License: MIT. Homepage: https://pkg.go.dev/github.com/mholt/archives Topics: 7zip, archives, brotli, bzip2, compression, extract, fs, go, golang, gzip, lz4, lzip, rar, snappy, streams, tar, xz, zip, zlib, zstandard. Latest release: v0.1.5 (8mo ago). Open PRs: 2, open issues: 6. Last activity: 1mo ago. Community health: 57%. Top contributors: mholt, M0Rf30, sephriot, darkliquid, dpgarrick, dirkmueller, Gusted, solvingj, joonas, mikelolasagasti and others.