Last 12 weeks · 6 commits
3 of 6 standards met
When looking at hexdumps, I frequently find myself wishing for easier ways to distinguish between the different bytes that are just specified as in a normal hexdump. I previously experimented with using the Unicode Braille characters for that. When I tried hexyl, I was immediately captured by the idea to add color and thought that color gradients would also add ways to distinguish between different bytes. I thought an image would say more than a thousand words, so I just went ahead and implemented this instead of opening an issue first. I hope this is fine for you. Without further ado, here is what all bytes would look like with the options after this PR is merged: !all bytes hexdump I made several design decisions when implementing this: 1. Setting the colors and the character set are two separate command line options. (Though they go well together, so I will not distinguish specifically between them here. Which option does what is hopefully obvious.) 2. While different bytes should have different colors, there should still be different classes of bytes, so there are several gradients used. 3. The colors should roughly get "hotter" as the byte values increase. 4. should not be in the same class as the other ASCII-non-printable characters as it has a very different numerical value. Picking this color was difficult, since it had to somehow fit into the general region but still be different enough so as not to be easily confusable. 5. likely makes more sense in the category of printable ASCII than non-printable ASCII, since it's very likely to be frequently contained in normal text. 6. , , and are not really printable charaters, but still important enough to have their own symbols. 7. All other non-printable character are represented using Braille characters corresponding to their bits. Here are some further examples to hopefully showcase how this can help pattern recognition: UTF8: !UTF8 hexdump UTF16LE with BOM: !UTF16LE with BOM hexdump Randomness: !randomness hexdump $MFT entry: !$MFT entry hexdump hexyl binary (start): !hexyl binary hexdump Unresolved question: How should this interact with the option? Currently this option ignores the color scheme. One option would be to show a smaller variant of the gradients with just 3-4 colors there. Let me if you think anything should be adjusted. EDIT: Added different character for tabs and changed 0x7f color.
Repository: sharkdp/hexyl. Description: A command-line hex viewer Stars: 9988, Forks: 258. Primary language: Rust. Languages: Rust (100%). License: Apache-2.0. Topics: binary-data, command-line, hexadecimal, rust, tool. Latest release: v0.17.0 (2w ago). Open PRs: 4, open issues: 18. Last activity: 2w ago. Community health: 57%. Top contributors: sharkdp, sharifhsn, ErichDonGubler, merkrafter, RinHizakura, mkatychev, tommilligan, sorairolake, selfup, arnavb and others.
Rust
As I tried to use Hexyl on content using characters from the extended ASCII table (8-bits table, as opposed to the original 7-bits one), I was shown with representations. After some reading of the , I stumbled on this: default: Show printable ASCII characters as-is, '⋄' for NULL bytes, ' ' for space, '_' for other ASCII whitespace, '•' for other ASCII characters, and '×' for non-ASCII bytes However I'd tend to say many people around Europe reading this expect it to cover the clearly not recent (https://en.wikipedia.org/wiki/Extended_ASCII) extended ASCII table, using 8 bits and covering many of the most used latin characters in Europe and with usually a wide support. If for any reason it's preferred to stick to the original 7-bits table, I'd suggest to clarify it, unless it's considered ASCII is clear enough for most people and I'm wrong on that one. I can open a PR if needs be (although I'm not sure of how to word this yet). I'd be interested though as to why chose to not show any UTF character provided the font allows for it, and if you prefer to stick to 7-bits ASCII, why so? (While I can still see situations where Unicode would still not be well supported, I don't think that'd be the case for extended ASCII?)
When dumping things with large positions, which take more than 8 character to display, a few things get misaligned. This happens both when ing that far, or when just using a to fake the position higher. Working: Misaligned: Note just the 8 characters of dashes/spaces in the first column, instead of 10. Additionally, if you have a dump crossing the threshold, even with no borders or squeezing the alignment still gets thrown out.
As I'm trying to track down a bug in another software, I was looking for a way to get the hex representation of file names (handling unicode) and thought about Hexyl; unfortunately file name isn't part of the hex representation. Although niche usage, offer to show filename's hex representation could help too.