4 of 6 standards met
On AArch64, non-smashable calls to the runtime (movz/movk/blr sequences) are significantly more expensive than on x86 due to instruction encoding overhead. This patch reorders functions during retranslateAll so that functions with the highest runtime call density (weighted by profile execution counts) are placed first in code.hot. The idea is that clustering call-heavy functions near the start of the TC improves icache locality for runtime call sequences. Controlled by two new config options: Eval.JitRuntimeCallReorder (bool, default: true, AArch64 only) Eval.JitRuntimeCallReorderLimitMB (uint32, default: 64) Diagnostic output available via TRACE=mcg:6.
Summary: Idea is to use instruction MATCH right after loading the tags. MATCH checks if any byte loaded from memory is equal to the needle, setting flags accordingly. We can use it to quickly branch if no byte is equal to the needle. The emitted asm looks like this: 2c3c70: a40e4141 ld1b {z1.b}, p0/z, [x10, x14] 2c3c74: 45208021 match p1.b, p0/z, z1.b, z0.b 2c3c78: 540001e0 b.eq 2c3cb4 // b.none 2c3c7c: 6e208c21 cmeq v1.16b, v1.16b, v0.16b 2c3c80: 910041ae add x14, x13, #0x10 2c3c84: 05800701 and z1.b, z1.b, #0x11 2c3c88: 0f0c8421 shrn v1.8b, v1.8h, #4 ... Instruction cmeq is likely to be executed speculatevely alongside match. I also tried using the output of match in a broadcast instruction: svdup_n_u8_z(outPred, 17); The dup reoplaces the cmeq+and, but it still showed equal or slower than cmeq+and, further suggesting that the cmeq will execute alongside match. I'll ask ARM engineers if they recommend any sequence between the two. In newer CPUs implementing SVE2.1, we will be able to move the predicate into a simd register. Maybe code layout can be improved to be less ugly. Benchmark shows ~10% reduction in find latency: Before: Find f14node[11] 28.08ns 35.61M Find f14val[11] 98.784% 28.43ns 35.18M Find f14vec[11] 101.61% 27.64ns 36.18M Find f14node[11] 28.07ns 35.62M Find f14val[11] 98.209% 28.58ns 34.99M Find f14vec[11] 101.54% 27.65ns 36.17M Find f14node[11] 28.07ns 35.62M Find f14val[11] 97.935% 28.66ns 34.89M Find f14vec[11] 101.42% 27.68ns 36.13M Find f14node[11] 28.05ns 35.65M Find f14val[11] 97.259% 28.84ns 34.67M Find f14vec[11] 100.40% 27.94ns 35.79M After: Find f14node[11] 25.81ns 38.75M Find f14val[11] 99.176% 26.02ns 38.43M Find f14vec[11] 100.04% 25.80ns 38.76M Find f14node[11] 25.81ns 38.75M Find f14val[11] 99.176% 26.02ns 38.43M Find f14vec[11] 100.02% 25.80ns 38.76M Find f14node[11] 26.33ns 37.98M Find f14val[11] 101.23% 26.01ns 38.45M Find f14vec[11] 95.719% 27.50ns 36.36M Find f14node[11] 26.36ns 37.93M Find f14val[11] 101.80% 25.90ns 38.62M Find f14vec[11] 96.690% 27.26ns 36.68M Improvement is likely to be higher on dense maps, while lower on sparse maps without the tags in cache Reviewed By: yfeldblum Differential Revision: D93997423
Repository: facebook/hhvm. Description: A virtual machine for executing programs written in Hack. Stars: 18605, Forks: 3075. Primary language: C++. Languages: C++ (44.1%), Hack (30.1%), OCaml (12%), Rust (9.5%), Python (1.5%). Homepage: https://hhvm.com Topics: hack, hacklang, hhvm, php. Latest release: HHVM-3.15.0 (9y ago). Open PRs: 100, open issues: 425. Last activity: 2h ago. Community health: 87%. Top contributors: edwinsmith, jdelong, oulgen, jano, ljw1004, yfeldblum, ricklavoie, vassilmladenov, fredemmott, ptarjan and others.
C++
Last 12 weeks ยท 2261 commits
Describe the bug A clear and concise description of what the bug is. Standalone code, or other way to reproduce the problem This should not depend on installing any libraries or frameworks. Ideally, it should be possible to copy-paste this into a single file and reproduce the problem by running hhvm and/or hh_client Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error Expected behavior A clear and concise description of what you expected to happen. Actual behavior Copy-paste output, or add a screenshot to illustrate what actually happens. Copy-pasted text output (e.g. from or ) is preferred to screenshots. Environment Operating system For example, 'Debian Squeeze', 'Ubuntu 18.04', 'MacOS Catalina'. Installation method For example, 'built from source', 'apt-get with dl.hhvm.com repository', 'hhvm/hhvm on dockerhub', 'homebrew' HHVM Version Please include the output of and Additional context Add any other context about the problem here.