Lately I’ve been spending time on Mastodon for … reasons. Here’s my link: https://oldbytes.space/@blakespot. In my recently active time on the platform, I have found quite a few excellent retrocomputing-related posts by creative members of the community. One such post, by Dougall, links to his blog post entitled “Why is Rosetta 2 fast?“.
Rosetta 2 is the ahead-of-time compile translator that’s part of macOS Big Sur (and later) that, upon launch of an x64 Intel binary, translates it to 64-bit ARM code for execution on ARM-based Apple Silicon processors before execution. It is not a real-time emulator. It translates the entire binary — once — at launch time, making best-guess choices along the way. Dougall delves into various aspects of Rosetta 2 in an effort to explain why it is so performant; in many instances the translated binary runs faster on Apple Silicon than on the fastest Intel machines that Apple has ever released. It’s impressive.
It’s a fascinating read for a tech nerd like me that has a particular interest in OS technology. But one detail of the post really grabbed my attention.
Apple’s secret extension
There are only a handful of different instructions that account for 90% of all operations executed, and, near the top of that list are addition and subtraction. On ARM these can optionally set the four-bit NZVC register, whereas on x86 these always set six flag bits: CF, ZF, SF and OF (which correspond well-enough to NZVC), as well as PF (the parity flag) and AF (the adjust flag).
Emulating the last two in software is possible (and seems to be supported by Rosetta 2 for Linux), but can be rather expensive. Most software won’t notice if you get these wrong, but some software will. The Apple M1 has an undocumented extension that, when enabled, ensures instructions like ADDS, SUBS and CMP compute PF and AF and store them as bits 26 and 27 of NZCV respectively, providing accurate emulation with no performance penalty.
Intrigued by mention of this “secret extension,” I reached out to the author and asked if he could expand on what Apple has done here. And, he obliged. As he explained in his multi-part Mastodon response:
Continue reading