Resources

ARM64 vs ARM32 - What's different for Linux programmers

When ARM introduced 64-bit support to its architecture, it aimed for compatibility with prior 32-bit software. But for Linux programmers, there remain some significant differences that can affect code behaviour. Here are some we found and the workarounds we developed for them.

Undo developer, Isa Smith, explains these differences in this thought-provoking article for EDN. See the original article or read on for more.

I had originally planned to call this article “What’s NEW in ARMv8 for Linux Programmers?” However, I think “what’s different” is much more apt. And, just for the record, by “ARMv8-A” I mean AArch64, with the A64 instruction set, also known as arm64 or ARM64. I’ve used AArch64 registers in the examples, but many of the issues I’ve described also happen in the ARMv8-A 32-bit execution state.

To help frame the problems discussed here, let me start by giving a little background on the sort of codebase we have here at Undo. Our core technology is a record and replay engine, which works by recording all non-deterministic input to a program and uses just-in-time compilation (JIT) to keep track of the program state. Our technology started on x86 (32 and 64-bit) and had progressed to have fairly complete, maturing support on ARM 32-bit when we began adapting it to work on AArch64. I joined the company after almost all of the low hanging fruit had been grabbed (as well as many rather higher up the tree, to be fair) leaving us with some tricky problems to tackle when it came to moving to ARMv8.

This leads me to my first simple, but possibly helpful, observation: ARM64 is much more similar to ARM 32-bit (aka AArch32) than it is to x86. ARM64 is still quite RISC (though the cryptographic acceleration instructions do lead to raised eyebrows in a RISC architecture). So I don’t intend to try to cover the many differences between x86 and either ARM version. Nor do I want to rehash the differences between AArch32 and AArch64 -- there are already good resources to explore those differences.

Also, a lot of ARM versus ARM64 resources focus on the instruction set and architectural differences. These differences are not really relevant to most Linux user space application developers, beyond the very obvious, such as “your pointers are bigger." But, as we discovered, there are differences important to Linux user space developers, four of which I'll discuss here. These differences fall into several categories, some falling into more than one category. The categories are:

  • Differences due to migrating to use a fairly new kernel version.
  • Differences due to the architecture and instruction set (where this is relevant to user space programmers).
  • Ptrace differences. We use ptrace a lot, so this was very important to us.

I will try to use the following format in the next sections:

  • A brief explanation of the area.
  • What is the difference? Why is this different? (Sometimes it is easier to understand a change in behaviour by looking at a few assembly instructions than it is from a wordy description, so I'll provide that code.)
  • How did we encounter it?
  • How did we overcome it?
  • Where to find out more information.


1. Changes to ptrace

ptrace provides process tracing capabilities to user space programs.

There have been a number of changes to the requests accepted by ptrace(). These changes produce the most pleasant of all incompatibilities to analyse: compilation errors. Our error reports were for undefined symbols PTRACE_GETREGS (for general registers), PTRACE_GETFPREGS (for floating point and SIMD registers), and PTRACE_GETHBPREGS (for hardware breakpoint registers), as well as the SET versions of these requests.

The man page for ptrace was no help at all in resolving these errors, so we dug deeper. We had a look at the kernel source, and it turns out that usually there is an architecture-independent ptracecode path (ptrace_request() in kernel/ptrace.c), and separate architecture-dependent paths (e.g. arch_ptrace() in arch/arm/kernel/ptrace.c). Although the arm64 version has a compat_arch_ptrace for AArch32 applications, the arm64 arch_ptrace() directly calls ptrace_request() and does not add any additional ptrace request types.

The solution is to use PTRACE_GETREGSET and PTRACE_SETREGSET with various different arguments to read these registers.

Here is a table of the GETREGS-style request and the closest equivalent GETREGSET request. Different REGSETs are acquired through different arguments to addr ptrace() argument.

ARM 32-bit

AArch64

GETREGS NT_PRSTATUS
GETFPREGS NT_PRFREG
GETHPBREGS NT_ARM_HW_BREAK
NT_ARM_HW_WATCH

Table 1. ARM 32-bit and closest equivalent AArch64 ptrace requests.

Note that NT_ARM_HW_BREAK and NT_ARM_HW_WATCH behave identically in a GETREGSET request.

Using GETREGSET is not as simple as using GETREGS, though. For a GETREGS request like this:

ptrace(PTRACE_GETREGS, 0, 0, regs);

GETREGSET would look like this:

struct
{
    void*   buf;
    size_t  len;
} my_iovec = { regs, sizeof(*regs)};

Note, too, that I have said “the closest equivalent GETREGSET request.” Naturally, the AArch64 register set is different from the ARM 32-bit one, but there are more differences between the two beyond the register set.

Figure 1 shows a diagram of the registers returned from an ARM 32-bit GETREGS and AArch64 GETREGSET instruction.

Getregs Getregset

Figure 1. GETREGS and GETREGSET.

Those familiar with AArch64 may notice that with GETREGSET we’ve been given a “cpsr” register, yet the hardware architecture does not have one. What's returned with GETREGSET has been synthesised into a cpsr-like layout from the individually accessible fields on AArch64. 

A more notable difference between the two is the lack of orig_r0 (or orig_x0) for GETREGSET. This lack has to do with syscalls. On ARM 32-bit, a syscall number gets placed in r7 and the syscall arguments are placed in the argument registers r0-r3 prior to a syscall(SVC) instruction. The value returned from the syscall is located in r0 (as per the usual APCS, r7 in exceptional circumstances). After the kernel returns from the syscall, orig_r0 provides the original first argument to the syscall(which had been overwritten by the return value).

I actually don’t know what use a “normal” application is supposed to make of this original first argument. We use it for our support of restart_syscall, where the return value is ERESTART_RESTARTBLOCK.

Unfortunately the lack of orig_x0 is a problem for us that we have yet to resolve in all circumstances. If we have recorded the entry to the syscall, then we have all the information we need. However, if we have attached during a restart_syscall, then we do not know the original value of x0. Our only option is to allow the kernel to restart the syscall, but this restart is inefficient for us as we can’t optimise the recording of the syscall.

Returning to the subject of GETREGS versus GETREGSET: GETHBPREGS and NT_ARM_HW_BREAK are also significantly different. For a GETHBPREGS request, you use the addr field in the ptracecall to request a particular hardware breakpoint register. NT_ARM_HW_BREAK returns allhardware breakpoint registers.

The best place to look for more information on these ptrace differences is to examine the AArch64 ptrace source file: arch/arm64/kernel/ptrace.c