Using x86_64 hardware to compile applications for ARM

 Nowadays, everyone has heard about IoT, or Edge computig but this is not something new. For a long time Embedded systems/devices have been around but their use was not so extense as it is these days.

IoT, Edge computing and Embedded systems have in common that all of them are "computers" running inside of a lot of different devices such as TV, cars, etc

Embedded systems/devices did not mean to use an Operating System to run the device as the one we use in our laptops, sometimes they used specific Operating Systems. IoT and Edge computing have popularized, even more, the use of Linux in a lot of devices standarizing the Operating System, allowing to lower costs.

These devices do not use x86_64 processors or at least they are not limited to use only these processors. As the hardware architecture is different from our laptops the binaries used in these devices are different as well. You are probably thinking in using a device running the same processor to build and compile the binaries needed, the whole Operating System and all the utilities/programs needed. This is an option, the obvious one. The problem using this approach is that these devices do not have usually much CPU power or memory to be efficient in compilation tasks. When we talk about IoT, Edge or Embedded systems they are normally small or reduced devices which means low CPU, power or memory.

If you own a Raspberry Pi you can try to compile some piece of software, like a kernel or a python module and after that try to do the same in your laptop. Check the times.

To solve this problem we can use a different hardware architecture to compile software for a different one. We use a computer with more resources, CPU and memory, to compile the binaries. This computer does not need to use the same processor family that the target architecture. This is named cross-compilation.

A cross-compiler is a compiler which is able to generate code from a different CPU. For instance, using a cross-compiler in an x86_64 computer to create a binary for ARM processors.

You will listen about a toolchain. Too create a binary you will need more than a compiler:

  • A linker, to link libraries.
  • Libraries to be linked.
  • Maybe other tools

A toolchain is a set of compiler, linker, libraries and tools needed to create a binary.

In some Linux distributions cross-compilers are provided, so we can use them. But if in our favorite Linux distribution a cross-compiler is not provided? Well, in this case we can create a virtual machine to install a Linux distribution with its cross-compiler. That's a good solution, but we will need some resources to create and run the virtual machine.

A better one is using containers.

We need to install podman and buildah in our laptop/computer running Linux. You can also use docker, but I will use podman and buildah. You can check how to install podman and buildah.

So we are going to create a container image to compile software for ARM64 using a x86_64 computer.

The first thing we need to do is to create a container image including the cross compiler. To create a this image based on Debian we have the following Dockerfile:

FROM debian:stable
RUN apt update -y && apt install -y gcc make gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu libgdm-dev && mkdir /srv/src
WORKDIR /srv/src

So we need to create the container image:

$ buildah bud -f Dockerfile -t debian-cross-compiling

We can see the container image after a while:

$ podman images
REPOSITORY                          TAG      IMAGE ID      CREATED         SIZE
localhost/debian-cross-compiling    latest   02851a52ad6b  54 seconds ago  862 MB
docker.io/library/debian            stable   fd388d9cf0ba  4 days ago      129 MB
$

This container image is a x86_64 image not an ARM64. In this image a toolchain for ARM64 has been installed.

As we need our source code to be available for the container we will map the directory where the source code is stored to the containers' /srv/src directory. Instead of using gcc as the compiler the cross-compiler needs to be used so aarch64-linux-gnu-gcc will be used as compiler.

A simple C program will be compiled:

#include <stdio.h>

int main(void) {
  printf("Hello world!!\n");
  return(0);
}

To create the ARM binary/executable:

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/hello-world:/srv/src localhost/debian-cross-compiling aarch64-linux-gnu-gcc hello.c -o hello-aarch64 -static
$ ls -lh /home/jadebustos/working/ache4bits/cross-compiling/hello-world
total 16K
-rwxr-xr-x 1 jadebustos jadebustos 9.2K May 15 23:10 hello-aarch64
-rw-r--r-- 1 jadebustos jadebustos 81 May 15 23:08 hello.c
$

If we try to run the ARM binary/excutable in the x86_64 computer:

$ /home/jadebustos/working/ache4bits/cross-compiling/hello-world/hello-aarch64
bash: ./hello-aarch64: cannot execute binary file: Exec format error
$

We can compile it for the x86_64:

$ gcc hello.c -o hello-x86_64
$ file /home/jadebustos/working/ache4bits/cross-compiling/hello-world/hello-x86_64
hello-x86_64: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f466c44b7c1545650fb4b61e312f8264a7e9d569, for GNU/Linux 3.2.0, not stripped
$ file /home/jadebustos/working/ache4bits/cross-compiling/hello-world/hello-aarch64
hello-aarch64: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=5884fec249c5efe64a3da7fc6aa670705aac8548, for GNU/Linux 3.7.0, not stripped
$

As we can see be have two different binaries/executables. One for each platform. We can copy the hello-aarch64 to an ARM64 system such a Raspberry PI 4 and successfully execute it.

Quite easy, huh?

You can start cross-compiling all you want!!!

Unfortunatelly, reality is different.

Compiling simple programs such as the above hello world is quite simple, but when we have to link external libraries problems could arise.

We are going to cross-compile another C program, pi-mid-point-rule.c. This program computes a PI approximation using the mid point rule to approximate an integral. This program uses the OpenMP library for parallel computing

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling aarch64-linux-gnu-gcc pi-mid-point-rule.c -o pi-mid-point-rule_aarch64 -fopenmp -static
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/bin/ld: /usr/lib/gcc-cross/aarch64-linux-gnu/10/libgomp.a(oacc-profiling.o): in function `goacc_profiling_initialize': (.text+0x7f8): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
$ ls -lh /home/jadebustos/working/ache4bits/cross-compiling/mid-point/
total 1.3M
-rw-r--r-- 1 jadebustos jadebustos 175 May 16 00:03 ChangeLog
-rw-r--r-- 1 jadebustos jadebustos 4.9K May 16 00:03 chudnovsky-gmp.c
-rw-r--r-- 1 jadebustos jadebustos 7.6K May 16 00:03 chudnovsky-gmp-omp.c
-rw-r--r-- 1 jadebustos jadebustos 459 May 16 00:03 Makefile
-rw-r--r-- 1 jadebustos jadebustos 676 May 16 00:03 mytime.c
-rw-r--r-- 1 jadebustos jadebustos 572 May 16 00:03 mytime.h
-rw-r--r-- 1 jadebustos jadebustos 1.9K May 16 00:03 pi2txt.c
-rwxr-xr-x 1 jadebustos jadebustos 1.2M May 16 00:13 pi-mid-point-rule_aarch64
-rw-r--r-- 1 jadebustos jadebustos 2.3K May 16 00:03 pi-mid-point-rule.c
-rw-r--r-- 1 jadebustos jadebustos 3.0K May 16 00:03 README.md
$

We can see a warning after cross-compilation:

`goacc_profiling_initialize': (.text+0x7f8): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking.

As we can see although the binary/excutable was statically linked there are some requirements to fullfil in the ARM device where we would like to run this executable. These requirements are related to the glibc version used in the container for linking. This means that we could face problems when running these kind of binaries no matter they have been linked statically.

So the target ARM systems where the binary/executable will be executed will have some common libraries with the cross-compiling environment.

As we can see the cross-compiled binary can be executed in an ARM64 processor:

[jadebustos@fedora mid-point]$ ./pi-mid-point-rule_aarch64 4 1
pi: 3.14159265312413195088837109134519589492087924293966704649112990043434163328628372
[jadebustos@fedora mid-point]$ cat /etc/fedora-release
Fedora release 36 (Thirty Six)
[jadebustos@fedora mid-point]$ uname -m
aarch64
[jadebustos@fedora mid-point]$ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A72
Model: 3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r0p3
BogoMIPS: 108.00
Flags: fp asimd evtstrm crc32 cpuid
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 192 KiB (4 instances)
L2: 1 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Not affected
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Mitigation; Branch predictor hardening, BHB
Srbds: Not affected
Tsx async abort: Not affected
[jadebustos@fedora mid-point]$

If we face problems running the cross-compiled binary due to library dependences on the target ARM64 system we can try to execute the binary inside a container. We can also create an ARM64 container based in the same Linux distribution used for cross-compiling it and run the binary in that way.

These libraries issues are something we will have to take into account when cross-compiling, although depending on the source code we need to compile this could result in a real pain in the ass we could hit another problem.

Now we are going to compile the C program chudnovsky-gmp-omp.c. This program computes PI digits using the Chudnovsky algorithm using OpenMP for parallel computing and the GNU Multiple Precision Arithmetic Library (GMP):

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling aarch64-linux-gnu-gcc -c mytime.c
$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling aarch64-linux-gnu-gcc chudnovsky-gmp-omp.c -o chudnovsky-gmp-omp_aarch64 mytime.o -lgmp -fopenmp -static
chudnovsky-gmp-omp.c:19:10: fatal error: gmp.h: No such file or directory
  19 | #include <gmp.h>
     |           ^~~~~~~
compilation terminated.
$ ls -lh /home/jadebustos/working/ache4bits/cross-compiling/mid-point/
total 1,3M
-rw-r--r-- 1 jadebustos jadebustos 175 may 16 00:03 ChangeLog
-rw-r--r-- 1 jadebustos jadebustos 4,9K may 16 00:03 chudnovsky-gmp.c
-rw-r--r-- 1 jadebustos jadebustos 7,6K may 16 00:03 chudnovsky-gmp-omp.c
-rw-r--r-- 1 jadebustos jadebustos 459 may 16 00:03 Makefile
-rw-r--r-- 1 jadebustos jadebustos 676 may 16 00:03 mytime.c
-rw-r--r-- 1 jadebustos jadebustos 572 may 16 00:03 mytime.h
-rw-r--r-- 1 jadebustos jadebustos 1,5K may 16 23:51 mytime.o
-rw-r--r-- 1 jadebustos jadebustos 1,9K may 16 00:03 pi2txt.c
-rwxr-xr-x 1 jadebustos jadebustos 1,2M may 16 00:13 pi-mid-point-rule_aarch64
-rw-r--r-- 1 jadebustos jadebustos 2,3K may 16 00:03 pi-mid-point-rule.c
-rw-r--r-- 1 jadebustos jadebustos 3,0K may 16 00:03 README.md
$

If we look for the missing header file inside the container:

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling find / -name gmp.h
/usr/include/x86_64-linux-gnu/gmp.h
find: '/proc/tty/driver': Permission denied
$

If the header file is present, what's the problem? The problem is that the header file and the library to be linked is present in the container for the x86_64 architecture and it is provided by the libgmp-dev package.

As we were able to cross-compile a C program linked with the OpenMP library we are going to look for the header file for the OpenMP library:

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling find / -name omp.h
/usr/lib/gcc/x86_64-linux-gnu/10/include/omp.h
/usr/lib/gcc-cross/aarch64-linux-gnu/10/include/omp.h
find: '/proc/tty/driver': Permission denied
$

We can see that there are two omp.h header files. The path indicates that one is for the architecture where the container is running, x86_64, and the other for the architecture we are compiling, aarch64.

As we need to link libraries for the aarch64 architecture we need to have the libraries for the aarch64 architecture to be linked. These libraries are present in the toolchain for the target architecture. We can check the packages which provide each file:

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling dpkg -S /usr/lib/gcc/x86_64-linux-gnu/10/include/omp.h
libgcc-10-dev:amd64: /usr/lib/gcc/x86_64-linux-gnu/10/include/omp.h
$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling dpkg -S /usr/lib/gcc-cross/aarch64-linux-gnu/10/include/omp.h
libgcc-10-dev-arm64-cross: /usr/lib/gcc-cross/aarch64-linux-gnu/10/include/omp.h
$

Libraries to be linked for the aarch64 architecture are provided by packages ending in -dev-arm64-cross (in Debian). The problem we had with th GMP library is that library is not provided in the toolchain for the aarch64 architecture. We can see that the header file is provided not by the toolchain:

$ podman run --rm -it -v /home/jadebustos/working/ache4bits/cross-compiling/mid-point/:/srv/src localhost/debian-cross-compiling dpkg -S /usr/include/x86_64-linux-gnu/gmp.h
libgmp-dev:amd64: /usr/include/x86_64-linux-gnu/gmp.h
$

So in this case we will need to cross-compile also the GMP library with all its dependencies. Not so funny, huh?

Cross-compiling is a incredibly good tool for developers, but it presents some challenges. Although all the examples in this post use static linking it is possible to create binaries/executables using dynamic linking too.

We have used containers to cross-compile software and in the same way we can use a container to deploy an execute the application.

 Running applications in containers will help us to avoid dependencies problems so although in this post we have not executed the ARM binaries/executables using containers we can use them to distribute applications without worrying about the underlying Operanting System and decoupling Operating System updates and application updates.

Comments

Popular Posts