Ponyhof

Towards Stable Rust UEFI Firmware

2022-09-07T00:00:00+00:00

While Tianocore EDKII still dominates the UEFI development world, there has been continuous effort to enable Rust for firmware development. But so far the tools involved have not been stabilised. We now started an effort to remedy this and get stable Rust support for UEFI targets.

The rust compiler has gained support for multiple UEFI targets in the past, namely:

aarch64-unknown-uefi @Tier-3
i686-unknown-uefi @Tier-3
x86_64-unknown-uefi @Tier-3

This allows building Rust UEFI Applications with a standard compiler by simply passing --target <arch>-unknown-uefi to cargo or rustc. Unfortunately, Tier-3 support means no compiler builds are distributed via the Rust release channels, nor does the Rust-CI guarantee the targets build successfully. Moreover, this implies that a nightly/unstable compiler is required to build for those targets, even though no nightly Rust Language features are required.

Raising support of these targets to Tier-2 will include automatic toolchain builds distributed via Rust release channels. Hence, no nightly/unstable compiler is required, anymore. Automatic CI builds will guarantee the targets build successfully and do not randomly break. This will greatly improve the trust in the platform and significantly enhance the developer experience.

Rust support for UEFI has been documented in the rustc-book section UEFI Platform Support. You can follow and support the Major Change Proposal (MCP) to raise support to Tier-2 on the Rust Compiler-Team Tracker.

Meson with MSVC on GitHub Actions

2021-04-21T00:00:00+00:00

The Meson Build System provides support for running on Microsoft Windows, including support for Microsoft Visual Studio C++. GitHub Actions provides public access to CI machines running Microsoft Windows. But trying to tie both together is not as straightforward as it sounds.

Sometimes you stumble over a task you never thought you have to deal with. This story is about one of those times. In particular, I was faced with running CI tests for a simple C library on Microsoft Visual Studio C++ (MSVC). Gladly, GitHub already provides simple access to machines running Microsoft Windows Server 2016 and 2019, so this sounded like a straightforward task. Unfortunately, my infinite ignorance of anything Windows made this harder than it should have been.

The root of this problem is that the Meson Build System needs to run in the MSVC Developer Shell. This shell has all the necessary environment variables prepared for a particular install of MSVC. Since you can have multiple versions installed in parallel, Meson cannot know which install to use if run outside of such a shell. Unfortunately, GitHub Actions has no simple way to enter this shell. Therefore, running Meson on GitHub Actions will end up using GCC rather than MSVC, since this is what it detects by default in the GitHub Actions Environment. This is not what we wanted, so adjustments are needed.

Luckily, Microsoft provides a tool called vswhere which finds MSVC installs on a Windows system. We can use this to find the required setup scripts and then import the environment variables into our GitHub Actions setup. This tool is pre-deployed on GitHub Actions, so we can simply invoke it to find a suitable MSVC install. From there on, we look for DevShell.dll, which provides the required integration. We load it into PowerShell and invoke the provided Enter-VsDevShell function. By comparing our own environment variables before and after that call, we can extract the changes and export them into the GitHub Actions environment. Thus, the following workflow-steps will have access to those variables as well.

I plugged this into a re-usable GitHub Action using the new composite type. To use it in a GitHub Actions workflow, simply use:

- name: Prepare MSVC
  uses: bus1/cabuild/action/msdevshell@v1
  with:
    architecture: x64

This queries the MSVC environment and exports it to your GitHub Actions job. Following steps will thus run as if in an MSVC Developer Shell. A full example is appended at the bottom, which shows how to get Meson to compile and test a project on MSVC for both Windows Server 2016 and 2019.

If you rather import the code into your own project, you can find it on GitHub. Note that this uses PowerShell syntax, so it might look alien to linux developers.

While this is only roughly 50 lines of PowerShell scripting, it still feels a bit too hacky. The Meson developers are aware of this, but so far no patches have found their way upstream. Lets hope that this workaround will one day be obsolete and Meson invokes vswhere itself.

Following a full example workflow:

name: Continuous Integration

on: [push, pull_request]

jobs:
  ci-msvc:
    name: CI with MSVC
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [windows-2016, windows-latest]

    steps:
    - name: Fetch Sources
      uses: actions/checkout@v2
    - name: Setup Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.x'
    - name: Install Python Dependencies
      run: pip install meson ninja
    - name: Prepare MSVC
      uses: bus1/cabuild/action/msdevshell@v1
      with:
        architecture: x64
    - name: Prepare Build
      run: meson setup build
    - name: Run Build
      run: meson compile -v -C build
    - name: Run Test Suite
      run: meson test -v -C build

Locating D-Bus Resource Leaks

2021-04-14T00:00:00+00:00

With dbus-broker we have introduced the resource-accounting of bus1 into the D-Bus world. We believe it greatly improves and strengthens the resource distribution of the D-Bus messages bus, and we have already found a handful of resource leaks that way. However, it can be a daunting task to solve resource exhaustion bugs, so I decided to describe the steps we took to resolve a recent resource-leak in the openQA package.

A few days ago, Adam Williamson approached me ¹ with a bug in the openQA package, where he saw the log stream filled with messages like:

dbus-broker[<pid>]: Peer :1.<id> is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
dbus-broker[<pid>]: UID <uid> exceeded its 'bytes' quota on UID <uid>.

This is the typical sign of a resource exhaustion in dbus-broker. When the message broker generates or forwards messages to an individual client, it will queue them as outgoing-messages and push them into the unix-socket of the client. If this client does not dequeue messages, this queue might fill up. If a limit is reached, something needs to be done. Since D-Bus is not a lossy protocol, dropping messages is not an option. Instead, the message broker will either refuse new incoming operations or disconnect a client. All resources are accounted on UIDs, this means multiple clients of the same user will share the same resource limits.

Depending on what message is sent, it is accounted either on the receiver or sender. Furthermore, some messages can be refused by the broker, others cannot. The exact rules are described in the wiki ².

In the case of openQA, the first step was to query the accounting information of the running message broker:

sudo dbus-send --system --dest=org.freedesktop.DBus --type=method_call --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetStats

(Replace --system with --session to query the session or user bus.)

While preferably this query is performed when the resource exhaustion happens, it will often yield useful information under normal operation as well. Resources are often consumed slowly, so the accumulation will still show up.

The output ³ of this query shows a list of all D-Bus clients with their accounting information. Furthermore, it lists all UIDs that have clients connected to this message bus, again with all accounting information. The challenge is to find suspicious entries in this huge data dump. The most promising solution so far was to search for "OutgoingBytes" and check for big numbers. This shows the number of bytes queued in the message broker for a particular client. It is usually 0, since the kernel queues are big enough to hold most normal messages. Even if it is not 0, it is usually just a couple of KiB.

In this case, we checked for "OutgoingBytes", and found:

dict entry(
    string "OutgoingBytes"
    uint32 62173024
)

62 MiB of messages are waiting to be delivered to that client. Expanding the logs to show the surrounding block, we see:

struct {
    string ":1.211366"
    array [
        dict entry(
            string "UnixUserID"
            variant                            uint32 991
        )
        dict entry(
            string "ProcessID"
            variant                            uint32 674968
        )
        [...]
    ]

    array [
        [...]
        dict entry(
            string "Matches"
            uint32 1
        )
        [...]
        dict entry(
            string "OutgoingBytes"
            uint32 62173024
        )
        [...]
    ]
}

This tells us the PID 674968 of user 991 has roughly 62 MiB of data queued, and it is likely not dequeuing the data. Furthermore, we see it has 1 message filter (D-Bus match rule) installed. D-Bus message filters will cause matching D-Bus signals to be delivered to a client. So a likely problem is that this client keeps receiving signals, but does not dispatch its client socket.

We digged further, and the data dump includes more such clients. Matching back the PIDs to processes via ps auxf, we found that each and every of those suspicious entries was /usr/bin/isotovideo: backend. The code of this process is part of the os-autoinst repository, in this case qemu.pm. A quick look showed only a single use of D-Bus ⁴. At a first glance, this looks alright. It creates a system-bus connection via the Net::DBus perl module, dispatches a method-call, and returns the result. However, we know this process has a match-rule installed (assuming the dbus-broker logs are correct), so we checked further and found that the Net::DBus module always installs a match-rule on NameOwnerChanged. Furthermore, it caches the system-bus connection in a global variable, sharing it across users in the same code-base.

Long story short, the os-autoinst qemu module created a D-Bus connection which was idle in the background and never dispatched by any code. However, the connection has a match-rule installed, and the message broker kept sending matching signals to that connection. This data accumulated and eventually exceeded the resource quota of that client. A workaround was quickly provided, and it will hopefully resolve this problem ⁵.

Hopefully, this short recap will be helpful to debug other similar situations. You are always welcome to message us on bus1-devel@googlegroups or on the dbus-broker GitHub issue tracker if you need help.

Inside Specs: ELF Segments and Sections

2020-04-26T00:00:00+00:00

The ELF data format divides object files into segments and sections, which has for long caused confusion. Both terms segment and section can be used interchangeably in almost all cases in the English language ([1], [2]). What is often overlooked is that the ELF specification explicitely meant both to mean almost the same. They merely provide two views of the same data, but use different terms to allow referring to them more easily.

When we look at the defining specification (gABI: System V Application Binary Interface) we find this quote in the introduction:

Object files participate in program linking (building a program) and program execution (running a program). For convenience and efficiency, the object file format provides parallel views of a file’s contents, reflecting the differing needs of those activities.

This is, in my opinion, a crucial detail often overlooked. The ELF data format explicitly provides two views of the same data. The difference between segments and sections is thus not what data they contain, but how they index the same data. The specification goes a step further:

A program header table tells the system how to create a process image. Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one.

A section header table contains information describing the file’s sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, and so on. Files used during linking must have a section header table; other object files may or may not have one.

Keep in mind that the program header table is effectively a segment header table. Therefore, the specification explicitly says that these two data views do not have to be present in a specific file. Depending on the use case, the format allows for only segments or only sections.

To summarize, an ELF object file contains data and machine code of a program, which itself is divided into many parts. The ELF format then provides two different views of this same content: segments and sections. However, these are views of the data present in the file, they do not define the content, but merely index it.

As a closing note, we must acknowledge how all this evolved over time, though. While the ELF specification provides this neat dual-view, a lot of this freedom is not actually used in most ELF files. Instead, most files are effectively split into many small sections, and the segments merely provide a grouping of sequential sections in the file. Sections have become the tool that drives the data in ELF files, and segments have become a view of that data. But this was a purely artifical interpretation and is not rooted in the ELF data format.

Fair Resource Distribution Algorithm v1

2019-03-13T00:00:00+00:00

Imagine a finite resource that you want to distribute amongst peers in a fair manner. If you know the number of peers to be n, the problem becomes trivial and you can assign every peer 1/n-th of the total. This way every peer gets the same amount, while no part of the resource stays unused. But what if the number of peers is only known retrospectively? That is, how many resources do you grant a peer if you do not know whether there are more peers or not? How do you define “fairness”? And how do you make sure as little of the resource as possible stays unused?

The fairdist algorithm provides one possible solution to this problem. It defines how many resources a new peer is assigned, considering the following propertis:

The total amount of resources already distributed to other peers. This is also referred to by the term consumption.
The number of peers that already got resources assigned.
The amount of resources remaining. That is, the resources that are remaining to be distributed. This is also referred to by the term reserve.

The following is a mathematical proof of the properties of the fairdist algorithm. For the reference implementation of the algorithm and information on the different applications of it, see the r-fairdist project.

Prerequisites

We define a set of symbols up front, to keep the proofs shorter. Whenever these symbols are mentioned, the following definition applies:

Let $c \in \mathbb{R}_{\geq 0}$ be a total amount of consumed resources.
Let $r \in \mathbb{R}_{\geq 0}$ be a total amount of reserved resources.
Let $n \in \mathbb{N}_0$ be a number of peers that consumed resources.
Let $A: \mathbb{N}_0 \to \mathbb{R}_{>0}$ be a function that computes the proportion of $r$ a peer can consume, based on the number of peers $n$ that currently have resources consumed.
Let $G: \mathbb{N}_0 \to \mathbb{R}_{>0}$ be a function that computes the proportion of $c + r$ a peer is guaranteed, based on the number of peers $n$ that currently have resources consumed.

The algorithm considers a total amount of resources, but splits it into two separate parts, the remaining reserve $r$ and the consumed part $c$. Their sum represents the total amount that was initially available. It then declares a function $A$, which is the resource allocator. It will later on be used to calculate how many resources of the reserve a peer can allocate: $\frac{r}{A(n)}$. That is, $A$ defines the proportion of the reserve a new peer gets access to. The smaller it is, the more a peer gets.

Similarly, the guarantee $G$ is used to declare a lower bound of the total resources the allocator $A$ grants a new peer. That is, while $A$ is a function applied to allocations, $G$ is a property the allocator will guarantee you. Unlike an allocation, $G$ will later on be calculated based on the total amount of resources: $\frac{c + r}{G(n)}$. Again, the function defines the proportion that is guaranteed. So the smaller $G$ is, the stronger the guarantee becomes.

Definition

The allocator $A$ is said to guarantee the limit $G$, if there exists a function $R: \mathbb{N}_+ \to \mathbb{R}_{>0}$ so that for all $c$, $r$, and $n$:

\[r \geq \frac{c}{R(n)}\]

implies both:

\[\frac{r}{A(n)} \geq \frac{c+r}{G(n)}\]
\[r - \frac{r}{A(n)} \geq \frac{c + \frac{r}{A(n)}}{R(n+1)}\]

The idea here is to define a function $A$ which calculates how many resources a new peer can allocate. That is, considering a new peer requests resources, it will get $\frac{1}{A(n)}$ of the reserve. The first property of this implication guarantees that this allocation is bigger than, or equal to, the guaranteed total for each peer. The guaranteed total is calculated through $G$ based on the total amount of resources (which is the consumption plus the reserve).

If you now pick an allocator $A$ and a guarantee $G$ that fulfil this definition, the idea is that this ensures you that the allocator $A$ can be used to serve resource requests from new peers, and it ensures that regardless of how many peers will request resources, each one will be guaranteed an amount equal to, or bigger than, the guarantee $G$.

This definition requires the existance of a reserve watermark $R$. It uses this watermark as a selector for an inductive step. That is, if the requirements of this reserve selector are true, the second implication guarantees that they are true for an infinite number of following allocations. That is, the right hand side of the second implication matches exactly the requirement of the implication, once a single allocation was performed (i.e., a resource chunk was subtracted from the reserve and added to the total consumption, while the number of peers increased by one).

Note that if $c$ is $0$, then the requirement of the implication is true for all $r$ and $R$. This guarantees that there is always a situation where allocator $A$ can actually be applied.

Lemma 1

To prove an allocator $A$ guarantees $G$, it is sufficient to show that $R$ fulfils:

\[R(n) \leq \frac{G(n)}{A(n)} - 1\]

and

\[A(n) \geq \frac{1+R(n+1)}{R(n+1)-R(n)}\]

This lemma is used to make it easier to prove a specific allocator guarantees a specific limit. Without it, each proof of the different allocators would have to replicate it.

However, this lemma also gives a better feeling of what the different functions actually mean. For instance, it clearly shows $A$ must always be smaller than $G$, and that by a considerable amount. If $A = G$, then no $R$ would ever fulfil this requirement (remember: $R(n) > 0$). At the same time, you can see the closer $A$ and $G$ are together, the smaller $R$ gets, and as such the requirements on the reserve get harder to fulfil.

The second requirement gives you a recursive equation to find an $R$ for any allocator you pick. Hence, in combination both these requirements show you an iterative process to find $A$ and $R$, for any guarantee $G$ you pick. However, the closer $A$ and $G$ get, the harder it becomes to solve the recursive equation.

Proof

To show this lemma is true, we must show both implications of the definition are true. As first step, we show the first implication is true, which is:

\[r \geq \frac{c}{R(n)} \implies \frac{r}{A(n)} \geq \frac{c+r}{G(n)}\]

We show this b starting with the left-hand side and showing it implies the right hand side, using the requisite of this lemma.

\[\begin{align} r &\geq_{req} \frac{c}{R(n)}\\[8pt] R(n) &\geq \frac{c}{r}\\[8pt] \frac{G(n)}{A(n)} - 1 \geq_{req} R(n) &\geq \frac{c}{r}\\[8pt] \frac{G(n)}{A(n)} - 1 &\geq \frac{c}{r}\\[8pt] \frac{r G(n)}{A(n)} - r &\geq c\\[8pt] \frac{r G(n)}{A(n)} &\geq c+r\\[8pt] \frac{r}{A(n)} &\geq \frac{c+r}{G(n)}\\[8pt] \end{align}\]

As second step, we need to show the second implication of the definition is true, which is:

\[r \geq \frac{c}{R(n)} \implies r - \frac{r}{A(n)} \geq \frac{c + \frac{r}{A(n)}}{R(n+1)}\]

To prove this, we start with the second requisite of this lemma and then show it implies the right-hand side of the implication, using the requisite of the implication.

\[\begin{align} A(n) &\geq_{req} \frac{1+R(n+1)}{R(n+1)-R(n)}\\[16pt] A(n)R(n+1) - A(n)R(n) &\geq 1+R(n+1)\\[16pt] -A(n)R(n) &\geq 1+R(n+1) - A(n)R(n+1)\\[16pt] A(n)R(n) &\leq A(n)R(n+1)-R(n+1)-1\\[16pt] R(n) &\leq R(n+1) - \frac{R(n+1)}{A(n)} - \frac{1}{A(n)}\\[16pt] \end{align}\]

Hint: The following introduction of $c$ is correct, since $R(n)$ is per definition greater than $0$, so neither side can be 0.

\[\begin{align} \frac{c}{R(n)} &\geq \frac{c}{R(n+1) - \frac{R(n+1)}{A(n)} - \frac{1}{A(n)}}\\[16pt] r \geq_{req} \frac{c}{R(n)} &\geq \frac{c}{R(n+1) - \frac{R(n+1)}{A(n)} - \frac{1}{A(n)}}\\[16pt] r &\geq \frac{c}{R(n+1) - \frac{R(n+1)}{A(n)} - \frac{1}{A(n)}}\\[16pt] rR(n+1) - \frac{rR(n+1)}{A(n)} - \frac{r}{A(n)} &\geq c\\[16pt] rR(n+1) - \frac{rR(n+1)}{A(n)} &\geq c + \frac{r}{A(n)}\\[16pt] r - \frac{r}{A(n)} &\geq \frac{c + \frac{r}{A(n)}}{R(n+1)}\\[16pt] \end{align}\]

Theorem

The following allocators each guarantee the specified limit:

\[\begin{align} A_1(n) &:= 2\\ G_1(n) &:= 2^{n+1} = \mathcal{O}(2^n)\\ \\ A_2(n) &:= n+2\\ G_2(n) &:= n^2+3n+2 = \mathcal{O}(n^2)\\ \\ A_3(n) &:= (n+2) \log_2(n+2) + (n+2)\\ G_3(n) &:= \mathcal{O}(n \log_2(n)^2)\\ \end{align}\]

This theorem defines three different allocators for different guarantees. The last one provides the strongest guarantee. Both the allocation and the guarantee are quasilinear. It is thus a good fit for fair allocation schemes, while still being reasonably fast to compute.

The other two provide quadratic and exponential guarantees and are mostly listed for documentational purposes. With the quasilinear guarantees at hand, there is little reason to use the other two.

As you might notice, this theorem does not provide a solution where $A$ and $G$ become infinitesimally close. It remains open whether what this solution would look like. However, the listed quasilinear solution is good enough, that it is unlikely that better options exist, which can still be calculated in reasonable amounts of time.

Proof

We provide a function $R$ for each pair. We then substitute them in Lemma 1 and show through equivalence transformations that the assertions are true.

Proof 1: Exponential Guarantee

Allocator: $A(n) := 2$
Guarantee: $G(n) := 2^{n+1}$

Let $R(n) := 2^n - 1$.

Part 1:

\[\begin{align} R(n) &\leq_{lemma} \frac{G(n)}{A(n)} - 1\\[8pt] 2^n - 1 &\leq \frac{2^{n+1}}{2} - 1\\[8pt] 2^n &= \frac{2^{n+1}}{2}\\[8pt] 2^n &= 2^n\\[8pt] \end{align}\]

Part 2:

\[\begin{align} A(n) &\geq \frac{1+R(n+1)}{R(n+1)-R(n)}\\[8pt] 2 &\geq \frac{1+(2^{n+1} - 1)}{(2^{n+1} - 1)-(2^n - 1)}\\[8pt] 2 &\geq \frac{2^{n+1}}{2^{n+1} - 2^n}\\[8pt] 2 &\geq \frac{2^{n+1}}{2^{n+1}(1 - \frac{1}{2})}\\[8pt] 2 &\geq \frac{1}{1 - \frac{1}{2}}\\[8pt] 2 &\geq 2\\[8pt] \end{align}\]

Proof 2: Polynomial Guarantee

Allocator: $A(n) := n+2$
Guarantee: $G(n) := n^2 + 3n + 2$

Let $R(n) := n$.

Part 1:

\[\begin{align} R(n) &\leq_{lemma} \frac{G(n)}{A(n)} - 1\\[8pt] n &\leq \frac{n^2 + 3n + 2}{n+2} - 1\\[8pt] n+1 &\leq \frac{n^2 + 3n + 2}{n+2}\\[8pt] (n+1)(n+2) &\leq n^2 + 3n + 2\\[8pt] n^2 + 3n + 2 &\leq n^2 + 3n + 2\\[8pt] \end{align}\]

Part 2:

\[\begin{align} A(n) &\geq \frac{1+R(n+1)}{R(n+1)-R(n)}\\[8pt] n+2 &\geq \frac{1+(n+1)}{(n+1)-(n)}\\[8pt] n+2 &\geq \frac{n+2}{n-n+1}\\[8pt] n+2 &\geq n+2\\[8pt] \end{align}\]

Proof 3: Quasilinear Guarantee

Allocator: $A(n) := (n+2) \log_2(n+2) + (n+2)$
Guarantee: $G(n) := \mathcal{O}(n \log_2(n)^2)$

Let $R(n) := \log_2(n+1)$.

Part 1:

\[\begin{align} R(n) &\leq_{lemma} \frac{G(n)}{A(n)} - 1\\[8pt] \log_2(n+1) &\leq \frac{\mathcal{O}(n \log_2 (n)^2)}{(n+2) \log_2(n+2) + (n+2)} - 1\\[8pt] \log_2(n+1) + 1 &\leq \frac{\mathcal{O}(n \log_2 (n)^2)}{(n+2) \log_2(n+2) + (n+2)}\\[8pt] \log_2(n+1) + 1 &\leq \frac{\mathcal{O}(n \log_2 (n)^2)}{(n+2) (\log_2(n+2) + 1)}\\[8pt] \log_2(n+1) + 1 \leq \log_2(n+2) + 1 &\leq \frac{\mathcal{O}(n \log_2 (n)^2)}{(n+2) (\log_2(n+2) + 1)}\\[8pt] (\log_2(n+1) + 1)^2 &\leq \frac{\mathcal{O}(n \log_2 (n)^2)}{(n+2)}\\[8pt] (n+2) \cdot (\log_2(n+1) + 1)^2 &\leq \mathcal{O}(n \cdot \log_2 (n)^2)\\[8pt] \end{align}\]

Part 2:

For this part, we rely on the following property:

\[\frac{1}{n+1} \leq \log(n+1) - \log(n) \leq \frac{1}{n}\]

This is true for all logarithms for all $n \in \mathbb{N}_+$.

We now show the second requirement of the Lemma is true. However, we cannot use equivalence transformations as in the other proofs. Hence, we show it by implication.

\[\begin{align} n+2 &= n+2\\[8pt] (n+2) (1+\log_2(n+2)) &= (n+2) (1+\log_2(n+2))\\[8pt] (n+2) (1+\log_2(n+2)) &= \frac{1+\log_2(n+2)}{\frac{1}{n+2}}\\[8pt] (n+2) \log_2(n+2) + (n+2) &= \frac{1+\log_2(n+2)}{\frac{1}{n+2}}\\[8pt] (n+2) \log_2(n+2) + (n+2) &\geq \frac{1+\log_2(n+2)}{\log_2(n+2) - \log_2(n+1)}\\[8pt] (n+2) \log_2(n+2) + (n+2) &\geq \frac{1+\log_2((n+1)+1)}{\log_2((n+1)+1) - \log_2(n+1)}\\[8pt] A(n) &\geq \frac{1+R(n+1)}{R(n+1)-R(n)}\\[8pt] \end{align}\]

Goodbye Gnu-EFI!

2019-01-31T00:00:00+00:00

The recommended way to link UEFI applications on linux was until now through GNU-EFI, a toolchain provided by the GNU Project that bridges from the ELF world into COFF/PE32+. But why don’t we compile directly to native UEFI? A short dive into the past of GNU Toolchains, its remnants, and a surprisingly simple way out.

The Linux World (and many UNIX Derivatives for that matter) is modeled around ELF. With statically linked languages becoming more prevalent, the impact of the ABI diminishes, but it still defines properties far beyond just how to call functions. The ABI your system uses also effects how compiler and linker interact, how binaries export information (especially symbols), and what features application developers can make use of. We have become used to ELF, and require its properties in places we didn’t expect.

UEFI does not use ELF. For all that matters, UEFI follows Microsoft Windows. This means, UEFI uses COFF/PE32+ (or short PE+). If we compile binaries for UEFI, they must target PE+. And the GNU Compiler Collection can do this… somewhat.

Conceptually, GCC supports many languages, ABIs, targets, and architectures in a single code-base. Technically, though, every compiled instance of GCC compiles from one language to one target. Your compiler that takes C and produces x86-64 is actually specific to x86_64-pc-linux-gnu. You cannot tweak it to compile UEFI binaries. Instead, you need another instance of GCC, one that takes C and produces x86_64-windows-msvc. You probably know this combination under the name MinGW.

But this is not what GNU went for. Instead, to what still puzzles me to this day, the GNU project decided against using its own software and instead produced something named GNU-EFI. The goal of GNU-EFI is to allow writing UEFI applications using the common GNU Toolchain (meaning you compile ELF binaries for Linux). They achieve this by linking a PE+ Stub, which at runtime performs required relocations, parameter translations, and jumps into the ELF application. You effectively write a free-standing Linux Application, add a wrapping layer and then execute it on UEFI. It works, but is needlessly complex.

Is this really the best way to compile for UEFI? Not anymore!

The LLVM toolchain (clang compiler plus lld linker) combines all supported targets in a single toolchain, offering a target selector --target to let LLVM know what to compile for. So as long as you have clang and lld installed, you can compile native UEFI binaries just like normal local compilation:

# Normal local compile+link
$ clang \
        $CFLAGS \
        -o OBJECT \
        -c [SOURCES…]
$ clang \
        $LDFLAGS \
        -o BINARY \
        [OBJECTS…]

To make this compile for UEFI targets, you simply set:

CFLAGS+= \
        --target x86_64-unknown-windows \
        -ffreestanding \
        -fshort-wchar \
        -mno-red-zone

LDFLAGS+= \
        --target x86_64-unknown-windows \
        -nostdlib \
        -Wl,-entry:efi_main \
        -Wl,-subsystem:efi_application \
        -fuse-ld=lld-link

The two things special are --target <TRIPLE> and --fuse-ld=<LINKER>. The former instructs both compiler and linker to produce COFF/PE32+ objects compatible to the Microsoft Windows Platform (which matches the UEFI platform). The latter selects the linker to use. Mind you, using the default linker will very likely fail (default being ld or ld-gold). Currently, you either have to use lld-link (PE+ backend of the LLVM linker), or you need a version of GNU-ld compiled for a PE+ toolchain. I recommend LLVM lld.

Voilà! No need for GNU-EFI, no need to mess with separated toolchains. With LLVM you get all this through your local toolchain.

If you use Meson Build, the c-efi project even provides you an example cross-file. A native meson C project can then be compiled for UEFI by nothing more than passing --cross-file x86_64-unknown-uefi to meson. See its sources for details.

The c-efi project also provides the protocol contants and definitions from the UEFI specification, so you don’t have to extract them yourself.

Exec in VM

2018-01-10T00:00:00+00:00

Almost everyone these days relies on continuous integration. And it seems, once you got accustomed to it, you never want to work without it again. Unfortunately, most CI systems lack cross-architecture capabilities. As a systems engineer with lots of C projects, I was desperately looking for a solution to run my tests on little-endian, big-endian, 32bit, and 64bit machines. So far, without any luck. Hence, I patched together qemu, docker, fedora, and some bash scripts to get a tool that allows me to execute scripts from the command-line in a VM ad-hoc.

My ultimate goal is to type vmrun make as replacement for make, and it spawns a virtual machine, mounts the current directory into the machine, executes make inside of it, returning the exit-code to my shell. Of course, it could be extended to support selecting the target architecture and/or OS image to us. So eventually, it might look something like:

vmrun \
    --image fedora-ci \
    --architecture armv7hl \
    -- \
            meson setup build && ninja -C build

As a developer, I would love having this at hand. I can easily compile and run projects in foreign architectures, without the requirement of setting up non-volatile VMs, moving data in and out of the machine, and also getting automation and scripting support.

Containers already allow this kind of setup. Using docker or systemd-nspawn you can get something similar already:

docker run \
    --interactive \
    --rm \
    --tty \
    --volume $PWD:/mnt/cwd \
    --workdir /mnt/cwd \
    fedora-ci \
            meson setup build && ninja -C build

systemd-nspawn \
    --bind $PWD:/mnt/cwd \
    --chdir /mnt/cwd \
    --ephemeral \
    --image fedora-ci \
            meson setup build && ninja -C build

This, however, has one major drawback: This can only run native binaries. If you want to run code in a foreign architecture, you need a kernel for that architecture as well. There are options like qemu-user, though they cannot provide perfect compatibility. They only get you so far.

Hence, you need some machine emulator. So how about we execute the image inside of qemu, rather than in a container? Sounds easier than it is:

Needs to Boot: Unlike in a container, the virtual machine needs to boot a kernel, user-space, and prepare the execution environment. This means, we cannot simply specify a script or binary to execute by qemu. We must actually boot the image and instruct the image to execute a given binary.

One way to get this to work on Fedora is to craft a special .service file and pull it in after boot is done. Make the service file execute your binary and then poweroff the machine when done, or on failure.
No Exit-Code Propagation: The qemu emulator does not propagate the exit-code of the code executed in the virtual machine. Hence, we need a side-channel to detect whether the script executed successfully. This is easily done by hooking up a separate serial-line and making your OS write success into it, once everything succeeded.

Maybe someone wants to hook up a qemu extension to propagate Exit-Codes?
No Bind Mounts: The biggest issue is, we cannot simply bind-mount the directory of the caller into the virtual machine. This is particularly bad, because there is no simple alternative solution. The closest possible solution I am aware of is to share the directory via NFS or 9pfs.

Maybe someone can figure out a way to do this. All my attempts failed. While I successfully shared the directory, either performance suffered, or random features failed, which were expected by some development tools (e.g., file-locks or mmap failed). I am not saying the tools are broken, but just that I couldn’t make it work. Help welcome!

(Also be aware that you suddenly run into UID and permission issues. The entire qemu machine runs as an unprivileged user, so it will only be able to access/write files as that user. But inside of the VM, you are free to use sudo and friends. There is no way to propagate this to the outside. This might be fine, but it is a source of confusion.)
No Image Hubs: While docker gave us image stores for free (e.g., Docker Hub, Quay.io, etc.), there is nothing like it for virtual machine images. Companies seem unwilling to provide the world with free terrabytes of storage.

Solution: Use docker.

While docker stores images in a format unsuitable to qemu, we can still use its storage. I simply took my XFS-qcow2 image-file and threw it into a docker container. While at it, I threw in a qemu binary with all its dependencies as well. This combined image can now be pushed to docker repositories and be hosted on Docker Hub and friends. As a consumer, you simply fetch the docker image and execute the qemu-binary inside of it, including its embedded OS image.

I went forth and threw together all the bits and pieces. But, sadly, I cannot provide you the vmrun tool as I described it above. I simply ran into too many issues around sharing a directory. However, I did end up with something close:

docker run \
        --rm \
        -it \
        -v $PWD/myscript.sh:/mnt/cherryimages/input/main:ro \
        cherrypick/cherryimages-fedora-vmrun:ci-x86_64-to-armv7hl-20180110-1

This command executes $PWD/myscript.sh inside of a fedora armv7hl image, hosted by an x86_64 qemu. For reproducability, I tagged the image at the time of this blog-post as cherrypick/cherryimages-fedora-vmrun:ci-x86_64-to-armv7hl-20180110-1. If you want the latest image, use cherrypick/cherryimages-fedora-vmrun:ci-x86_64-to-armv7hl-latest. Other tags exist as well. Just check out the repository, if interested. The Dockerfile sources as of the time of this post can be found on github.

Unlike the vmrun tool I described above, this shares the input script read-only. Furthermore, it shares the input as FAT16 volume (qemu can create this on-the-fly via the vvfat driver), so its size is quite limited, and filesystem attributes are mostly discarded. In the end, its only use is to push a script into the machine to execute (alternatively, you can push an entire directory into the machine, but the entrypoint must be named main).

For my personal use, I now added a script that fetches a git-repository, runs the embedded tests, and returns. In combination with this docker-qemu-image, I can easily run my CI on foreign architectures. Maybe some day I will pick this up again and get a proper vmrun tool (or maybe someone else does?). Until then, I will stick to the reduced version, as it serves my needs. Sadly, there are still too many variables that cannot be auto-detected (How many memory to give to the VM? Which devices to forward? Which CPU features to enable?), and too many hacks required (String-conversions required between different command-lines… Getting a distribution to boot fast in these containers… Sharing data correctly into and out of the VM…).

In the end, I think it is just too much a hassle to turn into a project I can maintain and support. The tools I needed do not provide proper APIs, but require me to lump together command-lines, PID-files, and magic configurations. Maybe some day we will get there? Until then, lets make use of qemu-user and avoid system integration tests…

Cross-Bootstrap Fedora

2018-01-09T00:00:00+00:00

I recently had to assemble linux distribution images to be run in containers and virtual machines. While most package managers provide tools to bootstrap an entire distribution into a target directory (e.g., debootstrap, dnf --installroot, zypper, pacstrap, …), I needed to do that for foreign architectures. Fortunately, Fedora got me covered!

If you use dnf --installroot=/path, dnf will perform the given operations in a separate directory tree, rather than your file-system root. It is easy to use this with dnf install to install an entire Fedora distribution into some custom directory. Unfortunately, RPM allows scripts to be run as part of the installation process of packages. Those scripts might invoke binaries of the target architecture as part of the installation. Hence, before we can cross bootstrap Fedora, we need one more tool: qemu-user-static

The qemu project provides two kinds of emulators:

System Emulators: These are the commonly known emulators used to emulate an entire machine of a given architecture. They can be used to run virtual machines of any kind. The binaries are usually called qemu-system-<arch>.
User Emulators: These emulators are much less known. They emulate the linux user-space of your target architecture of choice. That is, they execute binaries of foreign architectures on your machine, translating on the syscall boundary. Hence, you can run MIPS binaries on your x86_64 machine running a normal x86_64 kernel, as long as you use the qemu-user-mips emulator. The binaries are usually called qemu-<arch>.

Fedora provides a package called qemu-user-static, which provides statically linked qemu user-space emulators and hooks them up with the kernel-binfmt configuration. Hence, with the package installed, you can directly execute binaries of foreign architectures, and the kernel will use the qemu emulators to run the binaries. Since the qemu emulators are statically linked, they will work just fine in chroots as well.

With this in mind, you can simply add --forcearch=<arch> to dnf to bootstrap Fedora in a foreign architecture. For instance, this bootstraps just bash and all its dependencies as 32bit ARM targets:

dnf \
        -y \
        --repo=fedora \
        --repo=updates \
        --releasever=27 \
        --forcearch=armv7hl \
        --installroot=/some/path \
        install \
                bash

For more information, have a look at Nathaniel McCallum’s introduction of the --forcearch argument to dnf.

Rethinking the D-Bus Message Bus

2017-08-23T00:00:00+00:00

Later this year, on November 21, 2017, D-Bus will see its 15th birthday. An impressive age, only shy of the KDE and GNOME projects, whose collaboration inspired the creation of this independent IPC system. While still relied upon by the most recent KDE and GNOME releases, D-Bus is not free of criticism. Despite its age and mighty advocates, it never gained traction outside of its origins. On the contrary, it has long been criticized as bloated, over-engineered, and orphaned. Though, when looking into those claims, you’re often left with unsubstantiated ranting about the environment D-Bus is used in. If you rather want a glimpse into the deeper issues, the best place to look is the D-Bus bug-tracker, including the assessments of the D-Bus developers themselves. The bugs range from uncontrolled memory usage, over silent dropping of messages, to dead-locks by design, unsolved for up to 7 years. Looking closer, most of them simply cannot be solved without breaking guarantees long given by dbus-daemon(1), the reference implementation. Hence, workarounds have been put in place to keep them under control.

Nevertheless, these issues still bugged us! Which is, why we rethought some of the fundamental concepts behind the shared Message Buses defined by the D-Bus Specification. We developed a new architecture that is designed particularly for the use-cases of modern D-Bus, and it allows us to solve several long standing issues with dbus-daemon(1). With this in mind, we set out to implement an alternative D-Bus Message Bus. Half a year later, we hereby announce the dbus-broker project!

But before we dive into the project, lets first have a look at some of the long standing open bug reports on D-Bus. A selection:

Bug #33606: “stop dbus-daemon memory usage ballooning if a client is slow to read”

The bug-report describes a situation where the memory-usage of dbus-daemon(1) grows in an uncontrolled manner, if inflight messages keep piling up in the incoming and outgoing queues of the daemon. Despite being reported more than 6 years ago, there is no satisfying solution to the issue.

What it boils down to is the fact that dbus-daemon(1) does not judge messages based on their message type. Hence, whether a message was triggered by a peer itself (e.g., a method call), or triggered by another peer (e.g., a method reply), the message is always accounted on the sender of the message. Hence, if those messages are piled up in outgoing queues in dbus-daemon(1), the sender of those messages is accounted and punished for them. This can be misused by malicious applications that simply trigger a target peer to send messages (like method replies and signals), but they never read those messages but leave them queued. As a result, there is still no agreed upon way to decide who to punish for excessive buffering.
Bug #80817: “messages with abusive recursion are silently dropped”

Depending on the linux kernel you use, consecutively queued unix-domain-sockets may be rejected by sendmsg(2). This can have the effect of dbus-daemon(1) being unable to forward a message. The message will be silently dropped, without notifying anyone.

There is no known workaround for this issue, since the time of sendmsg(2) might be too late for proper error-handling, due to output buffering or short writes.

Similarly, Bug #52372 describes another situation where messages are silently dropped, if they are queued on an activatable name but their sender disconnects before the destination is activated.

Lastly, dbus-daemon(1) might fail any message and reply with an error message. That is, method-calls but also method-replies, signals, and error-messages can all be rejected for arbitrary reason by dbus-daemon(1) and trigger an error-reply. Nearly no application is ready to expect asynchronous error-replies to their attempt to send a method reply or signal. Again, this stems from dbus-daemon(1) never judging messages by their type. Despite method-transactions being stateful, there is no reliable way for a peer to cancel a message transaction. Any attempt to do so might fail. Same is true for a signal-subscription.

There are some more similar scenarios where dbus-daemon(1) has to silently drop messages, or unexpectedly rejects messages, thus breaking the rule of reliability. This is not about catching errors in client libraries, but this is about either messages being silently discarded or asynchronously rejected.
Bug #28355: “dbus-daemon hangs while starting if users are in LDAP/NIS/etc.”

Additionally to client-side policies, dbus-daemon(1) implements a mandatory access control mechanism, based on uids, gids, and message content. This, however, required D-Bus to resolve user-names and group-names to IDs, which will involve NSS, and as such LDAP/NIS/etc. This has long been a source of deadlocks, when using D-Bus to implement those NSS modules themselves. Workarounds are available, but the problem itself is not solved.
Bug #83938: “improve data structures for pending replies”

This bug-report concerns the method-call tracking in dbus-daemon(1), which is used to allow exactly one reply per method-call, but not more. A list of open reply windows is kept to track pending method-calls. In dbus-daemon(1), this is a global, linked list, searched whenever a reply is sent. By queuing up too many replies on too many connections, lookups on this list will consume a considerable amount of time, slowing down the entire bus.

While the issue at hand can be solved, and has been solved, there remain many similar global data-structures in dbus-daemon(1), that are shared across all users. Some of them can be fixed, some cannot, since D-Bus defines some global behavior (like broadcast matching and name-ownership/handover). This prevents D-Bus from scaling nicely with more processors being added to a system.

In fact, the name-registry of D-Bus, and the atomic hand-over of queued name owners, requires huge global state-tracking without any known efficient, parallel solution.

Furthermore, many of the employed workarounds simply introduce per-peer limits for those global resources. By setting them low enough, their scope has been kept under control. However, history shows that those limits have had violated application expectations several times.

None of the issues mentioned here is critical enough for D-Bus to become unbearable. On the contrary, D-Bus is still popular and no serious replacement is even close to be considered a contender. Furthermore, suitable workarounds have often been put in place to control those issues.

But we kept being annoyed by these fundamental problems, so we set forth to solve them in dbus-broker(1). What we came up with is a set of theoretical rules and concepts for a different message bus:

No Shared Medium

This is a rather theoretical change. Previously, the D-Bus Message Bus followed the model of actual physically wired buses, where peers place messages on a shared medium for others to fetch. The problem here is to guarantee fairness, and to make peers accountable for excessive use. In D-Bus the problem can be reduced to outgoing queues in the message broker. Whenever many peers send messages to the same destination, they fill the same message queue. If that queue runs full, someone needs to be held accountable. Was the destination too slow reading messages and should be disconnected? Did a sender flood the destination with an unreasonable amount of messages? Or did an innocent 3rd party just send a single message, but happened to be the final straw?

We decided to overcome this by throwing the model of a shared medium overboard. We no longer consider a D-Bus Message Bus a global medium that all peers are connected to and submit messages to. We rather consider a bus a set of distinct peers with no global state. Whenever a peer sends a message, we consider this a transaction between the sender and the destination (or multiple destinations in case of multicasts). We try to avoid any global state or context. We want every action taken by a peer to only affect the source and target of the action, but nothing else.

While nice in theory, D-Bus does not allow this. There is global state, and it is hard-coded in the D-Bus specification with many existing applications relying on it. However, we still tried to stick to this as close as possible. In particular, this means:
- Whenever a peer creates an object in the bus manager, it must be linked and indexed on a specific peer. There must not be any global lists or maps. Whenever the bus manager performs a transaction, it must be able to collect all objects that affect it by just looking at the involved peers.
  
  This rule is, in some corner-cases, violated to keep compatibility to the specification. That is, if applications rely on global behavior, it will still work. However, anything that can be indexed, is indexed, and as long as applications don’t rely on obscure D-Bus features, they will never end up in those global data-structures.
- We now judge messages by their message types. We implement proper message transactions and always know who to account for for inflight messages. Moreover, every peer now has a limited incoming queue, which every other peer gets a fair share of. Whenever a peer exceeds their share on another peer’s queue, one of both exceeded their configured limits and the message must be rejected. Details on how we dynamically adjust those shares can be found in the online documentation.
  
  We still need to decide who is at fault. Is the sender to blame or the receiver? Our solution is to base this on the question whether a message is unsolicited. That is, for unsolicited messages, the sender is to blame. For solicited messages, the receiver is to blame. Effectively, this means whenever you send a method call, you are to blame if you did not account for the reply. In case of signals, we simply treat a subscription as the intention of the subscriber to receive an unlimited stream of signals, thus making subscribed signals solicited.
  
  Lastly, in case of unsolicited messages, we reply with an error, and expect every peer to be able to deal with asynchronous errors to unsolicited messages. By contrast, solicited messages never yield an error. Instead, we always consider the receiver of solicited messages to be at fault, thus throw them off the bus.
No IPC to implement IPC

D-Bus is an IPC mechanism to allow other processes to communicate. We strictly believe that the implementation of an IPC mechanism should not use IPC itself. Otherwise, deadlocks are a steady threat.

This means, the transaction of a message (whatever kind) should not depend on any other means but local data. We do not read files, we do not invoke NSS, we do not call into D-Bus. Instead, the operation of the bus manager regarding message transactions is a self-contained process without any external hooks or callbacks.
User-based Accounting

Any resource and any object allocated in the bus must be accounted on a user. We do not account based on peers, but always account based on users.

In particular, this means we never have stacked accounting. We have limits for specific resources, but all those limits only ever affect the user accounting. That is, you can no longer exceed limits by simply connecting multiple times to the bus, or by creating objects that have separate accounting. Instead, whenever an action is accounted, it will be accounted on the calling user, regardless through which peer or object the action is performed.
Reliability

Never ever shall a message be silently dropped! Any error condition must be caught and handled, and must never put peers into unexpected situations.

If a situation arises where we cannot gracefully handle an error condition, we exit. We never put the burden on the peers, nor do we silently ignore it.

With these in mind, we implemented an independent D-Bus Message Bus and named it dbus-broker. It is available on GitHub and already capable of booting a full Fedora Desktop System. Some of its properties are:

Pure Bus Implementation

One of our aims was to make the bus manager a pure implementation with as little policy as possible. Furthermore, following our rule of “No IPC to implement IPC”, we stripped all external communication from it. The result is a standalone program we call dbus-broker, which implements a Message Bus as defined by the D-Bus specification. The only external control channel is a private socketpair that must be passed down by the parent process that spawns dbus-broker(1). This channel is used to control the broker at runtime, as well as get notified about specific events like name activation.

On top of this, we implemented a launcher compatible to dbus-daemon(1), employing dbus-broker(1). This dbus-broker-launch(1) program implements the dbus-daemon(1) semantics of a system and session/user message bus.
Local Only

We only implement local IPC. No remote transports are supported. We believe that this is beyond the realm of D-Bus. You can always employ ssh tunneling to get remote D-Bus working, just like most projects do already.
No Legacy

We do not implement legacy D-Bus features. Anything that is marked as deprecated was dropped, as long as it is not relied upon by crucial infrastructure. We are compatible to dbus-daemon(1), so use it if your system still requires those legacy features.

All those deviations are documented in our online wiki. Each case comes with a rationale why we decided to drop support for it.
Linux Only

A lot of functionality we rely on is simply not available on other operating systems. dbus-daemon(1) is still around (and will stay around), so there will always be a working D-Bus Message Bus for other operating systems.

Note that we rely on several peculiar features of the linux kernel to implement a secure message broker (including its accounting for inflight FDs, its output queueing on AF_UNIX including the IOCOUTQ ioctl, edge-triggered event notification, SO_PEERGROUPS ioctl, and more). We fixed several bugs upstream just few weeks ago, and we will continue to do so. But we are not in a position to review other kernels for the same guarantees.
Pipelining

We support SASL pipelining for fast connection attempts. This means, all SASL D-Bus authentication requests can be queued up without waiting for their replies, including any following Hello() call or other D-Bus Message.

This allows connecting to the message broker without waiting for a single roundtrip.
No Spec-Deviation

We do not intend to add features not standardized in the D-Bus Specification, nor do we intend to deviate. However, we do sometimes deviate from the behavior of the reference implementation. All those deviations are carefully considered and documented.

Our intention is to base this implementation on the ideas described above, and thus fix some of the fundamental issues we see in D-Bus. We report all our findings back and recommend solutions to upstream dbus-daemon(1). Discussion and development of the D-Bus specification still happens upstream. We are not the persons to contact for extensions of the specification, but we will happily collaborate on the upstream mailing-list and bug-tracker with whoever wants to discuss D-Bus.
Runtime Broker Control

The message broker process provides a control API to its parent process via a private connection. It allows to feed initial state, but also control the broker at runtime.

While it can and is used to implement compatibility to the dbus-daemon configuration files, it is also possible to modify the broker at runtime, if necessary. This includes adding and removing listener sockets and activatable names at runtime. Thus, appearance of activatable names can now be scheduled arbitrarily.

Please be aware that the dbus-broker project is still experimental. While we successfully use it on our machines to run the system and session/user bus, we do not recommend deploying it on production machines at this time. We are not aware of any critical bugs, but we do want more testing before recommending its deployment.

If you are curious and want to try it out, there are packages available for Fedora and Arch Linux. Other distributions will follow. The online documentation also contains information on how to compile and deploy it manually.

Project Wiki
Issue Tracker
Last Release: v3
Fedora Packages in Copr
Arch Linux Packages in AUR

Moved Blog!

2017-07-17T00:00:00+00:00

The Ponyhof blog was moved over from wordpress to here. Lets see how this will work out!

Thanks to Barry Clark for the nice jekyll guides and examples.