The Server Chose Violence

Diego Culton April 27, 2024

0 10 minutes read

Hubris’s oddest syscall

2024-04-08

I’m continuing to reflect on the past four years with Hubris — April Fool’s
Day was, appropriately enough, the fourth anniversary of the first Hubris user
program, and today is the fourth anniversary of the first kernel code. (I wrote
the user program first to help me understand what the kernel’s API wanted to
look like.)

Of all of Hubris’s design decisions, there’s one that gets a “wait what”
response more often than any other. It’s also proving to be a critical part of
the system’s overall robustness. In this post, I’ll take a look at our 13th and
oddest syscall, REPLY_FAULT.

A brief overview of Hubris IPC

Hubris uses a small, application-independent kernel, and puts most of the code
— drivers, application logic, network stack, etc. — in separately compiled
isolated tasks. These tasks can communicate with each other using a cross-task
messaging system (inter-process communication, or IPC). (This section will do a
sort of “Hubris in a nutshell” — if you’d like to learn more I recommend the
Reference Manual.)

IPC in Hubris consists of three core operations, implemented in the kernel,
which tasks can request using syscalls:

RECV collects the highest priority incoming message, or blocks until one
arrives.
SEND stops the caller and transfers a message — and control! — to the
receiving task. The caller is parked until it gets a response.
REPLY delivers a response to a task that had previously used SEND,
allowing it to continue.

The Hubris IPC scheme is deliberately designed to work a lot like a function
call, at least from the perspective of the client.

We often talk about “clients” and “servers” in Hubris, and it’s worth noting
that these are roles tasks play. A client is just a task using SEND, and a
server is a task using RECV and REPLY – but they’re not mutually exclusive.
A task may be a server to some other tasks, and simultaneously a client to
different tasks. For instance, an “LED Blinker” task may call (client) into a
“GPIO driver” task (server), which itself may call (client) into a supervisory
task (server).

To underscore this point, here’s a graph of the IPC flow (green arrows) between
tasks (rectangles) in Oxide’s production server firmware. Notice that almost all
tasks have arrows both coming out (client) and coming in (server).

A directed graph showing layers of tasks in our firmware with edges drawn between them, which is unfortunately difficult to explain entirely in text.

New and exciting failure modes

When writing a function or procedure in almost any programming language, you
make some assumptions about your callers’ behavior. This creates preconditions
for calling the function. Depending on the language, some are explicit, and some
are implicit. In Rust, for instance, if your function takes an argument of
type String, it’s reasonable to assume your caller passes in a String and
not a bool.

Your function has the backing of the compiler here: the caller has to pass a
compatible type for all arguments, or the compiler won’t let them attempt to
call the function. It’s possible to subvert this if you work at it, of course,
but it’s hard to subvert it by accident.

The compiler and linker conspire behind the scenes to make sure that your
program calls the function you intended. This ensures that you won’t be
surprised by code that attempts to call pet_cat and winds up calling
fire_missiles instead, except in very rare circumstances.

Because IPC crosses task boundaries, and tasks in Hubris are separately compiled
programs, you have to be careful making these same assumptions with IPC. If a
client is compiled against the wrong interface, or confuses one task for
another, the compiler won’t have any idea, since it sees only a single program
at a time. In this respect, IPC acts more like communication over a network.

Every task on Hubris that acts as an IPC server has to deal with the following
potential errors:

Getting a message with an operation code that isn’t even appropriate for your
interface, like “operation number 48” in a two-operation interface.
Receiving an uninterpretable bag of bytes instead of the message type you were
expecting — or a message that is much too short or long.
Not getting the sort of loaned memory you require (e.g. you need it writable
but you receive it read-only, or don’t receive it at all).

But I describe those as potential errors because, in practice…

None of this happens in normal, correct programs

In a normal Hubris program, none of these things happen.

Tasks are connected to each other by configuration in the build system, so it’s
hard to confuse one for the other. Clients use generated Rust code to construct
and send IPCs to servers, which use different generated Rust code to handle the
result. This lets us squint and pretend that the type system works across task
boundaries — it doesn’t, really, but our tools produce a pretty good illusion.

I always hate to penalize the “good” programs for error cases that they can’t
actually hit. All of the obvious ways of handling the potential but unlikely
errors (described above) hurt good programs.

For example: making all IPC operations return a Result where the
good programs can’t actually hit any case in IpcError means that, in practice,
they’ll just unwrap() it. That’s a fairly large operation in terms of code
size — especially when we know the code will never be used! — and costs time
at runtime to check for errors that won’t happen.

To keep every client from needing to unwrap() a bazillion errors, we could put
the unwrap() (or more generally a panic!) into the generated code. This
might reduce the code size impact (by centralizing the panic! in one location)
but won’t reduce the cost at runtime.

There’s also a different kind of cost: a design cost. To be able to return a
universal error from any operation, and have it be understood by a caller
attempting any other operation, we have to make rules about the message
encoding. Every operation must be capable of returning an error, every operation
must have a way of encoding this particular error, and the encoding of this
error by all operations must be identical.

This means you can’t express an operation that can’t fail, which is
particularly annoying: as we’ve built our firmware infrastructure on Hubris, we
keep finding operations that really can’t fail. Setting a GPIO pin, for example.

So we dearly needed an alternative to this “universal error code” approach. I
drew inspiration from a weird design decision I made in the Hubris kernel API:
the Hubris kernel is unusually aggressive.

The kernel is not having any of your nonsense.

In most operating systems, if you violate the preconditions for a system call,
you get a polite error code back from the kernel — or, at worst, an exception
handler or signal handler gets triggered. You have an opportunity to recover.

Take Unix for example. If you call close on a file descriptor you never
opened, you get an error code back. If you call open and hand it a null
pointer instead of a pathname? You get an error code back. Both of these are
violations of a system call’s preconditions, and both are handled through the
same error mechanism that handles “file not found” and other cases that can
happen in a correct program.

On Hubris, if you break a system call’s preconditions, your task is immediately
destroyed with no opportunity to do anything else.¹

More specifically, the kernel delivers a synthetic fault. This is very similar
to the hardware faults that a task receives if it, say, dereferences a null
pointer, or divides by zero. Those are produced by the CPU for breaking the
processor architecture’s rules. Synthetic faults, on the other hand, are
produced by the kernel for breaking the kernel’s rules.

For example, when a task calls SEND, it passes the kernel the index of the
intended recipient task, and a pointer to some memory containing the message. If
the recipient task index is out of range for the application? Synthetic fault.
If the message pointer points to memory the task doesn’t actually have access
to? Synthetic fault.

Early in the system’s design, I decided not to permit recoverable/resumable
faults. That is, when a program takes a fault — whether it’s hardware or
synthetic — the task is dead. It can run no further instructions. There is no
way to “fix” the problem and resume the task. This was a conscious choice to
avoid some subtle failure modes and simplify reasoning about the system.²

But combined with the kernel’s habit of faulting any task that looks at it
funny, this makes the system’s behavior very unusual compared to most operating
systems.

And it’s been great.

Initially I was concerned that I’d made the kernel too aggressive, but in
practice, this has meant that errors are caught very early in development. A
fault is hard to miss, and literally cannot be ignored the way an error code
might be. Humility (our debugger) happily prints a detailed description of any
fault it finds; in fact, one made an appearance in my last Hubris-related
post, although in that case it was being reported in error:

mem fault (precise: 0x801bffd) in syscall (was: wait: reply from i2c_driver/gen0)

This is a synthetic fault that a task receives for handing the kernel a pointer
to some memory (at address 0x801bffd in this case) that the task can’t
actually access.

This behavior was so nice to use in practice, in fact, that it suggested a way
to fix our IPC error reporting woes: generalize the same mechanism.

The server isn’t having any of your nonsense, either.

Once I realized that our unusually strict kernel was actually helping
developers instead of hindering them, I was inspired to implement Hubris’s 13th
and oddest syscall: REPLY_FAULT.

I mentioned REPLY earlier, the mechanism servers use to respond to their
clients. More specifically,

When a client uses SEND the kernel marks the client’s task as “waiting to
send” to the recipient task.
When the recipient uses RECV, one client task “waiting to send” to it is
updated to “waiting for reply.” The client task will remain in that state
until something changes — usually, the server using REPLY.
REPLY only works on a task marked as “waiting for reply” from the specific
server task that is attempting to reply. It switches the client task back into
a “runnable” state.

REPLY_FAULT is basically the same thing, except instead of delivering a
message and making the task runnable, it delivers a fault and makes the task
dead. With REPLY_FAULT, we can avoid having unnecessary error handling on IPC
operations, because correct programs will just go on as if the problem can’t
occur — and incorrect programs won’t get to handle the error at all!

Like REPLY, a server can only REPLY_FAULT a task that is waiting for its
reply. You can’t use REPLY_FAULT to kill random tasks, only the set of tasks
from which you have RECV’d a message and not yet REPLY’d.

Our system now uses REPLY_FAULT to handle the three cases I mentioned earlier:
a bogus operation code; or a corrupt, truncated, or otherwise nonsensical
message; or if the client doesn’t send the right kind of loaned memory.

But REPLY_FAULT also provides a way to define and implement new kinds of
errors — application-specific errors — such as access control rules. For
instance, the Hubris IP stack assigns IP ports to tasks statically. If a task
tries to mess with another task’s IP port, the IP stack faults them. This gets
us the same sort of “fail fast” developer experience, with the smaller and
simpler code that results from not handling “theoretical” errors that can’t
occur in practice.

Like the kernel’s aggressive handling of errors in system calls, I was initially
concerned that REPLY_FAULT would be too extreme. After I had the idea, I
delayed several months before starting implementation, basically trying to talk
myself out of it.

I was being too careful. REPLY_FAULT has been great. A new developer on the
system recently cited it as one of the more surprising and pleasant parts of
developing on Hubris, which is what inspired me to write this post.

The joy of panicking other programs

I mentioned earlier that Hubris IPC was explicitly designed to behave like a
Rust function call from the perspective of the client.

Well, if you violate the preconditions on a Rust function call, the function
will normally respond with a panic!.

REPLY_FAULT essentially provides a way for servers to generate cross-process
panic! in their clients, without requiring clients to contain code to do it
— or, perhaps more importantly, without requiring clients to cooperate in the
process at all.

Overall, this combines with some other system features to make Hubris
“aggressively hostile to malicious programs,” as Eliza Weissman recently
described it. Since attempts at exploitation often manifest first as errors or
misuse of APIs, a system that responds to any misbehavior by wiping the state of
the misbehaving component ought to be harder to exploit. (This hypothesis has
yet to be tested! Please reach out to me if you’re interested in trying to
exploit Hubris. I will help!)

In practice, the only downside I’ve observed from these decisions is that the
system is really difficult to fuzz test. Because I like chaos
engineering, I’ve implemented a small “chaos” task that generates random
IPCs and system calls to test other components for bugs, and almost anything it
does causes it to get immediately reset. To be useful, it has to base all of its
decisions off the one piece of state that is observably different each time it
starts: the system uptime counter. (However, REPLY_FAULT does provide a way
for servers to force chaos upon their clients by randomly killing them, an
option I haven’t fully evaluated.)

But normal Hubris tasks don’t dynamically generate IPC messages, particularly
ones that are deliberately bogus. In practice, they can carry on without
realizing REPLY_FAULT even exists — because unless they do something really
unusual, they will never see the business end of it anyway.