Porting SBCL to the Nintendo Switch
For the past two years Charles Zhang and I have been working on getting my game engine, Trial, running on the Nintendo Switch. The primary challenge in doing this is porting the underlying Common Lisp runtime to work on this platform. We knew going into this that it was going to be hard, but it has proven to be quite a bit more tricky than expected. I’d like to outline some of the challenges of the platform here for posterity, though please also understand that due to Nintendo’s NDA I can’t go into too much detail.
Current Status
I want to start off with where we are at, at the time of writing this article. We managed to port the runtime and compiler to the point where we can compile and execute arbitrary lisp code directly on the Switch. We can also interface with shared libraries, and I’ve ported a variety of operating system portability libraries that Trial needs to work on the Switch as well.
The above photo shows Trial’s REPL example running on the Switch devkit. Trial is setting up the OpenGL context, managing input, allocating shaders, all that good stuff, to get the text shown on screen; the Switch does not offer a terminal of its own.
Unfortunately it also crashes shortly after as SBCL is trying to engage its garbage collector. The Switch has some unique constraints in that regard that we haven’t managed to work around quite yet. We also can’t output any audio yet, since the C callback mechanism is also broken. And of course, there’s potentially a lot of other issues yet to rear their head, especially with regards to performance.
Whatever the case, we’ve gotten pretty far! This work hasn’t been free, however. While I’m fine not paying myself a fair salary, I can’t in good conscience have Charles invest so much of his valuable time into this for nothing. So I’ve been paying him on a monthly basis for all the work he’s been doing on this port. Up until now that has cost me ~17’000 USD. As you may or may not know, I’m self-employed. All of my income stems from sales of Kandria and donations from generous supporters on Patreon, GitHub, and Ko-Fi. On a good month this totals about 1’200 USD. On a bad month this totals to about 600 USD. That would be hard to get by in a cheap country, and it’s practically impossible in Zürich, Switzerland.
I manage to get by by living with my parents and being relatively frugal with my own personal expenses. Everything I actually earn and more goes back into hiring people like Charles to do cool stuff. Now, I’m ostensibly a game developer by trade, and I am working on a currently unannounced project. Games are very expensive to produce, and I do not have enough reserves to bankroll it anymore. As such, it has become very difficult to decide what to spend my limited resources on, and especially a project like this is much more likely to be axed given that I doubt Kandria sales on the Switch would even recoup the porting costs.
To get to the point: if you think this is a cool project and you would like to help us make the last few hurdles for it to be completed, please consider supporting me on Patreon, GitHub, or Ko-Fi. On Patreon you get news for every new library I release (usually at least one a month) and an exclusive monthly roundup of the current development progress of the unannounced game. Thanks!
An Overview
First, here’s what’s publicly known about the Switch’s environment: user code runs on an ARM64 Cortex-A57 chip with four cores and 4 GB RAM, and on top of a proprietary microkernel operating system that was initially developed for the Nintendo 3Ds.
SBCL already has an ARM64 Linux port, so the code generation side is already solved. Kandria also easily fits into 4GB RAM, so there’s no issues there either. The difficulties in the port reside entirely in interfacing with the surrounding proprietary operating system of the switch. The system has some constraints that usual PC operating systems do not have, which are especially problematic for something like Lisp as you’ll see in the next section.
Fortunately for us, and this is the reason I even considered a port in the first place, the Switch is also the only console to support the OpenGL graphics library for rendering, which Trial is based upon. Porting Trial itself to another graphics library would be a gigantic effort that I don’t intend on undertaking any time soon. The Xbox only supports DirectX, though supposedly there’s an OpenGL -> DirectX layer that Microsoft developed, so that might be possible. The Playstation on the other hand apparently still sports a completely proprietary graphics API, so I don’t even want to think about porting to that platform.
Anyway, in order to get started developing I had to first get access. I was lucky enough that Nintendo of Europe is fairly accommodating to indies and did grant my request. I then had to buy a devkit, which costs somewhere around 400 USD. The devkit and its SDK only run on Windows, which isn’t surprising, but will also be a relevant headache later.
Before we can get on to the difficulties in building SBCL for the Switch, let’s first take a look at how SBCL is normally built on a PC.
Building SBCL
SBCL is primarily written in Lisp itself. There is a small C runtime as well, which you use a usual C compiler to compile, but before it can do that, there’s some things it needs to know about the operating system environment it compiles for. The runtime also doesn’t have a compiler of its own, so it can’t compile any Lisp code. In order to get the whole process kicked off, SBCL requires another Lisp implementation to bootstrap with, ideally another version of itself.
The build then proceeds in roughly five phases:
build-config
This step just gathers whatever build configuration options you want for your target and spits them out into a readable format for the rest of the build process.
make-host-1
Now we build the cross-compiler with the host Lisp compiler, and at the same time emit C header files describing Lisp object layouts in memory as C structs for the next step.
make-target-1
Next we run the target C compiler to create the C runtime. As mentioned, this uses a standard C compiler, which can itself be a cross-compiler. The C runtime includes the garbage collector and other glue to the operating system environment. This step also produces some constants the target Lisp compiler and runtime needs to know about by using the C compiler to read out relevant operating system headers.
make-host-2
With the target runtime built, we build the target Lisp system (compiler and the standard library) using the Lisp cross-compiler built by the Lisp host compiler in
make-host-1
. This step produces a “cold core” that the runtime can jump into, and can be done purely on the host machine. This cold core is not complete, and needs to be executed on the target machine with the target runtime to finish bootstrapping, notably to initialize the object system, which requires runtime compilation. This is done in
make-target-2
The cold core produced in the last step is loaded into the target runtime, and finishes the bootstrapping procedure to compile and load the rest of the Lisp system. After the Lisp system is loaded into memory, the memory is dumped out into a “warm core”, which can be loaded back into memory in a new process with the target runtime. From this point on, you can load new code and dump new images at will.
Notable here is the need to run Lisp code on the target machine itself. We can’t cross-compile “purely” on the host, not in the least because user Lisp code cannot be compiled without also being run like batch-compiled C code can, and when it is run it assumes that it is in the target environment. So we really don’t have much of a choice in the matter.
In order to deploy an application, we proceed similar to
make-target-2
: We compile in Lisp code incrementally and then when we have everything we need we dump out a core with the runtime attached to it. This results in a single binary with a data blob attached.When the SBCL runtime starts up it looks for a core blob, maps it into memory, marks pages with code in them as executable, and then jumps to the entry function the user designated. This all is a problem for the Switch.
Building for the Switch
The Switch is not a PC environment. It doesn’t have a shell, command line, or compiler suite on it to run the build as we usually do. Worse still, its operating system does not allow you to create executable pages, so even if we could run the compilation steps on there we couldn’t incrementally compile anything on it like we usually do for Lisp code.
But all is not lost. Most of the code is not platform dependent and can simply be compiled for ARM64 as usual. All we need to do is make sure that anything that touches the surrounding environment in some way knows that we’re actually trying to compile for the Switch, then we can use another ARM64 environment like Linux to create our implementation.
With that in mind, here’s what our steps look like:
build-config
We run this on some host system, using a special flag to indicate that we’re building for the Switch. We also enable thefasteval
contrib. We needfasteval
to step in for any place where we would usually invoke the compiler at runtime, since we absolutely cannot do that on the Switch.
make-host-1
This step doesn’t change. We just get different headers that prep for the Switch platform.
make-target-1
Now we use the C compiler the Nintendo SDK provides for us, which can cross-compile for the Switch. Unfortunately the OS is not POSIX compliant, so we had to create a custom runtime target in SBCL that stubs out and papers over the operating system environment differences that we care about, like dynamic linking, mapping pages, and so on.
Here is where things get a bit weird. We are now moving on to compiling Lisp code, and we want to do so on a Linux host system. So we have to…
build-config
(2)We now create a normal ARM64 Linux system with the same feature set as for the Switch. This involves the usual steps as before, though with a special flag to inform some parts of the Lisp process that we’re going to ultimately target the Switch.
make-host-1
(2)
make-target-1
(2)
make-host-2
make-target-2
With all of this done we now have a slightly special SBCL build for Linux ARM64. We can now move on to compiling user code.
For user code we now perform some tricks to make it think it’s running on the Switch, rather than on Linux. In particular we modify
*features*
to include:nx
(the Switch code name) and not:linux
,:unix
, or:posix
. Once that is set up and ASDF has been neutered, we can compile our program (like Trial) “as usual” and at the end dump out a new core.We’ve solved the problem of actually compiling the code, but we still need to figure out how to get the code started on the Switch, since it does not allow us to do the usual core-mapping strategy. As such, attaching the new core to the runtime we made for the Switch won’t work.
To make this work, we make use of two relatively unknown features of SBCL: immobile-code, and elfination. Usually when SBCL compiles code at runtime, it sticks it into a page somewhere, and marks that page executable. The code itself however could become unneeded at some point, at which point we’d like to garbage collect it. We can then reclaim the space it took up, and to do so compact the rest of the code around it. The immobile-code feature allows SBCL to take up a different strategy, where code is put into special reserved code pages and remains there. This means it can’t be garbage collected, but it instead can take advantage of more traditional operating system support. Typically executables have pre-marked sections that the operating system knows to contain code, so it can take care of the mapping when the program is started, rather than the program doing it on its own like SBCL usually does.
OK, so we can generate code and prevent it from being moved. But we still have a core at the end of our build that we now need to transform into the separate code and data sections needed for a typical executable. This is done with the elfination step.
The elfinator looks at a core and performs assembly rewriting to make the code position-independent (a requirement for Address Space Layout Randomisation), and then tears it out into two separate files, a pure code assembly file, and a pure data payload file.
We can now take those two files and link them together with the runtime that the C compiler produced and get a completed SBCL that runs like any other executable would. So here’s the last steps of the build process:
Run the elfinator to generate the assembly files
Link the final binary
Run the Nintendo SDK’s authoring tools to bundle metadata, shared libraries, assets, and the application binary into one final package
That’s quite an involved build setup. Not to mention that we need at least an ARM64 Linux machine to run most of the build on, as well as either an AMD64 Windows machine (or an AMD64 Linux machine with Wine) to run the Nintendo SDK compiler and authoring tools.
I usually use an AMD64 Linux machine, so there’s a total of three machines involved: The AMD64 “driver,” the ARM64 build host, and a Windows VM to talk to the devkit with.
I wrote a special build system with all sorts of messed up caching and cross-machine synchronisation logic to automate all of this, which was quite a bit of work to get going, especially since the build should also be drivable from an MSYS2/Windows setup. Lots of fun with path mangling!
So now we have a full Lisp system, including user code, compiling for and being able to run on the Switch. Wow! I’ve skipped over a lot of the nitty-gritty dealing with getting the build properly aware of which target it’s building for, making the elfinator and immobile-code working on ARM64, and porting all of the support libraries like pathname-utils, libmixed, cl-gamepad, etc. Again, most of the details we can’t openly talk about due to the NDA. However, we have upstreamed what work we could, and all of the Lisp libraries don’t have a private fork.
It’s worth noting though that elfination wasn’t initially designed to produce position independent executable Lisp code, which is usually full of absolute pointers. So we needed to do a lot of work in the SBCL compiler and runtime to support load time relocation of absolute pointers and make sure code objects (which usually contain code constants) no longer have absolute pointers, as the GC can’t modify executable sections. Not even the OS loader is allowed to modify executable sections to relocate absolute pointer. We did this by relocating absolute pointers like code constants outside of the text space into a read-writable space close enough to rewrite constant references in code to load from this r/w space instead, which the loader and the moving GC can fixup pointers at.
Instead of interfacing directly with the Nintendo SDK, I’ve opted to create my own C libraries that have a custom interface the Lisp libraries interface with in order to access the operating system functionality it needs. That way I can at least publish the Lisp bits openly, and only keep the small C library private. Anyway, now that we can run stuff we’re not done yet. Our system actually needs to keep running, too, and that brings us to
The Garbage Collector
Garbage collection is a huge topic in itself and there’s a ton of different techniques to make it work efficiently. The standard GC for SBCL is called “gencgc”, a Generational Garbage Collector. Generational meaning it keeps separate “generations” of objects and scans the generations in different frequencies, copying them over to another generation’s location to compact the space. None of this is inherently an issue for the Switch, if it weren’t for multithreading.
When multiple threads are involved, we can’t just move objects around, as another thread could be accessing it at any time. The easiest way to resolve this conflict is to park all threads before engaging garbage collection. So the question becomes: when a thread wants to start garbage collection, how does it get the other threads to park?
On Unix systems a pretty handy trick is used: we can use the signalling mechanism to send a signal to the other threads, which then take that hint to park.
On the Switch we don’t have any signal mechanism. In fact, we can’t interrupt threads at all. So we instead need to somehow get each thread to figure out that it should park on its own. The typical strategy for this is called “safepoints”.
Essentially we modify the compiler a little bit to inject some extra code that checks whether the thread should park or not. This strategy has some issues, namely:
Adding a check isn’t free. So we want to check as little as possible
If we don’t check frequently enough, we are going to stall all the other threads because GC can’t begin until they’re all parked
If we have to inject a lot of instructions for a check, it is going to disrupt CPU cache lines and pipelining optimisations
The current safepoint system in SBCL was written for Windows, which similarly does not have inter-process signal handlers. However, unlike the Switch, it does still have signal handling for the current thread. So the current safepoint implementation was written with this strategy:
Each thread keeps a page around that a safepoint just writes a word to. When GC is engaged, those pages are marked as read-only, so that when the safepoint is hit and the other thread tries to write to the page, a segmentation fault is triggered and the thread can park. This is efficient, since we only need a single instruction to write into the page.
On the Switch we can’t use this trick either, so we have to actually insert a more complex check, which can be tricky to get working as intended, as all parallel algorithms tend to be.
Since safepoints aren’t necessary on any other platform than Windows, it also hasn’t been tested anywhere else, so aside from modifying it for this new platform it’s also just unstable. It is apparently quite a big mess in the code base and would ideally be redone from scratch, but hopefully we don’t have to go quite that far.
I’d also like to give special mention to the issue that CLOS presents. Usually SBCL defers compilation of the “discriminating function” that is needed to dispatch to methods to the first call of the generic function. This is done because CLOS is highly dynamic and allows adding and removing methods pretty much at any time, and there’s usually no good point in time that the system knows it is complete. Of course, on the Switch we can’t invoke the compiler, so we can’t really do this. For now our strategy has been to instead rely on the fast evaluator. We stub out the
compile
function to create a lambda that executes the code via the evaluator instead. This has the advantage of working with any user code that relies oncompile
as well, though it is obviously much slower for execution than it would be if we could actually compile.This neatly brings us to
Future Work
The fasteval trick is mostly a fallback. Ideally I’d like to explore options to freeze as much of CLOS in place as possible right before the final image is dumped and compile as much as possible ahead of time. I’d also like to investigate the block compilation mode that Charles restored some years back more closely.
It’s very possible that the Switch’s underpowered processor will also force us to implement further optimisations, especially on the side of my engine and the code in Kandria itself. Up until now I’ve been able to get away with comparatively little optimisation, since even computers of ten years ago are more than fast enough to run what I need for the game. However, I’m not so sure that the Switch could match up to that even if it didn’t also introduce additional constraints on performance with its lack of operating system support.
First, though, we need to get the garbage collector running fully. It runs enough to boot up and get into Trial’s main loop, but as soon as it hits multi-generation compaction, it falls flat on its face.
Next we need to get callbacks from C working again. Apparently this is a part of the SBCL codebase that can only be described as “a mess,” involving lots of hand-rolled assembly routines, which probably need some adjustments to work correctly with immobile-code and elfination. Callbacks fortunately are relatively rare, Trial only needs them for sound playback via libmixed.
There’s also been some other issues that we’ve kept in the back of our heads but don’t require our immediate attention, as well as some extra portability features I know I’ll have to work on in Trial before its selftest suite fully passes on the Switch.
Conclusion
I’ll be sure to add an addendum here should the state of the port significantly change in the future. Some people have also asked me if the work could be made public, or if I’d be willing to share it.
The answer to that is that while I would desperately like to share it all publicly, the NDA prevents us from doing so. We still upstream and publicise whatever we can, but some bits that tie directly into the Nintendo SDK cannot be shared with anyone that hasn’t also signed the NDA. So, in the very remote possibility that someone other than me is crazy enough to want to publish a Common Lisp game on the Nintendo Switch, they can reach out to me and I’ll happily give them access to our porting work once the NDA has been signed.
Naturally, I’ll also keep people updated more closely on what’s going on in the monthly updates for Patrons. With that all said, I once again plead with you to consider supporting me on Patreon, GitHub, or Ko-Fi. All the income from these will, for the foreseeable future, be going towards funding the SBCL port to the Switch as well as the current game project.
Thank you as always for reading, and I hope to share more exciting news with you soon!