Pisces Moon OS · Architectural Whitepaper

The SPI Bus Treaty

A four-rule shared-bus arbitration protocol for concurrent SD, LoRa, and display operation on the ESP32-S3 — and the reason the Ghost Engine never stops.

Eric Becker · Fluid Fortune v1.2.0 · May 19, 2026 Audit Verified
Abstract

The Espressif ESP32-S3 exposes a single hardware SPI peripheral block that, on the LilyGo T-Deck Plus, T-LoRa Pager, and similar devices, is wired to three independent consumers: the MicroSD card, the SX1262 LoRa radio, and the ST7789/ST7796U display controller. The reference firmware ecosystem for these devices is built around single-purpose use cases where, at any given moment, only one of these consumers is active. Pisces Moon OS is not that. It runs the Ghost Engine — a persistent Core 0 background process that wardrives, scans BLE, and logs GPS continuously — alongside any Core 1 foreground application, with the LoRa mesh radio and the SD card and the display all active simultaneously.

This document specifies the SPI Bus Treaty: a formal four-rule arbitration protocol enforced by a recursive mutex (spi_mutex) that every component of the operating system and every third-party ELF module must obey. The protocol was developed in field conditions in 2025–2026 and is now a platform-level contract. v1.2.0 audit: 54 take sites, 56 give sites verified across 12 files. The Treaty is the architectural precondition for everything else Pisces Moon does.

I.

The Physics of the Conflict

SPI is a synchronous serial protocol with four signal lines — SCK (clock), MOSI (master out), MISO (master in), and an active-low chip-select per peripheral. The ESP32-S3 has two general-purpose SPI peripherals (SPI2 and SPI3) and a third dedicated to flash memory. The LilyGo T-Deck Plus, T-LoRa Pager, and M5Stack Cardputer ADV all wire SD, LoRa, and display to the same SPI peripheral block, distinguished only by chip-select line.

When two consumers attempt to drive the bus simultaneously — even momentarily — the failure modes are not graceful. The SD card's CMD0 initialization sequence is timing-sensitive: a glitch on SCK during command issuance corrupts the response and the controller enters an error state requiring a full power cycle to recover. The LoRa radio's SX1262 SPI command set assumes the bus is quiet for the duration of a transaction; concurrent SD traffic causes the radio to mis-parse its own command queue and enter undefined states. The display controller's pixel-flush sequence relies on tight timing between command bytes and data bytes; SPI contention produces visible tearing, dropped frames, or display freeze.

None of these failures present as “SPI contention.” The Guru Meditation crash dumps point at memory addresses where execution happened to be when the corruption propagated — not at the source of the corruption. The first symptom of an SPI conflict in this OS was a non-deterministic reboot during heavy field operation. The hypothesis that the SPI bus was the shared resource being contested took several months and many field sessions to establish.

II.

Four Field Failures

The Treaty was not designed in advance. It was extracted from four specific failure modes that recurred across field operations in 2025–2026. Each is documented here with the symptom, the root cause, and the rule it produced.

Failure 1 · Downtown Los Angeles, March 2025
Guru Meditation under concurrent SD + LoRa load

T-Deck Plus running Ghost Engine wardrive + LoRa mesh receiver in a dense RF environment (40+ APs, active LoRa mesh traffic). Crashes occurred at non-deterministic intervals, between 4 and 90 minutes of sustained operation. Crash addresses varied. The bench environment (laboratory with single AP and no LoRa peer) never produced the crash.

Root cause: SD write operations and LoRa receive interrupts both manipulated the SPI peripheral block. With no arbitration, the LoRa interrupt would occasionally fire during an SD transaction, corrupting both. The corruption propagated as memory damage that crashed in arbitrary subsystems.

Produced Rule 1: hit-and-run.

Failure 2 · Lincoln Heights, May 2025
Watchdog reboot under sustained SD write

Ghost Engine writing wardrive CSV continuously during a long-form field session. Every 15–25 minutes, the watchdog timer fired and the device rebooted. No crash dump — clean reboot.

Root cause: an SD write extending across multiple sectors held the SPI bus for tens of milliseconds. During this window, the LoRa radio's interrupt handler could not service incoming packets, and the BLE stack's scan callback could not flush its result buffer. The watchdog detected the resulting starvation and rebooted the device as a recovery action.

Produced Rule 2: no extended holds.

Failure 3 · Pasadena, July 2025
WiFi scanner deadlock with Gemini AI client

The Gemini AI client (Core 1 foreground app) and the Ghost Engine wardrive scanner (Core 0 background) both required WiFi radio access. The ESP32-S3's WiFi peripheral is single-instance: only one consumer can hold it at a time. Both subsystems would block waiting for the radio, producing visible UI freezes and silent wardrive interruptions.

Root cause: WiFi radio arbitration was not explicit. Each subsystem assumed it could use the radio whenever it needed to. The deadlock was probabilistic — short Gemini queries usually completed without conflict, but a single long inference would block the wardrive long enough for the watchdog to intervene.

Produced Rule 3: radio-traffic mutual exclusion via shared boolean flag.

Failure 4 · Bench, December 2025
Nuke operation under Treaty discipline

The Ghost Partition Nuke — sector-level zero-overwrite of sensitive data files before FAT unlinking — must hold the SPI bus for the duration of the wipe, in violation of Rule 2 (no extended holds). The naive implementation would crash the LoRa radio mid-wipe, corrupting both the wipe and the radio state. The wipe could not be safely executed during normal operation.

Root cause: the operation that needs the bus most exclusively is also the most sensitive to interruption. Halting the wipe partway through leaves the data partially-destroyed and partially-recoverable — the worst possible outcome.

Produced Rule 4: destructive operations under Treaty discipline at boot, before Ghost Engine spawn.

III.

The Four Rules

The Treaty consists of four rules. They are binding on first-party OS code, third-party ELF modules loaded at runtime from SD, and any future hardware target. Violation produces immediate crashes, watchdog reboots, or test failures. The rules are short enough to be remembered and specific enough to be checked.

Rule 1
Hit and run
Every SPI consumer acquires the bus mutex, performs its operation, and releases the mutex immediately. No idle holds. No long-running state retained across mutex releases. The mutex is a hot resource: take it, use it, return it. The recursive mutex (spi_mutex) permits nested operations within a single component, but the outermost component is responsible for releasing the same number of times it acquired.
Rule 2
No extended holds
No operation may hold the SPI bus for longer than 50 milliseconds under any circumstances during normal operation. Operations that would exceed this budget — large SD writes, long display flushes, multi-sector erase — must be chunked into smaller atomic transactions, each acquiring and releasing the mutex independently. The 50ms budget is chosen to fit within the LoRa SX1262's RX-mode interrupt service window and the BLE stack's callback flush interval.
Rule 3
Radio-traffic mutual exclusion via shared boolean flag
The WiFi radio is a single-instance peripheral on the ESP32-S3. The Ghost Engine wardrive scanner and the foreground AI client share access via a wifi_in_use boolean. The flag is set by the consumer claiming the radio and cleared on release. Consumers check the flag before claiming; if set, they yield and retry. On the Cardputer ADV (no PSRAM), this rule is extended into full WiFi mode-locking with hard teardown — CLIENT and SCANNER modes are mutually exclusive at the OS level, with full esp_wifi_stop() + esp_wifi_deinit() between transitions.
Rule 4
Destructive operations under Treaty discipline at boot
Operations that violate Rule 2 by necessity — Nuke sector overwrite, full SD format, encrypted partition key destruction — execute under spi_mutex discipline at boot, before the Ghost Engine task spawns. During this boot window, no other SPI consumer is active. The operation has exclusive access to the bus for as long as it needs. The Ghost Engine starts only after all destructive operations have completed and the bus has returned to normal Treaty governance.
IV.

v1.2.0 Audit Results

The Treaty is enforced by recursive mutex (spi_mutex), not by static analysis. Every site that touches the SPI bus must call spi_mutex_take() before its first transaction and spi_mutex_give() after its last. A manual audit was conducted for v1.2.0 to verify that every take has a matching give, no operations exceed the 50ms hold budget, and no consumer holds the bus during a yield.

FileTakesGivesNotes
core/ghost_engine.cpp1212Wardrive scan write loop, BLE callback flush
core/wardrive_task.cpp88Session file creation + per-frame write
radio/lora_driver.cpp66SX1262 command queue + interrupt handler
radio/sd_filesystem.cpp910One extra give covers a rare recovery path
ui/display_driver.cpp77Pixel flush + command sequencing
boot/late_sd_init.cpp44Cold SPI restart for T-LoRa Pager
security/nuke.cpp33Boot-time only, exclusive bus access
elf/sandbox.cpp22ELF module load + verify
apps/recorder.cpp11Audio capture write
apps/mesh_messenger.cpp11Message persistence
apps/audio_player.cpp11WAV streaming read
apps/elf_browser.cpp01Defensive give in error path
Total5456Two extra gives are defensive recovery paths

The two extra gives are intentional defensive code: the SD filesystem driver and ELF browser each contain a recovery path that releases the mutex in error conditions where a previous acquire may not be tracked. The recursive mutex tolerates extra gives without harm.

V.

Prior Art & Precedent

The technique of mutex-arbitrated SPI is not new. Every operating system kernel that drives more than one SPI peripheral simultaneously uses some form of arbitration. The contribution of the SPI Bus Treaty is not the technique — it is the protocol: the named, documented, audited contract that every component must obey, published as a public reference for any future project on this hardware class.

The Treaty stands in a tradition of named arbitration protocols:

  • Unix filesystem locking conventions (1970s) — established formal rules for competing processes sharing a filesystem. The conventions were not new techniques (file locks predate Unix); the contribution was naming the rules and making them platform-wide.
  • Apollo Guidance Computer priority scheduling protocol (1969) — formal rules for competing tasks sharing a processor under load. The 1202 alarm during the lunar landing was the protocol working as designed: the lower-priority task was sacrificed to preserve the higher-priority one.
  • Nintendo N64 RSP time budget (1996) — formal rules for competing subsystems sharing the Reality Signal Processor. The budget was not new (time-slicing predates the N64); the contribution was the documented protocol that every game developer had to follow.

In each case the solution was not a patch — it was a protocol. A named standard. A thing that could be taught, documented, and enforced across an entire platform. The SPI Bus Treaty is the application of this approach to the ESP32-S3 hardware class. To the best of our knowledge, no prior published reference for this protocol exists on this chip family. If reviewers can identify prior art, we will document and credit it — this paper is offered with that invitation.

VI.

Enforcement & Contract

The Treaty is enforced by three mechanisms that work together:

  1. The recursive mutex (spi_mutex). Every SPI consumer must acquire the mutex before its first transaction and release after its last. The mutex is recursive, so nested operations within a single component do not deadlock. The mutex is mandatory — there is no “take if available” path. A consumer that cannot acquire the mutex must yield and retry.
  2. Audit. The v1.2.0 audit (Section IV) verified that every take has a matching give. The audit is repeated for every release. Future versions will add static analysis to enforce the pattern at compile time.
  3. Documentation. The Treaty is documented in this paper and in the source tree. Third-party ELF module developers are required to follow the Treaty as a condition of running on the platform. The Treaty is the public contract — any module that violates it is a bug in that module, not in the OS.

The Treaty is what makes the Ghost Engine possible. The Ghost Engine is the entire reason Pisces Moon OS exists. The relationship between the two is not incidental — the OS was designed around the Engine, and the Engine could not run without the Treaty. The treaty doesn't change. The hardware around the treaty does. The same four rules protect the bus on the T-Deck Plus (8 MB PSRAM), the T-LoRa Pager (8 MB PSRAM, deferred boot), and the Cardputer ADV (no PSRAM, mode-locked WiFi). The protocol generalized to the entire memory tier of the ESP32-S3 chip class.

Two-device live-traffic validation (bench, May 20, 2026): Mesh Messenger send/receive confirmed between T-Deck Plus and Cardputer ADV with Cap LoRa-1262, both on the LongFast channel, concurrent with active wardrive logging on both devices. SD transcripts written to /mesh_logs/messages.csv on each device while WiFi scan results were simultaneously written to wardrive CSV. Both Rule 1 (hit and run) and Rule 2 (no extended holds) survived contact with real LoRa interrupt traffic interleaved with SD writes on the same bus. The Treaty held across the 8 MB PSRAM and no-PSRAM ends of the chip tier simultaneously — proof that the protocol is correctly specified, not merely correctly observed.

The contribution of this document is the protocol itself — named, documented, audited, and offered as a starting point for any future project on this hardware class that needs to coordinate concurrent peripheral access on a shared SPI bus. The protocol is open source under AGPL-3.0. The reference implementation is at github.com/FluidFortune/pisces-moon-os.

The SPI Bus Treaty — v1.2.0 — May 19, 2026

Eric Becker · Fluid Fortune · forge@fluidfortune.com

Companion documents: white paper · engineering record · device architecture

Source: github.com/FluidFortune/pisces-moon-os

Licensed under AGPL-3.0.