A self-hosting compiler. Zero C dependencies.
Rail compiles itself to a ~1.0 MB ARM64 binary. Six backends: native ARM64, Linux ARM64, Linux x86_64, WASM, Cortex-M4, RISC-V — and it JIT-emits its own Metal GPU kernels. One language for the whole stack — compiler, TLS, plasma sim, site generator, training loop.
2-pass byte-identical self-compile · BSL 1.1 · Detroit, MI
Every module below is Rail source, compiled by Rail, running on the same ~1.0 MB binary. No FFI. No shims.
Pure-Rail handshake, x25519, ChaCha20-Poly1305, ECDSA P-256/P-384/P-521, RSA-PSS, Ed25519 verify. Chain-walk to trust store by default.
Raw TCP, chunked-transfer decoder, keep-alive, strict HTTPS GET/POST. Live against Anthropic, Slack, Cloudflare.
Full JSON parser/emitter. SQLite via Rail-native bindings. Regex with backtracking engine.
Metal GPU matmul (474 GFLOPS sustained). Reverse-mode autodiff. Transformer, optim, checkpoint.
Byte-pair encoding, sampling, detokenization. Trained against the Rail corpus itself.
2D magnetohydrodynamics solver. Orszag-Tang vortex. Conservation to machine precision.
Modern systems are piles of indirection. A hello-world prints through five runtimes before it touches a screen. Each runtime promises to hide the layer beneath it — and each hiding costs you the ability to reason about what's actually happening.
Rail takes the other side of that trade. Short distance between source and silicon. One binary. One language. If you want to know what a line of Rail does, you read the compiler — which is written in the same Rail. The answer is always reachable.
Rail now generates Metal Shading Language source from its op-DAG, JIT-compiles it via Metal's newLibraryWithSource:, and dispatches the kernel at runtime. Every kernel the GPU executes is emitted by an attested Rail binary. Two hand-fused kernels land on the same dispatch path: RMSNorm + Q/K/V matmul fused (35× faster at training shapes), SiLU(gate)·up fused (18× faster). bf16 numerics regime sidesteps fp16's step-2759 NaN cliff — 10k-step training stable. 141/141 still green; 2-pass byte-identical self-bootstrap unchanged.
Rail produces its own aarch64 Linux ELF binaries. Encoder, assembler, static linker, ELF writer — all pure Rail. On the supported subset of inputs the build pipeline invokes no external as, ld, or codesign. Four hand programs (exit, fib, hello, BSS counter) run byte-equivalent to canonical as + ld output on Pi Zero 2 W.
ARM64 and x86_64 backends reach full parity (140/140 and 136/136). JIT-first REPL at 0.1 ms/line. Single-program agentic loop: Rail calls Anthropic, JIT-lowers the response, executes, returns — no shell, no Python, no subprocess. Public reproducible hard-bench: frontier LLM + 1 KB Rail spec scores 30/30 on a held-out suite.
Pure-Rail PUT of arbitrary byte bodies, including the 945 KB rail_native binary itself. Five stdlib bugs fixed: file.close shadow, DNS UDP timeout, O(N²) string_to_bytes, length == 0 O(N) traps, recv-loop keep-alive hang. End-to-end attestation now signs the binary that just compiled itself.
The attestation pipeline is now Rail-native end-to-end including the Pi-side HTTP signer (pi_sign_server.rail, 118 KB ELF, replaces ~110 LOC Python). Linux ARM64 cross-compile gains real _atof, _snprintf %.15g, and _rail_print_float — previous stubs silently returned zero for every float literal.
The attestation pipeline that v3.8.0 introduced is now Rail-native end-to-end. sc_reduce/sc_muladd + RFC 8032 sign in pure Rail, RFC §A.4 byte-identical. Pi HTTP signer on fleet0:9102 replaces the SSH dance — release-attest 49 s → 27 s.
Every tagged release, every test pass, every 2-pass fixed point now signed against a live entropy beacon by the Pi-hosted fleet0 Ed25519 witness key. Mission control at /system.
17-day silent wrong-result bug fixed at root in all_params_int. Runtime-mmap arena (4 GB+ scaling). Rail-native fp32×fp16 GPU matmul. Parallel rerank (7.1× at N=8).
Undefined idents now fail at link time with a named symbol instead of a runtime SIGSEGV. Parser accepts multi-line compound expressions inside unclosed brackets.
Chain-walk is now the default. Trust-store caching of SPKI hashes. Non-strict paths parked under _unsafe_noverify.
RFC 8032 §5.1 verifier clean against test vectors. 140/140, 2-pass byte-identical.
TCP_NODELAY + NST seq advance. ECDSA P-521 wired into cert chain + leaf verify.
compile.rail quadratic fixes. probe4 195s → 12s (16×).
Every tagged release binds sha256(rail_native) to a
live entropy beacon pulse and an Ed25519 signature from the Pi
witness. The verifier is one curl away.