Every tagged release, in order.
Public since v1.0. Each entry points at the commit that ships it. CHANGELOG.md in the repo is the canonical source.
Rail now generates Metal Shading Language source from its op-DAG, JIT-compiles it via Metal's newLibraryWithSource:, and dispatches the kernel at runtime. Every kernel the GPU executes is emitted by an attested Rail binary — the substrate piece needed for end-to-end attested GPU training.
stdlib/jit_node.rail (op-DAG types) + stdlib/jit_tape.rail (tracers) + stdlib/jit_match.rail (DAG matcher) + stdlib/jit_emit.rail (MSL emitter)fused_rmsnorm_qkv (RMSNorm + 3 matmul in one threadgroup-per-row dispatch, 35× faster at training shapes), fused_silu_hadamard (SiLU(gate) * up elementwise, 18× faster)tgl_matmul_bf16 + matmul_bf16 Rail wrapper. f32 exponent range sidesteps fp16's step-2759 NaN cliff. 10k-step training stabletgl_jit_compile_from_tmp_file drives Metal's newLibraryWithSource: from a Rail-emitted .metal file; pipeline IDs cached for reuseRail produces its own aarch64 Linux ELF binaries. Encoder, assembler, static linker, and ELF writer are all pure Rail. On the supported subset of inputs, the build pipeline invokes no external as, ld, or codesign.
jit/arm64.rail: 23 new encoders for the Linux mnemonic set (ldrb/strb, clz, neg, cmn, rev, fneg, fcvt, tbnz, stp/ldp pre/post-index, asr/lsr/lsl immediate)stdlib/elf.rail: 175-line Elf64_Ehdr + program-header writer for static aarch64 binariestools/v5/elf_asm.rail: 567-line section-aware ARM64 assembler + static linker with adrp / :lo12: symbol resolutionas + ld output on Pi Zero 2 W via TailscaleARM64 and x86_64 backends reach full parity (140/140 and 136/136). JIT-first REPL at 0.1 ms/line. Single-program agentic loop: Rail calls Anthropic, JIT-lowers the response, executes, returns — no shell, no Python, no subprocess. Public reproducible hard-bench: frontier LLM + 1 KB Rail spec scores 30/30 on a held-out suite. Multi-witness Ed25519 provenance with pulse_id binding closes the prior session-replay gap.
An API token was leaked in public git history and has been rotated. No customer data affected.
The attestation pipeline is now Rail-native end-to-end including the Pi-side HTTP signer. The hot path no longer touches Python at all. Linux ARM64 cross-compile produces useful binaries beyond hello-world.
tools/attest/pi_sign_server.rail: HTTP signer on fleet0:9102. Replaces the ~110 LOC Python signer with a 118 KB ELF. Same wire format, same backing shell signer, end-to-end verifiedlinux_libc.s: three real implementations replacing silent stubs — _atof (real number parser, the previous stub returned 0.0 for every Rail float literal), _snprintf %.15g formatter (real digit-extract, the previous stub wrote literal "0"), _rail_print_float (Linux-ABI clone of the Mac stub)The attestation pipeline that v3.8.0 introduced is now Rail-native end-to-end. Mac-side orchestrator, scalar arithmetic mod L, sign function, signer transport — all Rail. Only the Pi-side key material still touches OpenSSL via the existing shell signer (wrapped in a Python HTTP server).
stdlib/ed25519_scalar.rail: sc_reduce (64-byte mod L) and sc_muladd ((a·b + c) mod L) in pure Rail. 8/8 vectors pass including SHA-512('') mod L matching the Python oracle byte-for-bytestdlib/ed25519_sign.rail: full RFC 8032 §5.1.6 sign. r, R, k, S = sc_muladd(k, a, r), sig = R || S. RFC §A.4 vector 1 byte-identical for both pk and sig + round-trip verify=1; PASS on first compiletools/attest/attest.rail: Rail-native attestation orchestrator. SHA-256 over input bytes, HTTPS GET /entropy/pulse, JSON parse, HTTP POST signer, JSON outer build, write to <input>.attestation.json. Zero shell-out on the request pathtools/attest/pi_sign_server.py + com.ledatic.attest_sign.service: Pi HTTP signer on fleet0:9102 over Tailscale, token-authed. Replaces the per-attest SSH dance — release-attest wall time 49 s → 27 s./rail_native linux foo.rail produces working ELF; hello/fact/fold run on Pi_rail_chained_malloc, mk_lxf_step_into, double-buffered beacon loopEvery tagged release, every ./rail_native test pass, and every 2-pass self-compile fixed point now binds to a live entropy beacon pulse_id and an Ed25519 signature from the project's fleet0 Pi witness (pk_fp = cac5f21a70564aeb).
verify.sh + fleet0.pub.pemcom.ledatic.attest_daily (06:00), com.ledatic.fleet_attest (60 s)libtensor_gpu.dylib before tests, ending 14 days of red on tensor link errorsSubstrate work + new inference paths + diagnostic infrastructure. Three real bugs (one fixed at root, one workaround'd at source, one falsified). 140/140 green; byte-identical self-bootstrap verified.
body_has_float guard to all_params_int. Closes a 17-day silent wrong-result bug that reinterpreted float bits as ints in tail-recursive register-ABI calls (RMSNorm CPU, AdamW weight decay, LayerNorm CPU backward)RAIL_ARENA_MB env var (default 1 GB, scales to 4 GB+). Long-context training (seq=2048+) now mechanically tractable on macOSalloc_stats_snapshot builtin + RAIL_ARENA_TRACEstdlib/tensor.rail:matmul_mixed) — 2× tighter than all-fp16parallel_rerank.sh validated 7.1× wall-clock at N=8./rail_native quick — 15 critical tests in ~5 sTwo codegen + parser fixes; both gated by 2-pass byte-identical self-bootstrap.
_RAIL_UNDEFINED_IDENT_<name>) instead of silently producing a binary that segfaults at runtime(...)/[...] or before strictly-greater-indented (/[returns_float tracks let-bound floats through V (variable) AST nodes, fixing two latent miscompile bugs in MHD axisym + MPD source-term paths14af7d5d…)Chain-walk to trust store is now the default for https_get_url / https_post_url. Old leaf-only bodies moved under _unsafe_noverify. _strict aliases kept for one release.
asn1_find_spki / ts_has_spki_hashhc_read_random now pipes urandom via stdout instead of writing a tmp file. serve_static returns 400 on .. / \.
RFC 8032 §5.1 verifier clean against TEST 1. Canonical ed_d_bytes LE hex. Flattened mutual recursion in ed_pow_bytes_iter.
TCP_NODELAY + pre-wrapped first-request coalesced with handshake + NST seq advance. P-521 structural clone of P-384 with 33 limbs.
edit_dist O(3n) → bounded. compile_funcs O(N2) → O(N). probe4 195 s → 12 s (16×).
Pre-handshake socket coalescing + body framing prep. Stepping stone to v3.2.0’s streaming bodies and v3.3.0’s keep-alive.
Pure-Rail TLS 1.3 + chain validation. ECDSA-P256/P384, RSA-PSS/PKCS1, SHA-256/384/512, x25519, ChaCha20-Poly1305. socat retired from the fleet.
Bump arena + mark-sweep in ARM64 assembly. Runtime safety for head [] / tail []. 116-test baseline green.