The Prism Compiler
This document describes the prism compiler, from source text to native binary across its three backends.
1. Architecture
Compilation is a pipeline from source text to a native binary. Each phase is a total function over the program, and there are no per-module artifacts.
| Phase | Role | Owner |
|---|---|---|
| Lex | text to tokens, then layout | src/lex/ |
| Parse | tokens to surface AST | src/parse/, src/syntax/grammar.lalrpop |
| Resolve | load imports, canonicalize names, merge | src/resolve/ |
| Desugar | surface sugar to core surface | src/syntax/desugar/ |
| Check | type and effect inference | src/tc/ |
| Elaborate | surface to CBPV / ANF core (match compilation, Section 8) | src/core/elaborate/ |
| Effect lower | remove handlers and operations | src/core/effect_lower/ |
| Reference count | insert dup/drop, then reuse | src/core/fbip.rs |
| Codegen | core to interpreter, LLVM, or MLIR | src/eval/, src/codegen/ |
The driver (src/driver/) exposes these as subcommands: a bare prism <file.pr> compiles a single file to a native binary named after the source (override with -o), prism build compiles the enclosing project (the nearest prism.toml) and fails outside one, prism run interprets, prism check runs the front end only, prism fmt formats, and prism dump <phase> prints an intermediate form, where <phase> is tokens, ast, types, core, core-json (the core as a JSON tree the Lean model reads, Section 13), core-hash (a content-addressed hash of each definition’s elaborated core, src/core/hash.rs), fbip (core after reference-count insertion and reuse), lowered (after effect lowering), llvm, or mlir (the last gated on the MLIR backend feature).
2. Lexing and Layout
The lexer produces a token stream and trivia (comments and spacing) that the formatter uses to reproduce source faithfully. An interpolated string is lexed by re-lexing each { ... } hole at its absolute source offset, so spans inside holes remain accurate. A layout pass then rewrites the stream, inserting virtual block-open, block-close, and separator tokens according to the offside rule of Spec 3.6, which the grammar consumes as ordinary terminals.
3. Parsing
The grammar is an LALR(1) grammar in LALRPOP (src/syntax/grammar.lalrpop), with two entry points: a whole program, and a single expression for the REPL. Parsing produces the surface AST of src/syntax/ast.rs. Type and parse errors are rendered with a source caret.
4. Name Resolution and Modules
Resolution loads every transitively imported module, rewrites each top-level definition to a globally unique canonical symbol (an export as Data.Map.insert, a private as Data.Map@helper), resolves qualified and re-exported references to those symbols, and merges all modules into one flat program. This is a whole-program renamer: the entire program is checked and compiled from source on every build. The canonical-symbol scheme makes the merge sound, since two modules can export the same short name without collision.
Moving to incremental, per-module compilation is planned but not implemented; the interprocedural analyses below (effect lowering, borrow signatures, instance coherence) are what make it nontrivial, since each crosses module boundaries.
5. Desugaring
Desugaring rewrites surface sugar into the smaller core-surface language the checker and elaborator handle (src/syntax/desugar/), each rule shown as surface form and the form it lowers to.
Function composition lowers to a lambda, kept as sugar only so the operator survives formatting.
fn inc(x) = x + 1
fn dbl(x) = x * 2
-- forward composition: inc, then dbl
fn main() = println((inc >> dbl)(10))
An arithmetic sequence lowers to a prelude enumeration call.
fn main() =
println(sum([1..5]))
println(sum([1, 3..9]))
A list comprehension (and the statement for) lowers to a stream (a producer performing the Emit effect, Section 9) that emits each surviving element, collected with scollect (a stream consumer that gathers the emissions into a list), so it fuses with no intermediate list.
fn main() = println(sum([x * x for x in srange(1, 6)]))
A record update rebuilds the constructor along the named fields; on a uniquely owned value the rebuild is the in-place write of Section 10.
type Vec2 = Vec2 { x: Int, y: Int }
fn main() =
let v = Vec2 { x = 1, y = 2 }
let w = Vec2 { ..v, x = 7 }
println(w.x)
println(w.y)
deriving (Lens) synthesizes a getter and a functional setter per field.
-- `deriving (Lens)` synthesizes a getter `<f>_of` and a functional setter
-- `with_<f>` per field. They are ordinary functions, no optic types needed.
-- On a uniquely owned value the setter is FBIP-reused.
--
-- Expected output:
-- 3
-- 7
-- 9
-- 4
type Vec2 = Vec2 { x: Int, y: Int } deriving (Lens)
fn main() : !{IO} Unit =
let v = Vec2 { x = 3, y = 4 }
println(x_of(v))
let v2 = with_x(v, 7)
println(x_of(v2))
let v3 = with_y(with_x(v, 9), 4)
println(v3.x)
println(v3.y)
The failure fallback a ?? b runs a under a Fail handler that yields b if a fails.
-- at_list fails past the end of the list; ?? supplies the fallback
fn main() = println(at_list([10, 20, 30], 5) ?? 99)
A method call e.m(args) is uniform-function-call sugar: the receiver becomes the first argument.
fn inc(x) = x + 1
fn double(x) = x * 2
-- a dot-chain reads left to right: inc(5), then double(...)
fn main() = println(5.inc().double())
A string with interpolation holes becomes a concatenation of its literal pieces and the show of each hole.
fn main() =
let name = "Prism"
let n = 42
println("hello {name}, n = {n}")
try/catch/throw is subtractive handler sugar: one nested final ctl clause (the non-resumable handler clause of Spec 7.2) per arm, each discharging one error label.
error NotFound(String)
fn lookup(k : String) : !{NotFound} Int =
if k == "a" then
1
else
throw NotFound(k)
fn main() =
println(try lookup("a") catch { NotFound(k) => 0 })
println(try lookup("z") catch { NotFound(k) => 0 })
transact body else fallback snapshots every live var, runs the body under a Fail handler, and restores the snapshots on failure, so a failed attempt leaves observable state unchanged.
-- Transactional rollback. A purchase that fails on insufficient funds leaves
-- balance and stock untouched, as if it never happened. The first buy succeeds.
-- The second overspends, so `guard` fails inside the `transact` and both vars
-- roll back to their pre-attempt values.
fn main() =
var balance := 100
var stock := 5
let r1 =
transact
balance -= 40
stock -= 1
guard(balance >= 0)
1
else
0
println(balance)
println(stock)
println(r1)
let r2 =
transact
balance -= 90
stock -= 1
guard(balance >= 0)
1
else
0
println(balance)
println(stock)
println(r2)
Optional chaining a?.b is force(a).b, where force raises fail() on None, so a path short-circuits at the first None and an enclosing ?? supplies the default.
-- The `??` operator runs its left side in a `Fail` context and falls back to
-- the right on failure. It reads as Verse's `or`. The failable accessors
-- at_list / at_map / force raise `fail()` on a miss, so lookups default
-- cleanly. They chain: `a ?? b ?? c` nests right, each `??` its own handler.
-- Optional chaining `a?.b` is `force(a).b`, so a path short-circuits on the
-- first `None` and defaults with `??`. Expected output:
--
-- 10
-- -1
-- 80
-- 8080
-- 443
-- 7
-- 99
-- 42
-- -1
-- -1
type City = City { zip: Int }
type Addr = Addr { city: Option(City) }
fn main() =
let xs = [10, 20, 30]
println(xs.at_list(0) ?? 0 - 1)
println(xs.at_list(9) ?? 0 - 1)
let cfg = map_from_list([("host", 80)])
println(cfg.at_map("host") ?? 8080)
println(cfg.at_map("port") ?? 8080)
println(cfg.at_map("port") ?? cfg.at_map("https") ?? 443)
println(force(Some(7)) ?? 0)
println(force(None) ?? force(None) ?? 99)
-- Optional chaining: the whole path short-circuits if any link is `None`.
let full = Some(Addr { city = Some(City { zip = 42 }) })
println(full?.city?.zip ?? 0 - 1)
let middle = Some(Addr { city = None })
println(middle?.city?.zip ?? 0 - 1)
let outer = (None : Option(Addr))
println(outer?.city?.zip ?? 0 - 1)
A with f <- handler { .. } block binds a first-class handler instance over a fresh private effect; f.op(..) targets it by name.
effect Read {
ctl read() : String
}
-- `with conf <- handler { .. }` binds a first-class handler instance;
-- `conf.read()` dispatches to it by name.
fn main() =
with conf <- handler
read(k) => k("conf.toml")
return r => r
println(conf.read())
A trailing block argument is appended as the call’s last argument.
fn twice(f : (Int) -> Int) = f(f(0))
-- the trailing `fn (x) ...` block becomes twice's last argument
fn main() =
let r = twice(\(x) -> x + 10)
println(r)
A bidirectional pattern synonym desugars to a view call in match position and a make call in expression position.
type Vec2 = Vec2 { x: Int, y: Int }
-- a bidirectional pattern synonym: matches when y == 0, binding x
pattern OnXAxis(x) for Vec2 =
view \(v) -> if v.y == 0 then Some(v.x) else None
make \(x) -> Vec2 { x = x, y = 0 }
fn describe(v) =
match v of
OnXAxis(x) => x
_ => 0 - 1
fn main() =
println(describe(Vec2 { x = 5, y = 0 }))
println(describe(Vec2 { x = 5, y = 3 }))
A nested path update rebuilds the single-constructor spine (the chain of nested constructor cells) along the path.
type Vec2 = Vec2 { x: Int, y: Int }
type Line = Line { from: Vec2, to: Vec2 }
fn main() =
let l = Line { from = Vec2 { x = 0, y = 0 }, to = Vec2 { x = 1, y = 1 } }
let l2 = { l | from.x = 9 }
println(l2.from.x)
deriving (Eq, Ord, Show) generates one structural instance per class.
type Color = Red | Green | Blue deriving (Eq, Ord, Show)
fn main() =
println(show(Green))
println(Red == Red)
The postfix e? unwraps Ok and short-circuits on Err.
fn half(n) =
if even(n) then
Ok(n / 2)
else
Err("odd")
-- `half(n)?` unwraps Ok and short-circuits the chain on the first Err.
fn quarter(n) =
let h = half(n)?
let q = half(h)?
Ok(q)
fn main() = println(result_or(0 - 1, quarter(8)))
The var desugaring is shown with full Source / Desugared / Core stage tabs in Spec 7.4; default and named arguments lower to positional ones in the same pass.
6. Type and Effect Inference
Type inference is the bidirectional, higher-rank algorithm of Dunfield & Krishnaswami (2013) (src/tc/); the surface rules are in Spec 5. Type classes elaborate to dictionary-passing: a constraint becomes a hidden parameter, resolved to a global instance, a passed dictionary, or a projection of a superclass dictionary.
Instances are global, but each records its defining module, so coherence is checked by provenance (src/tc/classes.rs). Resolution is coherent: for each (class, type-head) there is exactly one canonical instance, and implicit resolution always selects it. A single instance for a head is canonical automatically. When two or more instances share a head, one must be designated with a top-level canonical Class(Head) = name declaration (Spec 6.1). An undesignated overlap is a hard error reported at definition, naming the candidates and their modules, with a source caret when they point into the program being compiled. An orphan instance (defined apart from both its class and its head type) is reported as a warning. An explicit override is written at the use site as a trailing using argument, f(args, using name), which changes nothing else’s resolution.
Indexing (a[i], a[i] := v) is resolved the same way the Show and ^ lowerings are, by type-directed dispatch at elaboration: the checker records each sub-expression’s type in a span-keyed table, and the elaborator reads the receiver’s head type back and emits the matching builtin or accessor (Array to at_array/array_set, HashMap to at_hashmap/hm_insert, String to at_byte, List to at_list). A receiver whose type is still an unsolved existential when first synthesized (a var indexed before its initializer fixes its state type) defers to one pass at the end of the declaration, after the initializer has constrained it. No class or type-system extension is involved, so concrete indexing ships today; the desugar target is index/index_set, leaving room for a user-extensible Index class later.
Effect-row inference is principal: each declaration infers its most general row from its body alone. The row unifier in src/tc/subsume.rs discovers every label on its own (a row is a function’s effect set; Spec 5) from direct performs, applied effect-carrying callees, builtin rows, and mask. At a call it adds the callee’s row to the caller’s ambient row (the effect set accumulated for the body so far), and a handler removes the operations it discharges. The row is the single source of truth: there is no separate set-pass seed and no subset reconciliation against one.
A syntactic set-pass (a pass that computes a set of operation labels by a call-graph fixpoint, src/types/effects.rs) still runs, but only to feed the syntactic purity checks: it confirms a konst declaration and a declared-pure instance method perform nothing. It no longer seeds the row. After lowering, reconcile_effects checks the operations the lowered code actually performs against the inferred row, and the interpreter parity oracle (Section 13) is the final backstop. Effect lowering computes its own per-function latent operation set by an independent call-graph fixpoint (Section 9), so the two phases no longer share the set-pass result.
7. The Core Calculus
Elaboration lowers the surface language to a call-by-push-value core (Levy, 2004; src/core/cbpv.rs) in A-normal form. CBPV separates values, which are inert, from computations, which can be run; Thunk freezes a computation into a value and Force runs it. A-normal form names every intermediate result with a Bind, making evaluation order explicit and each operation and allocation syntactically distinguished, enabling the later effect and reference-counting passes. The grammar below is the elaborated core; the reference-count pass (Section 10) later adds dup, drop, and reuse nodes to it.
value -> Var sym
| Int i | I64 i | U64 i | Float f | Bool b | Unit | Str s
| Ctor sym tag [ value, ... ]
| Tuple [ value, ... ]
| Thunk comp -- suspend a computation as a value
comp -> Return value -- produce a value
| Bind comp sym comp -- run, name the result, continue
| Force value -- run a thunk
| Lam [ sym, ... ] comp
| App comp [ value, ... ]
| Call sym [ value, ... ] -- direct call to a top-level fn
| If value comp comp
| Case value [ (pat, comp), ... ]
| Prim op value value -- primitive arithmetic / compare
| Do sym [ value, ... ] -- perform an effect operation
| Handle { body, return_var, return_body, ops } -- install a handler
| Mask [ sym, ... ] comp -- bypass matching handlers
For example, a constructor applied to a call elaborates so the call is named before the constructor is built:
-- surface
fn f(y) = Cons(g(y), Nil)
-- core (every intermediate result is named by a Bind; arguments are values)
Lam [y]
(Bind (Call g [Var y]) x
(Return (Ctor Cons 1 [Var x, Ctor Nil 0 []])))
This calculus is modeled in Lean 4 (de Moura & Ullrich, 2021): models/Prism.lean mirrors the core one variant at a time with a substitution small-step relation, on top of which the model adds an executable abstract machine that mirrors the interpreter and is proved to agree with it. Section 13 describes the model and how it is run as a third differential oracle.
8. Pattern-Match Compilation
A match is compiled to a decision tree (src/core/elaborate/match_compile.rs). The arms form a matrix whose rows are arms and columns are argument positions. The compiler selects a column, partitions the arms by the head of that column’s patterns, and emits a test: a Case on the constructor tag of the scrutinee (the value being matched) for a constructor column, or a chain of equality tests for a scalar column. Wildcard rows form a default sub-matrix shared by the branches that fall through. A guarded arm compiles to a conditional that re-enters the remaining arms when the guard fails. Exhaustiveness, proven by the checker (Spec 9), guarantees every scrutinee reaches an arm.
9. Effect Lowering
Effect lowering compiles away the Handle, Do, and Mask nodes of the core. An operation is delimited control (an effect suspended and resumed within a handler’s scope): Handle is the delimiter, and the resumption k is the continuation captured between a perform site and its handler (Spec 7). The three strategies below are three compilations of that one mechanism, differing in how much of k they make manifest, from nothing to a heap-allocated tree. The compiler tries them in that order and takes the first that applies, so it reifies as little of the continuation as the program allows; a check then confirms no effect construct survives.
Two erasure pre-passes run before the strategy cascade, each recognizing a statically fixed handler shape and rewriting it to direct code, leaving everything else for the strategies. Var erasure (src/core/effect_lower/erase_var.rs) rewrites an escape-checked local var (a closed two-operation State handler, Spec 7.4) to a mutable cell: get becomes a cell read, set a cell write, and the block is wrapped in a fresh-cell allocation. It is sound exactly because the escape analysis proved the var’s continuation is never resumed more than once, so the shared cell and pure-state copies agree; a multishot use disables it. Control erasure (src/core/effect_lower/erase_control.rs) rewrites the internal break/continue/return effects (Spec 8.3), whose final ctl handlers have fixed templates, back to direct control flow. It runs after var erasure, so a pure imperative loop has lost all of its effect operations by the time the cascade classifies it and falls into the trivial pure path (no effect constructs remain), compiling to a musttail loop with no per-iteration allocation.
Evidence passing is the fast path for tail-resumptive handlers (every clause calls k exactly once, in tail position, so the continuation need never be captured at all). Each operation is assigned a stable numeric id by sorting the operation names, and a call-graph fixpoint computes each function’s latent set, the operations still performed anywhere in its call-graph closure. An effectful function then gains one extra parameter per latent operation, ev@<id>, a thunk holding the active handler clause. Performing an operation forces its evidence thunk directly; a handle binds fresh evidence for its body’s latent operations; and every call site appends the callee’s evidence, in ascending id order, so the convention is positional and stable. A first-class thunk that escapes carries evidence parameters for its own latent operations, threaded at each force site. No continuation is reified and no per-operation cell is allocated. What evidence to thread where is computed by an interprocedural least-fixpoint flow analysis (src/core/effect_lower/flow.rs) that derives, for every function, the operation signature of the thunk it returns and of each thunk-valued parameter.
State threading and stream fusion is the path for a uniform single-operation handler, the shape a stream consumer takes: a handler that folds every emit into an accumulator. Such a handler clause is rewritten to an accumulator transformer \acc -> acc', and the producer it wraps becomes a loop that threads the accumulator through each emission instead of allocating a value per step. A consumer that can stop early, like stake, returns a two-state tag (continue or done) that the producer checks, so the loop exits without unwinding. This reifies one small tag cell per early-terminating handler and, like evidence passing, no free-monad cell, so a smap/skeep/stake/ssum pipeline allocates neither an intermediate list nor a per-operation cell.
-- Streams as effects: a stream is a producer performing Emit(a), the
-- transformers smap/skeep/stake are handlers that re-emit, and the consumers
-- ssum/scollect/for are handlers that fold. A dot chain nests the handlers over
-- one producer, with no intermediate lists. stake stops the source early by
-- dropping its continuation. Expected output:
--
-- 220
-- lo
-- hi
-- 0
-- 3
-- 6
fn square(n) = n * n
fn main() =
println(srange(1, 1000).smap(square).skeep(even).stake(5).ssum())
for w in sof(["lo", "hi"]) do
println(w)
let xs = srange(0, 1000000).smap(\(n) -> n * 3).stake(3).scollect()
for x in sof(xs) do
println(x)
The free-monad fallback applies when an effect escapes static tracking: buried in data, dynamically applied, masked, genuinely multishot (a clause that resumes k more than once), or self-referential (a handler whose own body performs the effect it handles). A multishot handler forces this path because the two fast paths erase k, and a continuation invoked more than once must exist as a reusable value. Here the delimited continuation is reified in full: each computation becomes a tree of EPure and EOp cells threaded by ebind (shown below), and k is an explicit closure field of each EOp, so a clause can hold it, drop it, or apply it repeatedly. A handle becomes a generated driver function that case-dispatches the reified tree: an EPure runs the return clause, an EOp whose id the handler names and whose skip count is zero runs the matching clause, and any other EOp is re-emitted outward with a re-entry continuation, which is how an inner handler forwards an operation it does not catch. An EOp carries a skip field, its mask depth, the number of matching handlers it must still bypass; a mask driver increments it and the handler driver only fires when it is zero. This is exactly the interpreter’s dispatch (Section 11), so the two agree by construction. Each EOp allocation bumps the PRISM_EFFOP_STATS counter, so the fallback’s cost is observable, and a default-on warning (silenceable with PRISM_QUIET) names the functions that lost fusion and the cause when a program takes this path, so a pipeline meant to stay fused can be steered back. The generated drivers are closed by construction: a per-handler driver takes exactly its clauses’ captured free variables as parameters, and the fixed-binder templates (ebind, the mask drivers) use a reserved binder band and never nest, so a binder cannot capture a free occurrence. Lowering is local: when an effectful thunk escapes, only the connected component entangled with it (closed over the call graph, but leaving pure closure-inert helpers shared, and over shared operations) is converted to the free-monad form, while unrelated functions stay on their fused paths, provided the component’s operations are disjoint from the rest; when the split is not clean lowering falls back to converting every effectful function together. A convention-boundary check, run in both modes, validates the split and turns a missed monadic/direct boundary into a compile-time internal error.
Constant-stack driving changes how a closed handler on this fallback is run, not what it reifies. By default such a handler is driven by a single self-tail-recursive loop, {n}@region, rather than a pair of mutually recursive driver functions: the loop case-dispatches the same EPure/EOp tree but re-enters itself by a musttail self-call on the resumed continuation, so an iterative or deeply nested resumption runs in constant native stack where the mutually recursive driver grew it per step. Two clause shapes qualify. A tail-resumptive clause (every resume is the head of a tail application) re-drives the operation’s continuation queue with qApply. A function-answer state clause, the parameter-passing pattern whose answer is a function S -> A (rd(u, r) => \s -> r(s)(s), wr(v, r) => \s -> r(())(v)) applied once at the handler’s use site, threads the state in an accumulator parameter and folds that use-site application into the loop’s entry, so the pending-apply chain that would otherwise grow the stack per iteration lives in the accumulator instead. The reification is unchanged, so the per-operation EOp cost stays and the only zero-cell routes remain the evidence and state paths above; the gain is purely that a parameter-passing loop no longer overflows (the bounded-stack performance gate pins a million-iteration State loop completing in a 2 MB stack). An open handler, a multishot or escaping resume, or any clause outside these shapes keeps the mutually recursive driver, whose qApply the loop reuses, so the free-monad machinery is the substrate it drives rather than a thing it replaces. This is on by default and reverts under PRISM_NATIVE_EFFECTS=0; the interpreter oracle’s whole-corpus parity holds byte-for-byte either way.
-- Reified computation tree (the free-monad fallback)
EPure v -- a finished result
EOp id skip arg k -- a suspended operation:
-- id : numeric identity of the operation
-- skip : matching handlers still to bypass (mask depth)
-- arg : the operation argument
-- k : resumption closure \resume_value -> Eff
-- Sequencing extends the tree without forcing it:
ebind (EPure v) f = f v
ebind (EOp id skip a k) f = EOp id skip a (\r -> ebind (k r) f)
The example below exercises this path: an inner handler catches Log and forwards raise outward to an Exn handler, the two effects interleaving across the nesting.
effect Log {
ctl log(Int) : Unit
}
effect Exn {
ctl raise(Int) : Int
}
-- Cross-effect forwarding: an inner handler re-emits what it does not catch.
--
-- work() performs both Log and Exn. The inner handler interprets Log (prints
-- and resumes) but knows nothing about raise. When raise fires inside the Log
-- handler, that handler forwards it outward to the enclosing Exn handler, which
-- discharges it. Resumption threads back through the inner handler, so the two
-- effects interleave across nested handlers. The native free-monad lowering
-- drives this the same way the interpreter does.
fn work() : !{Log, Exn} Int =
log(1)
log(2)
let x = raise(0 - 7)
log(x)
x
fn run() =
handle (handle work() with {
log(n, k) => let _ = println(n) in k(()),
return r => r
}) with
raise(code, k) => code
return r => r
fn main() = println(run())
The fallback reifies one cell per pending operation, so its cost is proportional to the operations in flight; the fast paths avoid it where they apply.
10. Reference Counting and FBIP Reuse
Reference counting runs after effect lowering, over the handler-free core, so it counts evidence parameters and any reified cells as ordinary values. Memory is managed by Perceus-style reference counting (Reinking et al., 2021; src/core/fbip.rs): every parameter and binding is owned and consumed exactly once on every control-flow path from its binding to the end of its scope; a second use inserts a dup and an unused value inserts a drop. Perceus places these operations precisely rather than conservatively at scope exit, which frees a cell at the earliest point the reuse pass below can claim it. Closure captures are borrowed (read without being consumed) and duplicated before a consuming use, as is a borrow parameter (Spec 10). The parameters a function borrows are recorded as a per-function bit vector, its interprocedural borrow signature, which every caller consults to place its dup/drop correctly. Because that signature crosses call sites, it is one of the analyses that complicates the move to separate compilation (Section 4).
The reuse pass then turns drops into in-place updates. When a uniquely owned scrutinee is dropped and the continuation rebuilds a constructor of the same or smaller size, the drop becomes a scoped reuse node, WithReuse { token, freed, body }: it frees the cell once and binds a reuse token over the continuation, and the rebuild spends that token with an in-place Reuse(token, ctor), so map and tree rebuilds mutate the spine in place. The token is a binder that only a Reuse may name, and the rewrite spends it on every control path or declines wholesale (keeping the safe no-reuse body), so freeing a cell once and spending its token at exactly one allocation are well-formedness properties of the term rather than a condition checked afterward.
The fip/fbip annotations (Spec 10) are the fully-in-place discipline of Lorenzen et al. (2023), here static checks layered on these passes. fbip proves zero fresh allocation and a call-graph closure over annotated, allocation-free callees. fip adds two further properties: linearity (each owned binding is consumed at most once, checked on the source term, with scalars exempt because adjusting the count of an unboxed word costs nothing) and bounded stack. The tail-call and tail-modulo-cons (a tail call whose result is wrapped in one constructor) classification (src/core/tailrec.rs) is shared with codegen, so an accepted fip function always lowers to a loop; acceptance never outruns what the backend emits.
-- FP^2 in-place list operations (Lorenzen/Leijen/Swierstra, ICFP 2023),
-- statically checked. The annotation makes the compiler PROVE the function
-- allocates no fresh cell: every `Cons` it builds reuses one it just matched and
-- dropped. Run with PRISM_REUSE_STATS=1 to watch the reuse hits, or
-- PRISM_CHECK_LEAKS=1 to confirm zero live cells at exit.
--
-- `rev_onto` and `bump` are `fip`: linear (each binding used once) AND bounded
-- stack. `rev_onto` is a plain tail call and `bump` is a tail-modulo-constructor
-- (`Cons(.., bump(t))`), so both lower to a constant-stack loop, not recursion.
-- `cap_at` is only `fbip`: its `h` is read by the guard AND rebuilt into the
-- `Cons`, two uses of one value, so it is not linear (its element type is generic
-- `a`, so `h` cannot be assumed an immediate). Zero-allocation still holds.
--
-- Prints:
-- 10
-- 6
-- 22
-- Reverse onto an accumulator, the canonical fip. Each input `Cons` is matched,
-- freed, and immediately reused as the next accumulator cell, so the reversal
-- runs in place with zero allocation, and the tail call makes it a loop.
fip fn rev_onto(xs, acc) =
match xs of
Nil => acc
Cons(h, t) => rev_onto(t, Cons(h, acc))
-- The seeding wrapper is not fbip: that initial `Nil` is a genuine fresh
-- allocation with no cell to reuse, so it stays un-annotated.
fn reverse_ip(xs) = rev_onto(xs, Nil)
-- Spine-rebuilding map: drop each `Cons`, rebuild it around the bumped head.
fip fn bump(xs) =
match xs of
Nil => Nil
Cons(h, t) => Cons(h + 1, bump(t))
-- Saturating transform: rebuild every cell, head chosen by a test. Both `if`
-- branches end in a `Cons` reusing the same freed cell, so the reuse credit is
-- spent exactly once on every path. (A `filter` that DROPS cells could not be
-- fbip: the discard path frees a cell with no local allocation to reuse, and
-- this runtime has no cross-call reuse credit, so it would allocate.)
fbip fn cap_at(xs) =
match xs of
Nil => Nil
Cons(h, t) =>
if h > 9 then
Cons(9, cap_at(t))
else
Cons(h, cap_at(t))
fn main() =
println(sum(reverse_ip([1, 2, 3, 4])))
println(sum(bump([1, 1, 1])))
println(sum(cap_at([1, 20, 3, 40])))
11. Backends
Prism has three backends over one core: a tree-walking interpreter that is the reference oracle, and two native backends that must match it byte for byte. The native backends share a single generic emitter, so the differences below are narrow.
11.1 The Interpreter
The tree-walking interpreter (src/eval/) is a flat CEK (control, environment, continuation-stack) machine. Pending work lives on an explicit heap stack of frames rather than the host call stack, so object-program recursion never overflows it. A frame is one of: Bind (await a result, then continue with the rest of a sequence), Args (await a function before applying it), Handle (an installed handler), Mask (a masking frame), and Restore (unwind a name binding; a Restore already on top marks tail position, which is where the machine recognizes a tail call).
This machine makes the delimited continuation of Spec 7 concrete: performing an operation searches the frame stack outward for a matching Handle, decrementing the skip count past masked frames, and the captured continuation is exactly the slice of frames between the do and that handler, the handler included. Resuming pushes a clone of that slice back onto the stack, so the same resumption can be pushed again, which makes k multishot. The free-monad backend reifies this same frame slice as the k closure of an EOp (Section 9); evidence passing never materializes it.
11.2 The Shared Emitter
Both native backends drive one generic emitter (src/codegen/emit.rs) behind an Isa trait that abstracts instruction emission, so they differ only in instruction spelling. The emitter owns case dispatch, constructor allocation and reuse, and tail-call lowering: a self-tail call of equal arity becomes a musttail loop, and a constructor- or accumulator-shaped tail call (one whose result feeds a constructor or an integer accumulator) becomes a destination-passing loop, one that writes its result into an address passed as a hidden parameter rather than returning it, using the same classification the fip check reads (Section 10).
11.3 LLVM
The LLVM backend (src/codegen/llvm.rs) implements Isa over inkwell, emitting LLVM IR that clang compiles and links against the runtime. This is the default native path.
11.4 MLIR
The MLIR backend (src/codegen/mlir.rs) implements the same Isa by writing textual MLIR in the llvm dialect. Sharing the emitter makes its output byte-identical to the LLVM backend’s, which the parity gate (Section 13) enforces.
11.5 WebAssembly
The compiler front end and the interpreter also compile to WebAssembly (src/wasm.rs), so Prism type-checks and runs in the browser. This target hosts the interpreter, not the native code generators; the LLVM and MLIR backends are absent there.
12. The Runtime
The C runtime (runtime/prism_rt.c) is linked with the code each backend emits. It assumes an LP64 target (64-bit pointers and long) and uses mimalloc when available. The data representation below is shared by the backends and the runtime.
12.1 Value Representation
Every value occupies one 64-bit word, tagged by its low bit so that a single representation serves both scalars and pointers under polymorphism.
A Prism value is one 64-bit word, distinguished by its low bit:
...xxx1 immediate: a small integer or other unboxed scalar (value in the
high 63 bits; a small Int is (n << 1) | 1)
00000000 unit (the zero word)
...xxx0 a pointer to a heap cell (word-aligned, so the low bit is clear)
prism_rc_inc / prism_rc_dec take the raw word: an immediate or unit is skipped
without a dereference, so dup and drop are no-ops on non-cell values under
polymorphism.
A float does not fit the immediate scheme, so it is boxed: wrapped in a one-field cell holding the raw double bits, which are read back out (unboxed) at every float operation. Boxing makes a float field self-describing, so the collector frees it without interpreting its payload.
12.2 Cell Layout
A heap cell is a three-word header followed by its fields.
offset field meaning
0 refcount number of live references to this cell
8 tag constructor tag; reserved values mark String and bignum cells
16 arity number of fields (or byte length for a String)
24 fields arity words, each a value or pointer (UTF-8 bytes for a String)
Constructor tags follow declaration order (for Option(a) = None | Some(a), None is 0 and Some is 1). Two tag values are reserved, 0x53545200 for a string and 0x42494700 for a bignum (Section 12.5), marking cells whose payload is raw bytes or limbs rather than child values; the collector and the reuse pass (Section 10) read the tag to avoid recursing into them.
12.3 Reference Counting
prism_rc_inc and prism_rc_dec take the raw value word and return immediately on an immediate or unit, so counting is a no-op on non-cell values. Decrement to a nonzero count just decrements. Decrement to zero frees the cell, but freeing is iterative, not recursive: the dead cell’s now-zero refcount word is reused as a link field in an intrusive worklist of cells pending free, so a structure of any depth is reclaimed in constant auxiliary space without growing the C stack. A string or bignum tag short-circuits the child traversal.
12.4 In-Place Reuse
The reuse pass of Section 10 emits two runtime calls. prism_reuse_token(v) inspects a cell about to be dropped: if it is uniquely owned (refcount 1), it drops the cell’s children and returns the shell as a token, leaving the live-cell count untouched; otherwise it decrements and returns null. prism_reuse_alloc(token, n) overwrites the token’s header for the new constructor when the token is non-null, and falls back to a fresh allocation when it is null. A uniquely owned spine is therefore mutated in place, and a shared one transparently copies.
12.5 Integers
A small integer is an immediate, (n << 1) | 1. An operation whose fixed-width result would overflow promotes to a bignum: a cell tagged 0x42494700 storing the value in sign-magnitude form (sign and magnitude kept separate). Its header word is a signed limb count whose sign is the value’s sign; the magnitude follows as that many little-endian u64 limbs (base-2^64 digits) with no leading zero limb. Zero is a count of zero with no limbs. Each surface arithmetic operation takes a fast path on two immediates with a checked-overflow primitive and falls back to magnitude routines (add, subtract, multiply, and a shift-subtract long division) that renormalize the result, demoting back to an immediate when it again fits. The surface Int is this unbounded integer. The I64 and U64 lanes are raw machine words and wrap rather than promote.
12.6 Strings
A string is a cell tagged 0x53545200 whose field words hold its UTF-8 bytes inline, length-prefixed by the arity word and NUL-terminated for C interop. Each string the program builds, including a literal at each use, is a counted cell, so the leak counter (Section 12.7) accounts for strings like any other allocation. Two indexing families coexist: char_at, substring, and str_len work in Unicode codepoints, walking the UTF-8 encoding (and so are O(n)), while byte_at and byte_len give O(1) raw-byte access for a scanner or hash.
12.7 Instrumentation
Three environment-gated counters report to stderr at exit, leaving stdout (the parity-checked channel) untouched. PRISM_CHECK_LEAKS reports the live-cell balance, which a clean run drives to zero. PRISM_REUSE_STATS reports how many cells the reuse pass rewrote in place. PRISM_EFFOP_STATS reports how many free-monad EOp cells were allocated, which the performance gate asserts is zero on the fusion corpus.
12.8 Growable Arrays
The growable Array(a) (Spec 12) is an ordinary cell, { rc, tag 0, arity cap+1, len, elem0 .. }, with the length word stored odd-tagged (low bit set, so the collector skips it as an immediate per Section 12.1) and unused slots held at zero. Because it is a normal cell, reference counting recurses into its live elements with no special case. Every array operation borrows its array argument. array_get returns a counted element; array_set, array_push, and array_pop write in place when the array is uniquely owned (refcount 1) and copy otherwise, so functional array code runs as mutation exactly when ownership permits. array_push doubles the capacity when full, making appends amortized O(1). The prelude’s HashMap is a separate-chaining hash table layered on this array, with an FNV-1a hash written in Prism (so iteration order is a deterministic function of the inserts); it is library code, not a runtime primitive.
12.9 Primitive Sort
sort is a runtime primitive (prism_sort_prim) that borrows a list and returns it sorted, dispatched on a key kind. Arbitrary-precision Int keys use a bignum-aware stable bottom-up merge sort, ping-ponging between two buffers; fixed-width keys use a radix sort over a derived key. When the input spine is uniquely owned, the sorted heads are written back into the existing cells with no allocation; a shared spine is copied with its elements shared. The Cons and Nil tags are read off the input spine, so no list layout is baked into the runtime.
12.10 Input, Output, and Randomness
The runtime provides the impure builtins. read_int and read_line read stdin; read_file, write_file, append_file, and file_exists operate on files; getenv reads the environment; system runs a shell command and returns its exit code; eprint and eprintln write to stderr, leaving the parity-checked stdout untouched; and args_count and arg (wrapped by the prelude’s args) read the command line. Randomness is a SplitMix64 generator: rand advances it and srand seeds it, so a seeded run is deterministic and reproducible. Because these touch the world, the parity harness (Section 13) runs only the programs that avoid them.
13. Verification
Two test gates hold the implementation to its claims. The parity harness (tests/parity.rs) is differential testing with the interpreter as the reference: it runs every example on the interpreter and each native backend and asserts byte-identical output, and with PRISM_CHECK_LEAKS set, zero leaked cells.
The performance gate (tests/perf_gate.rs) asserts that the optimizations actually fire, so a regression that leaves output unchanged is still caught. With PRISM_EFFOP_STATS set, it requires zero free-monad cells allocated on the fusion corpus (the stream and multi-handler programs such as streams.pr), confirming that the evidence and state paths of Section 9 reify nothing. It also pins local monadification: a program that pairs an escaping effectful closure with an unrelated fused pipeline must allocate no more cells than the escape alone, so the pipeline stays fused despite the escape. With PRISM_REUSE_STATS set, it requires in-place reuse to fire on the reuse corpus (list.pr), confirming the reuse pass of Section 10 rewrites drops into in-place updates.
13.1 The Lean Model
Beyond the differential gates, the core calculus is mechanized in Lean 4 (the models/ directory, built with lake). Prism.lean defines the syntax and a substitution-based small-step relation Step with its determinism theorem (Step.deterministic). CEK.lean then defines the abstract machine the compiler actually runs (Section 11.1): an environment machine with a continuation stack, Rv runtime values carrying closures and thunks, curried application, and the deep, mask-aware handler capture that makes resume multishot. The machine is a total, executable step function, so it is deterministic by construction and runnable.
The model’s central theorem connects the two. A big-step natural semantics specifies what a program evaluates to, and bigstep_runs proves the machine implements it (a forward simulation under any continuation stack), so the abstract machine is a faithful realization of the specification rather than an independent artifact. Meta.lean adds the supporting metatheory: a unique-normal-form corollary, substitution lemmas, and a progress trichotomy (every computation is a value, takes a step, or is an explicit Stuck error, with stuckNoStep confirming the classification is a genuine partition). Dynamics.lean covers the effect machinery, proving the machine reaches a handler exactly when one is in scope (effect_progress) and is stuck on an unhandled operation otherwise (effect_unhandled). Every theorem is sorry-free; the proofs depend only on propext/Quot.sound, with Classical.choice added only where the model evaluates IEEE floats (whose arithmetic Lean defines non-constructively).
13.2 The Model as a Differential Oracle
The Lean machine is run as a third oracle alongside the interpreter and native backends. prism dump core-json <file> serializes the elaborated core to a JSON tree (src/core/json.rs), which models/Json.lean decodes back into the Lean syntax, and the oracle executable (models/Oracle.lean) runs the verified machine on it and prints the result, rendering floats through a port of the runtime’s fmt_g so output is byte-identical. models/diff_against_rust.sh pipes each fixture through prism dump core-json | oracle and compares against prism run, so the verified model is checked against the interpreter on the compiler’s actual core, not a hand-transcription. models/Certificates.lean records the same agreements as kernel-checked rfl theorems for the curated set. The grammar in the specification is itself single-sourced from models/grammar.ebnf.
14. The Interactive Shell
Running prism with no arguments starts a read-eval-print loop (src/repl/) backed by the interpreter of Section 11. It is a typed REPL: an entered expression is parsed through the expression entry point of Section 3, inferred, elaborated, and evaluated, and its type and effect row are shown above the value.
A session accumulates state. An expression is evaluated and its result bound to it; a let binds a name for reuse; and a top-level fn, type, class, instance, or effect declaration is added to the session so later input sees it. Declarations entered for a name shadow earlier ones.
Commands begin with :; any unambiguous prefix resolves to its command, GHCi-style, so :r is :reload and :lo is :load.
| Command | Action |
|---|---|
:type e | show the type and effect row of expression e |
:kind T | show the kind of a type constructor |
:info n | describe a binding, type, or class |
:browse | list the bindings this session has added over the prelude |
:core | dump the lowered core IR of the session |
:load f | load declarations from a file, making it the active file |
:reload | re-read the active file from disk |
:edit [f] | open a file (or a scratch buffer) in $EDITOR, then load it |
:set [+-]flags | toggle options; bare :set lists them |
:quit | leave the shell |
Two :set toggles exist: t (types) shows the inferred type and effect row of each result, on by default, and s (timing) reports evaluation time. A multi-line block runs between :{ and :}, or is auto-detected when a line opens a layout block that is not yet closed.
15. The Formatter
prism fmt parses a file to the surface AST and prints that tree back to text. It is not a token reflow: parsing discards whitespace, so the printer reconstructs all layout from scratch, and the only things carried across from the original are the trivia the lexer recorded (Section 2) and, as a last resort, raw byte ranges. An already-formatted file is a fixed point, which is what prism fmt --check verifies by comparing the printed output to the input byte for byte. The implementation lives in src/fmt/, with mod.rs driving expression and block layout, decl.rs and pat.rs handling declarations and patterns, and ops.rs the operator-precedence parenthesization.
Trivia reattachment. Comments and blank lines are not nodes in the AST. The lexer records them in marginalia’s TriviaTable, a side table keyed by byte offset that the parser fills alongside the token stream. The printer queries that table by source range: between(lo, hi) yields the trivia events lying in a gap and after(end) yields whatever trails the final declaration. Because every AST node carries its source span, “the comment above this arm” or “the blank line between these two statements” is recovered as the trivia lying in the gap between one node’s end offset and the next node’s start offset. This is why trivia survives not just between top-level declarations but inside function bodies, handler arms, and match arms, each printer threads the byte offset where the previous construct ended and re-emits the comments found before the next one begins. Line comments and deliberate blank-line grouping round-trip; block comments carry no placeable layout and are dropped.
Two layout engines. Expressions use a hand-rolled width model: each node is first rendered inline by fmt_expr_inline, and if that string fits the 80-column budget at the current indent it is kept, otherwise the node falls back to a broken multi-line form via fmt_expr_break. Patterns instead reuse marginalia’s Doc combinator engine (block, concat, comma, pretty_at), which makes the same fit-or-break choice generically over their nested constructor, tuple, and record structure. The two coexist because expression breaking is context-sensitive (statement position, offside blocks, trailing-lambda calls, operator precedence) in ways the generic combinators do not capture, whereas patterns are purely structural and a stock pretty-printer suffices.
Layout versus flat. A Mode is threaded through every printer. Layout emits Prism’s offside blocks, the indentation-significant let chains and match/handle/if/for bodies. Flat is for bracketed contexts (tuples, argument lists, an inline match { ... }) where the virtual layout tokens are suppressed, so only inline let ... in and braced arms are legal there. In statement position match and if always lay out across lines even when they would fit on one, since stacked arms and branches read better, matching how those forms are written in other languages.
Surface restoration. Desugaring (Section 5) rewrites several surface forms into a smaller core before formatting ever sees them: UFCS recv.f(x) becomes f(recv, x), e? and string interpolation become marker calls, and pattern-let and ?-binding become single-arm matches. The parser tags each lowered node with a synthetic span or a synth flag, and the formatter keys off those flags to reprint the original surface rather than the lowered shape. This is what makes formatting idempotent: the printed output desugars back to the same core, so a second pass changes nothing.
Never destroy code. The formatter borrows the original source for one purpose, a verbatim fallback: any node it cannot otherwise print is emitted as its exact original byte range. Together with parse-or-fail (an unparseable file yields an error, never a mangled rewrite), this guarantees prism fmt is information-preserving even on constructs the printer does not yet model.