Introduction
Welcome to From Zero to QED, an informal introduction to formality in Lean 4. This article series teaches the language from first principles. Lean is expressive but the learning resources remain scattered and incomplete. This series is a best effort to fill that gap.
Note
This is the beta release. There are bound to be typos, errors, and rough edges. If you spot something, send a PR on GitHub.
Tip
This article is itself a giant checkable theorem. Every code sample, every proof, every definition is extracted from source files that the Lean compiler typechecks on every build. If the article compiles, the theorems are valid. The full source lives in the GitHub repository.
What This Series Covers
The series divides into two arcs. The first arc treats Lean as a programming language. You will learn the syntax, type system, control flow, polymorphism, monads, and IO. By the end of this arc you can write real programs in Lean.
The second arc treats Lean as a theorem prover. You will learn to write proofs, understand type theory and dependent types, master tactics, and eventually prove classic mathematical results. The series concludes with the emerging intersection of theorem proving with artificial intelligence, and why formal methods may matter more in the coming decade than they have in the previous five.
No prior experience with theorem provers is assumed. Familiarity with a typed functional language like Haskell, OCaml, or Scala helps but is not strictly required.
Getting Started
There are several ways to follow along with the examples, from zero-install browser options to full local setup.
Option 1: Browser
Lean Live runs Lean 4 in your browser with no installation. Copy code snippets from the text and paste them into the editor. For compatibility with examples in this series, set the toolchain to leanprover/lean4:v4.26.0 and Mathlib to v4.26.0 in the settings. Some later articles require Mathlib, which Lean Live supports but loads slowly on first use.
Option 2: One-Click Cloud Environment
Launch a complete Lean 4 environment in your browser with no local setup:
- Open in GitHub Codespaces - Free for 120 core-hours/month
- Open in Gitpod - Free tier available
Both options provide VS Code in the browser with Lean 4, the language extension, and all dependencies pre-installed. The environment runs lake exe cache get automatically on startup to download prebuilt Mathlib artifacts.
Option 3: Dev Container (Docker + VS Code)
If you have Docker and VS Code installed locally, clone the repo and open it in VS Code:
git clone https://github.com/sdiehl/zero-to-qed
code zero-to-qed
VS Code will detect the .devcontainer configuration and prompt you to “Reopen in Container”. This builds the same environment locally, giving you cloud-like convenience with local performance.
Option 4: Local Installation
For the full experience, install Lean 4 with VS Code and the Lean 4 extension. Other editors work too (Zed, Emacs, Neovim all have Lean support) but VS Code is the best documented and most widely used. Clone the repository:
git clone https://github.com/sdiehl/zero-to-qed
cd zero-to-qed
lake exe cache get # Download prebuilt Mathlib (saves hours)
lake build
The lake exe cache get command downloads prebuilt artifacts for Mathlib, reducing the initial build from hours to minutes. Without it, Lake compiles Mathlib from source, which tests your patience more than your code.
You can also serve the documentation locally with just serve if you have mdBook installed.
Reading Paths
Different readers come to this material with different goals. Here are suggested paths through the series:
Complete beginners to typed functional programming: Read linearly. Arc I builds the programming foundation you need. Do not skip ahead to proofs until you are comfortable with pattern matching, recursion, and type classes. The concepts in Polymorphism are essential for understanding how Lean’s type system works.
Systems programmers wanting verification: Read Arc I thoroughly since you will use these features in production code. In Arc II, focus on Proofs, Proof Strategy, Verified Programs, and Model Checking. The Type Theory article provides the foundation but can be revisited as needed.
AI researchers interested in theorem proving: After covering the basics, jump to Proofs, then Tactics Reference, and finally Artificial Intelligence. The intermediate articles on type theory and algebraic structures can wait until you need them for specific formalization tasks.
Mathematicians new to programming: Start with Basics and Control Flow to learn Lean as a language, then proceed linearly through Arc II. You may skim Effects and IO on first reading since they focus on computational side effects rather than proof.
Article dependencies: Most articles build on previous ones, but some can be read independently. Classic Proofs requires only Proofs and Proof Strategy. Algebraic Structures requires Type Classes. Mathlib requires familiarity with tactics from earlier articles but not deep type theory.
Repository Structure
Code samples are extracted from Lean source files. Each article corresponds to modules in the src/ directory:
| Article | Source File |
|---|---|
| Basics | src/ZeroToQED/Basics.lean |
| Data Structures | src/ZeroToQED/DataStructures.lean |
| Control Flow | src/ZeroToQED/ControlFlow.lean |
| Polymorphism | src/ZeroToQED/Polymorphism.lean |
| Effects | src/ZeroToQED/Effects.lean |
| IO | src/ZeroToQED/IO.lean |
| Proofs | src/ZeroToQED/Proving.lean |
| Type Theory | src/ZeroToQED/TypeTheory.lean |
| Tactics | src/ZeroToQED/Tactics.lean |
Larger examples live in src/Examples/:
| Example | Source File | Run Command |
|---|---|---|
| Magic: The Gathering | MagicTheGathering.lean | lake exe mtg |
| D&D Character Generator | DndCharacter.lean | lake exe dnd 42 |
| ATM Withdrawal | ATM.lean | lake exe atm |
| Parser Combinators | ParserCombinators.lean | lake exe parsers |
| Game of Life | GameOfLife.lean | lake exe life |
| Stack Machine | StackMachine.lean | lake exe stack |
| Circuit Breaker | CircuitBreaker.lean | cargo test -p circuit-breaker |
Open these files in VS Code to explore with full IDE support. The Infoview panel shows types and proof states as you navigate.
Additional learning resources are collected in the References appendix. This series is an informal introduction to formality. If you want the stuffy formal introduction to formality, see Theorem Proving in Lean 4, Functional Programming in Lean, Mathematics in Lean, or university courses from CMU, Imperial, and Brown. They are more rigorous.
Why?
This article covers motivation and context. If you already know why you want to learn Lean, skip to Theorem Provers or directly to Basics.
Software keeps getting more complex. Every year brings more dependencies, more attack surface, more emergent behavior no one designed or intended. We build systems by stacking abstractions, trusting that each layer does what it claims. And this mostly works, except when it doesn’t. Sometimes an algorithm trades against itself and the stock market loses a trillion dollars in thirty-six minutes. Sometimes a missing bounds check in a cryptographic library lets attackers read arbitrary memory from half the servers on Earth. The complexity is the natural consequence of building systems larger than any individual can hold in their head, maintained by teams that turn over faster than the code does.
And now we are increasingly automating the production of code itself. Large language models now generate plausible programs at unprecedented scale. They have read every GitHub repository, every Stack Overflow answer, every tutorial and textbook. They produce code that looks right, compiles often, and works sometimes. They are plausibility generators: statistical engines that have learned what code typically looks like without understanding what code actually means. The gap between “looks correct” and “is correct” has always mattered. It is about to matter more.
One response is to build richer formal systems that can constrain the chaos. Languages where the compiler verifies rather than merely translates. Type systems that encode invariants the programmer used to track mentally. Proof assistants that check whether code does what it claims, mechanically and exhaustively. The map must match the territory, and these tools force you to make the correspondence explicit. They also give us something to point at the output of language models: a verifier that can check whether generated code satisfies its specification, turning plausibility into proof.
Lean sits at this intersection: a theorem prover backed by a million-line mathematical library and a general-purpose programming language fast enough for production use. You can prove theorems about prime numbers in the morning and write a web server in the afternoon, using the same language, the same tools, the same mental model. Write a claim, and the compiler tells you whether it holds. No trust required. No hand-waving accepted.
If Dwarf Fortress is the most complex simulation ever built and Factorio is crystallized obsession in game form, then Lean is what happens when you point that same energy at the foundations of mathematics. Like Dwarf Fortress, losing is fun, because even a failed proof attempt teaches you something about why it failed. Like Factorio, there is always one more lemma to optimize, one more proof to refactor, one more elegant solution lurking just beyond the current mess.
So why learn Lean then? Maybe you want to get a taste of what the future of mathematics looks like. Maybe you want to learn what the engineers at top quant firms and AI labs are building, because that seems like a reasonable bet on where things are heading. Maybe you want to build high-assurance software to model and trade markets. Maybe you have no practical reason at all and just want to learn something new. All of these work. There is no entrance exam.
There are also practical reasons. Start small: prove that a parser handles every edge case, that a compiler preserves meaning, that a refactor did not silently break an invariant. These are theorems about code you actually write. Scale up from there. The world runs on systems that allocate resources: power grids, packet routing, markets matching buyers and sellers. A bug in market infrastructure misallocates capital, distorts prices, rewards the wrong participants. The matching algorithm that assigns medical residents to hospitals, the slot allocation system that schedules aircraft across congested airspace, high-assurance trading systems: these are theorems waiting to be verified. The gap between “we tested it extensively” and “we proved it correct” is the gap between confidence and certainty.
Most people do not need dependent type theory or formal verification for their day jobs. But necessity is a boring criterion for what to learn. Humans decipher Linear B, collide particle jets to conjure baryons from vacuum, compose fugues, map the cosmic microwave background to the first seconds after time began, and verify QED’s prediction of the electron magnetic moment to twelve decimal places. None of these have direct economic outcomes (yet), yet we do them anyway. Much of this text is about similar pursuits that are still early and straddle the line between practical and theoretical, but this is where the future is always born.
But what makes this cluster of ideas so rewarding is the overarching big ideas and trends. When a proof type-checks, it is correct. The compiler does not care about your reputation or your confidence. It cares whether the logic holds. You discover what you actually understand versus what you thought you understood. A proof assistant is a bicycle for the mind: it amplifies your ability to think correctly.
Lean will not let you wave your hands. It will not accept “it is obvious that”, “because of abstract nonsense” or “this margin is too narrow” as justifications. The feedback loop is immediate and unforgiving. This can be frustrating, but it is also what makes the successes satisfying. When a proof finally compiles, you know it works.
There is a kind of peace in this. The world outside is full of claims that cannot be verified, arguments that cannot be resolved, systems that fail in ways no one predicted. Inside a proof assistant, the rules are clear. A theorem holds or it does not. The machine tells you which. You can build something that will still be correct in a hundred years, long after the context that motivated it has faded. The difficulty is part of the appeal. Easy things do not teach you much, and they do not last.
If this sounds interesting, keep reading. If not, there are plenty of other things worth learning.
Why Lean Specifically?
Among proof assistants, several mature options exist. Coq has the largest ecosystem and decades of industrial use. Isabelle offers powerful automation and a massive library of formalized mathematics. Agda provides elegant dependent types with a minimalist core. Each has devoted communities and hard-won expertise. The choice between them involves tradeoffs: Coq’s tactic language is battle-tested but showing its age; Isabelle’s automation can feel like magic until it fails mysteriously; Agda prioritizes purity over pragmatism.
Lean 4 occupies a distinctive position. The system emerged from Microsoft Research in 2013 and has evolved through four major versions. Lean 4, released in 2021, was a ground-up rewrite that reimagined the tool as both proof assistant and practical programming language. The implementation is largely written in Lean itself, a feat of bootstrapping that demonstrates the language’s capabilities. The mathematical library, Mathlib, contains over a million lines of formalized mathematics spanning algebra, analysis, topology, number theory, and beyond.
Lean has better tooling than other theorem provers. The error messages are informative. The IDE integration works. You can ask for hints, search the library, and see exactly what remains to prove.
This article series is my (admittedly flawed) attempt to take you from zero knowledge of Lean to writing your own proofs and programs, with enough depth to tackle real problems. Whether you finish it or abandon it halfway through, you will have spent your time on something worthwhile.
Theorem Provers
This article covers the history and landscape of theorem provers. If you do not care about context, skip to Basics.
The idea of mechanizing mathematical reasoning dates back centuries, but the modern era of theorem proving began in the 1960s and 1970s when researchers first attempted to implement formal logic on computers. These early systems were primitive by today’s standards, but they established the fundamental insight that proofs could be represented as data structures and verified by algorithms.
A note on terminology: automated theorem provers attempt to find proofs without human guidance. Proof assistants (also called interactive theorem provers) have humans construct proofs while the machine verifies each step. Lean is a proof assistant with enough automation that the line blurs. This article uses the terms loosely, as most practitioners do.
Early Systems
The first generation of theorem provers emerged from two distinct traditions. One tradition, exemplified by systems like Automath developed by Nicolaas de Bruijn in the late 1960s, focused on encoding existing mathematical proofs in a formal language that a computer could check. De Bruijn’s work introduced many concepts that remain central to modern systems, including the idea that types could depend on values and that propositions could be represented as types. The other tradition focused on automated theorem proving, attempting to have computers discover proofs on their own through search procedures. While fully automated proving remains intractable for most interesting mathematics, techniques from this tradition inform the automation tactics available in modern proof assistants.
The 1980s saw the development of several influential systems. The Calculus of Constructions, introduced by Thierry Coquand and Gérard Huet, provided a unified foundation combining dependent types with a hierarchy of universes. This calculus became the theoretical basis for Coq, which remains one of the most widely used proof assistants today. Coq pioneered many features now standard in the field, including tactic-based proof development, extraction of executable programs from proofs, and a module system for organizing large developments. Major verification efforts in Coq include the CompCert certified C compiler and the mathematical proof of the four color theorem.
The LCF Tradition
Around the same time, researchers in Edinburgh developed the LCF system and its descendants, which introduced the influential LCF architecture. In this design, there is a small trusted kernel that defines what constitutes a valid proof, and all proof construction must ultimately pass through this kernel. This approach provides strong guarantees because only the kernel needs to be trusted, while tactics and automation can be implemented in untrusted code. The HOL family of provers, including HOL4 and Isabelle/HOL, descend from this tradition. Isabelle in particular has been used for major verification efforts including the seL4 verified operating system kernel.
First-Order Theorem Proving
A parallel tradition built theorem provers on first-order logic rather than type theory. The Boyer-Moore family of provers, culminating in ACL2, used an untyped computational substrate based on Lisp with powerful automation heuristics for discovering proofs. ACL2 achieved notable industrial successes, including verification of AMD’s floating-point division after the Pentium FDIV bug made hardware correctness suddenly interesting to executives.
Despite these successes, first-order theorem proving has not been widely adopted outside specialized industrial applications. First-order logic imposes an expressiveness ceiling that makes formalizing modern mathematics awkward. Without dependent types, you cannot easily express properties like “a vector of length n” or “a sorted list.” These systems rely heavily on opaque automation heuristics rather than user-programmable tactics, which makes it harder to understand why proofs fail and how to fix them. Most importantly, there is no Curry-Howard correspondence linking proofs to programs, which means verified algorithms cannot be extracted into executable code.
The contrast is instructive. Type-theoretic systems grew ecosystems of thousands of users, million-line mathematical libraries, and active research communities. First-order provers remained specialized tools for specific classes of problems. The Curry-Howard insight that proofs are programs and types are propositions turned out to be generatively powerful in ways that first-order theorem proving was not. When you can express your specification, your implementation, and your correctness proof in the same language, each informs the others. This unity is what makes dependent type theory feel like mathematics rather than a checkbox.
Dependent Type Theory
The development of Martin-Löf type theory in the 1970s and 1980s provided another foundational framework that influenced systems like Agda and later Idris. Per Martin-Löf’s intensional type theory emphasized the computational content of proofs and introduced identity types as a way to reason about equality. Agda, developed primarily at Chalmers University, implements a variant of this theory with sophisticated support for dependent pattern matching. Its syntax influenced Lean’s design, and it remains popular for research in type theory and programming language semantics.
Idris took a different approach by prioritizing practical programming with dependent types rather than theorem proving per se. Idris demonstrated that dependent types could be integrated into a language designed for general-purpose programming, with features like implicit arguments and type-driven development making dependently typed code more accessible to working programmers. Many of these ergonomic innovations influenced Lean 4’s design.
The 2010s brought renewed interest in the foundations of mathematics through homotopy type theory, which reinterprets types as spaces and equality as paths. This perspective, developed by Vladimir Voevodsky and others, led to new proof assistants like Cubical Agda that implement univalent foundations. While Lean does not natively support cubical type theory, the mathematical insights from this research have influenced how the community thinks about equality and transport.
The Ecosystem Problem
Most theorem provers die the same death: they work, but nobody uses them. The software is correct, the theory is sound, the papers get cited, and then the maintainer graduates or retires and the codebase rots. This is not a failure of engineering. It is a failure of ecosystem.
A theorem prover without a library is a programming language without packages. You can write everything from scratch, but you will not. The activation energy is too high. Automath proved you could encode mathematics in the 1960s. Mizar built a large library but with a syntax that looked like it was designed to repel newcomers. The Boyer-Moore provers achieved industrial success at AMD but never grew a community. Each system had technical merit. None achieved escape velocity.
The systems that thrive today share a common pattern: a killer application that proved the approach could work at scale. Coq had CompCert. Isabelle had seL4. These existence proofs mattered. When someone asked “can you build anything real?” there was an answer.
Lean’s Position
Lean emerged from this rich history. The first version was developed by Leonardo de Moura at Microsoft Research starting in 2013, with the goal of building a system suitable for both interactive theorem proving and automated reasoning. Lean 2 and Lean 3 refined the system and built a substantial mathematical library. Lean 4, shipped in 2021, was a ground-up rewrite with several things done right.
Speed. Lean 4 compiles to C and runs fast. Not “fast for a theorem prover” but actually fast. You can write command-line tools, build systems, even games. This matters because proof assistants are IDEs, and IDE responsiveness determines whether people finish their proofs or give up.
Metaprogramming. Lean 4’s tactic framework is written in Lean itself. You can inspect it, modify it, and write your own tactics without learning a separate metalanguage. In Coq, the tactic language (Ltac) is a different beast from the term language. In Lean, tactics are just programs.
Syntax. Lean looks like a normal programming language. Functions are functions. Pattern matching works how you expect. Unicode is optional. Lower friction means more users.
Mathlib. At 1.9 million lines, Mathlib is the largest coherent mathematical library ever created. When people ask “can I formalize real mathematics?” the answer is: probably someone already did, go look it up. Mathlib covers undergraduate and graduate-level material across algebra, analysis, topology, number theory, and other areas. These libraries demonstrate that modern proof assistants can handle serious mathematics, not just toy examples.
Community momentum. Mathlib grows by thousands of theorems monthly. The Lean Zulip is active and welcoming. Kevin Buzzard teaches undergraduates at Imperial. Terence Tao formalizes his papers. When working mathematicians adopt your tool, the library grows faster.
AI integration. Lean is the default target for neural theorem proving research. DeepSeek-Prover, LeanDojo, and others chose Lean because the metaprogramming API makes tool integration tractable. This creates a flywheel: more AI tooling attracts more users attracts more AI tooling.
The Modern Landscape
Today’s theorem provers share many features despite their different foundations. Most support some form of dependent types, allowing types to depend on values and enabling precise specifications. Most provide tactic languages for interactive proof development alongside term-mode proof construction. Most include automation ranging from simple rewriting to sophisticated decision procedures.
The systems differ in their logical foundations, their approach to equality and computation, their support for classical versus constructive reasoning, and their emphasis on programming versus pure mathematics. Lean occupies a distinctive position by providing classical logic by default while maintaining strong computational properties, and by treating programming and proving as equally important activities.
Agda has elegant syntax and supports cubical type theory for people who care about homotopy, but has no substantial mathematical library. Idris pioneered practical dependent types for programming, but the community has not coalesced around a shared library. The HOL family uses simpler type theories without full dependent types, which makes automation easier but specifications harder. These systems have their niches but face the same ecosystem challenges that have always separated survivors from the graveyard.
Getting Started
If you are starting today, Lean is a reasonable choice. The syntax is approachable, the tooling is modern, the library is substantial, and the community is active. The alternatives are not wrong, but the path is less well-trodden.
If you want the theoretical foundations: Type Theory covers the core calculus. Dependent Types explains why types can mention values. Tactics and Proof Strategy cover how to actually get proofs done. Artificial Intelligence discusses where this is heading.
Basics
True to the title of this article series, we start from zero. Not “Hello, World!” but an actual zero: the natural number that forms the foundation of arithmetic.
Zero
-- From Zero to QED: let's start at the very beginning
def zero : Nat := 0
#eval zero -- Output: 0
-- The natural numbers are defined inductively:
-- Nat.zero is the base case
-- Nat.succ n is the successor of n
def one : Nat := Nat.succ Nat.zero
def two : Nat := Nat.succ (Nat.succ Nat.zero)
#eval one -- Output: 1
#eval two -- Output: 2
-- Of course, we can just write the literals directly
def fortyTwo : Nat := 42
-- The answer to life, the universe, and everything
theorem deep_thought : fortyTwo = 6 * 7 := rfl
This first example introduces three toplevel declarations that you will use constantly:
-
defdefines a named value or function. Heredef zero : Nat := 0declares thatzerohas typeNat(natural number) and equals0. Every Lean program is built fromdefdeclarations. -
#evalevaluates an expression and prints the result. This command runs code immediately, useful for testing as you work. Commands starting with#are interactive queries that do not create permanent definitions. -
theoremdeclares a proposition to be proved. The namedeep_thoughtlabels the statementfortyTwo = 6 * 7, andrfl(reflexivity) proves it by computation: both sides reduce to42. Unlikedef, theorem proofs are opaque and never unfold during type checking.
The natural numbers are perhaps the most fundamental type in mathematics and programming. Lean represents them inductively: zero is the base case, and every other natural number is the successor of another. This simple construction gives us the entire infinite sequence 0, 1, 2, 3, and so on.
Natural Numbers
Natural numbers in Lean represent non-negative integers, defined inductively just as Peano intended in 1889. They support standard arithmetic operations, but subtraction truncates at zero since negative results would fall outside the type. This has caused approximately as many bugs as unsigned integers in C, which is to say: more than anyone wants to admit.
-- Natural numbers (Nat) are non-negative integers: 0, 1, 2, 3, ...
def myNat : Nat := 42
def anotherNat : Nat := 100
-- Basic arithmetic
#eval 3 + 5 -- 8
#eval 10 - 3 -- 7 (truncated subtraction: 3 - 10 = 0)
#eval 4 * 7 -- 28
#eval 17 / 5 -- 3 (integer division)
#eval 17 % 5 -- 2 (modulo)
-- Natural number subtraction truncates at zero
#eval 3 - 10 -- 0, not -7
-- Comparison returns Bool
#eval 5 < 10 -- true
#eval 5 ≤ 5 -- true
#eval 10 == 10 -- true
Lean has two equality operators. The == operator is decidable equality, returning a Bool for use in programs. The = operator is propositional equality, returning a Prop for use in proofs. For runtime computation, use ==. For stating theorems, use =. Both work with #eval because Lean can decide equality for natural numbers.
Integers
When you need negative numbers, use Int. Integer arithmetic behaves as you would expect from standard mathematics, unburdened by the horrors of two’s complement overflow that have plagued systems programmers since the PDP-11.
-- Integers (Int) include negative numbers
def myInt : Int := -17
def posInt : Int := 42
-- Integer arithmetic handles negatives properly
-- Starting with a negative infers Int automatically
#eval -5 + 3 -- -2
#eval -4 * -3 -- 12
#eval -17 / 5 -- -3
#eval -17 % 5 -- -2
-- But positive minus larger needs annotation
#eval (3 : Int) - 10 -- -7
-- Converting between Nat and Int
#eval Int.ofNat 42 -- 42 as Int
#eval (42 : Int).toNat -- 42 as Nat
#eval (-5 : Int).toNat -- 0 (negative becomes 0)
#eval (42 : Int).natAbs -- 42 (absolute value as Nat)
#eval (-42 : Int).natAbs -- 42
Comments
Lean supports several comment styles. Single-line comments begin with -- and extend to the end of the line. Block comments are delimited by /- and -/ and can span multiple lines. Unlike C-style comments, Lean’s block comments nest properly, so you can comment out code that already contains comments.
Documentation comments are special. A comment starting with /-- attaches to the following declaration and is extracted by documentation tools. A comment starting with /-! provides module-level documentation, typically placed at the top of a file. Both support markdown formatting.
-- Single-line comments start with double dash
-- Everything after -- is ignored until end of line
/- Block comments are delimited by /- and -/
They can span multiple lines
and are useful for temporarily disabling code -/
/-- Documentation comments start with /-- and end with -/
They attach to the following declaration and support markdown.
Use these to document your API. -/
def documented (n : Nat) : Nat := n + 1
/-! Module-level documentation uses /-! and -/
Place these at the top of a file to describe the module's purpose.
Documentation tools extract these comments automatically. -/
-- Comments nest properly:
/- outer /- inner -/ still outer -/
-- The #check command is not affected by comments on the same line
#check Nat -- this comment doesn't interfere
Modules and Namespaces
Lean organizes code into modules and namespaces. This section covers the practical syntax; we revisit the underlying mechanics in Type Theory.
Files and Modules. Each .lean file defines a module. The file Foo/Bar/Baz.lean defines module Foo.Bar.Baz. To use definitions from another module, import it at the top of your file with import Mathlib.Data.Nat.Prime or import Mathlib for an entire library. Imports are transitive: if A imports B and B imports C, then A has access to C’s definitions. The Lake build system (covered in Build System) manages dependencies and ensures modules are compiled in the correct order.
Namespaces. Namespaces group related definitions under a common prefix. They prevent name collisions and organize large codebases:
namespace Geometry2
structure Point2 where
x : Float
y : Float
def theOrigin : Point2 := ⟨0.0, 0.0⟩
def dist (p q : Point2) : Float :=
let dx := p.x - q.x
let dy := p.y - q.y
Float.sqrt (dx * dx + dy * dy)
end Geometry2
-- Access with full path
#eval Geometry2.dist Geometry2.theOrigin ⟨3.0, 4.0⟩ -- 5.0
The angle brackets ⟨ and ⟩ are shorthand for structure constructors. These two lines are equivalent:
def explicit : Point2 := Point2.mk 3.0 4.0
def shorthand : Point2 := ⟨3.0, 4.0⟩
The open command brings namespace contents into scope, so you can write dist instead of Geometry2.dist:
-- Open brings namespace contents into scope
open Geometry2 in
#eval dist theOrigin ⟨3.0, 4.0⟩ -- 5.0
-- Open for a definition
open Geometry2 in
def unitCirclePoint (θ : Float) : Point2 := ⟨Float.cos θ, Float.sin θ⟩
Sections and Variables. The section command creates a scope for temporary declarations. Variables declared with variable inside a section are automatically added as parameters to definitions that use them:
section VectorOps
variable (α : Type) [Add α] [Mul α]
-- α and the instances are automatically added as implicit parameters
def doubleIt (x : α) : α := x + x
def squareIt (x : α) : α := x * x
end VectorOps
#eval doubleIt Nat 21 -- 42
#eval squareIt Nat 7 -- 49
The bracket notation deserves explanation. Round brackets mark explicit arguments you pass directly. Square brackets mark instance arguments that Lean finds automatically through type class resolution. Here, [Add α] means the type must have an Add instance, which provides the + operator. Curly braces mark implicit arguments that Lean infers from context.
Scoping. Lean provides several mechanisms for limiting where names are visible. At the expression level, let bindings introduce local variables that exist only within the expression body. The where clause does the same but places definitions after their use, which some find more readable. Scopes nest, and inner bindings shadow outer ones with the same name.
-- let bindings create local scope within an expression
def hypotenuse (a b : Float) : Float :=
let aSquared := a * a
let bSquared := b * b
Float.sqrt (aSquared + bSquared)
-- aSquared and bSquared are not visible outside this def
-- where clauses provide the same scoping, but definitions come after use
def quadraticRoots (a b c : Float) : Float × Float :=
((-b + discriminant) / denom, (-b - discriminant) / denom)
where
discriminant := Float.sqrt (b * b - 4 * a * c)
denom := 2 * a
-- Scopes nest: inner let shadows outer
def shadowExample : Nat :=
let x := 1
let result :=
let x := 2 -- shadows outer x
x + 10 -- uses inner x: 12
result + x -- uses outer x: 12 + 1 = 13
#eval shadowExample -- 13
At the declaration level, sections scope variable declarations (as shown above), namespaces scope definitions under a prefix, and private restricts visibility to the current file. The general principle: introduce names in the narrowest scope that makes sense. Local computations belong in let or where. Shared helpers belong in a namespace. Implementation details belong behind private.
Visibility. By default, all definitions are public. Mark definitions as private to hide them outside the current file:
namespace Internal
private def helperVal : Nat := 42
def publicApi : Nat := helperVal * 2
end Internal
#eval Internal.publicApi -- 84
-- helperVal is not accessible outside this file
Export. The export command re-exports definitions from one namespace into another, making them available without opening the original:
namespace Math
def square (x : Nat) : Nat := x * x
def cube (x : Nat) : Nat := x * x * x
end Math
namespace Prelude
-- Re-export square from Math into Prelude
export Math (square)
end Prelude
-- Now square is available via Prelude without opening Math
#eval Prelude.square 5 -- 25
The Init Namespace. Every Lean file automatically imports the Init namespace, which provides foundational types and functions without explicit imports. This is Lean’s equivalent of Haskell’s Prelude or OCaml’s Stdlib, though the design differs.
| Category | Contents |
|---|---|
| Core types | Unit, Bool, Nat, Int, String, Char, Option, List, Array |
| Monads | Id, Option, Except, StateM, ReaderM, IO, plus transformers StateT, ReaderT, ExceptT, OptionT |
| Type classes | Monad, Functor, Applicative, ToString, Repr, Inhabited, BEq, Ord, Hashable |
| Proof primitives | Eq, And, Or, Not, True, False, Exists |
Haskell’s Prelude is imported unqualified by default, meaning all its names are directly available. You disable this with NoImplicitPrelude. OCaml takes the opposite approach: all modules are available qualified (you write List.map), and you must explicitly open List to use names unqualified.
Lean splits the difference. The Init namespace is always available without qualification. Unlike Haskell, there is no pragma to disable it, but you can shadow any definition with your own. The Init hierarchy is organized into submodules (Init.Prelude, Init.Data, Init.Control), but from the user’s perspective it appears as a unified set of defaults.
Functions
Functions are first-class values in Lean. You can define them in multiple ways and partially apply them to create new functions.
def add (x : Nat) (y : Nat) : Nat :=
x + y
def double : Nat → Nat :=
fun x => 2 * x
def addFive := add 5 -- Partially applied function
#eval add 3 4 -- Output: 7
#eval double 21 -- Output: 42
#eval addFive 10 -- Output: 15
When declaring multiple parameters of the same type, you can group them: (x y z : Nat) is identical to (x : Nat) (y : Nat) (z : Nat). Use whichever reads better.
Pattern Matching
Pattern matching is a powerful feature for destructuring data and defining functions by cases.
def factorial : Nat → Nat
| 0 => 1
| n + 1 => (n + 1) * factorial n
def describe : Nat → String
| 0 => "zero"
| 1 => "one"
| 2 => "two"
| n => s!"many ({n})"
#eval factorial 5 -- Output: 120
#eval describe 0 -- Output: "zero"
#eval describe 100 -- Output: "many (100)"
The factorial definition uses n + 1 rather than the seemingly more natural | n => n * factorial (n - 1). This is not a style choice. Lean must verify that recursive calls terminate, and it does this by checking that arguments decrease structurally. The pattern n + 1 desugars to Nat.succ n, explicitly matching a successor. When you recurse on n, Lean sees that n is structurally smaller than Nat.succ n. With | n => ... factorial (n - 1), Lean cannot immediately see that n - 1 is smaller than n (subtraction is a function, not a constructor), so termination checking fails.
The describe function uses string interpolation with s!"many ({n})". The s! prefix enables interpolation: expressions inside {...} are evaluated and converted to strings. Without the prefix, curly braces are literal characters.
More Declarations
Abbreviations are transparent definitions that unfold automatically during elaboration. Use them for type aliases:
-- abbrev creates a transparent abbreviation (always unfolded)
abbrev NatPair := Nat × Nat
abbrev Predicate' (α : Type) := α → Bool
def isEvenPred : Predicate' Nat := fun n => n % 2 == 0
def sumPair (p : NatPair) : Nat := p.1 + p.2
#eval isEvenPred 4 -- true
#eval sumPair (3, 7) -- 10
The interactive commands #check, #print, and #reduce help you explore code:
-- #check shows the type of an expression
#check (fun x : Nat => x + 1) -- Nat → Nat
#check @List.map -- shows full polymorphic type
-- #print shows information about a declaration
#print Nat.add
#print List
-- #reduce reduces an expression to normal form
#reduce (fun x => x + 1) 5 -- 6
#reduce List.map (· + 1) [1, 2, 3] -- [2, 3, 4]
A complete reference of all declarations appears in the appendix. Advanced declarations like axiom, opaque, universe, notation, and set_option are covered in later articles where they arise naturally.
Function Composition
Lean provides three operators for combining functions:
| Operator | Name | Direction | Meaning |
|---|---|---|---|
f ∘ g | compose | right-to-left | fun x => f (g x) |
x |> f | pipe | left-to-right | f x |
f <| x | apply | right-to-left | f x |
Composition builds new functions by chaining existing ones. The ∘ operator reads right-to-left: in f ∘ g, apply g first, then f. Pipelines with |> read left-to-right, which often matches how you think about data transformations: “take this, then do that, then do this.”
def twice (n : Nat) := n * 2
def square (n : Nat) := n * n
def inc (n : Nat) := n + 1
-- ∘ composes right-to-left: (f ∘ g) x = f (g x)
#eval (square ∘ twice) 3 -- 36: square(twice(3)) = square(6)
#eval (twice ∘ square) 3 -- 18: twice(square(3)) = twice(9)
#eval (inc ∘ square ∘ twice) 3 -- 37: inc(square(twice(3)))
-- |> pipes left-to-right: x |> f |> g = g (f x)
#eval 3 |> twice |> square |> inc -- 37: same computation, opposite order
#eval 10 |> twice -- 20
#eval [1, 2, 3] |> List.reverse -- [3, 2, 1]
-- <| is low-precedence application: f <| x = f x
-- useful to avoid parentheses
#eval String.length <| "hello" ++ " world" -- 11
#eval List.map twice <| [1, 2, 3] -- [2, 4, 6]
-- chaining with methods vs pipes
#eval [1, 2, 3].map twice |> List.reverse -- [6, 4, 2]
#eval ([1, 2, 3].map twice).reverse -- same
-- composition builds new functions without naming arguments
def processThenSquare := square ∘ twice ∘ inc
#eval processThenSquare 2 -- 36: square(twice(inc(2)))
The <| operator is just function application with low precedence. It lets you write f <| expensive computation instead of f (expensive computation). Some find this cleaner; others prefer explicit parentheses. Use whichever reads better.
Point-free style defines functions without naming their arguments: square ∘ twice rather than fun x => square (twice x). This can be elegant for simple compositions but obscure for complex ones. The goal is clarity, not cleverness.
Higher-Order Functions
Higher-order functions take functions as arguments. The classics: map transforms each element, filter keeps elements matching a predicate, foldl reduces a list to a single value by accumulating from the left.
And because Lean is also a theorem prover, we can prove properties by computation. The theorem add_comm_example states that 2 + 3 = 3 + 2, and rfl proves it because both sides reduce to 5. The examples at the end go further: reversing a list twice returns the original, list lengths add correctly, mapping a function produces the expected result.
What is the difference between #eval [1,2,3].reverse.reverse = [1,2,3] and example : [1,2,3].reverse.reverse = [1,2,3] := rfl? The #eval runs at runtime and prints true. The example is verified at compile time by the type checker, and if the equality did not hold, compilation would fail. Both check the same fact, but example catches the error before you ship. Note that this rfl proof verifies this specific list; proving it for all lists requires a theorem with induction. We cover proofs properly in Proofs.
-- Higher-order functions
#eval [1, 2, 3, 4, 5].map (· * 2) -- [2, 4, 6, 8, 10]
#eval [1, 2, 3, 4, 5].filter (· > 2) -- [3, 4, 5]
#eval [1, 2, 3, 4, 5].foldl (· + ·) 100 -- 115 (100 + 1 + 2 + 3 + 4 + 5)
-- Programs are proofs: addition is commutative
theorem add_comm_example : 2 + 3 = 3 + 2 := rfl
-- Computation IS proof: complex operations verified by evaluation
example : [1, 2, 3].reverse.reverse = [1, 2, 3] := rfl
example : [1, 2, 3].length + [4, 5].length = 5 := rfl
example : [1, 2, 3].map (· + 10) = [11, 12, 13] := rfl
From Values to Structure
For a complete reference of all toplevel declarations (def, theorem, inductive, structure, etc.) and interactive commands (#eval, #check, #print), see Appendix B.
You now have the building blocks: numbers, functions, modules, and the fundamental declarations. Next we cover the data structures that make programs useful: lists, arrays, maps, and user-defined types. After that, we explore control flow, polymorphism, effects, and IO. By the end of Arc I, you will have built a D&D character generator, which is either a useful demonstration of structured programming or an excuse to start a D&D campaign. Possibly both.
Lake Build System
This chapter covers project setup and the Lake build system. If you already have your environment configured, skip to Data Structures. Return here when you need multi-file projects or dependencies.
Every programming language eventually grows a build system, and that build system eventually grows into a small civilization with its own customs and territorial disputes. Lake is Lean’s entry in this tradition. It borrows good ideas from Cargo, is written in Lean itself, and mostly works. The documentation is sparse, the error messages occasionally cryptic, and there are two competing configuration formats that do almost but not quite the same thing. Welcome to the frontier.
That said, Lake gets the job done. Paired with Elan for version management, you get reproducible builds and workable dependency management. Code that compiles on your machine will usually compile on other machines. For a young ecosystem, this is not nothing.
Elan
Elan is the Lean version manager. It downloads, installs, and switches between different versions of Lean. Most users install Elan first and then let it manage their Lean installation. On Unix systems, installation is a single command that downloads and runs the installer script. On Windows, a dedicated installer is available.
Once installed, Elan reads a lean-toolchain file in your project directory to determine which Lean version to use. This file typically contains a single line specifying the version, such as leanprover/lean4:v4.3.0 or simply leanprover/lean4:stable for the latest stable release. When you enter a directory containing this file, Elan automatically activates the correct toolchain. If that version is not installed, Elan downloads it transparently.
This per-project versioning solves a common problem in software development. Different projects may require different Lean versions, and Elan lets them coexist without conflict. You can work on a project using Lean 4.2 in one terminal and a project using Lean 4.5 in another. The toolchain file checked into version control ensures all collaborators use the same Lean version.
Elan also manages additional toolchain components. The Lean installation includes the compiler, the language server for editor integration, and documentation tools. Updates happen through Elan with commands like elan update to fetch the latest versions.
Lake
Lake is the build system and package manager for Lean. The name combines “Lean” and “Make,” and every Lean project contains a lakefile.lean that describes its structure, dependencies, and build configuration. A minimal lakefile declares the package name and defines one or more build targets. The most common targets are libraries, which compile Lean source files into modules that other code can import, and executables, which produce standalone programs. Lake reads this configuration and orchestrates compilation, handling dependencies between modules automatically.
import Lake
open Lake DSL
package myproject where
version := v!"0.1.0"
lean_lib MyLib where
roots := #[`MyLib]
@[default_target]
lean_exe myapp where
root := `Main
This lakefile defines a package named myproject containing a library called MyLib and an executable called myapp. The library compiles all modules under the MyLib namespace, while the executable uses Main as its entry point. The @[default_target] attribute marks myapp as the target built when you run lake build without arguments.
Dependencies on external packages are declared in the lakefile using the require keyword. Lake fetches dependencies from Git repositories, and you can specify versions through tags, branches, or commit hashes. When you build your project, Lake first ensures all dependencies are available and up to date, then compiles them before your own code. Reservoir serves as the community package registry, indexing Lean packages and providing searchable documentation, dependency graphs, and build status for the ecosystem.
require mathlib from git
"https://github.com/leanprover-community/mathlib4" @ "v4.3.0"
require aesop from git
"https://github.com/leanprover-community/aesop" @ "master"
Lake maintains a lake-manifest.json file that records the exact versions of all dependencies. This lockfile ensures reproducible builds across different machines and times. When you run lake update, Lake fetches the latest versions matching your constraints and updates the manifest.
The build process produces artifacts in a .lake directory within your project. Compiled Lean files become .olean files containing serialized proof terms and compiled code. These intermediate files enable incremental compilation, where Lake only recompiles modules that have changed or whose dependencies have changed. For large projects like Mathlib, this incremental approach is essential for practical development.
Lake also supports downloading precompiled artifacts called caches. Mathlib maintains a cache of compiled artifacts for anyone who would rather not spend hours rebuilding from source. The lake exe cache get command fetches these artifacts, reducing initial setup from hours to minutes.
Project Structure
A typical Lean project follows a conventional directory layout. The lakefile sits at the project root alongside the lean-toolchain file. Source files live in directories matching their module namespaces. A module named MyLib.Data.List would be in the file MyLib/Data/List.lean. This correspondence between filesystem paths and module names makes navigation straightforward.
myproject/
lakefile.lean
lean-toolchain
lake-manifest.json
MyLib/
Basic.lean
Data/
List.lean
Vector.lean
Main.lean
Test files typically live in a separate directory, often called Test or Tests, with their own library target in the lakefile. Documentation, examples, and scripts occupy other directories as needed. Lake does not enforce a particular structure beyond the lakefile requirements, but conventions have emerged from the community.
Common Commands
Building a project uses lake build, which compiles all default targets. You can build specific targets by name, like lake build MyLib or lake build myapp. For development, lake build after editing a file recompiles only what changed.
Running an executable uses lake exe followed by the executable name, like lake exe myapp. Arguments after the executable name pass through to the program. You can also use lake run with the executable target name.
Managing dependencies uses lake update to refresh the manifest with the latest matching versions. After modifying the lakefile to add or change dependencies, running lake update fetches and locks the new versions.
Cleaning build artifacts uses lake clean, which removes the .lake/build directory.
The lake env command prints environment variables that configure Lean to find your project’s modules. This is useful when running Lean directly or integrating with external tools.
Editor Integration
The Lean language server provides IDE features like error highlighting, go-to-definition, type information on hover, and code completion. Lake integrates with the language server by providing project configuration. When you open a Lean file in an editor with Lean support, the language server reads your lakefile to understand the project structure.
Visual Studio Code with the lean4 extension is the most popular editor setup. The extension automatically starts the language server and provides a panel showing proof states and messages. Other editors like Emacs and Neovim have community-maintained Lean integrations that communicate with the same language server.
For the language server to work correctly, it must know about your project configuration. Opening a Lean file outside a Lake project, or opening a file before dependencies are built, can cause errors. Building the project with lake build before editing ensures the language server has the information it needs.
Unicode Symbol Entry
Lean uses Unicode symbols extensively. Mathematical notation has evolved over centuries to be information-dense and readable. Writing ∀ n, n + 0 = n is clearer than forall n, n + 0 = n, and α → β is more familiar to mathematicians than alpha -> beta. Rather than inventing ASCII approximations, Lean embraces the notation that mathematicians already use.
In VS Code with the Lean extension, type a backslash followed by a name to produce symbols. The editor replaces the sequence as you type. Hover over any symbol in existing code to see how to type it.
| Input | Symbol | Input | Symbol | Input | Symbol |
|---|---|---|---|---|---|
\to | → | \and | ∧ | \alpha | α |
\le | ≤ | \or | ∨ | \beta | β |
\ge | ≥ | \not | ¬ | \N | ℕ |
\ne | ≠ | \forall | ∀ | \Z | ℤ |
\< | ⟨ | \exists | ∃ | \R | ℝ |
\> | ⟩ | \in | ∈ | \x | × |
\comp | ∘ | \sub | ⊂ | \l | λ |
The angle brackets ⟨ and ⟩ deserve special mention. They are shorthand for structure constructors. These two forms are equivalent:
def explicit : Point := Point.mk 3.0 4.0
def shorthand : Point := ⟨3.0, 4.0⟩
You will see angle brackets throughout Lean code wherever structures are constructed.
The Interactive Workflow
Lean development is fundamentally interactive. Unlike batch compilers where you write code, compile, and hope for the best, Lean provides continuous feedback as you type. This tight feedback loop is not a convenience feature but the primary way you develop in Lean.
The Infoview panel is your window into Lean’s reasoning. In VS Code, it appears on the right side when you open a Lean file. As you move your cursor through the code, the Infoview updates to show the state at that position. When writing proofs, it displays the current goal: what hypotheses you have available and what remains to be proved. When writing programs, it shows types and values. This panel is essential for understanding what Lean sees at every point in your code.
Consider a simple proof in progress:
theorem add_comm (n m : Nat) : n + m = m + n := by
induction n with
| zero => simp
| succ n ih => _
When your cursor is on the underscore, the Infoview shows:
case succ
m n : Nat
ih : n + m = m + n
⊢ n + 1 + m = m + (n + 1)
This goal state tells you everything: you are in the succ case of an induction, you have m and n as natural numbers, you have an induction hypothesis ih, and you must prove the equation shown after the turnstile ⊢. Without this feedback, tactic proving would be like navigating a maze blindfolded.
Running Single Files
Not every experiment needs a full project. For quick tests, you can run a single Lean file without creating a Lake project. Create a file like hello.lean:
#eval "Hello, world!"
Run it with:
lake env lean hello.lean
Each #eval command in the file prints its result. You do not need a main function or IO monad for simple output. This is the quickest path from idea to result.
For interactive development, the VS Code Infoview shows #eval results as you type, without running any commands. Place your cursor after an #eval line and the result appears in the panel. This feedback loop is often faster than switching to a terminal.
When your experiment grows beyond a single file, or when you need dependencies, create a proper Lake project. But for exploring syntax, testing small functions, or working through exercises, single files work well.
Evaluation Commands
Lean provides several commands that evaluate expressions and report results directly in the editor. These are invaluable for exploration and debugging.
The simplest demonstration:
#eval "Hello, world!" -- Hello, world!
#check displays the type of an expression:
#check 1 + 1 -- 1 + 1 : Nat
#check [1, 2, 3] -- [1, 2, 3] : List Nat
#check fun x => x -- fun x => x : ?m.1 → ?m.1
#eval evaluates an expression and shows its value:
#eval 2 + 2 -- 4
#eval [1, 2, 3].length -- 3
#eval "hello".toUpper -- "HELLO"
#print shows the definition of a constant, including theorems:
#print Nat.add_comm
-- theorem Nat.add_comm : ∀ (n m : Nat), n + m = m + n := ...
#reduce fully reduces an expression using definitional equality:
#reduce (fun x => x + 1) 5 -- 6
These commands appear as blue underlines in VS Code. Hover over them or check the Infoview to see results. They let you test ideas immediately without writing a full program or proof.
Reading Error Messages
Lean’s error messages are verbose but precise. They tell you exactly what went wrong, which is both a blessing and a curse for newcomers who may find the detail overwhelming.
A type mismatch error shows what was expected and what was provided:
type mismatch
h
has type
P
but is expected to have type
Q
This means you tried to use a proof of P where a proof of Q was needed. Look at the goal state to understand what type you actually need.
An unknown identifier error means a name is not in scope:
unknown identifier 'foo'
Check for typos, missing imports, or hypotheses you forgot to introduce.
An unsolved goals error at the end of a proof means you have not proved everything:
unsolved goals
⊢ P ∧ Q
Your proof is incomplete. Look at what remains and continue.
The habit of reading error messages carefully, rather than guessing at fixes, will save hours of confusion. Lean is trying to help; let it.
Mathlib Projects
Projects depending on Mathlib benefit from additional tooling. The cache executable bundled with Mathlib downloads prebuilt artifacts, avoiding the need to compile Mathlib yourself. After adding Mathlib as a dependency, running lake exe cache get fetches compiled files for your Lean version.
Mathlib projects often use a template that includes recommended configuration. The template sets up the toolchain file, lakefile, and auxiliary files for continuous integration and documentation generation. Starting from this template ensures compatibility with Mathlib’s infrastructure.
Because Mathlib updates frequently, projects must balance using new features against the cost of keeping up with changes. Pinning to specific Mathlib versions provides stability, while tracking recent versions provides access to new material. The Mathlib changelog documents breaking changes to help with updates.
Command Reference
| Command | Description | Example |
|---|---|---|
lake new | Create a new project | lake new myproject |
lake init | Initialize project in current directory | lake init myproject |
lake build | Build default targets | lake build |
lake build <target> | Build specific target | lake build MyLib |
lake clean | Remove build artifacts | lake clean |
lake update | Update dependencies to latest versions | lake update |
lake exe <name> | Run an executable | lake exe myapp --flag |
lake env | Print environment variables | lake env |
lake script run | Run a lakefile script | lake script run test |
lake test | Run project tests | lake test |
lake exe cache get | Download Mathlib cache | lake exe cache get |
elan show | Show installed toolchains | elan show |
elan update | Update all toolchains | elan update |
elan default | Set default toolchain | elan default leanprover/lean4:stable |
elan override | Set directory-specific toolchain | elan override set leanprover/lean4:v4.3.0 |
Compiler Backend and Runtime
Lean 4 compiles to C code, which is then compiled to native executables using a system C compiler (typically Clang or GCC). This compilation pipeline differs from most theorem provers, which either interpret code or extract to another language like OCaml or Haskell. The choice to target C provides portability and enables linking with existing C libraries.
The compilation process involves several stages. Lean source code is first type-checked and elaborated into the Lean kernel language. Proof terms are then erased since they have no computational content: proofs exist to satisfy the type checker, and once verified, they are deleted before the program runs. The remaining code is converted to an intermediate representation that resembles a simplified functional language. This intermediate form is then translated to C code that Lake compiles with your system’s C compiler. We explore the relationship between programming and proving in detail in Proofs.
Lean’s runtime uses reference counting rather than tracing garbage collection. Each heap-allocated object maintains a count of references to it. When the count drops to zero, the object is immediately freed. This approach has lower latency than tracing collectors since there are no garbage collection pauses. The Counting Immutable Beans paper describes the design in detail.
Reference counting enables a technique the Lean developers call Functional But In-Place. When you perform a functional update on a data structure and the original has a reference count of one, the runtime can reuse the memory in place rather than allocating new storage. This means that pure functional code operating on unshared data achieves performance comparable to imperative mutation. The Array type in Lean exploits this property: appending to an unshared array mutates it in place despite the pure functional semantics.
The runtime is strict, not lazy like Haskell. All function arguments are evaluated before the function body executes. This makes performance more predictable but requires different idioms for infinite data structures or expensive computations that might not be needed. Lean provides explicit thunks via the Thunk type when lazy evaluation is required.
Caution
The ecosystem lacks mature libraries for common tasks like HTTP clients, database connectors, encryption, and async I/O. While the Axiomed project is building HTTP support and the community has created socket bindings, these are far less polished than equivalents in established languages. Linking against system libraries requires out-of-band setup that Lake cannot manage portably across operating systems. Parallelism is supported in the form of cooperative scheduling on multiple threads.
Binary sizes tend to be large because the generated C code includes the Lean runtime and any Mathlib dependencies are substantial. Compile times for projects depending on Mathlib can be lengthy, though the cache system mitigates this for incremental builds. The compiler itself is under active development, with the Year 3 Roadmap promising improvements to code generation, smaller binaries, and better reference counting.
For production systems, Lean is best suited as a specification and verification tool rather than as the implementation language. A practical pattern is to write formal specifications in Lean, prove properties about algorithms, then implement the actual system in a production language while using Lean-generated tests or runtime checks to verify the implementation matches the specification. Alternatively, Lean excels for tools where correctness matters more than ecosystem maturity: proof automation, code generators, domain-specific languages, and programs where the type system’s expressiveness justifies the ecosystem tradeoffs.
Data Structures
Programs are nothing without data to manipulate. Here we cover the types that hold your data: from simple primitives like booleans and characters to collections like lists and arrays to user-defined structures and inductive types. By the end, you will have the vocabulary to represent any data your program needs.
Unit
The Unit type has exactly one value: (). It serves as a placeholder when a function has no meaningful return value, similar to void in C except that void is a lie and Unit is honest about being boring. Every function can return Unit because there is only one possible value to return.
-- Unit has exactly one value: ()
def nothing : Unit := ()
-- Often used for side-effecting functions
def printAndReturn : IO Unit := do
IO.println "Side effect!"
return ()
-- Unit in function types indicates "no meaningful return value"
def greetIO (name : String) : IO Unit :=
IO.println s!"Hello, {name}!"
Empty
The Empty type has no values at all. You can write a function from Empty to anything because you will never have to actually produce an output; there are no inputs to handle. Empty represents logical impossibility and marks unreachable code branches. If you somehow obtain a value of type Empty, you can derive anything from it, a principle the medievals called ex falso quodlibet: from falsehood, anything follows.
-- Empty has no values at all
-- It represents impossibility or unreachable code
-- If you have a value of type Empty, you can prove anything
def absurd' {α : Type} (e : Empty) : α :=
Empty.elim e
-- Empty is useful for marking impossible cases
inductive Void where -- Custom empty type (equivalent to Empty)
Note
For those with category theory background:
Unitis the terminal object (for any type \(A\), there exists exactly one function \(A \to \text{Unit}\)), andEmptyis the initial object (for any type \(A\), there exists exactly one function \(\text{Empty} \to A\)). In logic, these correspond to \(\top\) (true) and \(\bot\) (false). You do not need this perspective to use these types effectively. The Proofs and Type Theory articles explain the deeper connections, including the Curry-Howard correspondence that links types to logic.
Booleans
Booleans represent truth values and form the basis of conditional logic. George Boole would be pleased, though he might find it curious that his algebra of logic became the foundation for arguments about whether 0 or 1 should represent truth.
-- Booleans: true and false
def myBool : Bool := true
def myFalse : Bool := false
-- Boolean operations
#eval true && false -- false (and)
#eval true || false -- true (or)
#eval !true -- false (not)
#eval true ^^ false -- true (xor)
-- Conditionals
def absInt (x : Int) : Int :=
if x < 0 then -x else x
#eval absInt (-5) -- 5
-- Boolean decision
#eval if true then "yes" else "no" -- "yes"
#eval if false then "yes" else "no" -- "no"
Option
The Option type represents values that may or may not exist. It is Lean’s safe alternative to null references, which Tony Hoare famously called his “billion dollar mistake.” With Option, absence is explicit in the type: you cannot forget to check because the compiler will not let you. The hollow log either contains honey or it does not, and you must handle both cases.
-- Option represents a value that may or may not exist
def someValue : Option Nat := some 42
def noValue : Option Nat := none
-- Pattern matching on Option
def getOrDefault (opt : Option Nat) (default : Nat) : Nat :=
match opt with
| some x => x
| none => default
#eval getOrDefault (some 10) 0 -- 10
#eval getOrDefault none 0 -- 0
-- Option combinators
#eval (some 5).map (· * 2) -- some 10
#eval (none : Option Nat).map (· * 2) -- none
#eval (some 5).getD 0 -- 5
#eval (none : Option Nat).getD 0 -- 0
#eval (some 5).isSome -- true
#eval (some 5).isNone -- false
-- Chaining Options
#eval (some 5).bind (fun x => some (x + 1)) -- some 6
Chars
Characters in Lean are Unicode scalar values, capable of representing any character from any human language, mathematical symbols, and bears.
-- Characters are Unicode scalar values
def letterA : Char := 'A'
def digit : Char := '7'
def unicode : Char := 'λ'
def bear : Char := '🐻'
-- Character properties
#eval 'A'.isAlpha -- true
#eval '7'.isDigit -- true
#eval ' '.isWhitespace -- true
#eval 'a'.isLower -- true
#eval 'A'.isUpper -- true
-- Character to/from Nat
#eval 'A'.toNat -- 65
#eval Char.ofNat 65 -- 'A'
-- Case conversion
#eval 'a'.toUpper -- 'A'
#eval 'Z'.toLower -- 'z'
Strings
Strings are sequences of characters with a rich set of operations for text processing. They are UTF-8 encoded, which means you have already won half the battle that consumed the first decade of web development.
-- Strings are sequences of characters
def greeting : String := "Hello, Lean!"
def multiline : String := "Line 1\nLine 2\nLine 3"
-- String operations
#eval "Hello".length -- 5
#eval "Hello".append " World" -- "Hello World"
#eval "Hello" ++ " " ++ "World" -- "Hello World"
#eval "Hello".toList -- ['H', 'e', 'l', 'l', 'o']
-- String interpolation
def shipName := "Mistake Not My Current State Of Alarm"
def shipClass := "GCU"
#eval s!"The {shipClass} {shipName} has entered the system."
-- Substring operations
#eval "Hello World".take 5 -- "Hello"
#eval "Hello World".drop 6 -- "World"
#eval "Hello".isPrefixOf "Hello World" -- true
-- Splitting and joining
#eval "a,b,c".splitOn "," -- ["a", "b", "c"]
#eval ",".intercalate ["a", "b", "c"] -- "a,b,c"
Fixed-Precision Integers
For performance-critical code or when interfacing with external systems, Lean provides fixed-precision integers that map directly to machine types.
-- Fixed-precision unsigned integers
def byte : UInt8 := 255
def word : UInt16 := 65535
def dword : UInt32 := 0xDEADBEEF
def qword : UInt64 := 0xCAFEBABE12345678
-- Overflow wraps around
#eval (255 : UInt8) + 1 -- 0
-- Size type for platform-dependent sizing
def platformSize : USize := 42
-- Signed fixed-precision integers
def signedByte : Int8 := -128
def signedWord : Int16 := -32768
Floats
Lean supports IEEE 754 double-precision floating-point numbers for scientific computing and applications that require real number approximations.
-- IEEE 754 double-precision floating-point
def myFloat : Float := 3.14159
def scientific : Float := 6.022e23
def negativeFloat : Float := -273.15
-- Floating-point arithmetic
#eval 3.14 + 2.86 -- 6.0
#eval 10.0 / 3.0 -- 3.333...
#eval Float.sqrt 2.0 -- 1.414...
#eval Float.sin 0.0 -- 0.0
#eval Float.cos 0.0 -- 1.0
-- Special values
#eval (1.0 / 0.0 : Float) -- inf
#eval (0.0 / 0.0 : Float) -- nan
Tuples
Tuples combine values of potentially different types into a single value. They are the basic building block for returning multiple values from functions.
-- Tuples combine values of different types
def pair : Nat × String := (42, "answer")
def triple : Nat × String × Bool := (1, "one", true)
-- Accessing tuple elements
#eval pair.1 -- 42
#eval pair.2 -- "answer"
#eval pair.fst -- 42
#eval pair.snd -- "answer"
-- Pattern matching on tuples
def swap {α β : Type} (p : α × β) : β × α :=
let (a, b) := p
(b, a)
#eval swap (1, "hello") -- ("hello", 1)
-- Nested tuples
def nested : (Nat × Nat) × String := ((1, 2), "pair")
#eval nested.1.1 -- 1
#eval nested.1.2 -- 2
Sum Types
Sum types represent a choice between two alternatives. The Except variant is commonly used for error handling.
-- Sum types represent a choice between types
def leftValue : Nat ⊕ String := Sum.inl 42
def rightValue : Nat ⊕ String := Sum.inr "hello"
-- Pattern matching on Sum
def describeSum (s : Nat ⊕ String) : String :=
match s with
| Sum.inl n => s!"A number: {n}"
| Sum.inr str => s!"A string: {str}"
#eval describeSum leftValue -- "A number: 42"
#eval describeSum rightValue -- "A string: hello"
-- Except is like Sum but for error handling
def divideExcept (x y : Nat) : Except String Nat :=
if y == 0 then
Except.error "Division by zero"
else
Except.ok (x / y)
#eval divideExcept 10 2 -- Except.ok 5
#eval divideExcept 10 0 -- Except.error "Division by zero"
Lists
Lists are singly-linked sequences of elements, the workhorse data structure of functional programming since LISP introduced them in 1958. They support pattern matching and have a rich set of higher-order operations. Prepending is \(O(1)\); appending is \(O(n)\). If this bothers you, wait until you meet Arrays.
-- Linked lists
def myList : List Nat := [1, 2, 3, 4, 5]
def emptyList : List Nat := []
-- List construction
def consExample := 0 :: [1, 2, 3] -- [0, 1, 2, 3]
def appendExample := [1, 2] ++ [3, 4] -- [1, 2, 3, 4]
-- Common operations
#eval [1, 2, 3].length -- 3
#eval [1, 2, 3].head? -- some 1
#eval [1, 2, 3].tail? -- some [2, 3]
#eval [1, 2, 3][1]? -- some 2
#eval [1, 2, 3].reverse -- [3, 2, 1]
-- Higher-order functions
#eval [1, 2, 3].map (· * 2) -- [2, 4, 6]
#eval [1, 2, 3, 4].filter (· > 2) -- [3, 4]
#eval [1, 2, 3, 4].foldl (· + ·) 0 -- 10
-- Cartesian product via flatMap
#eval [1, 2].flatMap (fun x => [10, 20].map (fun y => x + y)) -- [11, 21, 12, 22]
Arrays
Arrays provide \(O(1)\) random access and are the preferred choice when you need indexed access without the pointer-chasing of linked lists. Thanks to Lean’s reference counting, operations on unshared arrays mutate in place, giving you the performance of imperative code with the semantics of pure functions. Purity without the performance penalty. The trick is that “unshared” does a lot of work in that sentence.
-- Arrays provide O(1) random access
def myArray : Array Nat := #[1, 2, 3, 4, 5]
def emptyArray : Array Nat := #[]
-- Array operations
#eval #[1, 2, 3].size -- 3
#eval #[1, 2, 3][0]! -- 1 (panics if out of bounds)
#eval #[1, 2, 3][1]? -- some 2
#eval #[1, 2, 3][10]? -- none
-- Modifying arrays (creates new array)
#eval #[1, 2, 3].push 4 -- #[1, 2, 3, 4]
#eval #[1, 2, 3].pop -- #[1, 2]
#eval #[1, 2, 3].set! 1 99 -- #[1, 99, 3]
-- Conversion
#eval #[1, 2, 3].toList -- [1, 2, 3]
#eval [1, 2, 3].toArray -- #[1, 2, 3]
-- Higher-order functions
#eval #[1, 2, 3].map (· * 2) -- #[2, 4, 6]
#eval #[1, 2, 3, 4].filter (· % 2 == 0) -- #[2, 4]
ByteArrays
ByteArrays are efficient arrays of bytes, useful for binary data, file I/O, and network protocols.
-- ByteArray is an efficient array of bytes
def bytes : ByteArray := ByteArray.mk #[0x48, 0x65, 0x6C, 0x6C, 0x6F]
-- Operations
#eval bytes.size -- 5
#eval bytes.get! 0 -- 72 (0x48 = 'H')
#eval bytes.toList -- [72, 101, 108, 108, 111]
-- Convert to/from String (UTF-8)
#eval "Hello".toUTF8 -- ByteArray from string
Bitvectors
Bitvectors represent fixed-width binary data and support bitwise operations. They are essential for low-level programming, cryptography, and hardware verification.
-- BitVec n is an n-bit vector
def bits8 : BitVec 8 := 0xFF
def bits16 : BitVec 16 := 0xABCD
def bits32 : BitVec 32 := 0xDEADBEEF
-- Bitwise operations
#eval (0b1100 : BitVec 4) &&& 0b1010 -- 0b1000 (AND)
#eval (0b1100 : BitVec 4) ||| 0b1010 -- 0b1110 (OR)
#eval (0b1100 : BitVec 4) ^^^ 0b1010 -- 0b0110 (XOR)
#eval ~~~(0b1100 : BitVec 4) -- 0b0011 (NOT)
-- Shifts
#eval (0b0001 : BitVec 4) <<< 2 -- 0b0100 (left shift)
#eval (0b1000 : BitVec 4) >>> 2 -- 0b0010 (right shift)
Maps and Sets
Hash maps and hash sets provide efficient key-value storage and membership testing.
-- HashMap for key-value storage
open Std in
def myMap : HashMap String Nat :=
HashMap.emptyWithCapacity
|>.insert "one" 1
|>.insert "two" 2
|>.insert "three" 3
#eval myMap.get? "two" -- some 2
#eval myMap.get? "four" -- none
#eval myMap.contains "one" -- true
#eval myMap.size -- 3
-- HashSet for unique elements
open Std in
def mySet : HashSet Nat :=
HashSet.emptyWithCapacity
|>.insert 1
|>.insert 2
|>.insert 3
|>.insert 2 -- duplicate ignored
#eval mySet.contains 2 -- true
#eval mySet.contains 5 -- false
#eval mySet.size -- 3
Structures
Structures are Lean’s way of grouping related data with named fields.
structure Point where
x : Float
y : Float
deriving Repr
def origin : Point := ⟨0.0, 0.0⟩
def myPoint : Point := { x := 3.0, y := 4.0 }
def distance (p : Point) : Float :=
Float.sqrt (p.x * p.x + p.y * p.y)
#eval distance myPoint -- Output: 5.0
Inductive Types
Inductive types allow you to define custom data types by specifying their constructors. An inductive type with no arguments on its constructors represents a finite set of values, like an enumeration. The SpellSchool type below can take exactly eight values (abjuration, conjuration, etc.), and the schoolDanger function assigns a danger rating to each via pattern matching.
When constructors take arguments, the type becomes recursive and can represent unbounded data. MyList α is parameterized by a type α and has two constructors: nil for the empty list and cons for prepending an element. The length function recurses through the structure, counting elements.
inductive SpellSchool where
| abjuration -- Protective magic
| conjuration -- Summoning things from elsewhere
| divination -- Knowing things you shouldn't
| enchantment -- Making friends (involuntarily)
| evocation -- Fireballs, obviously
| illusion -- Lying, but with magic
| necromancy -- Asking corpses for favors
| transmutation -- Turning lead into gold (or frogs)
deriving Repr, DecidableEq
def schoolDanger : SpellSchool → Nat
| .abjuration => 1
| .divination => 2
| .illusion => 3
| .transmutation => 4
| .enchantment => 5
| .conjuration => 6
| .evocation => 8 -- Fireballs in enclosed spaces
| .necromancy => 9 -- Ethical concerns
inductive MyList (α : Type) where
| nil : MyList α
| cons : α → MyList α → MyList α
def MyList.length {α : Type} : MyList α → Nat
| MyList.nil => 0
| MyList.cons _ tail => 1 + tail.length
-- Creating values and using pattern-matching functions
def ev : SpellSchool := SpellSchool.evocation
#eval schoolDanger ev -- 8
#eval schoolDanger .necromancy -- 9 (dot notation shorthand)
def myNumbers : MyList Nat := .cons 1 (.cons 2 (.cons 3 .nil))
#eval myNumbers.length -- 3
Subtypes
Subtypes refine an existing type with a predicate. The value carries both the data and a proof that the predicate holds. This is where dependent types begin to flex: instead of checking at runtime whether a number is positive, you encode positivity in the type itself. The predicate becomes part of the contract, enforced at compile time.
-- Subtypes refine a type with a predicate
def Positive := { n : Nat // n > 0 }
def five : Positive := ⟨5, by decide⟩
-- Accessing subtype values
#eval five.val -- 5 (the underlying Nat)
-- five.property is a proof that 5 > 0
-- Common subtypes
def NonEmptyList (α : Type) := { xs : List α // xs ≠ [] }
def exampleNEL : NonEmptyList Nat :=
⟨[1, 2, 3], by decide⟩
-- Safe head function for non-empty lists
def safeHead {α : Type} [Inhabited α] (nel : NonEmptyList α) : α :=
match nel.val with
| x :: _ => x
| [] => default -- unreachable due to property, but needed for totality
#eval safeHead exampleNEL -- 1
Fin
Fin n represents natural numbers strictly less than n. The type carries a proof that its value is in bounds, making it useful for safe array indexing.
-- Fin n is the type of natural numbers less than n
def smallNum : Fin 5 := 3 -- 3 is less than 5
def anotherSmall : Fin 10 := 7 -- 7 is less than 10
-- Fin values carry a proof that they're in bounds
#eval (smallNum : Fin 5).val -- 3 (the underlying Nat)
-- Useful for array indexing with guaranteed bounds
def safeIndex {α : Type} (arr : Array α) (i : Fin arr.size) : α :=
arr[i]
-- Fin arithmetic wraps around
#eval (3 : Fin 5) + 4 -- 2 (wraps: 7 mod 5 = 2)
Tip
Notice that
Fin nbundles a value with a proof about that value. This pattern appears everywhere in Lean: types can contain proofs. This is not a special feature but a consequence of Lean occupying λC, the most expressive corner of the lambda cube, where types can depend on values. The Curry-Howard correspondence makes propositions into types and proofs into values.If you are worried about runtime overhead: proofs are erased at compile time. The compiled code for
Fin ncarries only the natural number, not the proof. Zero runtime cost. This is proof irrelevance in action: the type checker verifies the proof exists, then discards it. You get compile-time assurance with no runtime penalty.
Type Classes
Type classes provide a way to define generic interfaces that can be implemented for different types.
def showTwice {α : Type} [ToString α] (x : α) : String :=
s!"{x} {x}"
#eval showTwice 42 -- Output: "42 42"
#eval showTwice "hello" -- Output: "hello hello"
#eval showTwice true -- Output: "true true"
Example: Magic The Gathering
We now have enough Lean to model something from the real world. Naturally, we choose Magic: The Gathering. The game has been proven Turing complete and as hard as the halting problem, so it makes a worthy adversary.
Mana System
Five colors of mana (white, blue, black, red, green) plus colorless. An inductive type captures the colors; a structure captures costs with default values of zero:
inductive ManaColor where
| white | blue | black | red | green | colorless
deriving Repr, DecidableEq
structure ManaCost where
white : Nat := 0
blue : Nat := 0
black : Nat := 0
red : Nat := 0
green : Nat := 0
colorless : Nat := 0
deriving Repr
def ManaCost.total (c : ManaCost) : Nat :=
c.white + c.blue + c.black + c.red + c.green + c.colorless
Players accumulate mana in a pool. The key question: can you afford a spell? The pay function returns Option ManaPool, giving back the remaining mana on success or none if you cannot afford the cost:
structure ManaPool where
white : Nat := 0
blue : Nat := 0
black : Nat := 0
red : Nat := 0
green : Nat := 0
colorless : Nat := 0
deriving Repr
def ManaPool.total (p : ManaPool) : Nat :=
p.white + p.blue + p.black + p.red + p.green + p.colorless
def ManaPool.canAfford (pool : ManaPool) (cost : ManaCost) : Bool :=
pool.white >= cost.white &&
pool.blue >= cost.blue &&
pool.black >= cost.black &&
pool.red >= cost.red &&
pool.green >= cost.green &&
pool.total >= cost.total
def ManaPool.pay (pool : ManaPool) (cost : ManaCost) : Option ManaPool :=
if pool.canAfford cost then
some { white := pool.white - cost.white
blue := pool.blue - cost.blue
black := pool.black - cost.black
red := pool.red - cost.red
green := pool.green - cost.green
colorless := pool.total - cost.total -
(pool.white - cost.white) - (pool.blue - cost.blue) -
(pool.black - cost.black) - (pool.red - cost.red) -
(pool.green - cost.green) }
else
none
Card Types
Cards come in different types. Creatures have power and toughness; instants, sorceries, and artifacts do not. An inductive type with arguments captures this: the creature constructor carries two Nat values, while others carry nothing:
inductive CardType where
| creature (power : Nat) (toughness : Nat)
| instant
| sorcery
| enchantment
| artifact
deriving Repr
structure Card where
name : String
cost : ManaCost
cardType : CardType
deriving Repr
Some iconic cards to work with. Note how .creature 2 2 constructs a creature type inline, using the dot notation shorthand:
def goblinGuide : Card :=
{ name := "Goblin Guide"
cost := { red := 1 }
cardType := .creature 2 2 }
def searingSpear : Card :=
{ name := "Searing Spear"
cost := { red := 1, colorless := 1 }
cardType := .instant }
def dayOfJudgment : Card :=
{ name := "Day of Judgment"
cost := { white := 2, colorless := 2 }
cardType := .sorcery }
def swordOfFire : Card :=
{ name := "Sword of Fire and Ice"
cost := { colorless := 3 }
cardType := .artifact }
def graveTitan : Card :=
{ name := "Grave Titan"
cost := { black := 2, colorless := 4 }
cardType := .creature 6 6 }
Pattern matching extracts information from cards. Querying a non-creature for its power returns none:
def Card.isCreature (c : Card) : Bool :=
match c.cardType with
| .creature _ _ => true
| _ => false
def Card.power (c : Card) : Option Nat :=
match c.cardType with
| .creature p _ => some p
| _ => none
def Card.toughness (c : Card) : Option Nat :=
match c.cardType with
| .creature _ t => some t
| _ => none
Combat
Creatures on the battlefield can attack and block. A Creature structure tracks accumulated damage. When damage meets or exceeds toughness, the creature dies:
structure Creature where
name : String
power : Nat
toughness : Nat
damage : Nat := 0
deriving Repr
def Creature.fromCard (c : Card) : Option Creature :=
match c.cardType with
| .creature p t => some { name := c.name, power := p, toughness := t }
| _ => none
def Creature.isAlive (c : Creature) : Bool :=
c.damage < c.toughness
def Creature.takeDamage (c : Creature) (dmg : Nat) : Creature :=
{ c with damage := c.damage + dmg }
def Creature.canBlock (blocker attacker : Creature) : Bool :=
blocker.isAlive && attacker.isAlive
def combat (attacker blocker : Creature) : Creature × Creature :=
let attackerAfter := attacker.takeDamage blocker.power
let blockerAfter := blocker.takeDamage attacker.power
(attackerAfter, blockerAfter)
Combat is simultaneous: both creatures deal damage at once. Two 2/2 creatures blocking each other results in mutual destruction.
Hand Management
A hand is a list of cards. List operations let us query it: which cards can we cast with our current mana? How many creatures do we have? What is the total mana cost?
abbrev Hand := List Card
def Hand.playable (hand : Hand) (pool : ManaPool) : List Card :=
hand.filter (fun c => pool.canAfford c.cost)
def Hand.creatures (hand : Hand) : List Card :=
hand.filter Card.isCreature
def Hand.totalCost (hand : Hand) : Nat :=
hand.foldl (fun acc c => acc + c.cost.total) 0
The filter function takes a predicate and keeps matching elements. The foldl function reduces a list to a single value. These are the workhorses of functional programming, and they compose naturally: hand.filter Card.isCreature gives all creatures, hand.playable pool gives everything castable.
Tip
Run from the repository:
lake exe mtg. The full source is on GitHub.
Beyond the Basics
The data structures covered here handle most everyday programming. For specialized needs, Mathlib and Batteries provide a comprehensive collection. The Mathlib overview catalogs everything available.
- List-like structures
- Sets
- Maps
- Trees
From Data to Control
You now have the building blocks for representing any data your program needs. Next we cover how to work with that data: control flow, recursion, and the patterns that give programs their structure.
Control Flow and Structures
Most languages let you lie to the compiler. Lean does not. There are no statements, only expressions. Every if produces a value. Every match is exhaustive or the compiler complains. Every recursive function must terminate or you must convince the system otherwise. Where imperative languages let you wave your hands, Lean demands you show your work.
Conditionals
Lean’s if-then-else is an expression, not a statement. Every branch must produce a value of the same type, and the entire expression evaluates to that value. There is no void, no implicit fall-through, no forgetting to return. The ternary operator that other languages treat as a curiosity is simply how conditionals work here.
-- If-then-else expressions
def absolute (x : Int) : Int :=
if x < 0 then -x else x
#eval absolute (-5) -- 5
#eval absolute 3 -- 3
-- Nested conditionals
def classifyNumber (n : Int) : String :=
if n < 0 then "negative"
else if n == 0 then "zero"
else "positive"
#eval classifyNumber (-10) -- "negative"
#eval classifyNumber 0 -- "zero"
#eval classifyNumber 42 -- "positive"
-- Conditionals are expressions, not statements
def minValue (a b : Nat) : Nat :=
if a < b then a else b
#eval minValue 3 7 -- 3
Pattern Matching
Pattern matching with match expressions lets you destructure data and handle cases exhaustively. The compiler verifies you have covered every possibility, which eliminates an entire category of bugs that haunt languages where the default case is “do nothing and hope for the best.” You can match on constructors, literals, and multiple values simultaneously.
-- Pattern matching with match
def describeList {α : Type} (xs : List α) : String :=
match xs with
| [] => "empty"
| [_] => "singleton"
| [_, _] => "pair"
| _ => "many elements"
#eval describeList ([] : List Nat) -- "empty"
#eval describeList [1] -- "singleton"
#eval describeList [1, 2] -- "pair"
#eval describeList [1, 2, 3, 4] -- "many elements"
-- Matching on multiple values
def fizzbuzz (n : Nat) : String :=
match n % 3, n % 5 with
| 0, 0 => "FizzBuzz"
| 0, _ => "Fizz"
| _, 0 => "Buzz"
| _, _ => toString n
#eval (List.range 16).map fizzbuzz
-- ["FizzBuzz", "1", "2", "Fizz", "4", "Buzz", ...]
-- Guards in pattern matching
def classifyAge (age : Nat) : String :=
match age with
| 0 => "infant"
| n => if n < 13 then "child"
else if n < 20 then "teenager"
else "adult"
#eval classifyAge 5 -- "child"
#eval classifyAge 15 -- "teenager"
#eval classifyAge 30 -- "adult"
Simple Recursion
Recursive functions are fundamental to functional programming. A function processing a list works through elements one by one, patient and systematic, eventually reaching the empty case and returning upstream with its catch. In Lean, functions that call themselves must be shown to terminate. For simple structural recursion on inductive types like Nat or List, Lean can automatically verify termination.
-- Simple recursion on natural numbers
def factorial : Nat → Nat
| 0 => 1
| n + 1 => (n + 1) * factorial n
#eval factorial 5 -- 120
#eval factorial 10 -- 3628800
-- Recursion on lists
def sum : List Nat → Nat
| [] => 0
| x :: xs => x + sum xs
#eval sum [1, 2, 3, 4, 5] -- 15
-- Computing the length of a list
def length {α : Type} : List α → Nat
| [] => 0
| _ :: xs => 1 + length xs
#eval length [1, 2, 3] -- 3
Tail Recursion
Naive recursion can overflow the stack for large inputs because each recursive call adds a frame. Tail recursion solves this by restructuring the computation so the recursive call is the last operation, allowing the compiler to optimize it into a loop. Scheme mandated tail call optimization in 1975. Most other languages did not, which is why stack traces exist.
-- Naive recursion can overflow the stack for large inputs
-- Tail recursion uses an accumulator to avoid this
-- Tail-recursive factorial
def factorialTR (n : Nat) : Nat :=
let rec go (acc : Nat) : Nat → Nat
| 0 => acc
| k + 1 => go (acc * (k + 1)) k
go 1 n
#eval factorialTR 20 -- 2432902008176640000
-- Tail-recursive sum
def sumTR (xs : List Nat) : Nat :=
let rec go (acc : Nat) : List Nat → Nat
| [] => acc
| x :: rest => go (acc + x) rest
go 0 xs
#eval sumTR (List.range 1000) -- Sum of 0..999
-- Tail-recursive reverse
def reverseTR {α : Type} (xs : List α) : List α :=
let rec go (acc : List α) : List α → List α
| [] => acc
| x :: rest => go (x :: acc) rest
go [] xs
#eval reverseTR [1, 2, 3, 4, 5] -- [5, 4, 3, 2, 1]
Structural Recursion
Lean requires all recursive functions to terminate, which prevents you from accidentally writing infinite loops and passing them off as proofs. For simple cases where the recursive argument structurally decreases, Lean verifies termination automatically. For more complex cases, you must provide termination hints, essentially explaining to the compiler why your clever recursion scheme actually finishes. The termination checker is not easily fooled.
-- Lean requires recursion to be well-founded
-- Structural recursion on decreasing arguments is automatic
-- Merge two sorted lists into one sorted list
def merge : List Nat → List Nat → List Nat
| [], ys => ys
| xs, [] => xs
| x :: xs, y :: ys =>
if x ≤ y then x :: merge xs (y :: ys)
else y :: merge (x :: xs) ys
-- Full merge sort: split at midpoint, recurse, merge
def mergeSort (xs : List Nat) : List Nat :=
if h : xs.length < 2 then xs
else
let mid := xs.length / 2
let left := xs.take mid
let right := xs.drop mid
have hl : left.length < xs.length := by
have h1 : mid < xs.length := Nat.div_lt_self (by omega) (by omega)
have h2 : left.length ≤ mid := List.length_take_le mid xs
omega
have hr : right.length < xs.length := by
simp only [List.length_drop, right, mid]
have : mid > 0 := Nat.div_pos (by omega) (by omega)
omega
merge (mergeSort left) (mergeSort right)
termination_by xs.length
#eval mergeSort [3, 1, 4, 1, 5, 9, 2, 6] -- [1, 1, 2, 3, 4, 5, 6, 9]
The merge function is structurally recursive: each call operates on a smaller list. The mergeSort function is trickier. It splits the list at the midpoint and recurses on both halves. Lean cannot immediately see that take and drop produce shorter lists, so we provide have clauses that prove the lengths decrease. The termination_by xs.length annotation tells Lean to measure termination by list length rather than structural decrease.
Escape Hatches: partial and do
Sometimes you just want to compute something. The termination checker is a feature, not a prison. When proving termination would require more ceremony than the problem warrants, Lean provides escape hatches.
The partial keyword marks a function that might not terminate. Lean skips the termination proof and trusts you. The tradeoff: partial functions cannot be used in proofs since a non-terminating function could “prove” anything. For computation, this is often acceptable.
-- When termination is hard to prove, partial lets you skip the proof
partial def findFixpoint (f : Nat → Nat) (x : Nat) : Nat :=
let y := f x
if y == x then x else findFixpoint f y
#eval findFixpoint (· / 2 + 1) 100 -- 2
-- Sum even Fibonacci numbers below a bound (Project Euler #2)
partial def sumEvenFibsBelow (bound : Nat) : Nat := Id.run do
let mut a := 0
let mut b := 1
let mut sum := 0
while b < bound do
if b % 2 == 0 then
sum := sum + b
let next := a + b
a := b
b := next
return sum
#eval sumEvenFibsBelow 4000000 -- 4613732
The second example uses Id.run do to write imperative-looking code in a pure context. The Id monad is the identity monad, and Id.run extracts the final value. The mut keyword introduces mutable bindings; := reassigns them. Lean compiles this into pure functional operations. The resulting code is referentially transparent, but the syntax is familiar to programmers from imperative backgrounds.
This style shines for algorithms where the functional version would be contorted. Consider Project Euler Problem 2: sum even Fibonacci numbers below four million. The imperative version is direct. The pure functional version would thread accumulators through recursive calls, which is correct but harder to read.
Use partial when exploring, prototyping, or when the termination argument would distract from the actual logic. When you need to prove properties about the function, you will need to establish termination. But not everything needs to be a theorem. Sometimes you just want an answer.
Unless
The unless keyword is syntactic sugar for if not. When you find yourself writing if !condition then ..., the negation can obscure intent. With unless, the code reads closer to how you would describe it in English: “unless this is valid, bail out early.”
def validatePositive (n : Int) : IO (Option Int) := do
unless n > 0 do
return none
return some n
#eval validatePositive 5 -- some 5
#eval validatePositive (-3) -- none
def processIfValid (values : List Int) : IO Unit := do
for v in values do
unless v >= 0 do
continue
IO.println s!"Processing: {v}"
The unless keyword works in do blocks as a guard clause. Combined with continue, it provides a clean way to skip iterations that fail some condition without nesting the rest of the loop body inside an if.
For Loops
For loops iterate over anything with a ForIn instance. Lists, arrays, ranges, and custom types all work uniformly. The syntax for x in collection do binds x to each element in turn.
def sumList (xs : List Nat) : Nat := Id.run do
let mut total := 0
for x in xs do
total := total + x
return total
#eval sumList [1, 2, 3, 4, 5] -- 15
def sumRange (n : Nat) : Nat := Id.run do
let mut total := 0
for i in [0:n] do
total := total + i
return total
#eval sumRange 10 -- 45
def sumEvens (n : Nat) : Nat := Id.run do
let mut total := 0
for i in [0:n:2] do
total := total + i
return total
#eval sumEvens 10 -- 20
def findMax (arr : Array Int) : Option Int := Id.run do
if arr.isEmpty then return none
let mut maxVal := arr[0]!
for x in arr do
if x > maxVal then maxVal := x
return some maxVal
#eval findMax #[3, 1, 4, 1, 5, 9, 2, 6] -- some 9
The range syntax [start:stop] generates numbers from start up to but not including stop. Add a third component [start:stop:step] to control the increment. Unlike Python slices, these are exclusive on the right. Unlike C loops, there is no opportunity to mess up the bounds.
While Loops
While loops repeat until their condition becomes false. They work within do blocks using Id.run do for pure computation or directly in IO for effectful operations.
def countdownFrom (n : Nat) : List Nat := Id.run do
let mut result : List Nat := []
let mut i := n
while i > 0 do
result := i :: result
i := i - 1
return result.reverse
#eval countdownFrom 5 -- [5, 4, 3, 2, 1]
def gcd (a b : Nat) : Nat := Id.run do
let mut x := a
let mut y := b
while y != 0 do
let temp := y
y := x % y
x := temp
return x
#eval gcd 48 18 -- 6
#eval gcd 17 13 -- 1
The while true do pattern with early return handles cases where the exit condition is easier to express as “stop when” rather than “continue while.” The GCD example uses the standard Euclidean algorithm, which terminates because the remainder strictly decreases.
Break and Continue
Lean’s continue skips to the next iteration; early return serves as break by exiting the entire function. There is no dedicated break keyword because the do notation’s early return provides the same control flow with clearer semantics.
def findFirst (xs : List Nat) (pred : Nat → Bool) : Option Nat := Id.run do
for x in xs do
if pred x then return some x
return none
#eval findFirst [1, 2, 3, 4, 5] (· > 3) -- some 4
def sumPositives (xs : List Int) : Int := Id.run do
let mut total : Int := 0
for x in xs do
if x <= 0 then continue
total := total + x
return total
#eval sumPositives [1, -2, 3, -4, 5] -- 9
def findInMatrix (m : List (List Nat)) (target : Nat) : Option (Nat × Nat) := Id.run do
let mut i := 0
for row in m do
let mut j := 0
for val in row do
if val == target then return some (i, j)
j := j + 1
i := i + 1
return none
#eval findInMatrix [[1,2,3], [4,5,6], [7,8,9]] 5 -- some (1, 1)
In nested loops, early return exits all the way out, which is usually what you want when searching. If you need to break only the inner loop while continuing the outer, restructure into separate functions.
Mutable State
The let mut syntax introduces mutable bindings within do blocks. Assignment uses :=, and the compiler tracks that mutations happen only within the do block’s scope. Under the hood, Lean transforms this into pure functional code by threading state through the computation.
def imperative_factorial (n : Nat) : Nat := Id.run do
let mut result := 1
let mut i := n
while i > 0 do
result := result * i
i := i - 1
return result
#eval imperative_factorial 5 -- 120
def fibPair (n : Nat) : Nat × Nat := Id.run do
let mut a := 0
let mut b := 1
for _ in [0:n] do
let temp := a + b
a := b
b := temp
return (a, b)
#eval fibPair 10 -- (55, 89)
def buildReversed {α : Type} (xs : List α) : List α := Id.run do
let mut result : List α := []
for x in xs do
result := x :: result
return result
#eval buildReversed [1, 2, 3, 4] -- [4, 3, 2, 1]
def demonstrate_assignment : Nat := Id.run do
let mut x := 10
x := x + 5
x := x * 2
return x
#eval demonstrate_assignment -- 30
The Id.run do wrapper extracts the pure result from what looks like imperative code. The Id monad is the identity monad: it adds no effects, just provides the syntax. This pattern shines when the algorithm is naturally stateful but the result is pure.
Structures
Structures group related data with named fields. If you have used records in ML, structs in Rust, or data classes in Kotlin, the concept is familiar. Unlike C structs, Lean structures come with automatically generated accessor functions, projection notation, and none of the memory layout anxiety.
-- Structures group related data with named fields
structure Point where
x : Float
y : Float
deriving Repr
-- Creating structure instances
def origin : Point := { x := 0.0, y := 0.0 }
def point1 : Point := Point.mk 3.0 4.0
def point2 : Point := ⟨1.0, 2.0⟩
-- Accessing fields
#eval point1.x -- 3.0
#eval point1.y -- 4.0
-- Functions on structures
def distance (p : Point) : Float :=
Float.sqrt (p.x * p.x + p.y * p.y)
#eval distance point1 -- 5.0
The deriving Repr clause automatically generates a Repr instance, which lets #eval display the structure’s contents. Without it, Lean would not know how to print a Point. Other commonly derived instances include BEq for equality comparison with ==, Hashable for use in hash maps, and DecidableEq for propositional equality that can be checked at runtime. You can derive multiple instances by listing them: deriving Repr, BEq, Hashable. The Polymorphism article covers this in more detail.
Updating Structures
Rather than modifying structures in place, Lean provides the with syntax for creating new structures based on existing ones with some fields changed. This functional update pattern means you never have to wonder whether some other part of the code is holding a reference to your data and will be surprised by your mutations.
-- Updating structures with "with" syntax
def moveRight (p : Point) (dx : Float) : Point :=
{ p with x := p.x + dx }
def moveUp (p : Point) (dy : Float) : Point :=
{ p with y := p.y + dy }
#eval moveRight origin 5.0 -- { x := 5.0, y := 0.0 }
-- Multiple field updates
def translate (p : Point) (dx dy : Float) : Point :=
{ p with x := p.x + dx, y := p.y + dy }
#eval translate origin 3.0 4.0 -- { x := 3.0, y := 4.0 }
Nested Structures
Structures can contain other structures, allowing you to build complex data hierarchies. Lean’s projection notation makes accessing nested fields readable: person.address.city works as you would hope, without the verbose getter chains of enterprise Java.
-- Nested structures
structure Rectangle where
topLeft : Point
bottomRight : Point
deriving Repr
def myRect : Rectangle := {
topLeft := { x := 0.0, y := 10.0 },
bottomRight := { x := 10.0, y := 0.0 }
}
def width (r : Rectangle) : Float :=
r.bottomRight.x - r.topLeft.x
def height (r : Rectangle) : Float :=
r.topLeft.y - r.bottomRight.y
def area (r : Rectangle) : Float :=
width r * height r
#eval area myRect -- 100.0
Default Values
Structures can have default values for their fields, making it easy to create instances with sensible defaults while overriding only specific fields.
-- Structures with default values
structure Config where
host : String := "localhost"
port : Nat := 8080
debug : Bool := false
deriving Repr
-- Use defaults
def defaultConfig : Config := {}
-- Override some defaults
def prodConfig : Config := { host := "api.example.com", port := 443 }
#eval defaultConfig -- { host := "localhost", port := 8080, debug := false }
#eval prodConfig -- { host := "api.example.com", port := 443, debug := false }
Inductive Types: Enumerations
Inductive types let you define custom data types by specifying their constructors. This is the core mechanism that makes Lean’s type system expressive: natural numbers, lists, trees, and abstract syntax all emerge from the same primitive. Simple enumerations have constructors with no arguments; more complex types carry data in each variant. Like Starfleet’s ship classification system, each variant is distinct and the compiler ensures you handle them all.
-- Simple enumerations
inductive Direction where
| north
| south
| east
| west
deriving Repr, DecidableEq
def opposite : Direction → Direction
| .north => .south
| .south => .north
| .east => .west
| .west => .east
#eval opposite Direction.north -- Direction.south
-- Starfleet vessel classes
inductive StarshipClass where
| galaxy -- Galaxy-class (Enterprise-D)
| sovereign -- Sovereign-class (Enterprise-E)
| defiant -- Defiant-class (compact warship)
| intrepid -- Intrepid-class (Voyager)
| constitution -- Constitution-class (original Enterprise)
deriving Repr, DecidableEq
def crewComplement : StarshipClass → Nat
| .galaxy => 1014 -- Families welcome
| .sovereign => 855 -- More tactical
| .defiant => 50 -- Tough little ship
| .intrepid => 141 -- Long-range science
| .constitution => 430 -- The classic
#eval crewComplement StarshipClass.defiant -- 50
-- Enums with associated data (MTG spell types)
inductive Spell where
| creature (power : Nat) (toughness : Nat) (manaCost : Nat)
| instant (manaCost : Nat)
| sorcery (manaCost : Nat)
| enchantment (manaCost : Nat) (isAura : Bool)
deriving Repr
def manaCost : Spell → Nat
| .creature _ _ cost => cost
| .instant cost => cost
| .sorcery cost => cost
| .enchantment cost _ => cost
def canBlock : Spell → Bool
| .creature _ toughness _ => toughness > 0
| _ => false
#eval manaCost (Spell.creature 3 3 4) -- 4 (e.g., a 3/3 for 4 mana)
#eval manaCost (Spell.instant 2) -- 2 (e.g., Counterspell)
#eval canBlock (Spell.creature 2 1 1) -- true
#eval canBlock (Spell.enchantment 3 true) -- false
Recursive Inductive Types
Inductive types can be recursive, allowing you to define trees, linked lists, and other self-referential structures. This is where inductive types earn their name: you define larger values in terms of smaller ones, and the recursion has a base case that grounds the whole edifice.
-- Recursive inductive types
inductive BinaryTree (α : Type) where
| leaf : BinaryTree α
| node : α → BinaryTree α → BinaryTree α → BinaryTree α
deriving Repr
-- Building trees
def exampleTree : BinaryTree Nat :=
.node 1
(.node 2 .leaf .leaf)
(.node 3
(.node 4 .leaf .leaf)
.leaf)
-- Counting nodes
def BinaryTree.size {α : Type} : BinaryTree α → Nat
| .leaf => 0
| .node _ left right => 1 + left.size + right.size
#eval exampleTree.size -- 4
-- Computing depth
def BinaryTree.depth {α : Type} : BinaryTree α → Nat
| .leaf => 0
| .node _ left right => 1 + max left.depth right.depth
#eval exampleTree.depth -- 3
-- In-order traversal
def BinaryTree.inorder {α : Type} : BinaryTree α → List α
| .leaf => []
| .node x left right => left.inorder ++ [x] ++ right.inorder
#eval exampleTree.inorder -- [2, 1, 4, 3]
Reading constructor type signatures takes practice. The node constructor has type α → BinaryTree α → BinaryTree α → BinaryTree α. In any arrow chain A → B → C → D, the last type is the return type; everything before is an input. So node takes a value of type α, a left subtree, a right subtree, and produces a tree. The leaf constructor takes no arguments and represents an empty position where the tree ends.
Parameterized Types
Inductive types can be parameterized, making them generic over the types they contain. This is how you write a List α that works for any element type, or an expression tree parameterized by its literal type. One definition, infinitely many instantiations.
-- Expression trees parameterized by the literal type
inductive Expr (α : Type) where
| lit : α → Expr α
| add : Expr α → Expr α → Expr α
| mul : Expr α → Expr α → Expr α
deriving Repr
-- Evaluate for any type with Add and Mul instances
def Expr.eval {α : Type} [Add α] [Mul α] : Expr α → α
| .lit n => n
| .add e1 e2 => e1.eval + e2.eval
| .mul e1 e2 => e1.eval * e2.eval
-- Integer expression: (2 + 3) * 4
def intExpr : Expr Int := .mul (.add (.lit 2) (.lit 3)) (.lit 4)
#eval intExpr.eval -- 20
-- Float expression: (1.5 + 2.5) * 3.0
def floatExpr : Expr Float := .mul (.add (.lit 1.5) (.lit 2.5)) (.lit 3.0)
#eval floatExpr.eval -- 12.0
-- Map a function over all literals
def Expr.map {α β : Type} (f : α → β) : Expr α → Expr β
| .lit n => .lit (f n)
| .add e1 e2 => .add (e1.map f) (e2.map f)
| .mul e1 e2 => .mul (e1.map f) (e2.map f)
-- Convert int expression to float
def floatFromInt : Expr Float := intExpr.map (fun n => Float.ofInt n)
#eval floatFromInt.eval -- 20.0
Mutual Recursion
Sometimes you need multiple definitions that refer to each other, like even and odd functions that call each other, or a parser and its sub-parsers. Lean supports mutually recursive definitions within a mutual block. The termination checker handles these jointly, so your circular definitions must still demonstrably finish.
-- Mutually recursive definitions
mutual
def isEven : Nat → Bool
| 0 => true
| n + 1 => isOdd n
def isOdd : Nat → Bool
| 0 => false
| n + 1 => isEven n
end
#eval isEven 10 -- true
#eval isOdd 7 -- true
-- Mutually recursive types
mutual
inductive Tree (α : Type) where
| node : α → Forest α → Tree α
inductive Forest (α : Type) where
| nil : Forest α
| cons : Tree α → Forest α → Forest α
end
-- Example forest
def exampleForest : Forest Nat :=
.cons (.node 1 .nil)
(.cons (.node 2 (.cons (.node 3 .nil) .nil))
.nil)
FizzBuzz
FizzBuzz is the canonical “can you actually program” interview question, famous for filtering candidates who cannot write a loop. Pattern matching on multiple conditions makes it elegant: match on whether divisible by 3, whether divisible by 5, and the four cases fall out naturally.
def fizzbuzz' (n : Nat) : String :=
match n % 3 == 0, n % 5 == 0 with
| true, true => "FizzBuzz"
| true, false => "Fizz"
| false, true => "Buzz"
| false, false => toString n
def runFizzbuzz (limit : Nat) : List String :=
(List.range limit).map fun i => fizzbuzz' (i + 1)
#eval runFizzbuzz 15
Tip
Run from the repository:
lake exe fizzbuzz 20
The Collatz Conjecture
The Collatz conjecture states that repeatedly applying a simple rule (halve if even, triple and add one if odd) eventually reaches 1 for any positive starting integer. Proposed in 1937, it remains unproven. Mathematicians have verified it for numbers up to \(2^{68}\), yet no one can prove it always works. Erdos said “Mathematics is not yet ready for such problems.”
The recursion here needs fuel (a maximum step count) because we cannot prove termination. If we could, we would have solved a famous open problem.
/-- The Collatz conjecture: every positive integer eventually reaches 1.
Unproven since 1937, but we can at least watch the journey. -/
def collatzStep (n : Nat) : Nat :=
if n % 2 == 0 then n / 2 else 3 * n + 1
def collatzSequence (n : Nat) (fuel : Nat := 1000) : List Nat :=
match fuel with
| 0 => [n] -- give up, though Collatz would be disappointed
| fuel' + 1 =>
if n <= 1 then [n]
else n :: collatzSequence (collatzStep n) fuel'
def collatzLength (n : Nat) : Nat :=
(collatzSequence n).length
-- The famous 27: takes 111 steps and peaks at 9232
#eval collatzSequence 27
#eval collatzLength 27
-- Find the longest sequence for starting values 1 to n
def longestCollatz (n : Nat) : Nat × Nat :=
(List.range n).map (· + 1)
|>.map (fun k => (k, collatzLength k))
|>.foldl (fun acc pair => if pair.2 > acc.2 then pair else acc) (1, 1)
#eval longestCollatz 100 -- (97, 119)
Tip
Run from the repository:
lake exe collatz 27
Role Playing Game Example
The constructs above combine naturally in larger programs. What better way to demonstrate this than modeling the “real world”? We will build a Dungeons & Dragons character generator. D&D is a tabletop role-playing game where players create characters with ability scores, races, and classes, then embark on adventures guided by dice rolls and a referee called the Dungeon Master. The game has been running since 1974, which makes it older than most programming languages and considerably more fun than COBOL.
Structures hold character data, inductive types represent races and classes, pattern matching computes racial bonuses, and recursion drives the dice-rolling simulation.
We start by defining the data types that model our domain. The AbilityScores structure bundles the six core abilities. Inductive types enumerate races, classes, and backgrounds. The Character structure ties everything together:
structure AbilityScores where
strength : Nat
dexterity : Nat
constitution : Nat
intelligence : Nat
wisdom : Nat
charisma : Nat
deriving Repr
inductive Race where
| human
| elf
| dwarf
| halfling
| dragonborn
| tiefling
deriving Repr, DecidableEq
inductive CharClass where
| fighter -- d10 hit die
| wizard -- d6 hit die
| rogue -- d8 hit die
| cleric -- d8 hit die
| barbarian -- d12 hit die
| bard -- d8 hit die
deriving Repr, DecidableEq
inductive Background where
| soldier
| sage
| criminal
| noble
| hermit
| entertainer
deriving Repr, DecidableEq
structure Character where
name : String
race : Race
charClass : CharClass
background : Background
level : Nat
abilities : AbilityScores
hitPoints : Nat
deriving Repr
Racial bonuses modify ability scores based on the character’s race. Pattern matching maps each race to its specific bonuses. Humans get +1 to everything; elves favor dexterity and intelligence; dwarves are sturdy:
def racialBonuses (r : Race) : AbilityScores :=
match r with
| .human => { strength := 1, dexterity := 1, constitution := 1,
intelligence := 1, wisdom := 1, charisma := 1 }
| .elf => { strength := 0, dexterity := 2, constitution := 0,
intelligence := 1, wisdom := 0, charisma := 0 }
| .dwarf => { strength := 0, dexterity := 0, constitution := 2,
intelligence := 0, wisdom := 1, charisma := 0 }
| .halfling => { strength := 0, dexterity := 2, constitution := 0,
intelligence := 0, wisdom := 0, charisma := 1 }
| .dragonborn => { strength := 2, dexterity := 0, constitution := 0,
intelligence := 0, wisdom := 0, charisma := 1 }
| .tiefling => { strength := 0, dexterity := 0, constitution := 0,
intelligence := 1, wisdom := 0, charisma := 2 }
Each character class has a different hit die, representing how much health they gain per level. Wizards are fragile with a d6, while barbarians are tanks with a d12:
def hitDie (c : CharClass) : Nat :=
match c with
| .wizard => 6
| .rogue => 8
| .bard => 8
| .cleric => 8
| .fighter => 10
| .barbarian => 12
Random character generation needs randomness. A linear congruential generator provides deterministic pseudo-random numbers, which means the same seed produces the same character:
structure RNG where
state : Nat
deriving Repr
def RNG.next (rng : RNG) : RNG × Nat :=
let newState := (rng.state * 1103515245 + 12345) % (2^31)
(⟨newState⟩, newState)
def RNG.range (rng : RNG) (lo hi : Nat) : RNG × Nat :=
let (rng', n) := rng.next
let range := hi - lo + 1
(rng', lo + n % range)
D&D ability scores use 4d6-drop-lowest: roll four six-sided dice and discard the lowest. This generates scores between 3 and 18, heavily weighted toward the middle. We thread the RNG state through each dice roll to maintain determinism:
def roll4d6DropLowest (rng : RNG) : RNG × Nat :=
let (rng1, d1) := rng.range 1 6
let (rng2, d2) := rng1.range 1 6
let (rng3, d3) := rng2.range 1 6
let (rng4, d4) := rng3.range 1 6
let dice := [d1, d2, d3, d4]
let sorted := dice.toArray.qsort (· < ·) |>.toList
let top3 := sorted.drop 1
(rng4, top3.foldl (· + ·) 0)
def rollAbilityScores (rng : RNG) : RNG × AbilityScores :=
let (rng1, str) := roll4d6DropLowest rng
let (rng2, dex) := roll4d6DropLowest rng1
let (rng3, con) := roll4d6DropLowest rng2
let (rng4, int) := roll4d6DropLowest rng3
let (rng5, wis) := roll4d6DropLowest rng4
let (rng6, cha) := roll4d6DropLowest rng5
(rng6, { strength := str, dexterity := dex, constitution := con,
intelligence := int, wisdom := wis, charisma := cha })
Character generation threads the RNG through multiple operations: pick a name, race, class, and background, roll ability scores, apply racial bonuses, and calculate starting hit points:
def applyRacialBonuses (base : AbilityScores) (race : Race) : AbilityScores :=
let bonus := racialBonuses race
{ strength := base.strength + bonus.strength
dexterity := base.dexterity + bonus.dexterity
constitution := base.constitution + bonus.constitution
intelligence := base.intelligence + bonus.intelligence
wisdom := base.wisdom + bonus.wisdom
charisma := base.charisma + bonus.charisma }
def abilityModifier (score : Nat) : Int :=
(score : Int) / 2 - 5
def startingHP (con : Nat) (c : CharClass) : Nat :=
let conMod := abilityModifier con
let baseHP := hitDie c
if conMod < 0 && baseHP < conMod.natAbs then 1
else (baseHP : Int) + conMod |>.toNat
def pickRandom {α : Type} (rng : RNG) (xs : List α) (default : α) : RNG × α :=
match xs with
| [] => (rng, default)
| _ =>
let (rng', idx) := rng.range 0 (xs.length - 1)
(rng', xs[idx]?.getD default)
def allRaces : List Race := [.human, .elf, .dwarf, .halfling, .dragonborn, .tiefling]
def allClasses : List CharClass := [.fighter, .wizard, .rogue, .cleric, .barbarian, .bard]
def allBackgrounds : List Background := [.soldier, .sage, .criminal, .noble, .hermit, .entertainer]
-- Names that would make sense at both a D&D table and a PL theory conference
def fantasyNames : List String := [
"Alonzo the Untyped", -- Church, lambda calculus
"Haskell the Memory Guzzler", -- I didn't choose the thunk life, the thunk life chose me
"Dana the Continuous", -- Scott, domain theory
"Thorin Typechecker", -- Tolkien meets compilation
"Edsger the Structured", -- Dijkstra, structured programming
"Kurt the Incomplete" -- Gödel, incompleteness theorems
]
def generateCharacter (seed : Nat) : Character :=
let rng : RNG := ⟨seed⟩
let (rng1, name) := pickRandom rng fantasyNames "Adventurer"
let (rng2, race) := pickRandom rng1 allRaces .human
let (rng3, charClass) := pickRandom rng2 allClasses .fighter
let (rng4, background) := pickRandom rng3 allBackgrounds .soldier
let (_, baseAbilities) := rollAbilityScores rng4
let abilities := applyRacialBonuses baseAbilities race
let hp := startingHP abilities.constitution charClass
{ name := name
race := race
charClass := charClass
background := background
level := 1
abilities := abilities
hitPoints := hp }
Display functions convert internal representations to human-readable strings. The modifier calculation implements D&D’s (score / 2 - 5) formula:
def raceName : Race → String
| .human => "Human"
| .elf => "Elf"
| .dwarf => "Dwarf"
| .halfling => "Halfling"
| .dragonborn => "Dragonborn"
| .tiefling => "Tiefling"
def className : CharClass → String
| .fighter => "Fighter"
| .wizard => "Wizard"
| .rogue => "Rogue"
| .cleric => "Cleric"
| .barbarian => "Barbarian"
| .bard => "Bard"
def backgroundName : Background → String
| .soldier => "Soldier"
| .sage => "Sage"
| .criminal => "Criminal"
| .noble => "Noble"
| .hermit => "Hermit"
| .entertainer => "Entertainer"
def formatModifier (score : Nat) : String :=
let m := abilityModifier score
if m >= 0 then s!"+{m}" else s!"{m}"
def printCharacter (c : Character) : IO Unit := do
IO.println "======================================"
IO.println s!" {c.name}"
IO.println "======================================"
IO.println s!" Level {c.level} {raceName c.race} {className c.charClass}"
IO.println s!" Background: {backgroundName c.background}"
IO.println ""
IO.println " ABILITY SCORES"
IO.println " --------------"
IO.println s!" STR: {c.abilities.strength} ({formatModifier c.abilities.strength})"
IO.println s!" DEX: {c.abilities.dexterity} ({formatModifier c.abilities.dexterity})"
IO.println s!" CON: {c.abilities.constitution} ({formatModifier c.abilities.constitution})"
IO.println s!" INT: {c.abilities.intelligence} ({formatModifier c.abilities.intelligence})"
IO.println s!" WIS: {c.abilities.wisdom} ({formatModifier c.abilities.wisdom})"
IO.println s!" CHA: {c.abilities.charisma} ({formatModifier c.abilities.charisma})"
IO.println ""
IO.println s!" Hit Points: {c.hitPoints}"
IO.println s!" Hit Die: d{hitDie c.charClass}"
IO.println "======================================"
The main function reads a seed from command-line arguments (defaulting to 42), generates a character, and prints it in a formatted sheet:
def main (args : List String) : IO Unit := do
let seed := match args.head? >>= String.toNat? with
| some n => n
| none => 42
IO.println ""
IO.println " D&D CHARACTER GENERATOR"
IO.println " Type-Safe Adventures Await!"
IO.println ""
let character := generateCharacter seed
printCharacter character
IO.println ""
IO.println "Your adventure begins..."
IO.println "(Use a different seed for a different character: lake exe dnd <seed>)"
Try different seeds to generate different characters. The generator uses a deterministic pseudo-random number generator, so the same seed always produces the same character.
Tip
Run from the repository:
lake exe dnd 42. The full source is on GitHub.
Toward Abstraction
With structures and inductive types, you can model complex domains. But real programs need abstraction over types themselves. Polymorphism and type classes let you write code that works for any type satisfying certain constraints. This is how you build generic libraries without sacrificing type safety.
Standard Library and Batteries
Lean’s standard library (Std) provides essential data structures and utilities. The community Batteries package extends it further with additional data structures and algorithms. This chapter surveys the most useful parts of both.
Std Data Structures
The Std namespace contains hash-based and tree-based collections that cover most common use cases.
HashMap
Hash maps provide average \(O(1)\) insertion and lookup. Use them when key order doesn’t matter.
-- HashMap: O(1) average insert/lookup
def hashmapDemo : IO Unit := do
-- Create a HashMap from key-value pairs
let mut scores : Std.HashMap String Nat := {}
scores := scores.insert "alice" 95
scores := scores.insert "bob" 87
scores := scores.insert "carol" 92
-- Lookup with get?
let aliceScore := scores.get? "alice"
IO.println s!"alice: {aliceScore}" -- some 95
-- Check membership
IO.println s!"contains bob: {scores.contains "bob"}" -- true
-- Get with default
let daveScore := scores.getD "dave" 0
IO.println s!"dave (default 0): {daveScore}" -- 0
-- Iterate over entries
IO.println "All scores:"
for (name, score) in scores do
IO.println s!" {name}: {score}"
HashSet
Hash sets efficiently track membership without associated values.
-- HashSet: O(1) average membership testing
def hashsetDemo : IO Unit := do
-- Create a HashSet
let mut seen : Std.HashSet String := {}
seen := seen.insert "apple"
seen := seen.insert "banana"
seen := seen.insert "cherry"
IO.println s!"contains apple: {seen.contains "apple"}" -- true
IO.println s!"contains grape: {seen.contains "grape"}" -- false
IO.println s!"size: {seen.size}" -- 3
-- Set operations
let more : Std.HashSet String := Std.HashSet.ofList ["banana", "date", "elderberry"]
let combined := seen.union more
IO.println s!"union size: {combined.size}" -- 5
TreeMap
Tree maps maintain keys in sorted order with \(O(\log n)\) operations. Use when you need ordered iteration or range queries.
-- TreeMap: ordered map with O(log n) operations
def treemapDemo : IO Unit := do
-- TreeMap keeps keys sorted
let mut prices : Std.TreeMap String Nat := {}
prices := prices.insert "banana" 120
prices := prices.insert "apple" 100
prices := prices.insert "cherry" 300
-- Iteration is in sorted order
IO.println "Prices (sorted by name):"
for (fruit, price) in prices do
IO.println s!" {fruit}: {price}"
-- Size and membership
IO.println s!"size: {prices.size}"
IO.println s!"contains apple: {prices.contains "apple"}"
Time
Basic timing operations are available through IO.
-- Time: dates, times, and durations
def timeDemo : IO Unit := do
-- Get current time
let now ← IO.monoNanosNow
IO.println s!"Monotonic nanoseconds: {now}"
-- Durations
let oneSecond := 1000000000 -- nanoseconds
let elapsed := now % oneSecond
IO.println s!"Nanoseconds into current second: {elapsed}"
-- For wall-clock time, use IO.getNumHeartbeats or external libraries
let heartbeats ← IO.getNumHeartbeats
IO.println s!"Heartbeats: {heartbeats}"
For full date/time handling, the Std.Time module provides DateTime, Duration, and timezone support.
Parsec
The Std.Internal.Parsec module provides parser combinators for building parsers from smaller pieces. It includes character-level parsers (digit, asciiLetter), repetition combinators (many, many1), and string building functions.
-- Parsec: parser combinators for parsing text
open Std.Internal.Parsec String in
def parsecDemo : IO Unit := do
-- Parser for a natural number
let parseNat : Parser Nat := digits
-- Parser for a word (one or more ASCII letters)
let parseWord : Parser String := many1Chars asciiLetter
-- Parser for comma-separated numbers
let parseNumList : Parser (List Nat) := do
let first ← parseNat
let rest ← many (skipChar ',' *> parseNat)
pure (first :: rest.toList)
-- Run parsers
match parseNat.run "12345" with
| .ok n => IO.println s!"Number: {n}" -- 12345
| .error e => IO.println s!"Error: {e}"
match parseWord.run "hello123" with
| .ok s => IO.println s!"Word: {s}" -- "hello"
| .error e => IO.println s!"Error: {e}"
match parseNumList.run "1,23,456" with
| .ok ns => IO.println s!"List: {ns}" -- [1, 23, 456]
| .error e => IO.println s!"Error: {e}"
For more advanced parsing needs, community libraries like lean4-parser and Megaparsec.lean offer additional features.
Batteries
The Batteries package provides additional data structures beyond Std. Add it to your project in lakefile.lean:
require batteries from git
"https://github.com/leanprover-community/batteries" @ "main"
Or in lakefile.toml:
[[require]]
name = "batteries"
scope = "leanprover-community"
rev = "main"
Batteries provides several data structures beyond Std.
BinaryHeap
A priority queue with \(O(\log n)\) insertion and extraction. Useful for scheduling, graph algorithms, and any problem requiring repeated min/max extraction.
-- BinaryHeap: priority queue with O(log n) push/pop
def heapDemo : IO Unit := do
-- Max-heap (largest first with >)
let heap := Batteries.BinaryHeap.empty (α := Nat) (lt := (· > ·))
|>.insert 5
|>.insert 1
|>.insert 3
|>.insert 9
IO.println s!"heap max: {heap.max}" -- some 9
-- Pop the max element
let (maxVal, heap') := heap.extractMax
IO.println s!"extracted: {maxVal}" -- some 9
IO.println s!"new max: {heap'.max}" -- some 5
The comparator determines ordering: (· > ·) for max-heap, (· < ·) for min-heap.
RBMap and RBSet
Red-black tree maps and sets with \(O(\log n)\) operations and ordered iteration. Use when you need sorted keys or efficient range queries.
-- RBMap: ordered map with O(log n) operations
def rbmapDemo : IO Unit := do
-- Create a map with String keys, Nat values
let scores : Batteries.RBMap String Nat compare :=
Batteries.RBMap.empty
|>.insert "alice" 95
|>.insert "bob" 87
|>.insert "carol" 92
-- Lookup
let aliceScore := scores.find? "alice"
let daveScore := scores.find? "dave"
IO.println s!"alice: {aliceScore}" -- some 95
IO.println s!"dave: {daveScore}" -- none
-- Convert to list for display (sorted by key)
IO.println s!"all: {scores.toList}"
Unlike HashMap, iteration order is deterministic (sorted by key).
UnionFind
Disjoint set data structure with near-constant time union and find operations. Essential for Kruskal’s algorithm, connected components, and equivalence class problems.
-- UnionFind: disjoint set with near O(1) union/find
def unionFindDemo : IO Unit := do
-- Create structure and add 5 elements (indices 0-4)
let uf := Batteries.UnionFind.empty
|>.push |>.push |>.push |>.push |>.push
-- Union 0 with 1, and 2 with 3
let uf := uf.union! 0 1
let uf := uf.union! 2 3
-- Check if elements are in same set (same root)
let root0 := uf.rootD 0
let root1 := uf.rootD 1
let root2 := uf.rootD 2
let root4 := uf.rootD 4
IO.println s!"0 and 1 same set: {root0 == root1}" -- true
IO.println s!"0 and 2 same set: {root0 == root2}" -- false
IO.println s!"4 alone: {root4 == 4}" -- true
DList
Difference lists enable \(O(1)\) concatenation by representing lists as functions. Useful when building lists by repeated appending, which would be \(O(n^2)\) with regular lists.
-- DList: difference list with O(1) append
def dlistDemo : IO Unit := do
-- Build a list by repeated appending
-- With regular List, this would be O(n^2), with DList it's O(n)
let d1 : Batteries.DList Nat := Batteries.DList.singleton 1
let d2 := d1.push 2
let d3 := d2.push 3
let d4 := d3.append (Batteries.DList.ofList [4, 5, 6])
-- Convert back to List when done building
let result := d4.toList
IO.println s!"result: {result}" -- [1, 2, 3, 4, 5, 6]
Collection Extensions
Batteries extends List, Array, and String with additional operations.
-- Batteries extends List and Array with useful functions
def batteriesListDemo : IO Unit := do
let nums := [1, 2, 3, 4, 5]
-- Chunking
let chunks := nums.toChunks 2
IO.println s!"toChunks 2: {chunks}" -- [[1, 2], [3, 4], [5]]
-- Rotating
let rotL := nums.rotateLeft 2
let rotR := nums.rotateRight 2
IO.println s!"rotateLeft 2: {rotL}" -- [3, 4, 5, 1, 2]
IO.println s!"rotateRight 2: {rotR}" -- [4, 5, 1, 2, 3]
-- Partitioning
let (small, big) := nums.partition (· ≤ 3)
IO.println s!"partition: small={small}, big={big}"
-- Erasure (remove first occurrence)
let erased := nums.erase 3
IO.println s!"erase 3: {erased}" -- [1, 2, 4, 5]
Other useful additions include List.enum (pairs elements with indices), Array.swap (exchange two elements), and various String utilities.
IO Operations
The IO monad handles all side effects. The Effects chapter covers monads in depth; here we focus on practical operations.
-- Basic IO operations
def ioDemo : IO Unit := do
-- Print to stdout
IO.println "Hello, world!"
IO.print "No newline"
IO.println "" -- newline
-- Environment variables
let path ← IO.getEnv "PATH"
IO.println s!"PATH exists: {path.isSome}"
-- Current time (milliseconds since epoch)
let now ← IO.monoMsNow
IO.println s!"Timestamp: {now}"
-- Random numbers
let rand ← IO.rand 1 100
IO.println s!"Random 1-100: {rand}"
Files and Directories
-- File operations
def fileDemo : IO Unit := do
let testFile := "test_output.txt"
-- Write to file
IO.FS.writeFile testFile "Line 1\nLine 2\nLine 3\n"
-- Read entire file
let contents ← IO.FS.readFile testFile
IO.println s!"Contents:\n{contents}"
-- Read as lines
let lines ← IO.FS.lines testFile
IO.println s!"Number of lines: {lines.size}"
for line in lines do
IO.println s!" > {line}"
-- Clean up
IO.FS.removeFile testFile
-- Directory operations
def directoryDemo : IO Unit := do
-- Current working directory
let cwd ← IO.currentDir
IO.println s!"Current directory: {cwd}"
-- List directory contents
let entries ← System.FilePath.readDir "."
IO.println s!"Files in current directory: {entries.size}"
-- Print first few entries
for entry in entries.toList.take 5 do
IO.println s!" {entry.fileName}"
External Processes
-- Running external processes
def processDemo : IO Unit := do
-- Run a command and capture output
let output ← IO.Process.output {
cmd := "echo"
args := #["Hello from subprocess"]
}
IO.println s!"Exit code: {output.exitCode}"
IO.println s!"stdout: {output.stdout}"
-- Check if command succeeded
if output.exitCode == 0 then
IO.println "Command succeeded"
else
IO.println s!"Command failed: {output.stderr}"
Finding Packages
Reservoir indexes the Lean package ecosystem. Notable packages:
- mathlib4: Comprehensive mathematics library
- aesop: Proof automation via best-first search
- lean4-cli: Command-line argument parsing
- Qq: Quoted expressions for metaprogramming
- ProofWidgets: Interactive proof visualization
Practical Example
A word frequency counter combining HashMap, String operations, and list processing:
-- Practical example: word frequency counter
def countWords (text : String) : Std.HashMap String Nat :=
let words := text.toLower.splitOn " "
|>.map (fun w => w.toList.filter Char.isAlpha |> String.ofList)
|>.filter (!·.isEmpty)
words.foldl (fun map word =>
map.insert word ((map.get? word).getD 0 + 1)
) {}
def wordFreqDemo : IO Unit := do
let text := "The quick brown fox jumps over the lazy dog The dog barks"
let freq := countWords text
IO.println "Word frequencies:"
let sorted := freq.toList.toArray
|>.qsort (fun a b => a.2 > b.2)
for (word, count) in sorted do
IO.println s!" {word}: {count}"
Polymorphism and Type Classes
On September 23, 1999, the Mars Climate Orbiter disintegrated in the Martian atmosphere because one piece of software produced thrust data in pound-force seconds while another expected newton-seconds. A 327 million dollar spacecraft, destroyed by a unit conversion error that any undergraduate physics student could catch. The software worked. The math was correct. The types simply failed to express the constraints that mattered.
This is what motivates the machinery in this article. Type classes and phantom types let you encode dimensional constraints directly in the type system. You cannot add meters to seconds because the compiler will not let you. You cannot pass thrust in the wrong units because the function signature forbids it. These constraints cost nothing at runtime, they compile away completely, but they catch at compile time the very class of error that destroyed a spacecraft. By the end of this article, you will see how to build such a system yourself.
But safety is only half the story. Polymorphism is also about not writing the same code twice. A sorting algorithm should not care whether it sorts integers, strings, or financial instruments. A data structure should not be rewritten for each type it might contain. The alternative is copying code and changing types by hand, which is how bugs are born and how programmers lose their minds. Polymorphism is the machinery that makes abstraction possible without sacrificing type safety.
In 1967, Christopher Strachey drew a distinction that would shape programming languages for decades: parametric polymorphism, where code works uniformly for all types, versus ad-hoc polymorphism, where the behavior changes based on the specific type involved. The first gives you reverse : List α → List α, a function blissfully ignorant of what the list contains. The second gives you +, which does quite different things to integers, floats, and matrices. Lean provides both, unified under a type class system that traces its lineage back to the 1989 paper How to Make Ad-Hoc Polymorphism Less Ad Hoc. The result is generic code that is simultaneously flexible and precise.
Parametric Polymorphism
Functions can take type parameters, allowing them to work with any type without knowing or caring what that type is. The function length : List α → Nat counts elements whether they are integers, strings, or proof terms. The curly braces indicate implicit arguments that Lean infers automatically, sparing you the tedium of writing length (α := Int) myList everywhere.
def identity {α : Type} (x : α) : α := x
#eval identity 42 -- 42
#eval identity "hello" -- "hello"
#eval identity [1, 2, 3] -- [1, 2, 3]
def compose {α β γ : Type} (g : β → γ) (f : α → β) : α → γ :=
fun x => g (f x)
def addOne (x : Nat) : Nat := x + 1
def double (x : Nat) : Nat := x * 2
#eval compose double addOne 5 -- 12
def flip {α β γ : Type} (f : α → β → γ) : β → α → γ :=
fun b a => f a b
#eval flip Nat.sub 3 10 -- 7
Polymorphic Data Types
Data types can also be parameterized, creating generic containers that work with any element type. You define List α once and get lists of integers, lists of strings, and lists of lists for free. The alternative, writing IntList, StringList, and ListOfLists separately, is how Java programmers spent the 1990s.
def Pair (α β : Type) := α × β
def makePair {α β : Type} (a : α) (b : β) : Pair α β := (a, b)
#eval makePair 1 "one" -- (1, "one")
inductive Either (α β : Type) where
| left : α → Either α β
| right : β → Either α β
deriving Repr
def mapEither {α β γ : Type} (f : β → γ) : Either α β → Either α γ
| .left a => .left a
| .right b => .right (f b)
#eval mapEither (· + 1) (Either.right 5 : Either String Nat)
Implicit Arguments
Implicit arguments in curly braces are inferred by Lean from context. When inference fails or you want to override it, the @ prefix makes everything explicit. This escape hatch is rarely needed, but when you need it, you really need it.
def listLength {α : Type} (xs : List α) : Nat :=
match xs with
| [] => 0
| _ :: rest => 1 + listLength rest
#eval listLength [1, 2, 3] -- 3
#eval listLength ["a", "b"] -- 2
#eval @listLength Nat [1, 2, 3] -- explicit type argument
def firstOrDefault {α : Type} (xs : List α) (default : α) : α :=
match xs with
| [] => default
| x :: _ => x
#eval firstOrDefault [1, 2, 3] 0 -- 1
#eval firstOrDefault ([] : List Nat) 0 -- 0
Instance Arguments
Square brackets denote instance arguments, resolved through type class inference. When you write [Add α], you are saying “give me any type that knows how to add.” The compiler finds the right implementation automatically. This is the mechanism that lets + work on integers, floats, vectors, and anything else with an Add instance.
def printTwice {α : Type} [ToString α] (x : α) : String :=
s!"{x} and {x}"
#eval printTwice 42 -- "42 and 42"
#eval printTwice true -- "true and true"
#eval printTwice "hi" -- "hi and hi"
def maximum {α : Type} [Ord α] (xs : List α) : Option α :=
xs.foldl (init := none) fun acc x =>
match acc with
| none => some x
| some m => if compare x m == .gt then some x else some m
#eval maximum [3, 1, 4, 1, 5, 9] -- some 9
#eval maximum ["b", "a", "c"] -- some "c"
Defining Type Classes
Type classes define interfaces that types can implement, but unlike object-oriented interfaces, the implementation is external to the type. You can add new capabilities to existing types without modifying them. This is how Lean can make + work for types defined in libraries you do not control.
class Printable (α : Type) where
format : α → String
instance : Printable Nat where
format n := s!"Nat({n})"
instance : Printable Bool where
format b := if b then "yes" else "no"
instance : Printable String where
format s := s!"\"{s}\""
def showValue {α : Type} [Printable α] (x : α) : String :=
Printable.format x
#eval showValue 42 -- "Nat(42)"
#eval showValue true -- "yes"
#eval showValue "test" -- "\"test\""
Tip
A type class constraint like
[Add α]is a proof obligation. The compiler must find evidence thatαsupports addition. Instance resolution is proof search. This connection between “finding implementations” and “finding proofs” is not a metaphor; it is the same mechanism. When you reach the Proofs article, you will see the compiler searching for proofs exactly as it searches for instances here.
Polymorphic Instances
Instances themselves can be polymorphic, building complex instances from simpler ones. If you can print an α, you can print a List α. This compositionality is the quiet superpower of type classes: small building blocks assemble into sophisticated behavior without explicit wiring.
instance {α : Type} [Printable α] : Printable (List α) where
format xs :=
let items := xs.map Printable.format
"[" ++ ", ".intercalate items ++ "]"
instance {α β : Type} [Printable α] [Printable β] : Printable (α × β) where
format p := s!"({Printable.format p.1}, {Printable.format p.2})"
#eval showValue [1, 2, 3] -- "[Nat(1), Nat(2), Nat(3)]"
#eval showValue (42, true) -- "(Nat(42), yes)"
#eval showValue [(1, true), (2, false)]
Numeric Type Classes
Type classes excel at abstracting over numeric operations. Write your algorithm once against an abstract Mul and Add, and it works for integers, rationals, complex numbers, matrices, and polynomials. The abstraction costs nothing at runtime because instance resolution happens at compile time. The generic code specializes to concrete operations in the generated code.
class Addable (α : Type) where
add : α → α → α
zero : α
instance : Addable Nat where
add := Nat.add
zero := 0
instance : Addable Int where
add := Int.add
zero := 0
instance : Addable Float where
add := Float.add
zero := 0.0
def sumList {α : Type} [Addable α] (xs : List α) : α :=
xs.foldl Addable.add Addable.zero
#eval sumList [1, 2, 3, 4, 5] -- 15
#eval sumList [1.5, 2.5, 3.0] -- 7.0
#eval sumList ([-1, 2, -3] : List Int) -- -2
Extending Classes
Type classes can extend other classes, inheriting their operations while adding new ones. An Ord instance gives you compare, and from that you get <, ≤, >, ≥, min, and max for free. The hierarchy of algebraic structures in Mathlib, from magmas through groups to rings and fields, is built this way.
class Eq' (α : Type) where
eq : α → α → Bool
class Ord' (α : Type) extends Eq' α where
lt : α → α → Bool
instance : Eq' Nat where
eq := (· == ·)
instance : Ord' Nat where
eq := (· == ·)
lt := (· < ·)
def sortedInsert {α : Type} [Ord' α] (x : α) (xs : List α) : List α :=
match xs with
| [] => [x]
| y :: ys => if Ord'.lt x y then x :: y :: ys else y :: sortedInsert x ys
#eval sortedInsert 3 [1, 2, 4, 5] -- [1, 2, 3, 4, 5]
Functor
The Functor pattern captures the idea of mapping a function over a structure while preserving its shape. Lists, options, arrays, trees, and IO actions are all functors. Once you see the pattern, you see it everywhere: any context that wraps a value and lets you transform that value without escaping the context.
class Functor' (F : Type → Type) where
map : {α β : Type} → (α → β) → F α → F β
instance : Functor' List where
map := List.map
instance : Functor' Option where
map f
| none => none
| some x => some (f x)
def doubleAll {F : Type → Type} [Functor' F] (xs : F Nat) : F Nat :=
Functor'.map (· * 2) xs
#eval doubleAll [1, 2, 3] -- [2, 4, 6]
#eval doubleAll (some 21) -- some 42
#eval doubleAll (none : Option Nat) -- none
Note
For readers familiar with category theory:
Functorhere is an endofunctor on the category of types, where objects are types and morphisms are functions. Themapoperation lifts a morphism \(f : A \to B\) to \(F(f) : F(A) \to F(B)\). The Yoneda lemma tells us that a functor is completely determined by how morphisms map into it. You do not need category theory to use functors effectively, but if you have the background, the connection is there.
Multiple Constraints
Functions can require multiple type class constraints, combining capabilities from different classes. A sorting function needs Ord; a function that sorts and prints needs [Ord α] [Repr α]. The constraints document exactly what your function requires, nothing more, nothing less.
def showCompare {α : Type} [ToString α] [Ord α] (x y : α) : String :=
let result := match compare x y with
| .lt => "less than"
| .eq => "equal to"
| .gt => "greater than"
s!"{x} is {result} {y}"
#eval showCompare 3 5 -- "3 is less than 5"
#eval showCompare "b" "a" -- "b is greater than a"
def sortAndShow {α : Type} [ToString α] [Ord α] (xs : List α) : String :=
let sorted := xs.toArray.qsort (compare · · == .lt) |>.toList
s!"{sorted}"
#eval sortAndShow [3, 1, 4, 1, 5] -- "[1, 1, 3, 4, 5]"
The sortAndShow function demonstrates a common pattern: convert to Array for efficient in-place sorting with qsort, then convert back to List. The predicate (compare · · == .lt) returns true when the first argument is less than the second, giving ascending order.
Deriving Instances
Many standard type classes can be automatically derived, saving you from writing boilerplate that follows predictable patterns. The deriving clause generates instances for Repr, BEq, Hashable, and others. Let the machine do the mechanical work; save your creativity for the parts that require thought.
structure Point' where
x : Nat
y : Nat
deriving Repr, BEq, Hashable
#eval Point'.mk 1 2 == Point'.mk 1 2 -- true
#eval Point'.mk 1 2 == Point'.mk 3 4 -- false
inductive Color where
| red | green | blue
deriving Repr, DecidableEq
#eval Color.red == Color.red -- true
#eval Color.red == Color.blue -- false
structure Person where
name : String
age : Nat
deriving Repr, BEq
def alice : Person := { name := "Alice", age := 30 }
def bob : Person := { name := "Bob", age := 25 }
#eval alice == bob -- false
#eval repr alice -- { name := "Alice", age := 30 }
You will notice both BEq and DecidableEq in deriving clauses. BEq provides the == operator for boolean equality tests. DecidableEq is stronger: it provides propositional equality that works in dependent contexts like if expressions where the branches have different types, and in proofs. For simple comparisons, BEq suffices; for anything involving the type system or proofs, you want DecidableEq.
Attributes
Attributes tag declarations with metadata that affects how Lean processes them. The @[simp] attribute marks a lemma for use by the simp tactic. The @[instance] attribute registers a type class instance. Attributes are how you opt declarations into various compiler subsystems.
-- Attributes attach metadata to declarations
@[simp] theorem add_zero_right' (n : Nat) : n + 0 = n := Nat.add_zero n
-- The @[simp] attribute marks this for use by the simp tactic
-- Common attributes: simp, inline, reducible, instance, class
See Tactics for how simp uses attributed lemmas.
Composable Spell Effects
Type classes shine when you have multiple types that share a common interface but differ in implementation. Consider a spell system where spells can deal damage, heal, apply buffs, or inflict status effects. Each effect type is different, but all effects can be described and have a potency:
structure Damage where
amount : Nat
element : Element
deriving Repr
structure Healing where
amount : Nat
deriving Repr
structure Buff where
stat : String
bonus : Int
duration : Nat
deriving Repr
structure StatusEffect where
name : String
duration : Nat
deriving Repr
The SpellEffect type class captures this abstraction. Any type with a SpellEffect instance can describe itself and report its potency:
class SpellEffect (ε : Type) where
describe : ε → String
potency : ε → Nat
instance : SpellEffect Damage where
describe d := s!"{d.amount} {d.element.name} damage"
potency d := d.amount
instance : SpellEffect Healing where
describe h := s!"restore {h.amount} HP"
potency h := h.amount
instance : SpellEffect Buff where
describe b := s!"+{b.bonus} {b.stat} for {b.duration} turns"
potency b := b.bonus.natAbs * b.duration
instance : SpellEffect StatusEffect where
describe s := s!"{s.name} for {s.duration} turns"
potency s := s.duration * 2
Combining Effects
The real payoff comes when effects can be combined. A spell that damages AND heals (like Drain Life) or damages AND poisons (like Venom Strike) requires representing compound effects. The Effect inductive wraps all effect types and adds a compound constructor for combinations:
inductive Effect where
| damage (d : Damage)
| healing (h : Healing)
| buff (b : Buff)
| status (s : StatusEffect)
| compound (fst snd : Effect)
deriving Repr
def Effect.describe : Effect → String
| .damage d => SpellEffect.describe d
| .healing h => SpellEffect.describe h
| .buff b => SpellEffect.describe b
| .status s => SpellEffect.describe s
| .compound fst snd => s!"{fst.describe} + {snd.describe}"
def Effect.potency : Effect → Nat
| .damage d => SpellEffect.potency d
| .healing h => SpellEffect.potency h
| .buff b => SpellEffect.potency b
| .status s => SpellEffect.potency s
| .compound fst snd => fst.potency + snd.potency
instance : SpellEffect Effect where
describe := Effect.describe
potency := Effect.potency
instance : Append Effect where
append := Effect.compound
The Append instance gives us the ++ operator, so you can write .damage ⟨6, .dark⟩ ++ .healing ⟨6⟩ to combine two effects. Effects form a semigroup: you can combine any two effects into a compound effect. The describe and potency functions recurse through the structure, building descriptions like “6 dark damage + restore 6 HP” and summing potencies.
Spells and Casting
The Spell type is parameterized by its effect type. A Spell Damage deals damage; a Spell Effect can do anything:
structure Spell (ε : Type) where
name : String
manaCost : Nat
effect : ε
def castSpell {ε : Type} [SpellEffect ε] (s : Spell ε) : String :=
let desc := SpellEffect.describe s.effect
let pot := SpellEffect.potency s.effect
s!"{s.name} (Cost: {s.manaCost} MP): {desc} [Potency: {pot}]"
Simple spells have a single effect type:
def fireball : Spell Damage := ⟨"Fireball", 3, ⟨8, .fire⟩⟩
def frostbolt : Spell Damage := ⟨"Frostbolt", 2, ⟨5, .ice⟩⟩
def heal : Spell Healing := ⟨"Heal", 4, ⟨20⟩⟩
def haste : Spell Buff := ⟨"Haste", 3, ⟨"Speed", 2, 5⟩⟩
Compound spells combine multiple effects with ++:
def drainLife : Spell Effect :=
⟨"Drain Life", 4,
.damage ⟨6, .dark⟩ ++ .healing ⟨6⟩⟩
def chaosStorm : Spell Effect :=
⟨"Chaos Storm", 8,
.damage ⟨5, .fire⟩ ++ .damage ⟨5, .ice⟩ ++ .damage ⟨5, .lightning⟩⟩
def holyWrath : Spell Effect :=
⟨"Holy Wrath", 6,
.damage ⟨10, .holy⟩ ++ .buff ⟨"Defense", 3, 3⟩⟩
def venomStrike : Spell Effect :=
⟨"Venom Strike", 3,
.damage ⟨4, .dark⟩ ++ .status ⟨"Poisoned", 4⟩⟩
def battleHymn : Spell Effect :=
⟨"Battle Hymn", 5,
.buff ⟨"Strength", 2, 4⟩ ++ .buff ⟨"Speed", 1, 4⟩ ++ .healing ⟨8⟩⟩
The castSpell function works uniformly over any spell whose effect type has a SpellEffect instance. Simple spells, compound spells, future effect types you have not invented yet: all handled by the same polymorphic function.
Tip
Run from the repository:
lake exe spells
Semigroups and Monoids
Algebraic structures like semigroups and monoids capture patterns that recur across mathematics and programming. A semigroup has an associative operation; a monoid adds an identity element. String concatenation, list append, function composition, and integer addition are all monoids. Recognizing the common structure lets you write code once and apply it to all of them.
class Semigroup (α : Type) where
append : α → α → α
class Monoid' (α : Type) extends Semigroup α where
empty : α
def concat {α : Type} [Monoid' α] (xs : List α) : α :=
xs.foldl Semigroup.append Monoid'.empty
instance : Monoid' String where
append := String.append
empty := ""
instance {α : Type} : Monoid' (List α) where
append := List.append
empty := []
#eval concat ["Hello", " ", "World"] -- "Hello World"
#eval concat [[1, 2], [3], [4, 5]] -- [1, 2, 3, 4, 5]
Example: Type-safe Dimensional Analysis
Here is the payoff for all the type class machinery. We can prevent the Mars Orbiter bug entirely, not through runtime checks but through types that make the error inexpressible. Consider representing physical quantities with phantom types:
/-!
# Type-Safe Units of Measurement
A demonstration of using phantom types and type classes to enforce dimensional
correctness at compile time with zero runtime overhead. The types exist only
in the compiler's mind; the generated code manipulates raw floats.
-/
-- Phantom types representing physical dimensions
-- These have no constructors and no runtime representation
structure Meters
structure Seconds
structure Kilograms
structure MetersPerSecond
structure MetersPerSecondSquared
structure Newtons
structure NewtonSeconds
-- A quantity is a value tagged with its unit type
-- At runtime, this is just a Float
structure Quantity (unit : Type) where
val : Float
deriving Repr
-- Type class for multiplying quantities with compatible units
class UnitMul (u1 u2 result : Type) where
-- Type class for dividing quantities with compatible units
class UnitDiv (u1 u2 result : Type) where
-- Define the dimensional relationships
instance : UnitDiv Meters Seconds MetersPerSecond where
instance : UnitDiv MetersPerSecond Seconds MetersPerSecondSquared where
instance : UnitMul Kilograms MetersPerSecondSquared Newtons where
instance : UnitMul Newtons Seconds NewtonSeconds where
-- Arithmetic operations that preserve dimensional correctness
def Quantity.add {u : Type} (x y : Quantity u) : Quantity u :=
⟨x.val + y.val⟩
def Quantity.sub {u : Type} (x y : Quantity u) : Quantity u :=
⟨x.val - y.val⟩
def Quantity.mul {u1 u2 u3 : Type} [UnitMul u1 u2 u3] (x : Quantity u1) (y : Quantity u2) : Quantity u3 :=
⟨x.val * y.val⟩
def Quantity.div {u1 u2 u3 : Type} [UnitDiv u1 u2 u3] (x : Quantity u1) (y : Quantity u2) : Quantity u3 :=
⟨x.val / y.val⟩
def Quantity.scale {u : Type} (x : Quantity u) (s : Float) : Quantity u :=
⟨x.val * s⟩
-- Smart constructors for readability
def meters (v : Float) : Quantity Meters := ⟨v⟩
def seconds (v : Float) : Quantity Seconds := ⟨v⟩
def kilograms (v : Float) : Quantity Kilograms := ⟨v⟩
def newtons (v : Float) : Quantity Newtons := ⟨v⟩
def newtonSeconds (v : Float) : Quantity NewtonSeconds := ⟨v⟩
-- Example: Computing velocity from distance and time
-- This compiles because Meters / Seconds = MetersPerSecond
def computeVelocity (distance : Quantity Meters) (time : Quantity Seconds)
: Quantity MetersPerSecond :=
distance.div time
-- Example: Computing force from mass and acceleration
-- This compiles because Kilograms * MetersPerSecondSquared = Newtons
def computeForce (mass : Quantity Kilograms) (accel : Quantity MetersPerSecondSquared)
: Quantity Newtons :=
mass.mul accel
-- Example: Computing impulse from force and time
-- This compiles because Newtons * Seconds = NewtonSeconds
def computeImpulse (force : Quantity Newtons) (time : Quantity Seconds)
: Quantity NewtonSeconds :=
force.mul time
-- The Mars Climate Orbiter scenario:
-- One team computes impulse in Newton-seconds
-- Another team expects the same units
-- The type system ensures they match
def thrusterImpulse : Quantity NewtonSeconds :=
let force := newtons 450.0
let burnTime := seconds 10.0
computeImpulse force burnTime
-- This would NOT compile - you cannot add NewtonSeconds to Meters:
-- def nonsense := thrusterImpulse.add (meters 100.0)
-- This would NOT compile - you cannot pass Meters where Seconds expected:
-- def badVelocity := computeVelocity (meters 100.0) (meters 50.0)
-- The key insight: all the Quantity wrappers and phantom types vanish at runtime.
-- The generated code is just floating-point arithmetic.
-- The safety is free.
def main : IO Unit := do
let distance := meters 100.0
let time := seconds 9.58 -- Usain Bolt's 100m record
let velocity := computeVelocity distance time
IO.println s!"Distance: {distance.val} m"
IO.println s!"Time: {time.val} s"
IO.println s!"Velocity: {velocity.val} m/s"
let mass := kilograms 1000.0
let accel : Quantity MetersPerSecondSquared := ⟨9.81⟩
let force := computeForce mass accel
IO.println s!"Mass: {mass.val} kg"
IO.println s!"Acceleration: {accel.val} m/s²"
IO.println s!"Force: {force.val} N"
let impulse := computeImpulse force (seconds 5.0)
IO.println s!"Impulse: {impulse.val} N·s"
Tip
Run from the repository:
lake exe units
The Quantity type wraps a Float but carries a phantom type parameter representing its unit. You cannot add meters to seconds because Quantity.add requires both arguments to have the same unit type. You cannot pass thrust in the wrong units because the function signature encodes the dimensional requirements.
The crucial insight is that these phantom types vanish at runtime. The Meters and Seconds types have no constructors, no fields, no runtime representation whatsoever. The generated code manipulates raw floats with raw floating-point operations. The type checker enforces dimensional correctness; the compiled program pays no cost for it. This is the dream of static typing: safety that exists only in the compiler’s mind, free at runtime, catching at compile time the very class of error that destroyed a spacecraft.
There is a broader lesson here about the direction of software. The mathematics that physicists scribble on whiteboards, the dimensional analysis that engineers perform by hand, the invariants that programmers hold in their heads and document in comments: these are all pseudocode. They are precise enough for humans to follow but not precise enough for machines to verify. The project of programming language research, from Curry and Howard through ML and Haskell to Lean and dependent types, has been to formalize this pseudocode. To turn informal reasoning into machine-checked artifacts.
As code generation by large language models becomes routine, this formalization becomes essential. A neural network can produce syntactically correct code that passes tests yet harbors subtle unit errors, off-by-one mistakes, and violated invariants. The guardrails cannot be more tests, more code review, more human attention. The guardrails must be formalisms that make entire categories of errors unrepresentable. Type classes, phantom types, dependent types: these are not academic curiosities but safety controls for a future where most code is synthesized. The Mars Climate Orbiter was written by humans who made a human error. The code that replaces them must be held to a higher standard. (For more on this trajectory, see Artificial Intelligence.)
Side Effects Ahead
Type classes and phantom types give you abstraction and compile-time safety. But programs must eventually interact with the world: reading files, handling errors, managing state. Next up: monads. Yes, those monads. Do not worry, we will not explain them using burritos.
Effects
“A monad is just a monoid in the category of endofunctors, what’s the problem?” This infamous quip became the ur-meme of functional programming, spawning a thousand blog posts explaining monads via burritos, boxes, and space suits. The tragedy is that the concept is not hard. It just got wrapped in mystique before anyone explained it clearly.
Here is what matters: programs have effects. They might fail, consult state, perform IO, branch nondeterministically, or launch the missiles. In languages without effect tracking, any function call might do any of these things. You call getUsername() and hope it only reads from a database rather than initiating thermonuclear war. The type signature offers no guarantees. The question is how to represent effects in a way that lets us reason about composition and know, from the types alone, what a function might do. Monads are one answer. They capture a pattern for sequencing operations where each step produces both a result and some context. The bind operation chains these operations, threading the context automatically. Do notation makes the sequencing readable. The interface is minimal, the applications broad.
But monads are not the only answer, and treating them as sacred obscures the deeper point. Algebraic effect systems, linear types, graded monads, and effect handlers all attack the same problem from different angles. What they share is the conviction that effects should be visible in types and that composition should be governed by laws. The specific mechanism matters less than the principle: make the structure explicit so that humans and machines can reason about it.
Lean uses monads because they work well and the ecosystem inherited them from functional programming research of the 1990s. They are a good tool. But the goal is to capture effects algebraically, whatever form that takes. When you understand monads, you understand one particularly elegant solution to sequencing effectful computations. You also understand a template for how programming abstractions should work: a minimal interface, a set of laws, and the discipline to respect both.
The Option Monad
The simplest monad handles computations that might fail. You already understand this pattern: look something up, and if it exists, do something with it. If not, propagate the absence. Every programmer has written this code a hundred times. The monad just gives it a name and a uniform interface.
def safeDivide (x y : Nat) : Option Nat :=
if y == 0 then none else some (x / y)
def safeHead {α : Type} (xs : List α) : Option α :=
match xs with
| [] => none
| x :: _ => some x
#eval safeDivide 10 2 -- some 5
#eval safeDivide 10 0 -- none
#eval safeHead [1, 2] -- some 1
#eval safeHead ([] : List Nat) -- none
Chaining Without Monads
Without the abstraction, chaining fallible operations produces the pyramid of doom: nested conditionals, each handling failure explicitly, the actual logic buried under boilerplate. This is not hypothetical. This is what error handling looks like in languages without monadic structure. It is also what early JavaScript looked like before Promises, which are, of course, monads by another name.
def computation (xs : List Nat) : Option Nat :=
match safeHead xs with
| none => none
| some x =>
match safeDivide 100 x with
| none => none
| some y => some (y + 1)
#eval computation [5, 2, 3] -- some 21
#eval computation [0, 2, 3] -- none (division by zero)
#eval computation [] -- none (empty list)
The bind Operation
The bind operation (written >>=) is the heart of the monad. It takes a value in context and a function that produces a new value in context, and chains them together. For Option, this means: if the first computation succeeded, apply the function. If it failed, propagate the failure. The pattern generalizes far beyond failure, but failure is the clearest example.
def computation' (xs : List Nat) : Option Nat :=
safeHead xs >>= fun x =>
safeDivide 100 x >>= fun y =>
some (y + 1)
#eval computation' [5, 2, 3] -- some 21
#eval computation' [0, 2, 3] -- none
#eval computation' [] -- none
Do Notation
Do notation is syntactic sugar that makes monadic code look imperative. The left arrow ← desugars to bind, and the semicolon sequences operations. This is not a concession to programmers who cannot handle functional style. It is recognition that sequential composition is how humans think about processes, and fighting that serves no purpose. The abstraction remains while the syntax yields to ergonomics.
def computationDo (xs : List Nat) : Option Nat := do
let x ← safeHead xs
let y ← safeDivide 100 x
return y + 1
#eval computationDo [5, 2, 3] -- some 21
#eval computationDo [0, 2, 3] -- none
#eval computationDo [] -- none
def validateInput (name : String) (age : Nat) : Option (String × Nat) := do
if name.isEmpty then none
if age == 0 then none
return (name, age)
#eval validateInput "Alice" 30 -- some ("Alice", 30)
#eval validateInput "" 30 -- none
#eval validateInput "Bob" 0 -- none
In validateInput, the bare none on its own line short-circuits the computation. Within a do block for Option, writing none is equivalent to early return with failure. The remaining lines are not executed, and the whole expression evaluates to none.
Do Notation Desugaring
Do notation is syntactic sugar for bind chains. The compiler transforms your imperative-looking code into applications of >>= and pure. Understanding the desugaring helps when types do not match or when you want to optimize.
The key distinction is between two forms of let:
let x ← eperforms a monadic bind: it unwraps the value from the monadic context. Ife : Option Nat, thenx : Nat. Ifeevaluates tonone, the computation short-circuits immediately.let x := eis a pure let binding: no unwrapping occurs. The value is used exactly as-is. Ife : Option Nat, thenx : Option Nat.
The arrow ← is doing real work: it reaches into the monad and extracts the value, handling failure automatically. The walrus := just names a value.
A monadic bind extracts the value and passes it to the continuation:
do e1 >>= fun x =>
let x ← e1 ⟹ do es
es
A pure let binding has no monadic involvement:
do let x := e
let x := e ⟹ do es
es
An action without binding discards the result:
do e1 >>= fun () =>
e1 ⟹ do es
es
Pattern matching with a fallback handles failure:
do e1 >>= fun
let some x ← e1 ⟹ | some x => do es
| fallback | _ => fallback
es
The return keyword is just pure:
return e ⟹ pure e
The ← operator can appear as a prefix within expressions. Each occurrence is hoisted to a fresh binding, processed left-to-right, inside-to-outside:
do do
f (← e1) (← e2) ⟹ let x ← e1
es let y ← e2
f x y
es
This is not the same as f e1 e2. Consider the difference:
-- If e1 : Option Nat and e2 : Option Nat:
f e1 e2 -- f receives two Option Nat values
f (← e1) (← e2) -- f receives two Nat values (unwrapped)
Use ← when you want to extract the value from a monadic context within an expression. The arrow does the unwrapping. Without it, you pass the wrapped value.
Effects like early return, mutable state, and loops with break/continue transform the entire do block rather than desugaring locally, similar to monad transformers.
Note
Semicolons can replace newlines in do blocks:
do let x ← e1; let y ← e2; pure (x + y). This is rarely used since multiline format is “more readable.” Fifty years of programming language research and we still cannot agree on what makes syntax objectively good. Perhaps because syntax is more fashion and culture than science.
def withBind (xs : List Nat) : Option Nat :=
safeHead xs >>= fun x =>
safeDivide 100 x >>= fun y =>
pure (y + 1)
def withDoNotation (xs : List Nat) : Option Nat := do
let x ← safeHead xs
let y ← safeDivide 100 x
return y + 1
#eval withBind [5] -- some 21
#eval withDoNotation [5] -- some 21
def mixedBindings : Option Nat := do
let x ← some 10 -- monadic bind
let y := x + 5 -- pure let
let z ← some (y * 2) -- monadic bind
return z
#eval mixedBindings -- some 30
Mutable Variables in Do
The let mut syntax introduces mutable bindings that desugar to StateT, a monad transformer that adds mutable state to any monad. Assignment with := modifies the state. The compiler threads the state automatically, transforming imperative-looking code into pure functional operations. You do not need to understand StateT to use let mut because the desugaring is automatic.
def imperativeSum (xs : List Nat) : Nat := Id.run do
let mut total := 0
for x in xs do
total := total + x
return total
def functionalSum (xs : List Nat) : Nat :=
xs.foldl (· + ·) 0
#eval imperativeSum [1, 2, 3, 4, 5] -- 15
#eval functionalSum [1, 2, 3, 4, 5] -- 15
def countValid (xs : List Nat) : IO Nat := do
let mut count := 0
for x in xs do
if x > 0 then
count := count + 1
IO.println s!"Valid: {x}"
return count
When should you use Id.run do versus plain do?
doalone works when you are already inside a monad likeIOorOption. The do block produces a monadic value.Id.run dois needed when you want to use imperative syntax (let mut,forloops) but return a pure value. TheIdmonad is the “identity” monad: it adds no effects, just provides the machinery for state threading.
In imperativeSum, the return type is Nat, not IO Nat. Without Id.run, there would be no monad to thread the mutable state through. The Id monad provides exactly that scaffolding while adding nothing else. For IO operations, you work directly in the IO monad and the mutations interleave with side effects.
The Except Monad
Option tells you that something failed but not why. Except carries the reason. This is the difference between a function returning null and a function throwing an exception with a message. The monadic structure is identical, only the context changes. This uniformity is the point. Learn the pattern once, apply it to failure, to errors, to state, to nondeterminism, to parsing, to probability distributions. The shape is always the same.
inductive ValidationError where
| emptyName
| invalidAge (age : Nat)
| missingField (field : String)
deriving Repr
def validateName (name : String) : Except ValidationError String :=
if name.isEmpty then .error .emptyName
else .ok name
def validateAge (age : Nat) : Except ValidationError Nat :=
if age == 0 || age > 150 then .error (.invalidAge age)
else .ok age
def validatePerson (name : String) (age : Nat) : Except ValidationError (String × Nat) := do
let validName ← validateName name
let validAge ← validateAge age
return (validName, validAge)
#eval validatePerson "Alice" 30 -- Except.ok ("Alice", 30)
#eval validatePerson "" 30 -- Except.error ValidationError.emptyName
#eval validatePerson "Bob" 200 -- Except.error (ValidationError.invalidAge 200)
Combining Effects: Transformer Ordering
Real programs often need multiple effects at once: error handling and logging, state and failure. Monad transformers let you combine effects by stacking them. But the order of the stack matters: different orderings give different failure semantics.
Here is the minimal demonstration:
-- Two ways to combine State and Except - the order matters!
-- State outside Except: on error, state is LOST
abbrev Rollback := StateT Nat (Except Unit)
-- Except outside State: on error, state is PRESERVED
abbrev Audit := ExceptT Unit (StateM Nat)
def countThenFailRollback : Rollback Unit := do
modify (· + 1) -- count = 1
modify (· + 1) -- count = 2
throw () -- error!
modify (· + 1) -- never reached
def countThenFailAudit : Audit Unit := do
modify (· + 1) -- count = 1
modify (· + 1) -- count = 2
throw () -- error!
modify (· + 1) -- never reached
-- Rollback: error discards the state
#eval StateT.run countThenFailRollback 0
-- Except.error () ← count is gone!
-- Audit: error preserves the state
#eval StateT.run (ExceptT.run countThenFailAudit) 0
-- (Except.error (), 2) ← count = 2 preserved
With StateT on the outside (Rollback), an error discards the accumulated state. With ExceptT on the outside (Audit), the state persists even after failure. Same operations, different semantics.
To understand why, think about what each transformer does when you “run” it:
StateT.runtakes a computation and initial state, returns(result, finalState)ExceptT.runtakes a computation, returnsExcept Error Result
The outer transformer determines what you get back. If Except is outer, you get Except Error (Result × State), so the state is inside, preserved regardless of success. If StateT is outer, you get State → Except Error (Result × State), so on error the state is never returned.
ATM Example
Consider an ATM withdrawal, a pipeline of fallible operations that must be logged for compliance. Check the balance. Verify the daily limit. Dispense cash. Update the account. Each step can fail, and each step should be recorded. ATMs are where functional programming meets the brutal reality of mechanical cash dispensers.
inductive ATMError where
| insufficientFunds (requested available : Nat)
| dailyLimitExceeded (requested limit : Nat)
| dispenserJam (dispensedBeforeJam : Nat)
| cardRetained
deriving Repr
def ATMError.describe : ATMError → String
| .insufficientFunds req avail =>
s!"Insufficient funds: requested €{req}, available €{avail}"
| .dailyLimitExceeded req limit =>
s!"Daily limit exceeded: requested €{req}, limit €{limit}"
| .dispenserJam dispensed =>
s!"Dispenser jam after dispensing €{dispensed}"
| .cardRetained => "Card retained by machine"
structure AuditEntry where
timestamp : Nat
message : String
deriving Repr
structure AuditLog where
entries : List AuditEntry := []
nextTimestamp : Nat := 0
deriving Repr
def AuditLog.add (log : AuditLog) (msg : String) : AuditLog :=
{ entries := log.entries ++ [⟨log.nextTimestamp, msg⟩]
nextTimestamp := log.nextTimestamp + 1 }
def AuditLog.show (log : AuditLog) : List String :=
log.entries.map fun e => s!"[{e.timestamp}] {e.message}"
structure Account where
holder : String
balance : Nat
dailyWithdrawn : Nat := 0
dailyLimit : Nat := 500
deriving Repr
The withdrawal amount uses a dependent type PosNat to ensure it is positive. You cannot withdraw zero euros (pointless) or negative euros (the bank frowns upon this):
def PosNat := { n : Nat // n > 0 }
def PosNat.mk? (n : Nat) : Option PosNat :=
if h : n > 0 then some ⟨n, h⟩ else none
instance : Repr PosNat where
reprPrec p _ := repr p.val
Two Transformer Stacks
We define two stacks with different failure semantics:
abbrev RollbackATM := StateT AuditLog (Except ATMError)
abbrev AuditATM := ExceptT ATMError (StateM AuditLog)
The operations are identical in both stacks. Here is the audit version:
def logA (msg : String) : AuditATM Unit :=
modify (·.add msg)
def checkBalanceA (acct : Account) (amount : PosNat) : AuditATM Unit := do
logA s!"Balance check: €{acct.balance} available"
if amount.val > acct.balance then
logA s!"DENIED: Insufficient funds for €{amount.val}"
throw (.insufficientFunds amount.val acct.balance)
def checkLimitA (acct : Account) (amount : PosNat) : AuditATM Unit := do
let remaining := acct.dailyLimit - acct.dailyWithdrawn
logA s!"Daily limit check: €{remaining} remaining of €{acct.dailyLimit}"
if amount.val > remaining then
logA s!"DENIED: Would exceed daily limit"
throw (.dailyLimitExceeded amount.val acct.dailyLimit)
def dispenseCashA (amount : PosNat) : AuditATM Nat := do
logA s!"Dispensing €{amount.val}..."
if amount.val > 200 then
let dispensed := 100
logA s!"ERROR: Dispenser jam after €{dispensed} dispensed"
throw (.dispenserJam dispensed)
logA s!"Cash dispensed: €{amount.val}"
pure amount.val
def updateBalanceA (acct : Account) (amount : Nat) : AuditATM Account := do
let newBalance := acct.balance - amount
logA s!"Balance updated: €{acct.balance} → €{newBalance}"
pure { acct with balance := newBalance, dailyWithdrawn := acct.dailyWithdrawn + amount }
The complete withdrawal combines all steps:
def withdrawAudit (acct : Account) (amount : PosNat) : AuditATM Account := do
logA s!"=== Withdrawal started: {acct.holder} ==="
logA s!"Requested amount: €{amount.val}"
checkBalanceA acct amount
checkLimitA acct amount
let dispensed ← dispenseCashA amount
let newAcct ← updateBalanceA acct dispensed
logA s!"=== Withdrawal complete ==="
pure newAcct
Partial Failure
Consider what happens when the dispenser jams after partially dispensing cash. Alice requests €300. The machine gives her €100, then the dispenser jams.
def runAudit (acct : Account) (amount : PosNat)
: (Except ATMError Account) × AuditLog :=
let ((result, log)) :=
StateT.run (ExceptT.run (withdrawAudit acct amount)) {}
(result, log)
def runRollback (acct : Account) (amount : PosNat)
: Except ATMError (Account × AuditLog) :=
StateT.run (withdrawRollback acct amount) {}
With rollback semantics (RollbackATM), the audit log is lost. The bank’s records show nothing happened. But Alice has €100 in her hand, and there is no record of what occurred.
With audit semantics (AuditATM), the log is preserved:
[0] === Withdrawal started: Alice ===
[1] Requested amount: €300
[2] Balance check: €1000 available
[3] Daily limit check: €500 remaining of €500
[4] Dispensing €300...
[5] ERROR: Dispenser jam after 100 dispensed
Now compliance knows exactly what happened: Alice got €100, the machine jammed, and manual reconciliation is needed.
def demonstrateDifference : IO Unit := do
IO.println "=== Rollback Semantics (StateT outside Except) ==="
IO.println "On error, the audit log is LOST\n"
let amount : PosNat := ⟨300, by omega⟩
match runRollback alice amount with
| .ok (acct, log) =>
IO.println s!"Success! New balance: €{acct.balance}"
IO.println "Audit log:"
for entry in log.show do IO.println s!" {entry}"
| .error e =>
IO.println s!"FAILED: {e.describe}"
IO.println "Audit log: <LOST - we only have the error>"
IO.println "Compliance cannot determine what happened.\n"
IO.println "\n=== Audit Semantics (ExceptT outside StateM) ==="
IO.println "On error, the audit log is PRESERVED\n"
let (result, log) := runAudit alice amount
match result with
| .ok acct =>
IO.println s!"Success! New balance: €{acct.balance}"
| .error e =>
IO.println s!"FAILED: {e.describe}"
IO.println "Audit log (preserved!):"
for entry in log.show do IO.println s!" {entry}"
IO.println "\nCompliance can see exactly what happened."
Tip
Run from the repository:
lake exe atm
This is why banks use audit semantics for ATM transactions. Financial regulations require knowing what happened, including partial failures. The transformer ordering is a design decision with legal implications. Get it wrong and auditors will have questions. Get it right and the code is its own documentation.
The State Monad
The state monad threads mutable state through a pure computation. You get the ergonomics of mutation, the ability to read and write a value as you go, without actually mutating anything. Each computation takes a state and returns a new state alongside its result. The threading is automatic, hidden behind the monadic interface. This is not a trick. It is a different way of thinking about state: not as a mutable box but as a value that flows through your computation, transformed at each step.
Lean provides two related types: StateT σ m α is the general state transformer that adds state of type σ to any monad m. StateM σ α is defined as StateT σ Id α, state over the identity monad, for pure stateful computations. When you do not need other effects, StateM is simpler and sufficient. When you need state combined with IO, Option, or other monads, use StateT. They share the same interface (get, set, modify), and code written for one often works for the other.
Under the hood, a stateful computation is just a function σ → (α × σ). The following shows how you would build the primitives yourself:
namespace ManualState
abbrev State (σ α : Type) := σ → (α × σ)
def get {σ : Type} : State σ σ := fun s => (s, s)
def set {σ : Type} (newState : σ) : State σ Unit := fun _ => ((), newState)
def modify {σ : Type} (f : σ → σ) : State σ Unit := fun s => ((), f s)
def run {σ α : Type} (init : σ) (m : State σ α) : α × σ :=
m init
def counter : State Nat Nat := fun n => (n, n + 1)
#eval run 0 counter -- (0, 1)
#eval run 10 counter -- (10, 11)
end ManualState
The ManualState namespace isolates these definitions from the standard library. Inside, we use natural names: get returns the current state as the result, set ignores the old state and installs a new one, modify applies a function to transform the state.
StateM in Practice
Lean’s Init namespace (see Basics) provides StateM, StateT, ExceptT, and other monad transformers without explicit imports. As noted above, StateM σ α equals StateT σ Id α, the pure-state specialization. The operations get, set, and modify work exactly like our manual versions. Combined with do notation, stateful code looks almost identical to imperative code, except that the state is explicit in the type and the purity is preserved. You can run the same computation with different initial states and get reproducible results. You can reason about what the code does without worrying about hidden mutation elsewhere.
def tick : StateM Nat Unit := modify (· + 1)
def getTicks : StateM Nat Nat := get
def countOperations : StateM Nat Nat := do
tick
tick
tick
let count ← getTicks
return count
#eval countOperations.run 0 -- (3, 3)
#eval countOperations.run 10 -- (13, 13)
The List Monad
Lists as a monad represent nondeterministic computation: a value that could be many things at once. Bind explores all combinations, like nested loops but without the nesting. This is how you generate permutations, enumerate possibilities, or implement backtracking search. The abstraction is the same, only the interpretation differs. A monad does not care whether its context is failure, state, or multiplicity. It only knows how to sequence.
def pairs (xs : List Nat) (ys : List Nat) : List (Nat × Nat) :=
xs.flatMap fun x => ys.map fun y => (x, y)
#eval pairs [1, 2] [10, 20] -- [(1, 10), (1, 20), (2, 10), (2, 20)]
def pythagTriples (n : Nat) : List (Nat × Nat × Nat) :=
(List.range n).flatMap fun a =>
(List.range n).flatMap fun b =>
(List.range n).filterMap fun c =>
if a * a + b * b == c * c && a > 0 && b > 0 then
some (a, b, c)
else
none
#eval pythagTriples 15 -- [(3, 4, 5), (4, 3, 5), (5, 12, 13), ...]
Iteration Type Classes
The ForIn type class powers for loops. Any type with a ForIn instance can be iterated with for x in collection do. The mechanism is more flexible than it first appears: you can implement custom iteration patterns, control early exit, and work in any monad.
structure CountDown where
start : Nat
instance : ForIn Id CountDown Nat where
forIn cd init f := do
let mut acc := init
let mut i := cd.start
while i > 0 do
match ← f i acc with
| .done a => return a -- break
| .yield a => acc := a -- continue
i := i - 1
return acc
def sumCountDown (n : Nat) : Nat := Id.run do
let mut total := 0
for i in CountDown.mk n do
total := total + i
return total
#eval sumCountDown 5 -- 15 (5+4+3+2+1)
The ForInStep type controls loop flow. Returning .done value breaks out of the loop with the accumulated result. Returning .yield value continues to the next iteration. This desugars to monadic operations, so early return in a for loop is not a special case but an application of the general mechanism.
def printAll (xs : List String) : IO Unit := do
for x in xs do
IO.println x
def sumWithIndex (arr : Array Nat) : Nat := Id.run do
let mut total := 0
for h : i in [0:arr.size] do
total := total + arr[i]
return total
#eval sumWithIndex #[10, 20, 30] -- 60
def manualForIn (xs : List Nat) : Option Nat :=
ForIn.forIn xs 0 fun x acc =>
if x == 0 then some (.done acc) -- early exit
else some (.yield (acc + x)) -- continue
#eval manualForIn [1, 2, 3, 4] -- some 10
#eval manualForIn [1, 2, 0, 4] -- some 3 (stopped at 0)
The forIn function can be called directly when you need explicit control over the accumulator and continuation. The callback returns some (.done acc) to break or some (.yield acc) to continue. Returning none propagates failure in the Option monad. This is how Lean unifies iteration with monadic effects.
Collection Operations
Lists and arrays share a common vocabulary of operations. These functions compose naturally into data processing pipelines.
def langs : List String := ["Lean", "Haskell", "Rust", "OCaml"]
def types : List String := ["theorem", "lazy", "systems", "modules"]
-- zip: pair elements from two collections
#eval langs.zip types
-- [("Lean", "theorem"), ("Haskell", "lazy"), ...]
-- map: transform each element
#eval langs.map String.toUpper
-- ["LEAN", "HASKELL", "RUST", "OCAML"]
-- filter: keep elements matching predicate
#eval langs.filter (·.startsWith "R")
-- ["Rust"]
-- take/drop: slice prefix or suffix
#eval langs.take 2
-- ["Lean", "Haskell"]
#eval langs.drop 2
-- ["Rust", "OCaml"]
-- filterMap: filter and transform in one pass
#eval ["42", "bad", "17"].filterMap String.toNat?
-- [42, 17]
-- find?: first element matching predicate
#eval langs.find? (·.length > 4)
-- some "Haskell"
-- any/all: check predicates
#eval langs.any (·.startsWith "L") -- true
#eval langs.all (·.length > 3) -- true
-- zipIdx: pair with indices
#eval ["a", "b", "c"].zipIdx
-- [("a", 0), ("b", 1), ("c", 2)]
The operations compose cleanly: filter selects, map transforms, filterMap fuses both. find? returns Option because absence is a valid result, not an exception.
Folds
Folds are the fundamental iteration pattern. Every loop, every accumulation, every reduction can be expressed as a fold. Understanding folds means understanding how computation flows through a data structure.
A left fold processes elements left-to-right, accumulating from the left:
\[ \text{foldl}(f, z, [a, b, c]) = f(f(f(z, a), b), c) = ((z \oplus a) \oplus b) \oplus c \]
A right fold processes elements right-to-left, accumulating from the right:
\[ \text{foldr}(f, z, [a, b, c]) = f(a, f(b, f(c, z))) = a \oplus (b \oplus (c \oplus z)) \]
Here \(\oplus\) is just \(f\) written infix: \(a \oplus b = f(a, b)\).
For associative operations like addition, both folds give the same result. For non-associative operations, the parenthesization matters:
-- foldl: left fold, accumulator on the left
#eval [1, 2, 3, 4].foldl (· + ·) 0 -- 10
#eval ["a","b","c"].foldl (· ++ ·) "" -- "abc"
-- foldr: right fold, accumulator on the right
#eval [1, 2, 3, 4].foldr (· + ·) 0 -- 10
#eval ["a","b","c"].foldr (· ++ ·) "" -- "abc"
-- the difference: subtraction is not commutative
-- foldl f z [a,b,c] = f(f(f(z,a),b),c) = ((z⊕a)⊕b)⊕c
-- foldr f z [a,b,c] = f(a,f(b,f(c,z))) = a⊕(b⊕(c⊕z))
#eval ([1, 2, 3, 4] : List Int).foldl (· - ·) 0 -- -10: ((((0-1)-2)-3)-4)
#eval ([1, 2, 3, 4] : List Int).foldr (· - ·) 0 -- -2: (1-(2-(3-(4-0))))
-- foldl builds left-to-right (tail recursive)
#eval [1, 2, 3].foldl (fun acc x => x :: acc) [] -- [3, 2, 1]
-- foldr builds right-to-left (preserves structure)
#eval [1, 2, 3].foldr (fun x acc => x :: acc) [] -- [1, 2, 3]
-- practical uses
#eval [10, 25, 8, 42, 3].foldl max 0 -- 42
#eval [2, 3, 4].foldl (· * ·) 1 -- 24
#eval [1, 2, 3].foldl (fun acc x => acc + x * x) 0 -- 14
The cons example reveals the structural difference. Building a list with foldl reverses order because each new element is prepended to the growing accumulator. Building with foldr preserves order because the accumulator grows from the right. This is why map is typically defined using foldr: map f xs = foldr (fun x acc => f x :: acc) [] xs.
Left folds are tail-recursive and run in constant stack space. Right folds are not tail-recursive but can work with lazy data structures since they do not need to traverse to the end before producing output. In strict languages like Lean, prefer foldl for efficiency unless you need the structural properties of foldr.
The Monad Type Class
Under the hood, all monads implement the same interface. pure lifts a plain value into the monadic context. bind sequences two computations, passing the result of the first to the second. That is the entire interface. Everything else, the do notation, the specialized operations, the ergonomic helpers, builds on these two primitives. The simplicity is deliberate. A minimal interface means maximal generality.
class Monad' (M : Type → Type) extends Functor M where
pure' : {α : Type} → α → M α
bind' : {α β : Type} → M α → (α → M β) → M β
instance : Monad' Option where
map f
| none => none
| some x => some (f x)
pure' := some
bind' m f := match m with
| none => none
| some x => f x
instance : Monad' List where
map := List.map
pure' x := [x]
bind' m f := m.flatMap f
Monad Laws
Here is where the algebra becomes essential. Monads must satisfy three laws: left identity, right identity, and associativity. These are not suggestions. They are the contract that makes generic programming possible.
In the traditional bind/return formulation:
| Law | Lean | Math |
|---|---|---|
| Left Identity | pure a >>= f = f a | $\eta(a) \star f = f(a)$ |
| Right Identity | m >>= pure = m | $m \star \eta = m$ |
| Associativity | (m >>= f) >>= g = m >>= (λ x => f x >>= g) | $(m \star f) \star g = m \star (\lambda x. f(x) \star g)$ |
Note
For those with category theory background: the same laws look cleaner in the Kleisli category, where we compose monadic functions directly. If $f : A \to M B$ and $g : B \to M C$, their Kleisli composition is $g \circ f : A \to M C$:
Law Lean Math Left Identity pure >=> f = f$\eta \circ f = f$ Right Identity f >=> pure = f$f \circ \eta = f$ Associativity (f >=> g) >=> h = f >=> (g >=> h)$(h \circ g) \circ f = h \circ (g \circ f)$ The Kleisli formulation reveals that monads give you a category where objects are types and morphisms are functions $A \to M B$. The laws say
pureis the identity morphism and>=>is associative composition. A monad is a way of embedding effectful computation into the compositional structure of functions. You do not need this perspective to use monads effectively.
-- Left identity: pure a >>= f = f a
example (f : Nat → Option Nat) (a : Nat) :
(pure a >>= f) = f a := rfl
-- Right identity: m >>= pure = m
example (m : Option Nat) : (m >>= pure) = m := by
cases m <;> rfl
-- Associativity: (m >>= f) >>= g = m >>= (fun x => f x >>= g)
example (m : Option Nat) (f : Nat → Option Nat) (g : Nat → Option Nat) :
((m >>= f) >>= g) = (m >>= fun x => f x >>= g) := by
cases m <;> rfl
Tip
In Haskell (and most other languages), you can claim your monad follows the laws but the compiler takes your word for it. In Lean, you can prove it. The laws become theorems, and proving them means constructing values of the corresponding types. This is the Curry-Howard correspondence at work. The Proofs article shows how.
At this point someone usually asks what a monad “really is.” The answers have become a genre: a burrito, a spacesuit, a programmable semicolon, a monoid in the category of endofunctors. These metaphors are not wrong, but they are not enlightening either. A monad is the three laws above and nothing else. Everything follows from the laws. The metaphors are for people who want to feel like they understand before they do the work of understanding.
Note
For those who want the category theory (colloquially known as “abstract nonsense,” which is their term of endearment for their own field): a monad is a monoid object in the monoidal category of endofunctors under composition. Equivalently, it is a lax 2-functor from the terminal 2-category to Cat. The Kleisli category is the free algebra of the monad.
someis the identity morphism in the Kleisli category ofOption. In Haskell it is calledJust, which humorously is Just an endomorphism in the Kleisli category ofOption. If this clarified nothing, congratulations: you understood monads before and still do now. You do not need any of this to use monads effectively.
Early Return
Do notation supports early return, loops, and mutable references, all the imperative conveniences. Combined with monads, this gives you the syntax of imperative programming with the semantics of pure functions. You can write code that reads like Python and reasons like Haskell. This is not cheating. It is the whole point: capturing effects in types so that the compiler knows what your code might do, while letting you write in whatever style is clearest.
def findFirst {α : Type}
(p : α → Bool) (xs : List α) : Option α := do
for x in xs do
if p x then return x
none
#eval findFirst (· > 5) [1, 2, 3, 7, 4] -- some 7
#eval findFirst (· > 10) [1, 2, 3] -- none
def processUntilError (xs : List Nat) : Except String (List Nat) := do
let mut results := []
for x in xs do
if x == 0 then
throw "encountered zero"
results := results ++ [x * 2]
return results
#eval processUntilError [1, 2, 3] -- Except.ok [2, 4, 6]
#eval processUntilError [1, 0, 3] -- Except.error "encountered zero"
Combining Monadic Operations
Functions like mapM and filterMap combine monadic operations over collections. Map a fallible function over a list and either get all the results or the first failure. Filter a list with a predicate that consults external state. These combinators emerge naturally once you have the abstraction. They are not special cases but instances of a general pattern, composable because they respect the monad laws.
def mayFail (x : Nat) : Option Nat :=
if x == 0 then none else some (100 / x)
def processAll (xs : List Nat) : Option (List Nat) :=
xs.mapM mayFail
#eval processAll [1, 2, 4, 5] -- some [100, 50, 25, 20]
#eval processAll [1, 0, 4, 5] -- none
def filterValid (xs : List Nat) : List Nat :=
xs.filterMap mayFail
#eval filterValid [1, 0, 2, 0, 4] -- [100, 50, 25]
The Larger Pattern
Monads are one algebraic structure among many. Functors capture mapping. Applicatives capture independent combination. Monads capture dependent sequencing. Comonads capture context extraction. Arrows generalize computation graphs. Algebraic effects decompose monads into composable pieces. Each abstraction comes with laws, and those laws are the actual content. The specific names matter less than the discipline: identify a pattern, find its laws, and build an interface that enforces them.
The trajectory of programming language research has been toward making this structure explicit. Effects that C programmers handle with conventions, functional programmers handle with types. Invariants that documentation describes, dependent types enforce. Properties that tests sample, proofs establish. Each step reduces the burden on human memory and attention, encoding knowledge in artifacts that machines can check.
This matters because the economics of software are changing. When code is cheap to generate, correctness becomes the bottleneck. A language model can produce plausible implementations faster than any human, but “plausible” is not “correct.” The leverage shifts to whoever can specify precisely what correct means. Types, laws, contracts, proofs: these are the tools for specifying. Monads are a small example, one worked case of a pattern made precise. The concept itself was always simple. Sequencing with context. The value was never in the mystery but in the laws that let us reason compositionally about programs we increasingly do not write ourselves and cannot fully understand. (For more on where this is heading, see Artificial Intelligence.)
From Abstract to Concrete
Monads describe effects abstractly. The next article makes them concrete: actual file I/O, process management, environment variables, and the runtime machinery that connects your pure descriptions to the impure world. This completes the programming half of our journey.
IO and Concurrency
Programs must touch the world. They read files, open sockets, query databases, render pixels, and occasionally launch missiles. Pure functions, by definition, cannot do any of this. They map inputs to outputs, indifferent to the universe outside their arguments. And yet we write programs precisely to change that universe, to impose our will on entropy, to build something that persists after the process exits. This is the fundamental tension of typed functional programming, and IO is how we resolve it.
The insight is to describe impure computations as pure values. An IO String is not a string obtained through side effects; it is a pure, inert description of a computation that, when executed by the runtime, will interact with the world and produce a string. The description itself is referentially transparent. You can pass it around, store it in data structures, combine it with other descriptions, all without anything actually happening. Only when the runtime interprets the description do effects occur. The effect is separated from the specification of the effect, and that separation is everything.
When effects are explicit in types, the signature readFile : FilePath → IO String tells you something important: this function touches the filesystem. Compare this to a language where any function might secretly perform IO, where you cannot know without reading the implementation whether getUserName() is a pure lookup or a network call. The type becomes documentation that cannot lie. More than that, it becomes a constraint that tools can check. A function typed as pure cannot sneak in a database write. The compiler enforces the separation.
Effects come in varieties. Observable effects change the world in ways the program can later detect: writing a file, then reading it back. Unobservable effects change the world but leave no trace the program can see: logging to stderr, incrementing a metrics counter, writing to /dev/null. Then there are effects with international consequences: the missile launch, the funds transfer, the irreversible deletion. Code that launches missiles should look different from code that formats strings. When effects are tracked in types, it does. The engineer reading the signature knows what they are dealing with. The engineer writing the code must confront, at every function boundary, whether this operation belongs in IO or whether it can remain pure. A bear foraging in the forest leaves tracks; a program touching the world should leave type signatures. This is engineering discipline: knowing the consequences of your logic and designing systems where those consequences are visible, reviewable, and constrained.
This matters for the same reason everything in this series matters: we are building systems too complex to hold in our heads, increasingly written by machines that have no heads at all. Explicit effect tracking is not academic purity worship. It is engineering discipline for an era when the codebase is larger than any human can read and the authors include entities that optimize for plausibility rather than correctness. The type signature is the contract. The contract is the only thing we can trust.
IO Basics
The IO monad represents computations that can perform side effects. All effectful operations live inside IO, and you sequence them using do notation. The “Hello, World!” of functional programming is not a string; it is IO Unit, a description of an action that, when run, will emit bytes to a file descriptor and return nothing interesting. The Unit return type is honest about its triviality: we did not call this function for its result but for its consequences.
def greet : IO Unit := do
IO.println "What is your name?"
let stdin ← IO.getStdin
let name ← stdin.getLine
IO.println s!"Hello, {name.trim}!"
def printNumbers : IO Unit := do
for i in [1, 2, 3, 4, 5] do
IO.println s!"Number: {i}"
def getCurrentTime : IO Unit := do
let now ← IO.monoMsNow
IO.println s!"Milliseconds since start: {now}"
Pure Computations in IO
Not everything in IO needs to be effectful. Pure computations can be lifted into IO when you need to mix them with effects. The pure function wraps a value in an IO that does nothing but return that value. This is not contamination; it is embedding. A pure value inside IO is like a diplomat with immunity: present in foreign territory but untouched by local laws. You can always extract the pure part because it never actually did anything.
def pureComputation : IO Nat := do
let x := 10
let y := 20
return x + y
#eval pureComputation -- 30
def combineIO : IO String := do
let a ← pure "Hello"
let b ← pure "World"
return s!"{a} {b}"
#eval combineIO -- "Hello World"
File Operations
File operations live in the IO.FS namespace: read, write, append, delete, the usual POSIX inheritance. Files are where programs meet persistence, where the ephemeral computation leaves a mark that outlasts the process. The filesystem is the boundary between the volatile and the durable, between the RAM that vanishes when power fails and the spinning rust or flash cells that remember. Every configuration file, every log, every database is ultimately bytes on a filesystem, and the operations here are how you reach them.
def writeToFile (path : String) (content : String) : IO Unit := do
IO.FS.writeFile path content
def readFromFile (path : String) : IO String := do
IO.FS.readFile path
def appendToFile (path : String) (content : String) : IO Unit := do
let handle ← IO.FS.Handle.mk path .append
handle.putStrLn content
def copyFile (src dst : String) : IO Unit := do
let content ← IO.FS.readFile src
IO.FS.writeFile dst content
Processing Lines
Reading a file line by line is the fundamental pattern of Unix text processing, the reason awk and sed exist, the shape of a thousand shell pipelines. The newline character is civilization’s oldest API, a contract that dates to the teletype and has outlived every attempt to replace it. In Lean, you get the same streaming capability with type safety and without the quoting nightmares that have spawned more security vulnerabilities than any other single source.
def processLines (path : String) : IO (List String) := do
let content ← IO.FS.readFile path
return content.splitOn "\n"
def countLines (path : String) : IO Nat := do
let lines ← processLines path
return lines.length
def filterLines' (path : String) (pred : String → Bool) : IO (List String) := do
let lines ← processLines path
return lines.filter pred
Error Handling
IO operations can fail. Files go missing, networks drop packets, disks fill up, permissions change, the NFS mount goes stale, the USB drive gets yanked mid-write. The universe is hostile to running programs, and error handling is how we cope with entropy’s preference for chaos. Like a bear preparing for winter, a robust program anticipates scarcity: the resource that was there yesterday may not be there tomorrow. Lean provides try-catch blocks that should feel familiar to anyone who has written Java, Python, or JavaScript, except that here the error handling is explicit in the type. An IO action either succeeds or throws, and try/catch is how you intercept the failure before it propagates to somewhere you cannot recover.
def safeDivideIO (x y : Nat) : IO Nat := do
if y == 0 then
throw <| IO.userError "Division by zero"
return x / y
def trySafeDivide : IO Unit := do
try
let result ← safeDivideIO 10 0
IO.println s!"Result: {result}"
catch e =>
IO.println s!"Error: {e}"
def withDefault' {α : Type} (action : IO α) (default : α) : IO α := do
try
action
catch _ =>
return default
Mutable References
Sometimes you need a mutable cell, a box you can read from and write to as the computation proceeds. IO.Ref provides this: a reference that lives in IO, supporting get, set, and modify. This is controlled mutation, explicit in the type system, not the invisible aliasing that makes imperative code a maze of spooky action at a distance. The reference is a value. You can see who has access to it. You can trace where it flows. The mutation is real, but the accounting is honest.
def counterExample : IO Nat := do
let counter ← IO.mkRef 0
for _ in List.range 10 do
counter.modify (· + 1)
counter.get
#eval counterExample -- 10
def accumulate (values : List Nat) : IO Nat := do
let sum ← IO.mkRef 0
for v in values do
sum.modify (· + v)
sum.get
#eval accumulate [1, 2, 3, 4, 5] -- 15
Monad Transformers: ExceptT
Real programs layer effects. You want IO for the filesystem, but you also want typed errors that propagate cleanly. Monad transformers stack these capabilities like geological strata, each layer adding something to the computation below. ExceptT wraps another monad and adds the ability to short-circuit with a typed error. The result is a computation that can both perform IO and fail with a specific error type, the signature making both capabilities explicit. The layering is architectural, not just convenient. The type tells you exactly what can go wrong and what side effects might occur.
abbrev AppM := ExceptT String IO
def validatePositive (n : Int) : AppM Int := do
if n <= 0 then throw "Must be positive"
return n
def validateRange (n : Int) (lo hi : Int) : AppM Int := do
if n < lo || n > hi then throw s!"Must be between {lo} and {hi}"
return n
def processNumber : AppM Int := do
let n ← validatePositive 42
let m ← validateRange n 0 100
return m * 2
def runApp : IO Unit := do
match ← processNumber.run with
| .ok result => IO.println s!"Success: {result}"
| .error msg => IO.println s!"Failed: {msg}"
Monad Transformers: ReaderT
ReaderT provides access to a read-only environment, an implicit parameter that threads through your computation without cluttering every function signature. This is the dependency injection pattern done right: instead of global variables that any code can mutate, you have a typed environment that flows lexically through your program. Configuration, database connections, logger handles, the parameters that everything needs but nothing should own. The environment is read-only, which means you can reason about it. No function can secretly change the configuration and break something downstream.
structure Config where
verbose : Bool
maxRetries : Nat
timeout : Nat
deriving Repr
abbrev ConfigM := ReaderT Config IO
def getVerbose : ConfigM Bool := do
let cfg ← read
return cfg.verbose
def logIfVerbose (msg : String) : ConfigM Unit := do
if ← getVerbose then
IO.println s!"[LOG] {msg}"
def runWithConfig : IO Unit := do
let config : Config := { verbose := true, maxRetries := 3, timeout := 5000 }
(logIfVerbose "Starting process").run config
Running External Processes
Lean can spawn external processes and capture their output, bridging the gap between the typed world inside your program and the untyped chaos of shell commands. This is where you call out to git, invoke compilers, run linters, orchestrate builds. The interface is necessarily stringly-typed at the boundary, but you can wrap it in whatever validation your domain requires. Just remember that every subprocess is a portal to undefined behavior: the external program can do anything, and your types cannot save you from what happens on the other side.
def runCommand (cmd : String) (args : List String) : IO String := do
let output ← IO.Process.output {
cmd := cmd
args := args.toArray
}
return output.stdout
def shellExample : IO Unit := do
let result ← runCommand "echo" ["Hello", "World"]
IO.println result
Environment Variables
Environment variables are the original configuration mechanism, a key-value store that predates databases by decades. They are inherited from parent processes, invisible in your source code, and different on every machine. This is both their power and their peril. PATH, HOME, USER, the secret sauce that makes your program behave differently in development and production. Access them through IO because reading them is an effect: the same code will behave differently depending on what the shell exported before launch.
def getEnvVar (name : String) : IO (Option String) := do
IO.getEnv name
def printPath : IO Unit := do
match ← getEnvVar "PATH" with
| some path => IO.println s!"PATH: {path}"
| none => IO.println "PATH not set"
def getCwd' : IO System.FilePath := do
IO.currentDir
The Discipline of Effects
The history of programming is a history of discovering what we should not have been allowed to do. Unrestricted goto gave us spaghetti code; we constrained control flow with structured programming. Unrestricted memory access gave us buffer overflows; we constrained pointers with ownership and garbage collection. Unrestricted side effects gave us programs impossible to test, impossible to parallelize, impossible to reason about. We are still learning to constrain effects.
Every constraint is a gift. When the type system refuses to let you perform IO in a pure function, it is saving you from bugs that would surface only in production, only under load, only on the third Tuesday of months with an R in them. The constraint is leverage.
The IO monad is a template for how to think about the boundary between your program and the world. The world is messy, stateful, concurrent, and hostile. Your program can be clean, pure, sequential, and predictable. The boundary between them should be narrow, explicit, and scrutinized. That is engineering discipline for systems that must work when no one is watching.
The Second Arc
This concludes the programming arc. You can now write typed functions, define data structures, use type classes for abstraction, sequence effects with monads, and interact with the operating system. You have written FizzBuzz, parsed command-line arguments, read files, handled errors, and spawned processes. Congratulations, you are a Lean programmer.
Time to become a Lean prover.
Lean is both a programming language and a theorem prover. The same type system that checks your functions can verify mathematical claims. The same compiler that rejects type errors can reject invalid proofs. Everything you have learned carries forward, because in Lean, proofs are programs and propositions are types. This is not a metaphor. It is a mathematical theorem called the Curry-Howard correspondence, and it means the skills you just built are directly applicable to theorem proving. You have already seen hints of this: Fin n bundles a number with a proof it is in bounds; subtypes attach predicates to values. These were not special cases but instances of the general pattern.
Consider what you already know. You know that every expression has a type, and the type checker verifies that expressions have the types you claim. You know how to construct values: literals, function applications, pattern matching, recursion. You know that the compiler rejects programs that do not type-check. These facts carry over unchanged. A proposition is just a type. A proof is just a value of that type. When you prove \(P \to Q\), you are constructing a function from P to Q. When you prove \(P \land Q\), you are constructing a pair. When you prove \(\forall n, P(n)\), you are constructing a function that takes any n and returns a proof of P n. The vocabulary changes, but the mechanics are the same.
The second arc introduces new tools for constructing these values. Tactics are commands that manipulate an incomplete proof, filling in pieces step by step. Where term-mode proof requires you to write the entire proof term at once, tactic-mode lets you work interactively: state a goal, apply a tactic, see what remains, repeat until nothing remains. The Infoview panel that showed you types and values now shows you goals and hypotheses. The tight feedback loop remains.
You will also encounter new concepts. Induction is recursion wearing a different hat. Case analysis is pattern matching. Proof by contradiction is constructing a function from \neg P to False. The jargon can be intimidating, but the underlying structures are ones you already understand. When someone says “we proceed by induction on n,” they mean “we define the proof recursively, with a base case for zero and a step case that assumes the result for n and proves it for n + 1.” You have been writing recursive functions all along.
What changes is the goal. In Arc I, you wrote programs to compute outputs from inputs. The type system ensured your functions were well-formed. In Arc II, you write proofs to establish that propositions are true. The type system ensures your proofs are valid. A proof is a program whose return type happens to be a logical statement rather than a data structure. The compiler verifying your proof is the same compiler that verified your FizzBuzz implementation. It just happens to be checking a different claim.
This is the strange magic of dependent types. The wall between computation and logic dissolves. You can write a sorting function and prove it sorts. You can implement a parser and prove it handles all inputs. You can design a protocol and prove it never deadlocks. The same language, the same compiler, the same workflow. The boundary between “programming” and “proving” becomes a matter of which types you are constructing values for.
The articles ahead cover: the foundations of proof in Lean, type theory and the Curry-Howard correspondence in depth, dependent types and their power, strategies for approaching proofs, a comprehensive tactics reference, and applications from classic mathematics to software verification. By the end, you will be proving theorems about the programs you write, and understanding why that matters.
The transition may feel abrupt. Programming and mathematics have been taught as separate subjects for so long that their unification can seem like a category error. It is not. The historical accident is that they were separated. Lean reveals them as aspects of the same thing: constructing terms of specified types. What you learned in Arc I is directly applicable to Arc II. The rest is vocabulary.
Let us begin.
Proofs
You have written functions. You have defined types. You have pattern matched, recursed, and composed. But you have not yet proved anything.
The difference matters. When you write a function, the compiler checks that types align, but it does not verify that your code does what you claim. You say this function sorts? The compiler shrugs. In theorem proving, you make claims and then justify them. The compiler verifies that your justification actually establishes the claim. You cannot bluff your way through a proof.
A bear learns to fish by watching the stream, understanding where salmon pause, developing patience for the moment when motion becomes certainty. Proving is similar. You learn to read the goal state, understand where progress stalls, develop patience for the tactic that transforms confusion into clarity.
Programming and Proving
Lean unifies programming and theorem proving through type theory. The same language that lets you define a function also lets you state and prove properties about it. Understanding how these fit together is essential before writing your first proof.
-- A function computes values - it has computational content
def factorial : Nat → Nat
| 0 => 1
| n + 1 => (n + 1) * factorial n
-- A theorem proves properties - it has no computational content at runtime
theorem factorial_pos : ∀ n, 0 < factorial n := by
intro n
induction n with
| zero => simp [factorial]
| succ n ih => simp [factorial]; omega
The factorial function computes values. It has computational content because it produces output from input. At runtime, it runs and returns numbers.
The factorial_pos theorem proves that factorial always returns a positive number. This proof convinces the type checker that the property holds, but it does not compute anything useful at runtime. The proof exists only to satisfy Lean’s verification. Once the compiler confirms the proof is valid, the proof term itself can be discarded. Proofs are checked at compile time and deleted before the program runs.
The proof uses omega, a decision procedure for linear arithmetic that we cover later in this chapter. For now, just note that it automatically handles numeric inequalities.
The distinction between def and theorem reflects this. Both define named values, but theorem marks its body as opaque: Lean will never unfold it during type checking. This prevents proofs from slowing down type checking when they appear in types (since proofs are erased before runtime, they cannot affect execution speed). A def can be unfolded and computed with; a theorem cannot. If you need a lemma that Lean should simplify through, use def or mark the theorem with @[simp].
What about proofs that appear as function arguments?
-- Proofs as function arguments: checked at compile time, erased at runtime
def safeDiv (n : Nat) (d : Nat) (h : d ≠ 0) : Nat := n / d
-- The proof argument h vanishes in the compiled code
#eval safeDiv 10 3 (by decide) -- 3
The proof h ensures at compile time that you cannot call safeDiv with a zero divisor. But at runtime, h vanishes. The compiled code receives only n and d. This is the power of Lean’s type system: proofs enforce invariants during development, then disappear from the final executable.
Notation
Before we write our first proof, we need a shared language. The notation below bridges three worlds: the mathematical symbols you find in logic textbooks, the inference rules used in programming language theory (as in Pierce’s Types and Programming Languages and Harper’s Practical Foundations for Programming Languages), and the Lean syntax you will write. Learning to read all three simultaneously is the key to fluency.
| Symbol | Name | Meaning |
|---|---|---|
| $\vdash$ | turnstile | “proves” or “entails” |
| $\Gamma$ | Gamma | the context (hypotheses we can use) |
| $\to$ | arrow | implication or function type |
| $\forall$ | for all | universal quantification |
| $\exists$ | exists | existential quantification |
| $\land$ | and | conjunction |
| $\lor$ | or | disjunction |
| $\top$ | top | truth (trivially provable) |
| $\bot$ | bottom | falsehood (unprovable) |
A judgment $\Gamma \vdash P$ reads “from context $\Gamma$, we can prove $P$.” An inference rule shows how to derive new judgments from existing ones:
\[ \frac{\Gamma \vdash P \quad \Gamma \vdash Q}{\Gamma \vdash P \land Q} \text{(∧-intro)} \]
This rule says: if you can prove $P$ and you can prove $Q$, then you can prove $P \land Q$. The premises sit above the line; the conclusion below. The name on the right identifies the rule. Every tactic you learn corresponds to one or more such rules. The tactic is the mechanism; the rule is the justification.
Each logical connective and type former comes with two kinds of rules. Introduction rules tell you how to construct a proof or value: to prove P ∧ Q, prove both P and Q. Elimination rules tell you how to use a proof or value: from P ∧ Q, you can extract P or Q. This pattern is universal. For implication, introduction is fun h => ... (assume the premise), elimination is function application (use the implication). For existence, introduction provides a witness, elimination uses the witness. Once you internalize this pattern, you can work with any connective by asking: “How do I build one?” and “How do I use one?”
Tactics as Proof-State Transformers
You may have repressed the trauma of high school algebra, but the core idea was sound: you start with $2x + 5 = 11$ and apply operations until you reach $x = 3$. Subtract 5, divide by 2, each step transforming the equation into something simpler. The tedium was doing it by hand, error-prone and joyless. But the method itself, symbolic manipulation through mechanical transformation, turns out to be extraordinarily powerful when the machine handles the bookkeeping.
Tactics work the same way. You start with a goal (what you want to prove) and a context (what you already know). Each tactic transforms the goal into simpler subgoals. You keep applying tactics until no goals remain. The proof is the sequence of transformations, not a single flash of insight.
Think of it as a game. Your current position is the proof state: the facts you hold and the destination you seek. Each tactic is a legal move that changes your position. Some moves split one goal into two (like constructor creating two subgoals). Some moves close a goal entirely (like rfl finishing with a checkmate). You win when the board is empty.
Formally, a proof state is a judgment $\Gamma \vdash G$: context $\Gamma$, goal $G$. A tactic transforms one proof state into zero or more new proof states. When no goals remain, the proof is complete. This table is your Rosetta Stone:
| Tactic | Before | After | Rule |
|---|---|---|---|
intro h | $\Gamma \vdash P \to Q$ | $\Gamma, h:P \vdash Q$ | $\to$-intro |
apply f | $\Gamma \vdash Q$ | $\Gamma \vdash P$ | $\to$-elim (given $f : P \to Q$) |
exact h | $\Gamma, h:P \vdash P$ | $\square$ | assumption |
rfl | $\Gamma \vdash t = t$ | $\square$ | refl |
constructor | $\Gamma \vdash P \land Q$ | $\Gamma \vdash P$, $\Gamma \vdash Q$ | $\land$-intro |
left | $\Gamma \vdash P \lor Q$ | $\Gamma \vdash P$ | $\lor$-intro₁ |
right | $\Gamma \vdash P \lor Q$ | $\Gamma \vdash Q$ | $\lor$-intro₂ |
cases h | $\Gamma, h:P \lor Q \vdash R$ | $\Gamma, h:P \vdash R$, $\Gamma, h:Q \vdash R$ | $\lor$-elim |
induction n | $\Gamma \vdash \forall n,, P(n)$ | $\Gamma \vdash P(0)$, $\Gamma, ih:P(k) \vdash P(k{+}1)$ | Nat-ind |
rw [h] | $\Gamma, h: a=b \vdash P[a]$ | $\Gamma, h:a=b \vdash P[b]$ | subst |
simp | $\Gamma \vdash G$ | $\Gamma \vdash G’$ | rewrite* |
contradiction | $\Gamma, h:\bot \vdash P$ | $\square$ | $\bot$-elim |
The symbol $\square$ marks a completed goal. Multiple goals after “After” mean the tactic created subgoals. Read left to right: you have the state on the left, you apply the tactic, you must now prove everything on the right. This is the algebra of proof. Each tactic is a function from proof states to proof states, and a complete proof is a composition that maps your theorem to $\square$.
Reading the notation: In expressions like $\Gamma, h:a=b \vdash P[a]$, the comma separates hypotheses (the “extended context”), the colon separates a hypothesis name from its type, and the turnstile $\vdash$ separates what you have from what you must prove. Lean’s InfoView displays this vertically, one hypothesis per line:
h : a = b
⊢ P[a]
The horizontal notation packs the same information into table cells. Once you can read one, you can read the other.
If the table above looks like both logic and programming, that is not a coincidence.
Proving vs Programming
The surprising insight is that proving and programming are the same activity viewed differently. A proof is a program. A theorem is a type. When you prove $P \to Q$, you are writing a function that transforms evidence for $P$ into evidence for $Q$. This correspondence, the Curry-Howard isomorphism, means that logic and computation are two views of the same underlying structure:
| Logic | Programming |
|---|---|
| proposition | type |
| proof | program |
| $P \to Q$ | function from P to Q |
| $P \land Q$ | pair (P, Q) |
| $P \lor Q$ | either P or Q |
| $\top$ | unit type |
| $\bot$ | empty type |
Every function you have written so far was secretly a proof. Every proof you write from now on is secretly a program. Two cultures, mathematicians and programmers, spoke the same language for decades without knowing it.
What You Already Know
The concepts from Arc I are not prerequisites for Arc II. They are the same concepts in different clothing. If you understood programming in Lean, you already understand proving. The vocabulary changes; the structures do not.
| Arc I (Programming) | Arc II (Proving) | Why They Match |
|---|---|---|
Pattern matching on Nat constructors | The cases tactic on natural numbers | Both examine which constructor built the value |
| Recursive function with base case | Proof by induction with base case | Both reduce a problem on \(n+1\) to the same problem on \(n\) |
Function type signature α → β | Theorem statement P → Q | Both declare what goes in and what comes out |
| Function body (the implementation) | Proof term (the justification) | Both witness that the signature/statement is inhabited |
Returning a value of type α | Providing a term of type P (a proof of P) | Both construct an inhabitant of the required type |
match x with | none => ... | some a => | cases h with | none => ... | some a => ... | Both split on constructors and handle each possibility |
| Termination checking on recursive calls | Well-founded induction on decreasing measures | Both ensure the process ends |
Type error: expected β, got γ | Proof error: expected Q, got R | Both mean you produced the wrong thing |
When you wrote match n with | 0 => ... | n + 1 => ... in the Control Flow article, you were doing case analysis. The cases n tactic does the same thing to a proof goal. When you wrote a recursive function that called itself on n to compute a result for n + 1, you were doing induction. The induction n tactic generates exactly that structure: a base case and a step that assumes the result for n.
The syntax differs because tactics operate on proof states rather than values directly. But the reasoning is identical. If you can write a recursive function over natural numbers, you can prove a theorem about natural numbers. You have been training for this.
Your First Proof
Let us prove something undeniably true: one plus one equals two.
-- Your very first proof: 1 + 1 = 2
theorem one_plus_one : 1 + 1 = 2 := by
rfl
-- Without tactics, just a direct proof term
theorem one_plus_one' : 1 + 1 = 2 := rfl
Whitehead and Russell famously required 362 pages of Principia Mathematica before reaching this result. We have done it in three characters. This is not because we are cleverer than Russell; it is because we inherited infrastructure. The Principia was an attempt to place all of mathematics on rigorous foundations, to banish the intuition and hand-waving that had allowed paradoxes to creep into set theory. It was a heroic, doomed effort: the notation was unreadable, the proofs were uncheckable by any human in finite time, and Gödel would soon prove that the program could never fully succeed. But the ambition was right. The ambition was to make mathematics a science of proof rather than a craft of persuasion.
A century later, the ambition survives in different form. We do not write proofs in Russell’s notation; we write them in languages that machines can check. The 362 pages compress to three characters not because the mathematics got simpler but because the verification got automated. What mathematicians have been writing all along was pseudocode: informal instructions meant for human execution, full of implicit steps and assumed context, correct only if the reader filled in the gaps charitably. We are finally compiling that pseudocode.
The keyword by enters tactic mode. Instead of writing a proof term directly, you give commands that build the proof incrementally. The tactic rfl (reflexivity) says “both sides of this equation compute to the same value, so they are equal.” Lean evaluates 1 + 1, gets 2, sees that 2 = 2, and accepts the proof. No faith required. No appeals to authority. The machine checked, and the machine does not lie.
Or does it? Ken Thompson’s Reflections on Trusting Trust demonstrated that a compiler can be trojaned to insert backdoors into code it compiles, including into future versions of itself. Turtles all the way down. At some point you trust the hardware, the firmware, the operating system, the compiler that compiled your proof assistant. We choose to stop the regress somewhere, not because the regress ends but because we must act in the world despite uncertainty. This is the stoic’s bargain: do the work carefully, verify what can be verified, and accept that perfection is not on offer. The alternative is paralysis, and paralysis builds nothing.
The Goal State
When you write proofs in Lean, the editor shows you the current goal state. This is your map, your honest accounting of where you stand. Unlike tests that can pass while bugs lurk, unlike documentation that drifts from reality, the goal state cannot lie. It tells you exactly what you have (hypotheses) and exactly what you need to prove (the goal). The gap between aspiration and achievement is always visible.
-- Demonstrating how the goal state changes
theorem add_zero (n : Nat) : n + 0 = n := by
rfl
theorem zero_add (n : Nat) : 0 + n = n := by
simp
When you place your cursor after by in add_zero, you see:
n : Nat
⊢ n + 0 = n
The line n : Nat is your context: the facts you know, the tools you have. The symbol ⊢ (turnstile) separates what you have from what you need. The goal n + 0 = n is your obligation. After applying rfl, the goal disappears. No goals means the proof is complete. You have caught your fish.
Reflexivity: rfl
The rfl tactic proves goals of the form $a = a$ where both sides are definitionally equal. In inference rule notation:
\[ \frac{}{\Gamma \vdash a = a} \text{(refl)} \]
No premises above the line means the rule is an axiom: equality is reflexive, always, unconditionally. “Definitionally equal” means Lean can compute both sides to the same value without any lemmas. This is equality by computation, the most basic form of truth: run the program on both sides and see if you get the same answer.
-- rfl works when both sides compute to the same value
theorem two_times_three : 2 * 3 = 6 := by rfl
theorem list_length : [1, 2, 3].length = 3 := by rfl
theorem string_append : "hello " ++ "world" = "hello world" := by rfl
theorem bool_and : true && false = false := by rfl
def double (n : Nat) : Nat := n + n
theorem double_two : double 2 = 4 := by rfl
When rfl works, it means the equality is “obvious” to Lean’s computation engine. When it fails, you need other tactics to transform the goal into something rfl can handle.
How does definitional equality relate to other equality types? Definitional equality is the strongest: if a and b are definitionally equal, rfl proves a = b with no computation. Decidable equality (via DecidableEq and decide, discussed in Polymorphism) handles cases where equality can be computed at runtime, like 5 = 5 or "hello" = "hello". Propositional equality (a = b as a Prop) is the most general: you may need lemmas and rewriting to prove it. All three describe the same = type, but they differ in how much work is required to establish the proof.
Triviality: trivial
The trivial tactic handles goals that are straightforwardly true. It combines several simple tactics and works well for basic logical facts.
theorem true_is_true : True := by
trivial
theorem one_le_two : 1 ≤ 2 := by
trivial
theorem and_true : True ∧ True := by
trivial
Simplification: simp
The simp tactic is your workhorse. It applies a database of hundreds of rewrite rules, accumulated over years by the mathlib community, to simplify the goal. This is collective knowledge made executable: every time someone proved that x + 0 = x or list.reverse.reverse = list, they added to the arsenal that simp deploys on your behalf.
theorem add_zero_simp (n : Nat) : n + 0 = n := by
simp
theorem zero_add_simp (n : Nat) : 0 + n = n := by
simp
theorem silly_arithmetic : (1 + 0) * (2 + 0) + 0 = 2 := by
simp
theorem list_append_nil {α : Type*} (xs : List α) : xs ++ [] = xs := by
simp
theorem use_hypothesis (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
simp [h]
When simp alone does not suffice, you can give it additional lemmas: simp [lemma1, lemma2]. You can also tell it to use hypotheses from your context: simp [h].
Tip
When stuck, try
simpfirst. It solves a surprising number of goals. If it does not solve the goal completely, look at what remains.
Using Hypotheses: exact
The simplest way to close a goal is to provide exactly what is needed. If your goal is P and you have a hypothesis h : P, then exact h finishes the proof.
-- exact: provide exactly the term needed to close the goal
theorem exact_demo (P : Prop) (h : P) : P := by
exact h
-- When the goal is P and you have h : P, exact h closes it
theorem exact_with_function (P Q : Prop) (h : P) (f : P → Q) : Q := by
exact f h
The exact tactic says “this term has exactly the type we need.” It works with any expression, not just hypothesis names. If f : P → Q and h : P, then exact f h proves Q.
Introducing Assumptions: intro
When your goal is an implication $P \to Q$, you assume $P$ and prove $Q$. This is the introduction rule for implication:
\[ \frac{\Gamma, P \vdash Q}{\Gamma \vdash P \to Q} \text{(→-intro)} \]
Read this bottom-up: to prove $P \to Q$ (below the line), it suffices to prove $Q$ while assuming $P$ (above the line). The intro tactic performs this transformation, moving the antecedent from goal to hypothesis.
-- intro: move the antecedent of an implication into the context
theorem imp_self (P : Prop) : P → P := by
intro hp -- Now hp : P is in context, goal is P
exact hp -- Goal matches hypothesis exactly
After intro hp, the goal changes from P → P to just P, and you gain hypothesis hp : P. Multiple assumptions can be introduced at once: intro h1 h2 h3.
The same tactic handles universal quantifiers. When your goal is ∀ n, P n, intro n introduces n as a variable in scope:
-- intro also handles universal quantifiers
theorem forall_self (P : Nat → Prop) : (∀ n, P n) → (∀ n, P n) := by
intro h n -- h : ∀ n, P n and n : Nat now in context
exact h n -- Apply h to n to get P n
Applying Lemmas: apply
The apply tactic performs backward reasoning. When your goal is $Q$ and you have $h : P \to Q$, applying $h$ transforms the goal to $P$. You reason backward from what you want to what you need. This is the elimination rule for implication:
\[ \frac{\Gamma \vdash P}{\Gamma, h : P \to Q \vdash Q} \text{(→-elim with apply)} \]
Read this as: if you can prove $P$, and you have $h : P \to Q$, then you can prove $Q$. The apply tactic inverts this: to prove $Q$, it suffices to prove $P$.
-- apply: use a lemma whose conclusion matches the goal (backward reasoning)
theorem imp_trans (P Q R : Prop) : (P → Q) → (Q → R) → P → R := by
intro hpq hqr hp -- hpq : P → Q, hqr : Q → R, hp : P
apply hqr -- Goal changes from R to Q (backward step)
apply hpq -- Goal changes from Q to P (backward step)
exact hp -- Goal P matches hypothesis hp
In imp_trans, we have three implications chained together: $(P \to Q) \to (Q \to R) \to P \to R$. This reads as “if P implies Q, and Q implies R, then P implies R.” The arrows associate to the right, so it parses as $(P \to Q) \to ((Q \to R) \to (P \to R))$. After introducing all hypotheses, the goal is R. We apply hqr : Q → R to reduce the goal to Q, then apply hpq : P → Q to reduce it to P, then exact hp closes it.
Intermediate Steps: have
Sometimes you want to prove a helper fact before using it. The have tactic introduces a new hypothesis with its own proof. This is how knowledge accumulates: you establish a stepping stone, name it, and build on it.
-- have: introduce intermediate results
theorem have_demo (a b c d : Nat) (h1 : a = b) (h2 : b = c) (h3 : c = d) : a = d := by
have ab_eq_c : a = c := by rw [h1, h2] -- Combine first two equalities
rw [ab_eq_c, h3] -- Use the intermediate result
theorem sum_square (n : Nat) : (n + 1) * (n + 1) = n * n + 2 * n + 1 := by
have expand : (n + 1) * (n + 1) = n * n + n + n + 1 := by ring
have simplify : n + n = 2 * n := by ring
omega
The pattern have name : type := proof adds name : type to your context.
Case Analysis: cases
When you have a value of an inductive type, cases splits the proof into one case per constructor. This is exhaustive reasoning: you consider every possible form the value could take, and you prove your claim holds in each. The compiler ensures you miss nothing. This is how careful decisions should be made: enumerate the possibilities, handle each one, leave no branch unexamined.
theorem bool_cases (b : Bool) : b = true ∨ b = false := by
cases b with
| true => left; rfl
| false => right; rfl
theorem nat_zero_or_succ (n : Nat) : n = 0 ∨ n ≥ 1 := by
cases n with
| zero => left; rfl
| succ m => right; simp
theorem option_destruct (o : Option Nat) : o = none ∨ ∃ n, o = some n := by
cases o with
| none => left; rfl
| some n => right; exact ⟨n, rfl⟩
For Bool, there are two cases: true and false. For Nat, there are two cases: zero and succ m. For Option, there are none and some n.
The syntax ⟨n, rfl⟩ in the last example is anonymous constructor notation. The goal ∃ n, o = some n requires a witness and a proof. The angle brackets ⟨...⟩ construct the existential: n is the witness, and rfl proves o = some n (since in this branch, o is definitionally some n). This is equivalent to writing Exists.intro n rfl.
Induction
For properties of natural numbers, mathematical induction is the fundamental principle:
\[ \frac{\Gamma \vdash P(0) \quad \Gamma, P(n) \vdash P(n+1)}{\Gamma \vdash \forall n., P(n)} \text{(Nat-ind)} \]
Prove the base case $P(0)$. Then prove the inductive step: assuming $P(n)$, show $P(n+1)$. From these two finite proofs, you derive a statement about infinitely many numbers. The induction tactic generates both proof obligations automatically. The principle dates to Pascal and Fermat, but the mechanization is new.
theorem sum_twice (n : Nat) : n + n = 2 * n := by
induction n with
| zero => rfl
| succ n ih => omega
theorem length_append {α : Type*} (xs ys : List α) :
(xs ++ ys).length = xs.length + ys.length := by
induction xs with
| nil => simp
| cons x xs ih => simp [ih]; ring
In the succ case, you get an induction hypothesis ih that assumes the property holds for n, and you must prove it holds for n + 1.
Arithmetic: omega
For goals involving linear arithmetic over natural numbers or integers, omega is powerful. It implements a decision procedure for Presburger arithmetic, a fragment of number theory that is provably decidable. Within its domain, omega does not search or guess; it decides.
theorem omega_simple (n : Nat) (h : n < 10) : n < 100 := by
omega
theorem omega_transitive (a b c : Int) (h1 : a < b) (h2 : b < c) : a < c := by
omega
theorem omega_sum (x y : Nat) (h : x + y = 10) : x ≤ 10 := by
omega
If your goal involves only addition, subtraction, multiplication by constants, and comparisons, try omega.
Decision Procedures: decide
For decidable propositions, decide simply computes the answer. Is 7 less than 10? Run the comparison. Is this list empty? Check. Some questions have algorithms that answer them definitively, and decide invokes those algorithms. When it works, there is nothing to prove; the computation is the proof.
theorem three_lt_five : (3 : Nat) < 5 := by
decide
theorem bool_compute : (true && false) = false := by
decide
theorem list_membership : 3 ∈ [1, 2, 3, 4, 5] := by
decide
theorem fin_in_bounds : (2 : Fin 5).val < 5 := by
decide
Putting It Together
Real proofs combine multiple tactics. You introduce assumptions, simplify, split cases, apply lemmas, and close with computation. The art is knowing which tool fits which moment. With practice, patterns emerge: implications call for intro, equalities for rw or simp, inductive types for cases or induction. The goal state guides you.
theorem worked_example (n : Nat) : n + 0 = 0 + n := by
simp
theorem worked_example2 (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
rw [h]
theorem combined_proof (n : Nat) (h : n > 0) : n - 1 + 1 = n := by
omega
theorem list_nonempty (xs : List Nat) (h : xs ≠ []) : xs.length > 0 := by
cases xs with
| nil => contradiction
| cons x xs' => simp
The Tactics You Need
| Tactic | Purpose |
|---|---|
rfl | Prove a = a when both sides compute to the same value |
trivial | Prove obviously true goals |
simp | Simplify using rewrite rules |
intro | Assume hypotheses from implications and universals |
apply | Use a lemma whose conclusion matches the goal |
exact | Provide exactly the term needed |
have | Introduce intermediate results |
cases | Split on constructors of inductive types |
induction | Prove by induction on recursive types |
omega | Solve linear arithmetic |
decide | Compute decidable propositions |
rw | Rewrite using an equality |
These twelve tactics will carry you through most of what follows.
Exercises
The best way to learn tactics is to use them. These exercises progress from straightforward applications of single tactics to combinations that require reading the goal state carefully.
-- Exercise 1: Use rfl to prove this computation
theorem ex_rfl : 3 * 4 = 12 := by
rfl
-- Exercise 2: Use simp to simplify this expression
theorem ex_simp (n : Nat) : n * 1 + 0 = n := by
simp
-- Exercise 3: Use intro and exact
theorem ex_intro_exact (P Q : Prop) (h : P) (hpq : P → Q) : Q := by
exact hpq h
-- Exercise 4: Use cases to prove this about booleans
theorem ex_bool_not_not (b : Bool) : !!b = b := by
cases b <;> rfl
-- Exercise 5: Use induction to prove addition is commutative
theorem ex_add_comm (n m : Nat) : n + m = m + n := by
induction n with
| zero => simp
| succ n ih => omega
-- Exercise 6: Use omega to prove this inequality
theorem ex_omega (x y : Nat) (h1 : x ≤ 5) (h2 : y ≤ 3) : x + y ≤ 8 := by
omega
-- Exercise 7: Combine multiple tactics
theorem ex_combined (xs : List Nat) : ([] ++ xs).length = xs.length := by
simp
-- Exercise 8: Prove implication transitivity
theorem ex_imp_chain (A B C D : Prop) : (A → B) → (B → C) → (C → D) → A → D := by
intro hab hbc hcd ha
exact hcd (hbc (hab ha))
-- Exercise 9: Use cases on a natural number
theorem ex_nat_lt (n : Nat) : n = 0 ∨ 0 < n := by
cases n with
| zero => left; rfl
| succ m => right; omega
-- Exercise 10: Prove a property about list reversal
theorem ex_reverse_nil : ([] : List Nat).reverse = [] := by
rfl
If you get stuck, ask yourself: what is the shape of my goal? What tactic handles that shape? What hypotheses do I have available? The Infoview is your guide.
The Liar’s Trap
Try to prove something false:
-- Try to prove something false. Every tactic will fail.
theorem liar : 0 = 1 := by
sorry -- Try: rfl, simp, omega, decide. Nothing works.
-- The goal state shows: ⊢ 0 = 1
-- This goal is unprovable because it is false.
Every tactic fails. rfl cannot make 0 equal 1. simp finds nothing to simplify. omega knows arithmetic and refuses. decide computes the answer and it is false. The goal state sits there, immovable: ⊢ 0 = 1. You can stare at it, curse at it, try increasingly desperate combinations. Nothing works because nothing can work. The machine will not let you prove a falsehood.
This is the point. The compiler is not your collaborator; it is your adversary. It checks every step and rejects handwaving. When someone tells you their code is correct, you can ask: does it typecheck? When someone tells you their proof is valid, you can ask: did the machine accept it? The answers are not always the same, but when they are, you know something real.
Axioms and Escape Hatches
The axiom declaration asserts something without proof. It is the escape hatch from the proof system: you declare that something is true and Lean believes you. This is extremely dangerous. If you assert something false, you can prove anything at all, including False itself. The system becomes unsound.
-- axiom asserts something without proof
-- WARNING: Incorrect axioms make the entire system inconsistent!
axiom magicNumber : Nat
axiom magicNumber_positive : magicNumber > 0
-- Use axioms only for:
-- 1. Foundational assumptions (excluded middle, choice)
-- 2. FFI bindings where proofs are impossible
-- 3. Temporary placeholders during development (prefer sorry)
Warning
Axioms should be used only in narrow circumstances: foundational assumptions like the law of excluded middle or the axiom of choice (which Mathlib already provides), FFI bindings where proofs are impossible because the implementation is external, or as temporary placeholders during development (though
sorryis preferred since it generates a warning). Before adding a custom axiom, ask whether you actually need it. Usually the answer is no.
Lean’s kernel accepts axioms unconditionally. The #print axioms command shows which axioms a theorem depends on, which is useful for verifying that your proofs rely only on the standard foundational axioms you expect.
The opaque declaration hides a definition’s implementation from the type checker. Unlike axiom, an opaque definition must be provided, but Lean treats it as a black box during type checking. This is useful when you want to abstract implementation details while still having a concrete definition.
-- opaque hides the implementation (never unfolds)
opaque secretKey : Nat
-- The type checker cannot see any value for secretKey
-- This is useful for abstraction barriers
De Morgan’s Little Theorem
Augustus De Morgan formalized the laws that bear his name in the 1850s: the negation of a conjunction is the disjunction of negations, and vice versa. Every programmer knows these laws intuitively from boolean expressions. Let us prove one.
-- De Morgan's Law: ¬(P ∧ Q) → (¬P ∨ ¬Q)
theorem demorgan (P Q : Prop) (h : ¬(P ∧ Q)) : ¬P ∨ ¬Q := by
by_cases hp : P
· -- Case: P is true
right
intro hq
apply h
constructor
· exact hp
· exact hq
· -- Case: P is false
left
exact hp
The proof proceeds by case analysis. We have h : ¬(P ∧ Q), a proof that P ∧ Q is false. We must show ¬P ∨ ¬Q. The by_cases tactic splits on whether P holds:
- If
Pis true (call thishp), we go right and prove¬Q. Why? IfQwere true, thenP ∧ Qwould be true, contradictingh. So¬Q. - If
Pis false (call thishnp), we go left and prove¬Pdirectly. We have it:hnp.
Each branch uses tactics from this article: intro, apply, exact, left, right, constructor. The contradiction tactic spots when hypotheses conflict. Read the proof slowly, watch the goal state at each step, and trace how the logical structure maps to the tactic sequence. This is the texture of real mathematics: case splits, contradictions, and the steady narrowing of possibilities until only truth remains.
De Morgan died in 1871. His laws persist in every boolean expression, every logic gate, every conditional branch. If you want to test your understanding, try proving the other direction: from ¬P ∨ ¬Q to ¬(P ∧ Q). It is easier, which tells you something about the asymmetry of classical logic.
The Theory Beneath
You can now prove things. The proofs have been simple, but the mental model is in place. You understand goals, hypotheses, and the tactic dance that connects them. Next we introduce type theory and dependent types, the language for stating claims worth proving.
Type Theory
Humans classify. We sort animals into species, books into genres, people into roles, and programmers into those who have mass-assigned any to silence the compiler and those who are lying about it. Classification is how finite minds manage infinite variety. Types are classification for computation: every value belongs to a type, and the type determines what operations make sense. You can add numbers but not strings. You can take the length of a list but not the length of an integer. The type system enforces these distinctions before your program runs, which sounds obvious until you remember that most of the world’s software runs on languages where the type system’s considered opinion is “looks plausible” right up until production catches fire.
This seems pedestrian until you push it. What if types could say not just “this is a list” but “this is a list of exactly five elements”? What if they could say not just “this function returns an integer” but “this function returns a positive integer”? What if the type of a function could express its complete specification, so that any implementation with that type is correct by construction?
Dependent type theory answers yes to all of these. It is the most expressive type system in common use, and it blurs the line between programming and mathematics. A type becomes a proposition. A program becomes a proof. The compiler becomes a theorem checker. This is not metaphor; it is the Curry-Howard correspondence that we met in the previous article, now unleashed to its full power.
(The correspondence runs deeper than logic and computation. Category theory provides a third vertex: types correspond to objects, functions to morphisms, and the equations governing programs to commutative diagrams. This three-way relationship, sometimes called computational trinitarianism or the Curry-Howard-Lambek correspondence, means that insights from any vertex illuminate the others. A categorical construction suggests a type former; a type-theoretic proof technique suggests a logical inference rule; a logical connective suggests a categorical limit. The triangle constitutes a precise mathematical isomorphism, providing a conceptual map for navigating modern type theory.)
| Logic | Type Theory | Category Theory |
|---|---|---|
| Proposition | Type | Object |
| Proof | Term / Program | Morphism |
| Implication $P \to Q$ | Function type A → B | Exponential object $B^A$ |
| Conjunction $P \land Q$ | Product type A × B | Product $A \times B$ |
| Disjunction $P \lor Q$ | Sum type A ⊕ B | Coproduct $A + B$ |
| True $\top$ | Unit type Unit | Terminal object $1$ |
| False $\bot$ | Empty type Empty | Initial object $0$ |
| Universal $\forall x. P(x)$ | Dependent product (x : A) → B x | Right adjoint to pullback |
| Existential $\exists x. P(x)$ | Dependent sum (x : A) × B x | Left adjoint to pullback |
The Ladder of Expressiveness
Type systems form a ladder. Each rung lets you say more.
Simple types (C, Java): Values have types. int, string, bool. You cannot add a string to an integer. This catches typos and category errors, but nothing deeper.
Polymorphic types (Haskell, OCaml): Types can be parameterized. List α works for any element type. You write one reverse function that works on lists of integers, strings, or custom objects. The type ∀ α. List α → List α says “for any type α, give me a list of α and I’ll return a list of α.”
Dependent types (Lean, Coq, Agda): Types can depend on values. Vector α n is a list of exactly n elements. The number n is a value that appears in the type. Now the type system can express array bounds, matrix dimensions, protocol states, and any property you can state precisely.
The jump from polymorphic to dependent types is where things get interesting. Consider matrix multiplication. Two matrices can only be multiplied if the columns of the first equal the rows of the second. With dependent types:
Matrix m n → Matrix n p → Matrix m p
The shared n enforces compatibility at compile time. Multiply a \(3 \times 4\) by a \(5 \times 2\)? Type error. The bug is caught before any code runs. Your linear algebra homework now has compile errors, which is somehow both better and worse.
The lambda cube formalizes these distinctions. Starting from the simply typed lambda calculus at the origin, each axis adds a new kind of abstraction:
- x-axis (Polymorphism): Terms can abstract over types. Moving along this axis gives you generic functions like
id : ∀ α, α → αthat work uniformly across all types. - y-axis (Type Operators): Types can abstract over types. This axis adds type-level functions like
ListorMap kthat take types as arguments and produce new types. - z-axis (Dependent Types): Types can depend on terms. Moving along this axis allows types like
Vector α nwhere the type itself contains a value.
The eight vertices represent all possible combinations of these three features:
- λ→ (STLC): The simply typed lambda calculus, with only term-to-term functions. (C, Pascal)
- λ2 (System F): Adds polymorphism, where terms can abstract over types. (Haskell core, ML)
- λω: Adds type operators, where types can abstract over types. (Rarely implemented alone)
- λP (LF): Adds dependent types, where types can depend on terms. (Twelf, LF)
- λω̲ (System Fω): Combines polymorphism and type operators. (Haskell with type families, OCaml modules)
- λP2: Combines polymorphism and dependent types. (Rare in practice)
- λPω̲: Combines dependent types and type operators. (Rare in practice)
- λC (CoC): The Calculus of Constructions, combining all three axes. (Lean, Coq, Agda)
Lean sits at λC, the corner where all three meet.
Types as First-Class Citizens
In simple type systems, types and values live in separate worlds. You cannot write a function that takes a type as an argument or returns a type as a result. The wall between them is absolute.
Dependent types tear down this wall. Types become values. You can compute with them, pass them to functions, store them in data structures. The function that constructs Vector Int n takes a number n and returns a type. This uniformity is what makes the whole system work: if types can depend on values, then types must be values.
The theoretical foundations trace through the 20th century: Church’s simply typed lambda calculus, Martin-Löf’s intuitionistic type theory that unified logic and computation, and various attempts to resolve paradoxes that plagued early formal systems. Lean implements a refinement called the Calculus of Inductive Constructions, which adds inductive types and a hierarchy of universes to keep everything consistent. Understanding why that hierarchy exists requires a detour into the history of mathematics.
The practical experience differs from conventional programming. Types become more informative but also more demanding. You must often provide proofs alongside your code, demonstrating that values satisfy required properties. The compiler becomes an adversary that checks your reasoning at every step, as we saw with tactics. When a program type-checks, you gain strong guarantees about its behavior. When it fails, the error messages guide you toward the gap between what you claimed and what you proved.
The Foundational Crisis
This section covers historical background and can be skipped without losing the thread.
By the late 19th century, mathematics faced a crisis of foundations. Mathematicians had built analysis on set theory, set theory on logic, and logic on intuition. The foundations kept shifting. Georg Cantor’s work on infinite sets produced results that seemed paradoxical. The question became urgent: could mathematics be placed on a foundation that was provably secure?
Russell’s Paradox
In 1901, Bertrand Russell sent a letter to Gottlob Frege, who had just completed his life’s work: a logical foundation for all of mathematics. Russell’s letter contained a single question. Consider the set $R$ of all sets that do not contain themselves. Does $R$ contain itself? If yes, then by definition it should not. If no, then by definition it should. Frege’s system was inconsistent. His life’s work collapsed. He wrote back: “Hardly anything more unfortunate can befall a scientific writer than to have one of the foundations of his edifice shaken after the work is finished.”
This is the danger of self-reference. A set that asks about its own membership. A sentence that asserts its own falsehood. A type that contains itself. These constructions look innocent but harbor contradictions. Mathematics needed walls to prevent them.
Hilbert’s Program
David Hilbert proposed an ambitious response. His program, articulated in the 1920s, aimed to formalize all of mathematics in a finite, complete, and consistent axiomatic system. Complete meant every true statement could be proved. Consistent meant no contradiction could be derived. The dream was a mechanical procedure that could, in principle, determine the truth of any mathematical claim. Mathematics would become a closed system, immune to further crisis.
Principia Mathematica, published by Russell and Whitehead between 1910 and 1913, was the most sustained attempt at this vision. Three volumes, nearly 2000 pages, laboriously deriving mathematics from logical axioms. The proof that $1 + 1 = 2$ appears on page 379 of the second volume. The work demonstrated that formalization was possible but also hinted at its costs. The notation was impenetrable, the proofs were tedious, and the system still required axioms whose consistency could not be established from within.
Gödel’s Incompleteness Theorems
Two decades after Principia, Kurt Gödel showed that the consistency problem was not a limitation of Russell’s system but an inescapable feature of mathematics itself. His incompleteness theorems of 1931 proved that any consistent formal system powerful enough to express arithmetic contains true statements that cannot be proved within the system. The first theorem says completeness is impossible: there will always be truths beyond the reach of your axioms. The second theorem is worse: such a system cannot prove its own consistency. The tools Hilbert wanted to use to secure mathematics are necessarily inadequate for the task. You cannot lift yourself by your own bootstraps.
What This Means for Lean
This might seem to doom the entire enterprise of formal verification. If mathematics cannot be complete, if consistency cannot be proved, what is the point of proof assistants?
The answer is that Lean is not attempting Hilbert’s program. Nobody believes Mathlib will eventually contain all mathematical truth or that its foundations can be proved consistent using only its own axioms. The goals are more modest and more practical. What Lean actually provides is mechanical verification of derivations, not philosophical certainty about foundations.
Lean’s kernel accepts a small set of axioms: the rules of the Calculus of Inductive Constructions, plus optional classical principles like the law of excluded middle and the axiom of choice. These axioms are not provably consistent from within the system. They are simply accepted, much as working mathematicians accept ZFC set theory without demanding a consistency proof that Gödel showed cannot exist. Given these axioms, is this proof valid? That question has a definite answer, and Lean provides it.
Yes, there exist true statements about natural numbers that Lean cannot prove. Yes, Lean cannot prove its own consistency. But these limitations do not prevent you from formalizing the theorems mathematicians actually care about. The prime number theorem, the fundamental theorem of calculus, the classification of finite simple groups: none of these bump against incompleteness. The unprovable statements Gödel constructs are specifically engineered to be unprovable. They are curiosities, not obstacles to mathematical practice.
You have not solved Hilbert’s problem. You have sidestepped it. The foundations rest on trust in a small kernel and a handful of axioms that the mathematical community has examined for decades without finding contradiction. This is not absolute certainty, but it is far more than hand-waving. Principia Mathematica failed because it tried to be a closed system answering every question from first principles. Mathlib succeeds because it tries to be a library: a growing collection of verified results that mathematicians can use, extend, and build upon. The goal is not to end mathematics but to record it in a form that machines can check. That turns out to be achievable, useful, and entirely compatible with Gödel’s theorems.
With the philosophical groundwork laid, we can examine how type theory actually prevents the paradoxes that plagued earlier systems.
Universe Stratification
This section covers advanced material on type universes. Feel free to skim or skip on first reading and return later when universe-related errors arise in practice.
Type theory builds walls against self-reference through stratification. Types are organized into a hierarchy of universes. In Lean, Prop sits at Sort 0, Type 0 sits at Sort 1, Type 1 sits at Sort 2, and so on. A type at level n can only mention types at levels below n. The type Type 0 itself has type Type 1, not Type 0. This breaks the self-reference that doomed Frege’s system. You cannot ask whether Type contains itself because Type is not a single thing; it is an infinite ladder, and each rung can only see the rungs below.
universe u v w
-- Universe polymorphism (explicit universe level)
def polyIdentity (α : Sort u) (a : α) : α := a
-- Universe level expressions
def maxLevel (α : Type u) (β : Type v) : Type (max u v) := α × β
-- Type 0 contains types, Type 1 contains Type 0, etc.
example : Type 0 = Sort 1 := rfl
example : Prop = Sort 0 := rfl
-- Impredicative Prop: functions into Prop stay in Prop
def propPredicate (P : Type u → Prop) : Prop := ∀ α, P α
-- Predicative Type: function types take maximum universe
def typePredicate (P : Type u → Type v) : Type (max (u+1) v) :=
∀ α, P α
When you write universe u v w in Lean, you are declaring universe level variables. The declaration lets you define functions that work at any universe level. When you write def polyIdentity (α : Sort u) (a : α) : α := a, you are defining a function that works across the entire hierarchy. The Sort u includes both Prop (when u = 0) and Type n (when u = n + 1). This universe polymorphism lets you write single definitions that work everywhere.
The .{u} syntax declares a universe parameter local to a single definition. For file-wide universe variables, use universe u v at the top level:
-- Types themselves have types, forming a hierarchy
#check (Nat : Type 0) -- Nat lives in Type 0
#check (Type 0 : Type 1) -- Type 0 lives in Type 1
#check (Type 1 : Type 2) -- and so on...
-- Universe variables let definitions work at any level
-- The .{u} syntax declares a universe parameter for this definition
def myId.{u} (α : Type u) (x : α) : α := x
-- myId works at any universe level
#check myId Nat 42 -- α = Nat (in Type 0)
#check myId (Type 0) Nat -- α = Type 0 (in Type 1)
Predicativity
Here is a rule that sounds obvious until you think about it: you cannot be in the photograph you are taking. The photographer stands outside the frame. A committee that selects its own members creates paradoxes of legitimacy. A definition that refers to a collection containing itself is suspect. This intuition, that the definer must stand apart from the defined, is called predicativity.
Imagine a monastery where knowledge is organized into concentric walls. Acolytes in the outer ring may study only texts from their own ring. Scholars who wish to reference the entire outer collection must do so from the second ring, looking inward. Those who would survey the second ring must stand in the third. And so on, each level permitted to see only what lies below. No scholar may cite a collection that includes their own work. The hierarchy prevents the serpent from eating its tail.
This is how predicative universes work. When you quantify over all types at level n, the resulting type lives at level n+1. The definition “for all types α in Type 0, the type α → α” must itself live in Type 1 because it speaks about the entirety of Type 0. You cannot make universal claims about a collection while remaining inside it. The quantification must ascend.
-- Predicative: quantifying over Type u produces Type (u+1)
-- The result must be "larger" than what we quantify over
def predicativeExample : Type 1 := ∀ (α : Type 0), α → α
-- Check the universe levels explicitly
#check (∀ (α : Type 0), α → α : Type 1) -- Lives in Type 1
#check (∀ (α : Type 1), α → α : Type 2) -- Lives in Type 2
-- Impredicative: quantifying over Prop still produces Prop
-- Prop can "talk about itself" without ascending universe levels
def impredicativeExample : Prop := ∀ (P : Prop), P → P
#check (∀ (P : Prop), P → P : Prop) -- Still in Prop, not Type 0!
-- Why this matters: classical logic requires impredicativity
-- "For all propositions P, P or not P" is itself a proposition
def excludedMiddleType : Prop := ∀ (P : Prop), P ∨ ¬P
-- If Prop were predicative, this would be Type 1, breaking classical reasoning
-- Impredicativity lets us define propositions that quantify over all propositions
-- The danger: unrestricted impredicativity leads to paradox
-- Girard's paradox shows Type : Type is inconsistent
-- Lean avoids this by making only Prop impredicative
Lean’s Type hierarchy is predicative: ∀ (α : Type 0), α → α has type Type 1, not Type 0. This prevents Girard’s paradox, a type-theoretic version of Russell’s paradox that arises when Type : Type. The infinite regress of universes is the price of consistency.
Non-Cumulativity
In a cumulative type theory, every type at universe level n is automatically also a type at level n+1 and all higher levels. Coq and Idris work this way: if you have Nat : Type 0, you can use Nat anywhere a Type 1 is expected. The type “flows upward” through the hierarchy without explicit intervention. This makes polymorphic code more convenient since you rarely need to think about universe levels.
Lean takes the opposite approach. Each type lives at exactly one universe level. Nat has type Type 0 and only Type 0. If a function expects a Type 1 argument, you cannot pass Nat directly. You must explicitly lift it using ULift or PLift, wrapper types that move values to higher universes.
-- In Lean, Nat has type Type 0, not Type 1
#check (Nat : Type 0) -- works
-- #check (Nat : Type 1) -- would fail: Nat is not in Type 1
-- A function expecting Type 1 cannot accept Nat directly
def wantsType1 (α : Type 1) : Type 1 := α
-- This fails: Nat lives in Type 0, not Type 1
-- def broken := wantsType1 Nat
-- Solution: explicitly lift Nat to Type 1
def works := wantsType1 (ULift Nat)
-- In a cumulative system (like Coq), this would work:
-- def coqStyle := wantsType1 Nat -- Coq allows this
-- Practical example: polymorphic container at higher universe
def Container (α : Type 1) := List α
-- Cannot directly use Nat
-- def natContainer := Container Nat -- fails
-- Must lift first
def natContainer := Container (ULift Nat)
-- Extracting values requires unlifting
def sumLifted (xs : List (ULift Nat)) : Nat :=
xs.foldl (fun acc x => acc + x.down) 0
Note
This explicit lifting makes universe structure visible in your code. You always know exactly which universe level you are working at. The tradeoff is verbosity: code that would “just work” in Coq requires explicit lifts in Lean. In practice, most Lean code stays within
Type 0andProp, so non-cumulativity rarely causes friction.
The World of Prop
Lean’s universe hierarchy has a special member at the bottom: Prop, the universe of propositions. Unlike Type, which holds computational data, Prop holds logical statements. This distinction enables two features that would be dangerous elsewhere: impredicativity and proof irrelevance. Together, they make Prop a safe space for classical reasoning.
Impredicativity of Prop
Prop breaks the predicativity rule. While ∀ (α : Type 0), α → α must live in Type 1, the analogous ∀ (P : Prop), P → P has type Prop, staying at the same level despite quantifying over all propositions. The monastery has a secret inner sanctum where the old restrictions do not apply.
-- PLift: lifts any type by exactly one level
def liftedFalse : Type := PLift False
-- ULift: lifts types by any amount
def liftedNat : Type u := ULift.{u} Nat
-- Lifting and unlifting values
def liftExample : ULift.{1} Nat := ⟨42⟩
example : liftExample.down = 42 := rfl
-- Non-cumulativity: types exist at exactly one level
def needsLifting (α : Type 1) : Type 2 := ULift.{2} α
This matters for classical logic. The law of excluded middle, ∀ (P : Prop), P ∨ ¬P, quantifies over all propositions. If Prop were predicative, this would live in Type 0, making it a computational object rather than a logical axiom. But how is impredicativity safe here when it causes paradoxes elsewhere?
Proof Irrelevance
The answer is proof irrelevance. A bear catching a salmon does not care whether the fish swam upstream via the left channel or the right. Proof irrelevance applies this principle to mathematics: any two proofs of the same proposition are equal. If you have two proofs p1 and p2 that both establish proposition P, then p1 = p2 holds definitionally. We care that the theorem is true, not which path led there.
-- Basic logical connectives
theorem and_intro (P Q : Prop) (hp : P) (hq : Q) : P ∧ Q := ⟨hp, hq⟩
theorem or_elim (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R :=
h.elim hp hq
theorem iff_intro (P Q : Prop) (hpq : P → Q) (hqp : Q → P) : P ↔ Q :=
⟨hpq, hqp⟩
-- Proof irrelevance demonstration
theorem proof_irrel_demo (P : Prop) (p1 p2 : P) : p1 = p2 := rfl
-- Classical logic (via choice)
open Classical in
theorem excluded_middle (P : Prop) : P ∨ ¬P := Classical.em P
The technical foundation is that Prop is a subsingleton universe. A subsingleton is a type with at most one element. For any proposition P, there is at most one proof of P up to definitional equality. This contrasts with Type, where Bool has two distinct elements true and false, and Nat has infinitely many.
Proof irrelevance is what makes impredicativity safe. You cannot extract computational content from an impredicative definition over propositions because there is nothing to extract; all witnesses are indistinguishable. The dangerous circularity is defanged. The serpent may eat its tail here because the tail has no substance.
Computational Erasure
Proof irrelevance has profound computational implications. Because all proofs of a proposition are equal, the compiler can erase proofs at runtime. A function that takes a proof argument does not actually need to receive any runtime data for that argument. This erasure is essential for performance: without it, complex proofs would bloat compiled code with useless proof terms. Your elaborate justification for why the code is correct compiles down to nothing, much like comments but with mathematical guarantees.
Proof irrelevance also enables powerful automation. When a tactic constructs a proof term, the exact structure of that term does not matter. The tactic can use whatever construction is convenient, and the result will be equal to any other proof of the same statement. This freedom simplifies tactic implementation and allows for aggressive optimization of proof search.
Constructive and Classical Logic
Lean’s type theory is constructive at its core. A constructive proof of existence must provide a witness: to prove ∃ n, P n, you must exhibit a specific n and show P n holds. You cannot merely argue that non-existence leads to contradiction. This discipline has a profound consequence: constructive proofs are programs. A proof of ∃ n, n * n = 4 contains the number 2. You can extract it and compute with it. The categorical semantics of this intuitionistic logic is the theory of toposes, where every topos provides a model in which constructive reasoning holds.
-- Constructive proof: we provide an explicit witness
theorem constructive_exists : ∃ n : Nat, n * n = 4 :=
⟨2, rfl⟩ -- The witness is 2, and we can compute 2 * 2 = 4
-- Constructive: we can extract and run the witness
def constructive_even : { n : Nat // n % 2 = 0 } :=
⟨4, rfl⟩ -- The subtype bundles value with proof
#eval constructive_even.val -- Outputs: 4
-- The law of excluded middle itself (classical axiom)
theorem lem (P : Prop) : P ∨ ¬P := Classical.em P
-- Double negation elimination (classical, not constructive)
-- In constructive logic, ¬¬P does not imply P
theorem dne (P : Prop) : ¬¬P → P := Classical.byContradiction
-- Classical proof by contradiction: no even number is odd
theorem even_not_odd (n : Nat) : n % 2 = 0 → ¬(n % 2 = 1) := by
intro heven hodd
omega
Classical logic adds axioms that break this computational interpretation. The law of excluded middle (P ∨ ¬P for any proposition) lets you prove existence by contradiction without producing a witness. Double negation elimination (¬¬P → P) lets you escape a double negation without constructing a direct proof. These principles are mathematically sound but computationally empty. When you prove something exists using excluded middle, the proof does not contain the thing that exists.
Lean provides classical axioms through the Classical namespace. When you use Classical.em or tactics like by_contra, you are stepping outside constructive logic. Lean tracks this: definitions that depend on classical axioms are marked noncomputable, meaning they cannot be evaluated at runtime.
-- Classical choice: given existence, extract a witness
-- This is noncomputable because we cannot run it
noncomputable def classical_witness (P : Nat → Prop) (h : ∃ n, P n) : Nat :=
Classical.choose h
-- The witness satisfies the property (but we cannot compute what it is)
theorem classical_witness_spec (P : Nat → Prop) (h : ∃ n, P n) :
P (classical_witness P h) :=
Classical.choose_spec h
-- Contrast: decidable existence gives computable witnesses
def decidable_witness (p : Nat → Bool) (bound : Nat) : Nat :=
-- We can search by enumeration because the domain is finite
(List.range bound).find? (fun n => p n) |>.getD 0
-- The key insight: constructive proofs compute, classical proofs assert
Why does this matter? For pure mathematics, classical reasoning is often more convenient. Many standard proofs use contradiction freely. But for verified programming, constructive proofs have an advantage: they produce code. A constructive proof that a sorting algorithm returns a sorted list can be extracted into an actual sorting function. A classical proof merely asserts the sorted list exists.
The practical guidance: use constructive methods when you can, classical when you must. Lean supports both. When you see noncomputable on a definition, you know it relies on classical axioms and cannot be executed. When a definition lacks that marker, it is constructive and can run. The type system tracks the distinction so you always know which world you are in.
Type Equivalences
When are two types “the same”? Having functions in both directions is not enough. You can map Bool to Nat and back, but Nat has infinitely many values while Bool has two. The round-trip loses information.
An equivalence \(A \simeq B\) requires functions in both directions that are mutual inverses: composing them in either order gives back the original value. This captures the idea that \(A\) and \(B\) have the same structure, just with different names for their elements.
-- Two types are "the same" when they are equivalent
-- An equivalence requires functions in both directions that are mutual inverses
variable {α β γ : Type*}
-- Having functions both ways is NOT enough
def boolToNat : Bool → Nat
| true => 1
| false => 0
def natToBool : Nat → Bool
| 0 => false
| _ => true
-- These are NOT inverses: natToBool (boolToNat true) = true, but
-- boolToNat (natToBool 42) = 1, and natToBool 1 = true ≠ 42
-- A proper equivalence between Unit ⊕ Unit and Bool
-- Both have exactly two elements
def sumUnitEquivBool : Unit ⊕ Unit ≃ Bool where
toFun | .inl () => false | .inr () => true
invFun | false => .inl () | true => .inr ()
left_inv := by intro x; cases x <;> rfl
right_inv := by intro b; cases b <;> rfl
-- Equivalences compose
example : (α ≃ β) → (β ≃ γ) → (α ≃ γ) := Equiv.trans
-- Equivalences are symmetric
example : (α ≃ β) → (β ≃ α) := Equiv.symm
-- Every type is equivalent to itself
example : α ≃ α := Equiv.refl α
-- Product types commute (up to equivalence)
def prodComm : α × β ≃ β × α where
toFun p := (p.2, p.1)
invFun p := (p.2, p.1)
left_inv := by intro ⟨a, b⟩; rfl
right_inv := by intro ⟨b, a⟩; rfl
-- Currying is an equivalence
def curryEquiv : (α × β → γ) ≃ (α → β → γ) where
toFun f a b := f (a, b)
invFun g p := g p.1 p.2
left_inv := by intro f; ext ⟨a, b⟩; rfl
right_inv := by intro g; rfl
Equivalences form a well-behaved notion of sameness. They are reflexive (every type is equivalent to itself), symmetric (if \(A \simeq B\) then \(B \simeq A\)), and transitive (equivalences compose). This makes them an equivalence relation on types, which is exactly what you want for a notion of “sameness.”
The distinction matters in mathematics and programming alike. When you prove that two types are equivalent, you can transport theorems and constructions between them. If you prove a property about Bool, the equivalence \(\text{Unit} \oplus \text{Unit} \simeq \text{Bool}\) lets you conclude the same property holds for the sum type. Mathlib uses equivalences extensively to connect different representations of the same mathematical structure.
Quotients and Parametricity
Quotient types allow you to define new types by identifying elements of an existing type according to an equivalence relation. The integers modulo n, for example, identify natural numbers that have the same remainder when divided by n. Quotients are essential for constructing mathematical objects like rational numbers, real numbers, and algebraic structures.
-- Simple modulo equivalence relation
def ModRel (n : Nat) : Nat → Nat → Prop :=
fun a b => a % n = b % n
-- Prove it's an equivalence relation
theorem ModRel.refl (n : Nat) : ∀ x, ModRel n x x :=
fun _ => rfl
theorem ModRel_symm (n : Nat) : ∀ x y, ModRel n x y → ModRel n y x :=
fun _ _ h => h.symm
theorem ModRel.trans (n : Nat) : ∀ x y z, ModRel n x y → ModRel n y z → ModRel n x z :=
fun _ _ _ hxy hyz => Eq.trans hxy hyz
-- Create setoid instance
instance ModSetoid (n : Nat) : Setoid Nat where
r := ModRel n
iseqv := {
refl := ModRel.refl n
symm := @ModRel_symm n
trans := @ModRel.trans n
}
-- Define the quotient type (integers modulo n)
def ZMod (n : Nat) : Type := Quotient (ModSetoid n)
However, quotients break parametricity. Parametricity is the principle that polymorphic functions must treat their type arguments uniformly. A function of type ∀ α, α → α can only be the identity function because it has no way to inspect what α is. It must work the same way for Nat, String, and any other type. This uniformity yields powerful “free theorems” about polymorphic functions.
Quotients violate this uniformity through the Quot.lift operation. When you lift a function to operate on a quotient type, you must prove that the function respects the equivalence relation. This proof obligation means that functions on quotients can behave differently depending on the specific equivalence relation, breaking the uniformity that parametricity requires.
namespace ZMod
-- Constructor respecting equivalence
def mk (n : Nat) (a : Nat) : ZMod n :=
Quotient.mk (ModSetoid n) a
-- Addition operation via lifting
def add {n : Nat} [NeZero n] : ZMod n → ZMod n → ZMod n :=
Quotient.lift₂
(fun a b => mk n ((a + b) % n))
(fun a₁ a₂ b₁ b₂ h₁ h₂ => by
apply Quotient.sound
simp only [ModSetoid] at h₁ h₂ ⊢
unfold ModRel at h₁ h₂ ⊢
rw [Nat.add_mod, h₁, h₂, ← Nat.add_mod]
rfl)
-- Quotient.sound: related elements are equal
theorem mk_eq_of_rel {n : Nat} (a b : Nat) (h : ModRel n a b) :
mk n a = mk n b :=
Quotient.sound h
-- Quotient induction principle
theorem ind_on {n : Nat} {P : ZMod n → Prop} (q : ZMod n)
(h : ∀ a, P (mk n a)) : P q :=
Quotient.ind h q
end ZMod
Why is this acceptable? The trade-off is deliberate. Quotients are necessary for mathematics: you cannot construct the integers, rationals, or reals without them. The loss of parametricity is confined to quotient types and does not affect ordinary polymorphic functions. Moreover, the requirement to prove that lifted functions respect equivalence ensures that quotient operations are well-defined. You cannot accidentally distinguish between equivalent elements.
Comparative Type Systems
Different languages make different design choices in their type systems. The following table summarizes key features across proof assistants and programming languages.
| Feature | Lean 4 | Coq | Agda | Idris 2 | Haskell | Rust |
|---|---|---|---|---|---|---|
| Dependent Types | Full | Full | Full | Full | Limited | No |
| Universe Hierarchy | Predicative | Predicative | Predicative | Predicative | None | None |
| Universe Cumulativity | No | Yes | No | Yes | N/A | N/A |
| Proof Irrelevance | Yes (Prop) | Yes (Prop) | Optional | Yes | N/A | N/A |
| Tactic Language | Lean DSL | Ltac | No | Elab | N/A | N/A |
| Type Inference | Partial | Partial | Partial | Partial | Full* | Full |
| Termination Checking | Required | Required | Required | Optional | No | No |
| Linear Types | No | No | No | QTT | Extension | Ownership |
| Effects System | Monad | Monad | Monad | Algebraic | Monad | Ownership |
| Code Generation | Native | OCaml/Haskell | Haskell | Native | Native | Native |
| Cubical Type Theory | No | No | Yes | No | No | No |
| Decidable Type Checking | No | No | No | No | Yes* | Yes |
Glossary:
- Ltac: Coq’s original tactic language, a dynamically-typed scripting language for proof automation
- QTT: Quantitative Type Theory, tracks how many times each variable is used to enable linear resource management
- Predicative: A universe is predicative if quantifying over types at level n produces a type at level n+1 or higher
- Cumulativity: Whether a type at level n is automatically also at level n+1
- *: Haskell 2010 has full type inference and decidable type checking, but enabling extensions (GADTs, TypeFamilies, RankNTypes, UndecidableInstances) may require type annotations or introduce undecidability
Lean and Coq provide full dependent types with rich proof automation, making them suitable for formal verification. Agda emphasizes explicit proof terms and supports cubical type theory for constructive equality, connecting to homotopy type theory and higher topos theory. Idris 2 uses quantitative type theory to track resource usage, bridging the gap between theorem proving and systems programming.
Haskell approaches dependent types through extensions like GADTs, DataKinds, and type families. Base Haskell maintains decidable type checking, but common extensions can introduce undecidability. The language proved enormously influential as a research vehicle, demonstrating what expressive type systems could achieve. However, approximating dependent types through dozens of extensions bolted onto a core calculus not designed for them creates inherent complexity. Type-level programming in Haskell requires navigating interactions between GADTs, type families, DataKinds, and numerous other extensions, each with its own quirks and limitations. For projects that need the full expressive power of dependent types, languages built on richer type theories from first principles offer a cleaner foundation.
Rust’s ownership system provides memory safety guarantees through affine types, with decidable checking and predictable compile times. However, Rust’s type system proves memory safety, not functional correctness. The borrow checker ensures you will not have use-after-free bugs, but it cannot verify that your sorting algorithm actually sorts or that your cryptographic protocol maintains confidentiality. Proving Rust code correct requires external tools like Prusti, Creusot, Kani, or Verus. These tools have achieved real successes verifying cryptographic primitives and isolated algorithms, but verifying the business logic of production systems remains an active research frontier. Extracting verified Rust from higher-level proven specifications, as CompCert does for C, is an open problem. A verified extraction path from Lean to Rust would bridge formal specifications with systems programming, but no such mechanism exists today. Even without extraction, useful techniques remain: differential testing between a proven specification and a Rust implementation, or reducing state spaces to tractable subproblems that admit complete verification. These approaches yield a spectrum of assurance, from probabilistic confidence through exhaustive testing to total correctness proofs over bounded domains. A fully verified pipeline to production Rust remains hard, but partial verification still delivers industrial value. The Rust Formal Methods Interest Group coordinates ongoing work.
A common critique of Lean is its lack of linear or affine types, which would enable compile-time guarantees about resource usage and in-place mutation. This is a deliberate architectural choice, not an oversight. Rust verifies resource management statically at compile time through ownership; Lean minimizes the performance cost of immutability dynamically at runtime through FBIP (functional but in-place) optimizations and reference counting. The trade-off: Lean’s simpler type system makes large-scale theorem proving more tractable, since proofs do not need to thread linearity constraints through every function. You can share data freely without borrow checker complexity, and FBIP recovers imperative performance for the common case where data has a single owner.
The table above looks like a feature comparison. It is actually a map of philosophical commitments. Each row is a question about the nature of computation; each column is an answer. The language you choose chooses what thoughts are easy to think.
The fundamental trade-off is expressiveness versus automation. Full dependent types let you express arbitrary properties but require manual proof effort. Decidable type systems like Rust and Haskell infer types automatically but cannot express many important invariants. Choose based on whether you need machine-checked proofs or just strong static guarantees.
In short: Lean and Coq make you prove everything is correct, Rust makes you prove memory is safe, Haskell makes you prove effects are tracked, and most other languages just trust you not to ship on Friday.
Compiler Options
The set_option command configures compiler behavior. Most options control elaboration and pretty-printing. You will rarely need these until you hit edge cases:
-- set_option configures compiler and elaborator behavior
-- Show implicit arguments that are normally hidden
set_option pp.explicit true in
#check @id Nat 5 -- shows: @id Nat 5 : Nat
-- maxRecDepth controls recursion during elaboration (type-checking),
-- not runtime. #reduce fully unfolds expressions at compile time:
-- #reduce (List.range 500).length -- ERROR without increased limit
set_option maxRecDepth 2000 in
#reduce (List.range 500).length -- 500
The @ prefix forces explicit argument mode: @id Nat 5 passes the type Nat explicitly instead of letting Lean infer it. Combined with set_option pp.explicit true, this shows all normally-hidden arguments. The maxRecDepth option increases how deeply Lean will recurse during elaboration.
Where Types Meet Values
Type theory provides the foundation. The next article explores dependent types in depth: how types can depend on values, how propositions become types, and how proofs become programs. This is where Lean’s power as a theorem prover emerges from its foundations as a programming language.
Dependent Types
Why bother with all this? The honest answer is that most working programmers will never need dependent types, the same way most drivers will never need to understand engine timing. The dishonest answer is that dependent types will make you more productive. The true answer is somewhere stranger: ordinary type systems cannot express the constraints that actually matter. You can say “this is a list” but not “this is a list of exactly five elements.” You can say “this function returns an integer” but not “this function returns a positive integer smaller than its input.” Every time you write a comment explaining what a function really does, every time you add a runtime check for an invariant the compiler cannot see, every time a bug slips through because the types were not precise enough, you are paying the cost of an insufficiently expressive type system. The comment is a wish. The runtime check is a prayer. Dependent types are a contract.
Dependent types are the solution. They let types talk about values. The type Vector α 5 denotes a list of exactly five elements. The type Fin n represents a natural number provably less than n. Array bounds checking happens at compile time. Protocol state machines live in the types. The invariants you used to hope were true become things the compiler verifies.
Per Martin-Löf spent the 1970s in Sweden developing this type theory where types could depend on values, not just on other types. The idea seems simple enough: if List can be parameterized by an element type, why not also by its length? But this small step has profound consequences. Suddenly types become a specification language. A function returning Vector α n does not merely return a list; it returns a list of exactly n elements, verified at compile time. Polymorphic type systems like those in Haskell (built on System FC, an extension of System Fω with type equality coercions) and OCaml (an extended ML core with row polymorphism and first-class modules) stop at the water’s edge here. Dependent types let you wade in, expressing invariants that would otherwise live only in documentation or runtime checks. Lean’s type system, based on the Calculus of Inductive Constructions, provides the full machinery: types that compute, proofs that are programs, and specifications precise enough to replace testing with theorem proving.
Warning
This section is dense and notation-heavy. If the mathematical symbols start to blur together, that is not a personal failing. It is the appropriate response to notation that took logicians fifty years to stabilize and still varies between textbooks. Skip ahead and return later. Or don’t. Some people read this material linearly, some spiral in from the edges, some open to random pages and see what sticks. There is no wrong way to learn type theory except to believe you should have understood it faster. The notation table below is a reference, not a prerequisite.
Notation
| Concept | Mathematical Notation | Lean Syntax | Description |
|---|---|---|---|
| Function type | $\alpha \to \beta$ | α → β | Non-dependent function from α to β |
| Dependent function | $\Pi (x : \alpha), \beta(x)$ | (x : α) → β x | Function where return type depends on input |
| For all | $\forall x : \alpha, P(x)$ | ∀ x : α, P x | Universal quantification |
| Exists | $\exists x : \alpha, P(x)$ | ∃ x : α, P x | Existential quantification |
| Lambda abstraction | $\lambda x. t$ | fun x => t or λ x => t | Anonymous function |
| Equivalence | $a \equiv b$ | N/A | Definitional equality |
| Conjunction | $P \land Q$ | P ∧ Q | Logical AND |
| Disjunction | $P \lor Q$ | P ∨ Q | Logical OR |
| Negation | $\neg P$ | ¬P | Logical NOT |
| Type universe | $\text{Type}_n$ | Type n | Universe of types at level n |
| Proposition universe | $\text{Prop}$ | Prop | Universe of propositions |
| Sort hierarchy | $\text{Sort}_n$ | Sort n | Unified universe hierarchy |
| Quotient type | $\alpha/{\sim}$ | Quotient s | Type obtained by quotienting α by relation ∼ |
| Natural numbers | $\mathbb{N}$ | Nat or ℕ | Natural numbers type |
| Integers | $\mathbb{Z}$ | Int or ℤ | Integer type |
| Sigma type | $\Sigma (x : \alpha), \beta(x)$ | Σ x : α, β x | Dependent pair type |
| Product type | $\alpha \times \beta$ | α × β | Cartesian product |
Custom Notation
The notation command lets you extend Lean’s syntax with domain-specific operators. Lean supports four kinds of operators: prefix (before the operand), infix (between two operands), postfix (after the operand), and mixfix (multiple tokens with operands interspersed). The general notation command can define any of these, while specialized commands like prefix, infix, and postfix provide convenient shortcuts:
-- Prefix: operator before operand
prefix:max "√" => fun (n : Nat) => n * n
#eval √5 -- 25
-- Infix: operator between operands
infix:50 " ⊕ " => fun (a b : Nat) => a + b + 1
#eval 3 ⊕ 4 -- 8
-- Postfix: operator after operand
postfix:max "!" => fun (n : Nat) => n * n
#eval 5! -- 25
-- Mixfix: multiple tokens with operands interspersed
notation "⟪" a ", " b "⟫" => (a, b)
notation "if'" c "then'" t "else'" e => if c then t else e
#eval ⟪1, 2⟫ -- (1, 2)
#eval if' true then' 42 else' 0 -- 42
Type System Overview
Lean’s type system supports definitional equality through several reduction rules. Two terms are definitionally equal when one reduces to the other, and the type checker treats them as interchangeable without proof.
Beta reduction ($\beta$) is function application. When you apply $(\lambda x. t)$ to an argument $s$, the result is $t$ with $s$ substituted for $x$. This is the computational heart of the lambda calculus.
Delta reduction ($\delta$) unfolds definitions. When you define def f x := x + 1, any occurrence of f 3 can be replaced with 3 + 1. The type checker sees through your naming conventions.
Iota reduction ($\iota$) handles pattern matching on inductive types. When a recursor meets a constructor, the match fires and computation proceeds. This is how Nat.rec applied to Nat.succ n knows to take the successor branch.
Zeta reduction ($\zeta$) substitutes let-bound variables. The expression $\text{let } x := s \text{ in } t$ reduces to $t[s/x]$. Local definitions are just convenient names.
example : (fun x => x + 1) 2 = 3 := rfl -- β-reduction
def myConst := 42
example : myConst = 42 := rfl -- δ-reduction
example : let x := 5; x + x = 10 := rfl -- ζ-reduction
Functions
Function types are a built-in feature of Lean. Functions map values from one type (the domain) to another type (the codomain). In Lean’s core language, all function types are dependent. Non-dependent function types are just a special case where the parameter does not appear in the codomain.
Non-Dependent vs Dependent Functions
Non-dependent functions have a fixed return type that does not vary based on the input value. These are the only kinds of functions available in languages like Haskell (not :: Bool -> Bool) and OCaml (let not : bool -> bool). In Lean, def not : Bool → Bool uses the arrow notation to indicate non-dependence. The return type is always Bool, regardless of which boolean you pass in.
Dependent functions have a return type that can depend on the actual value of the input. The type is written as $\Pi (x : \alpha), \beta(x)$ or (x : α) → β x in Lean syntax, where the parameter name x appears in the return type β x. This feature has no equivalent in Haskell or OCaml.
Key insight: Dependent functions can return values from completely different types based on their input! This is sometimes called a dependent product (or Pi type) because it corresponds to an indexed product of sets.
Tip
The name “dependent product” may seem backwards since we are building functions, not pairs. The terminology comes from set theory: a function $f : \Pi (x : A), B(x)$ assigns to each $x \in A$ an element of $B(x)$, which is precisely an element of the Cartesian product $\prod_{x \in A} B(x)$. The “product” is over all possible inputs.
Why Dependent Types Matter
Consider this function that cannot be typed in Haskell or OCaml:
-- Return type depends on runtime value - impossible in Haskell/OCaml
def dependentTwo (b : Bool) : if b then Unit × Unit else String :=
match b with
| true => ((), ()) -- Returns a pair when b is true
| false => "two" -- Returns a string when b is false
The return type literally changes based on the runtime value. Call dependentTwo true and you get a Unit × Unit. Call dependentTwo false and you get a String. This should feel slightly transgressive. A function that returns different types? In most languages, this is either impossible or requires erasing all type information and hoping for the best. Here, the type system tracks it precisely. The function is total, the types are known, the compiler is satisfied.
-- Dependent pattern matching: types change based on scrutinee
-- Return type depends on the boolean value
def boolToType (b : Bool) : Type :=
if b then Nat else String
def boolExample (b : Bool) : boolToType b :=
match b with
| true => (42 : Nat) -- returns Nat
| false => "hello" -- returns String
-- Pattern matching on Fin must handle all cases
def finToString : Fin 3 → String
| 0 => "zero"
| 1 => "one"
| 2 => "two"
This enables encoding invariants directly in types. For example, Vector α n encodes the length n in the type itself, making it impossible to write functions that violate length constraints. Your off-by-one errors become compile-time errors. The compiler catches at build time what you would otherwise discover in production, at 3am, with the on-call phone ringing.
Typing Rules for Functions
Before diving into the formal rules: the intuition is simple. When you call a function, the type of what you get back can mention the value you passed in. When you define a function, you assume you have an input and show how to produce an output. That is all the rules below are saying, just precisely enough for a compiler to check them.
Two rules govern how types flow through function definitions and applications. The first is application: when you apply a function to an argument, the return type can mention the argument itself. If $f : \Pi (x : \alpha), \beta(x)$ and you apply it to some $a : \alpha$, the result $f \, a$ has type $\beta(a)$. The type of the output depends on the value of the input. This is the essence of dependent typing. A function Vector.head : (n : Nat) → Vector α (n + 1) → α applied to 3 yields a function expecting a Vector α 4. The 3 propagates into the type.
The second rule is abstraction: to construct a function, you assume a variable of the input type and produce a term of the output type. If $t : \beta$ under the assumption $x : \alpha$, then $\lambda x : \alpha. \, t$ has type $\Pi (x : \alpha), \beta$. The abstraction binds the variable and packages the assumption into the function type. When $\beta$ does not mention $x$, this collapses to the familiar non-dependent arrow $\alpha \to \beta$.
Beyond formation and elimination, functions satisfy eta-reduction: wrapping a function in a lambda that immediately applies it produces the same function. Formally, $\lambda x. \, f \, x \equiv f$ when $x$ does not appear free in $f$. This goes beyond simplification; it expresses extensionality: a function is determined by what it does to its arguments, not by how it is written.
Examples: Dependent and Non-Dependent Functions
-- Non-dependent function: return type is fixed
-- Similar to Haskell: double :: Int -> Int
-- or OCaml: let double : int -> int
def double (n : Nat) : Nat := n * 2
-- Another non-dependent example (like Haskell's const)
def constantFive (_ : Nat) : Nat := 5
-- Dependent function: return type depends on the input value
-- This has NO equivalent in Haskell or OCaml!
def makeVec (n : Nat) : Fin (n + 1) := ⟨n, Nat.lt_succ_self n⟩
-- A more dramatic dependent function example
-- The return TYPE changes based on the input VALUE
def two (b : Bool) : if b then Unit × Unit else String :=
match b with
| true => ((), ()) -- Returns a pair of units
| false => "two" -- Returns a string
-- This function returns different TYPES based on input:
-- two true : Unit × Unit
-- two false : String
-- Dependent pairs: the second type depends on the first value
def dependentPair : (n : Nat) × Fin n :=
⟨5, 3⟩ -- Second component must be less than first
-- Compare with non-dependent versions:
-- Haskell: (Int, Int) -- no constraint between components
-- OCaml: int * int -- no constraint between components
-- Lean enforces the invariant in the type!
-- Function composition (non-dependent)
def comp {α β γ : Type} (f : β → γ) (g : α → β) : α → γ :=
fun x => f (g x)
-- This is like Haskell's (.) or OCaml's (@@)
-- But Lean can also do DEPENDENT composition!
-- Dependent function composition
def depComp {α : Type} {β : α → Type} {γ : (x : α) → β x → Type}
(f : (x : α) → (y : β x) → γ x y)
(g : (x : α) → β x) :
(x : α) → γ x (g x) :=
fun x => f x (g x)
-- Multiple parameters via currying (named after Haskell B. Curry)
def curriedAdd : Nat → Nat → Nat := fun x y => x + y
-- Function extensionality: equal outputs for equal inputs
theorem funext_example (f g : Nat → Nat) (h : ∀ x, f x = g x) : f = g :=
funext h
Currying
Currying is the fundamental technique of transforming functions with multiple parameters into sequences of single-parameter functions. Named after logician Haskell Curry (yes, the programming language is also named after him), this approach is automatic in Lean. All multi-parameter functions are internally represented as curried functions. This enables partial application, where supplying fewer arguments than a function expects creates a new function waiting for the remaining arguments.
Note
The technique was actually discovered by Moses Schonfinkel in 1924, six years before Curry’s work. Academic naming conventions are not always fair. Schonfinkel’s life ended in obscurity in a Moscow hospital; Curry became a household name among programmers who have never heard of either.
The power of currying lies in its composability. You can create specialized functions by partially applying general ones, building up complex behavior from simple building blocks. While languages like Haskell make currying explicit, Lean handles it transparently, allowing you to work with multi-parameter functions naturally while still benefiting from the flexibility of curried representations.
/-!
### Currying
Currying is the transformation of functions with multiple parameters into
a sequence of functions, each taking a single parameter. In Lean, all
multi-parameter functions are automatically curried.
-/
-- Multi-parameter function (automatically curried)
def add3 (x y z : Nat) : Nat := x + y + z
-- Equivalent to nested lambdas
def add3' : Nat → Nat → Nat → Nat :=
fun x => fun y => fun z => x + y + z
-- Partial application creates new functions
def add10 : Nat → Nat → Nat := add3 10
def add10And5 : Nat → Nat := add3 10 5
example : add10 3 7 = 20 := rfl
example : add10And5 2 = 17 := rfl
-- Function.curry: Convert uncurried to curried
def uncurriedAdd : Nat × Nat → Nat := fun p => p.1 + p.2
def curriedVer := Function.curry uncurriedAdd
example : curriedVer 3 4 = 7 := rfl
-- Function.uncurry: Convert curried to uncurried
def addPair := Function.uncurry Nat.add
example : addPair (3, 4) = 7 := rfl
-- Currying with dependent types
def depCurry {α : Type} {β : α → Type} {γ : (a : α) → β a → Type}
(f : (p : (a : α) × β a) → γ p.1 p.2) :
(a : α) → (b : β a) → γ a b :=
fun a b => f ⟨a, b⟩
Function Extensionality
Function extensionality is a fundamental principle stating that two functions are equal if and only if they produce equal outputs for all equal inputs. This principle, while intuitively obvious, is not derivable from the other axioms of dependent type theory and must be added as an axiom in Lean. Without extensionality, we could only prove functions equal if they were syntactically identical: the same symbols in the same order.
The funext tactic in Lean implements this principle, allowing us to prove function equality by considering their behavior pointwise. This is essential for mathematical reasoning, where we often want to show that two different definitions actually describe the same function. The principle extends to dependent functions as well, where the output type can vary with the input.
/-!
### Extensionality
Function extensionality states that two functions are equal if they
produce equal outputs for all inputs. This is not provable from the
other axioms and is added as an axiom in Lean.
-/
-- funext: Basic function extensionality
theorem my_funext {α β : Type} (f g : α → β) :
(∀ x, f x = g x) → f = g :=
funext
-- Example: Proving function equality
def double' (n : Nat) : Nat := 2 * n
def double'' (n : Nat) : Nat := n + n
theorem doubles_equal : double' = double'' := by
funext n
simp [double', double'']
omega
-- Dependent function extensionality
theorem dep_funext {α : Type} {β : α → Type}
(f g : (x : α) → β x) :
(∀ x, f x = g x) → f = g :=
funext
-- Eta reduction: λ x, f x = f
theorem eta_reduction (f : Nat → Nat) : (fun x => f x) = f :=
funext fun _ => rfl
-- Functions equal by behavior, not syntax
def addOne : Nat → Nat := fun x => x + 1
def succFunc : Nat → Nat := Nat.succ
theorem addOne_eq_succ : addOne = succFunc := by
funext x
simp [addOne, succFunc]
Totality and Termination
Important
All functions in Lean must be total, meaning they must be defined for every possible input of the correct type. This requirement ensures logical consistency: a function that could fail or loop forever would make Lean’s logic unsound. Partiality is the enemy. The function that hangs on edge cases, the recursion that never terminates, the match that forgot a constructor: these are not just bugs but logical contradictions waiting to invalidate your theorems.
To achieve totality while allowing recursion, Lean uses well-founded recursion based on decreasing measures.
For structural recursion on inductive types, Lean automatically proves termination by observing that recursive calls operate on structurally smaller arguments. For more complex recursion patterns, you can specify custom termination measures using termination_by and provide proofs that these measures decrease with decreasing_by. This approach allows expressing any computable function while maintaining logical soundness. If you have ever written while (true) and hoped for the best, this is the universe collecting on that debt.
/-!
### Totality and Termination
All functions in Lean must be total (defined for all inputs) and
terminating. Lean uses well-founded recursion to ensure termination.
-/
-- Total function: defined for all natural numbers
def safeDivide (n : Nat) (m : Nat) : Nat :=
if m = 0 then 0 else n / m -- Returns 0 for division by zero
-- Structural recursion (automatically proven terminating)
def fact : Nat → Nat
| 0 => 1
| n + 1 => (n + 1) * fact n
-- Well-founded recursion with explicit termination proof
def gcd (a b : Nat) : Nat :=
if h : b = 0 then
a
else
have : a % b < b := Nat.mod_lt _ (Nat.pos_of_ne_zero h)
gcd b (a % b)
termination_by b
-- Mutual recursion with termination
mutual
def isEvenMut : Nat → Bool
| 0 => true
| n + 1 => isOddMut n
def isOddMut : Nat → Bool
| 0 => false
| n + 1 => isEvenMut n
end
-- Using decreasing_by for custom termination proof
def ackermann : Nat → Nat → Nat
| 0, n => n + 1
| m + 1, 0 => ackermann m 1
| m + 1, n + 1 => ackermann m (ackermann (m + 1) n)
termination_by m n => (m, n)
For non-structural recursion, you must provide a termination measure that decreases on each recursive call. The classic examples are the GCD algorithm (where the second argument decreases) and the Ackermann function (where the lexicographic pair decreases):
-- Termination proofs for recursive functions
-- Lean requires proof that recursion terminates
-- For structural recursion, it's automatic
def factorial : Nat → Nat
| 0 => 1
| n + 1 => (n + 1) * factorial n
-- For non-structural recursion, provide a termination measure
def gcd (a b : Nat) : Nat :=
if h : b = 0 then a
else
have : a % b < b := Nat.mod_lt a (Nat.pos_of_ne_zero h)
gcd b (a % b)
termination_by b
-- The measure must decrease on each recursive call
def ackermann : Nat → Nat → Nat
| 0, n => n + 1
| m + 1, 0 => ackermann m 1
| m + 1, n + 1 => ackermann m (ackermann (m + 1) n)
termination_by m n => (m, n)
Function API Reference
Lean’s standard library provides a rich collection of function combinators in the Function namespace. These combinators, familiar from functional programming, enable point-free style and function composition. The composition operator ∘ allows building complex functions from simpler ones, while combinators like const, flip, and id provide basic building blocks for function manipulation.
Function composition in Lean satisfies the expected mathematical properties: it is associative, and the identity function acts as a neutral element. These properties are not just theorems but computational facts. They hold definitionally, meaning Lean can verify them by pure computation without requiring proof steps.
/-!
### Function API Reference
Lean provides standard function combinators in the Function namespace.
-/
-- Function.comp: Function composition
def composed := Function.comp (· + 10) (· * 2)
example : composed 5 = 20 := rfl -- (5 * 2) + 10 = 20
-- Using ∘ notation for composition
open Function in
def composed' := (· + 10) ∘ (· * 2)
example : composed' 5 = 20 := rfl
-- Function.const: Constant function
def alwaysFive := Function.const Nat 5
example : alwaysFive 100 = 5 := rfl
example : alwaysFive 999 = 5 := rfl
-- id: Identity function
example : id 42 = 42 := rfl
example : (id ∘ id) = (id : Nat → Nat) := rfl
-- flip: Swap arguments
def subtract (a b : Nat) : Int := a - b
def flippedSubtract := flip subtract
example : subtract 10 3 = 7 := rfl
example : flippedSubtract 3 10 = 7 := rfl
-- Function composition laws
open Function in
theorem comp_assoc {α β γ δ : Type} (f : γ → δ) (g : β → γ) (h : α → β) :
(f ∘ g) ∘ h = f ∘ (g ∘ h) := rfl
open Function in
theorem id_comp {α β : Type} (f : α → β) :
id ∘ f = f := rfl
open Function in
theorem comp_id {α β : Type} (f : α → β) :
f ∘ id = f := rfl
Function Properties
Mathematical properties of functions (injectivity, surjectivity, and bijectivity) play crucial roles in both mathematics and computer science. An injective function maps distinct inputs to distinct outputs, a surjective function reaches every possible output, and a bijective function is both injective and surjective, establishing a one-to-one correspondence between domain and codomain.
These properties connect to the concept of inverses. A function has a left inverse if and only if it’s injective, a right inverse if and only if it’s surjective, and a two-sided inverse if and only if it’s bijective. Lean provides definitions and theorems for reasoning about these properties, enabling formal verification of mathematical and algorithmic correctness.
/-!
### Function Properties
Important mathematical properties of functions.
-/
-- Injective: Different inputs give different outputs
def isInjective {α β : Type} (f : α → β) : Prop :=
∀ x y, f x = f y → x = y
theorem double_injective : isInjective (fun n : Nat => 2 * n) := by
intro x y h
simp only [] at h
-- If 2*x = 2*y, then x = y
have : x * 2 = y * 2 := by rw [mul_comm x 2, mul_comm y 2]; exact h
exact Nat.eq_of_mul_eq_mul_right (by norm_num : 0 < 2) this
-- Using Lean's built-in Function.Injective
example : Function.Injective (fun n : Nat => 2 * n) := by
intro x y h
simp only [] at h
have : x * 2 = y * 2 := by rw [mul_comm x 2, mul_comm y 2]; exact h
exact Nat.eq_of_mul_eq_mul_right (by norm_num : 0 < 2) this
-- Surjective: Every output is reached by some input
def isSurjective {α β : Type} (f : α → β) : Prop :=
∀ b, ∃ a, f a = b
-- Not surjective: doubling doesn't produce odd numbers
theorem double_not_surjective :
¬Function.Surjective (fun n : Nat => 2 * n) := by
intro h
obtain ⟨n, hn⟩ := h 1
simp at hn
-- 2*n is always even, but 1 is odd
cases n with
| zero => simp at hn
| succ m => simp [Nat.mul_succ] at hn
-- Bijective: Both injective and surjective
def isBijective {α β : Type} (f : α → β) : Prop :=
Function.Injective f ∧ Function.Surjective f
-- Left inverse: g ∘ f = id
theorem has_left_inverse {α β : Type} (f : α → β) (g : β → α) :
Function.LeftInverse g f ↔ ∀ a, g (f a) = a := by
rfl
-- Right inverse: f ∘ g = id
theorem has_right_inverse {α β : Type} (f : α → β) (g : β → α) :
Function.RightInverse g f ↔ ∀ b, f (g b) = b := by
rfl
-- Example: Successor and predecessor
def succInt : Int → Int := (· + 1)
def predInt : Int → Int := (· - 1)
theorem succ_pred_inverse :
Function.LeftInverse predInt succInt ∧
Function.RightInverse predInt succInt := by
constructor <;> intro x <;> simp [succInt, predInt]
-- Inverse functions
structure IsInverse {α β : Type} (f : α → β) (g : β → α) : Prop where
left : Function.LeftInverse g f
right : Function.RightInverse g f
theorem inverse_bijective {α β : Type} (f : α → β) (g : β → α)
(h : IsInverse f g) : isBijective f := by
constructor
· -- Injective
intro x y hxy
have : g (f x) = g (f y) := by rw [hxy]
rw [h.left x, h.left y] at this
exact this
· -- Surjective
intro b
use g b
exact h.right b
Implicit and Auto Parameters
While not part of core type theory, Lean’s function types include indications of whether parameters are implicit. Implicit and explicit function types are definitionally equal. Implicit parameters are inferred from context, strict implicit parameters must be inferrable at the application site, and auto parameters are filled by type class resolution.
-- Implicit parameters (inferred from usage)
def implicitId {α : Type} (x : α) : α := x
-- Strict implicit (must be inferrable at application site)
def strictImplicit ⦃α : Type⦄ (x : α) : α := x
-- Auto parameters (filled by type class resolution)
def autoParam {α : Type} [Inhabited α] : α := default
-- Optional parameters with default values
def withDefault (n : Nat := 10) : Nat := n * 2
example : implicitId 5 = 5 := rfl
example : withDefault = 20 := rfl
example : withDefault 3 = 6 := rfl
Propositions
Propositions (Prop) are types representing logical statements. They feature proof irrelevance: any two proofs of the same proposition are definitionally equal. This means the specific proof does not matter, only that one exists. We covered this in the Type Theory article.
The Curry-Howard Correspondence Revisited
The Curry-Howard correspondence we encountered in earlier articles now reveals its full depth. With dependent types, the correspondence extends beyond simple propositional logic. Universal quantification becomes dependent function types. Existential quantification becomes dependent pair types (sigma types). The slogan “propositions are types, proofs are programs” turns out to be a precise mathematical equivalence.
| Logic | Type Theory | Lean Syntax |
|---|---|---|
| $\forall x : \alpha, P(x)$ | Dependent function $\Pi (x : \alpha), P(x)$ | ∀ x : α, P x or (x : α) → P x |
| $\exists x : \alpha, P(x)$ | Dependent pair $\Sigma (x : \alpha), P(x)$ | ∃ x : α, P x or Σ x : α, P x |
| Induction principle | Recursor | Nat.rec, List.rec, etc. |
| Proof by cases | Pattern matching | match ... with |
The dependent versions unify what simpler type systems treat separately. A proof of “for all natural numbers n, P(n) holds” is literally a function that takes any n : Nat and returns a proof of P n. A proof of “there exists a natural number n such that P(n)” is literally a pair: the witness n together with a proof of P n. This unification is not philosophical hand-waving; it is the operational semantics of Lean.
-- Basic logical connectives
theorem and_intro (P Q : Prop) (hp : P) (hq : Q) : P ∧ Q := ⟨hp, hq⟩
theorem or_elim (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R :=
h.elim hp hq
theorem iff_intro (P Q : Prop) (hpq : P → Q) (hqp : Q → P) : P ↔ Q :=
⟨hpq, hqp⟩
-- Proof irrelevance demonstration
theorem proof_irrel_demo (P : Prop) (p1 p2 : P) : p1 = p2 := rfl
-- Classical logic (via choice)
open Classical in
theorem excluded_middle (P : Prop) : P ∨ ¬P := Classical.em P
Key Properties of Propositions
- Run-time irrelevance: Propositions are erased during compilation
- Impredicativity: Propositions can quantify over types from any universe
- Restricted elimination: With limited exceptions (subsingletons), propositions cannot eliminate to non-proposition types
- Extensionality: The
propextaxiom enables proving logically equivalent propositions are equal
Decidability
Decidable propositions bridge logic and computation, allowing propositions to be computed. A proposition $P$ is decidable when we can algorithmically determine $P \lor \neg P$:
$$\text{Decidable}(P) \triangleq P \lor \neg P$$
This connects to constructive mathematics where decidability provides computational content:
-- Custom decidable instance
instance decidableEven (n : Nat) : Decidable (n % 2 = 0) :=
if h : n % 2 = 0 then isTrue h else isFalse h
-- Using decidability for computation
def isEven (n : Nat) : Bool := decide (n % 2 = 0)
example : isEven 4 = true := rfl
example : isEven 5 = false := rfl
-- Subsingletons: types with at most one element
theorem subsingleton_prop (P : Prop) : Subsingleton P :=
⟨fun _ _ => rfl⟩
Inductive Types
Inductive types are Lean’s primary mechanism for introducing new types. Every type is either inductive or built from universes, functions, and inductive types.
Warning
The recursor that Lean generates for each inductive type is the induction principle in computational form. If you find yourself writing a proof by induction and wondering where the induction hypothesis comes from, the answer is: the recursor. Understanding recursors deeply is optional for using Lean but essential for understanding why Lean works.
Each inductive type has:
- A single type constructor (may take parameters)
- Any number of constructors introducing new values
- A derived recursor representing an induction principle
The general form looks intimidating but says something simple: an inductive type is defined by listing all the ways you can build a value of that type. Natural numbers have two constructors (zero and successor), lists have two (empty and cons), and so on. The formal notation below just makes this precise.
$$\begin{aligned} &\textbf{inductive } C (\vec{\alpha} : \vec{U}) : \Pi (\vec{\beta} : \vec{V}), s \textbf{ where} \\ &\quad | , c_1 : \Pi (\vec{x_1} : \vec{T_1}), C , \vec{\alpha} , \vec{t_1} \\ &\quad | , c_2 : \Pi (\vec{x_2} : \vec{T_2}), C , \vec{\alpha} , \vec{t_2} \\ &\quad \vdots \end{aligned}$$
Where $\vec{\alpha}$ are parameters (fixed across all constructors) and $\vec{\beta}$ are indices (can vary between constructors).
-- Basic inductive with multiple constructors
inductive Color : Type where
| red | green | blue
deriving Repr, DecidableEq
-- Parameterized inductive type
inductive Result (ε α : Type) : Type where
| ok : α → Result ε α
| error : ε → Result ε α
-- Anonymous constructor syntax for single-constructor types
structure Point where
x : Float
y : Float
def origin : Point := ⟨0.0, 0.0⟩
Indexed Families
The distinction between parameters and indices is fundamental. Parameters are fixed across the entire definition: if you declare inductive Foo (α : Type), then every constructor must produce a Foo α with that same α. Indices can vary: each constructor can target a different index value. In Vector α n, the type α is a parameter (all elements have the same type) but n is an index (constructors produce vectors of different lengths). The nil constructor produces Vector α 0. The cons constructor takes a Vector α n and produces Vector α (n + 1). The index changes; the parameter does not.
This distinction affects how Lean generates recursors and what pattern matching can learn. When you match on a Vector α n, Lean learns the specific value of the index n in each branch. Matching on nil tells you n = 0. Matching on cons tells you n = m + 1 for some m. This index refinement is what makes length-indexed vectors useful: the type system tracks information that flows from pattern matching.
For vectors (length-indexed lists), the signature is: $$\text{Vector} : \text{Type} \to \mathbb{N} \to \text{Type}$$
The recursor for indexed families captures the dependency: $$\text{Vector.rec} : \Pi (P : \Pi n, \text{Vector } \alpha , n \to \text{Sort } u), \ldots$$
-- Vector: length-indexed lists
inductive Vector (α : Type) : Nat → Type where
| nil : Vector α 0
| cons : ∀ {n}, α → Vector α n → Vector α (n + 1)
def vectorHead {α : Type} {n : Nat} : Vector α (n + 1) → α
| Vector.cons a _ => a
-- Even/odd indexed list (from manual)
inductive EvenOddList (α : Type) : Bool → Type where
| nil : EvenOddList α true
| cons : ∀ {isEven}, α → EvenOddList α isEven → EvenOddList α (!isEven)
def exampleEvenList : EvenOddList Nat true :=
EvenOddList.cons 1 (EvenOddList.cons 2 EvenOddList.nil)
The Fin n type represents natural numbers strictly less than n. It is perhaps the simplest useful indexed type: the index constrains which values can exist. A Fin 3 can only be 0, 1, or 2. Attempting to construct a Fin 3 with value 3 is a type error, not a runtime error.
-- Fin n: natural numbers less than n
-- Fin 3 has exactly three values: 0, 1, 2
example : Fin 3 := 0
example : Fin 3 := 1
example : Fin 3 := 2
-- example : Fin 3 := 3 -- Error: 3 is not less than 3
-- Fin carries a proof that the value is in bounds
def two : Fin 5 := ⟨2, by omega⟩
-- Safe array indexing using Fin
def safeIndex {α : Type} (arr : Array α) (i : Fin arr.size) : α :=
arr[i] -- No bounds check needed at runtime
Vectors generalize lists by tracking their length in the type. A Vec α n is a list of exactly n elements of type α. The head function can only be called on non-empty vectors because its type requires Vec α (n + 1). No runtime check needed; the type system enforces the precondition.
-- Vector: lists with length in the type
-- A vector is a list with its length tracked in the type
inductive Vec (α : Type) : Nat → Type where
| nil : Vec α 0
| cons {n : Nat} : α → Vec α n → Vec α (n + 1)
-- The length is known at compile time
def exampleVec : Vec Nat 3 := .cons 1 (.cons 2 (.cons 3 .nil))
-- Head is safe: can only call on non-empty vectors
def Vec.head {α : Type} {n : Nat} : Vec α (n + 1) → α
| .cons x _ => x
-- Tail preserves the length relationship
def Vec.tail {α : Type} {n : Nat} : Vec α (n + 1) → Vec α n
| .cons _ xs => xs
-- Map over a vector (preserves length)
def Vec.map {α β : Type} {n : Nat} (f : α → β) : Vec α n → Vec β n
| .nil => .nil
| .cons x xs => .cons (f x) (xs.map f)
Mutual and Nested Inductive Types
Multiple inductive types can be defined simultaneously when they reference each other. Nested inductive types are defined recursively through other type constructors.
mutual
inductive Tree (α : Type) : Type where
| leaf : α → Tree α
| node : Forest α → Tree α
inductive Forest (α : Type) : Type where
| nil : Forest α
| cons : Tree α → Forest α → Forest α
end
-- Nested inductive: recursive through other type constructors
inductive NestedTree (α : Type) : Type where
| leaf : α → NestedTree α
| node : List (NestedTree α) → NestedTree α
-- Recursors are automatically generated for pattern matching
def treeSize {α : Type} : Tree α → Nat
| Tree.leaf _ => 1
| Tree.node forest => forestSize forest
where forestSize : Forest α → Nat
| Forest.nil => 0
| Forest.cons t rest => treeSize t + forestSize rest
Structures
Structures are specialized single-constructor inductive types with no indices. They provide automatic projection functions, named-field syntax, update syntax, and inheritance:
structure Person where
name : String
age : Nat
email : Option String := none
deriving Repr
-- Inheritance
structure Student extends Person where
studentId : Nat
gpa : Float
-- Field access and update syntax
def alice : Person := { name := "Alice", age := 25 }
def olderAlice := { alice with age := 26 }
example : alice.name = "Alice" := rfl
example : olderAlice.age = 26 := rfl
Sigma Types and Subtypes
The intuition for sigma types is straightforward: a sigma type is a pair where the type of the second element depends on the value of the first. Think of a labeled box: the label tells you what is inside, and the contents match the label. If the label says “length 5”, the contents are a list with exactly 5 elements. The label and contents travel together, and the type system knows they are consistent.
Sigma types (dependent pairs) package a value with data that depends on it. The notation Σ x : α, β x describes pairs where the second component’s type depends on the first component’s value. This is the dependent version of the product type α × β.
-- Sigma types: dependent pairs
-- A dependent pair where the second type depends on the first value
-- Σ (n : Nat), Fin n means "a natural number n paired with a Fin n"
def dependentPair : Σ n : Nat, Fin n := ⟨5, 3⟩
-- The second component's type depends on the first
example : dependentPair.fst = 5 := rfl
example : dependentPair.snd < dependentPair.fst := by decide
-- Contrast with non-dependent product
def regularPair : Nat × Nat := (5, 3) -- both components are Nat
Subtypes refine existing types with predicates. The type { x : α // P x } contains values of type α that satisfy predicate P. Each element bundles a value with a proof that it satisfies the constraint. This is how you express “positive integers” or “sorted lists” at the type level.
-- Subtypes: values with proofs
-- { x : Nat // x > 0 } is the type of positive naturals
def posNat : { n : Nat // n > 0 } := ⟨5, by omega⟩
-- Access the value and proof separately
example : posNat.val = 5 := rfl
example : posNat.property = (by omega : 5 > 0) := rfl
-- Functions can require positive inputs
def safeDivide (a : Nat) (b : { n : Nat // n > 0 }) : Nat :=
a / b.val
Equality as a Type
The equality type a = b is itself a dependent type: it depends on the values a and b. The only constructor is rfl : a = a, which proves that any value equals itself. Proofs of equality can be used to substitute equal values, and equality satisfies the expected properties of symmetry and transitivity.
-- Equality as a dependent type
-- The equality type: a = b is a type that is inhabited iff a equals b
-- rfl : a = a is the only constructor
example : 2 + 2 = 4 := rfl
-- Proofs of equality can be used to substitute
theorem subst_example (a b : Nat) (h : a = b) (P : Nat → Prop) (pa : P a) : P b :=
h ▸ pa -- transport pa along h
-- Equality is symmetric and transitive
theorem eq_symm (a b : Nat) (h : a = b) : b = a := h.symm
theorem eq_trans (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := h1.trans h2
Quotient Types
Quotient types create new types by identifying elements via equivalence relations. Given a type $\alpha$ and an equivalence relation $\sim$ on $\alpha$, the quotient $\alpha/\sim$ is a type where $a = b$ in $\alpha/\sim$ whenever $a \sim b$. Elements related by the relation become equal in the quotient type. Equality is respected universally, and nothing in Lean’s logic can observe differences between equal terms.
Note
Mathematicians write $\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\!\sim$ and software engineers write
type Int = Quotient (Nat × Nat) equiv. Same idea, different notation. The integer $-3$ is not any particular pair of naturals but the equivalence class of all pairs $(a, b)$ where $a + 3 = b$: so $(0, 3)$, $(1, 4)$, $(2, 5)$, and infinitely many others. Two fields, one concept, a century of mutual incomprehension that turns out to be largely notational.
For example, the integers can be constructed as $\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\sim$ where $(a,b) \sim (c,d)$ iff $a + d = b + c$.
-- Simple modulo equivalence relation
def ModRel (n : Nat) : Nat → Nat → Prop :=
fun a b => a % n = b % n
-- Prove it's an equivalence relation
theorem ModRel.refl (n : Nat) : ∀ x, ModRel n x x :=
fun _ => rfl
theorem ModRel_symm (n : Nat) : ∀ x y, ModRel n x y → ModRel n y x :=
fun _ _ h => h.symm
theorem ModRel.trans (n : Nat) : ∀ x y z, ModRel n x y → ModRel n y z → ModRel n x z :=
fun _ _ _ hxy hyz => Eq.trans hxy hyz
-- Create setoid instance
instance ModSetoid (n : Nat) : Setoid Nat where
r := ModRel n
iseqv := {
refl := ModRel.refl n
symm := @ModRel_symm n
trans := @ModRel.trans n
}
-- Define the quotient type (integers modulo n)
def ZMod (n : Nat) : Type := Quotient (ModSetoid n)
Working with Quotients
Operations on quotients must respect the equivalence relation. The Quotient.lift functions ensure operations are well-defined, while Quotient.sound asserts equality of related elements.
The quotient axioms provide:
- Quotient.mk: $\alpha \to \alpha/{\sim}$ (constructor)
- Quotient.lift: If $f : \alpha \to \beta$ respects $\sim$, then $f$ lifts to $\alpha/{\sim} \to \beta$
- Quotient.sound: If $a \sim b$, then $[a] = [b]$ in $\alpha/{\sim}$
- Quotient.exact: If $[a] = [b]$ in $\alpha/{\sim}$, then $a \sim b$
namespace ZMod
-- Constructor respecting equivalence
def mk (n : Nat) (a : Nat) : ZMod n :=
Quotient.mk (ModSetoid n) a
-- Addition operation via lifting
def add {n : Nat} [NeZero n] : ZMod n → ZMod n → ZMod n :=
Quotient.lift₂
(fun a b => mk n ((a + b) % n))
(fun a₁ a₂ b₁ b₂ h₁ h₂ => by
apply Quotient.sound
simp only [ModSetoid] at h₁ h₂ ⊢
unfold ModRel at h₁ h₂ ⊢
rw [Nat.add_mod, h₁, h₂, ← Nat.add_mod]
rfl)
-- Quotient.sound: related elements are equal
theorem mk_eq_of_rel {n : Nat} (a b : Nat) (h : ModRel n a b) :
mk n a = mk n b :=
Quotient.sound h
-- Quotient induction principle
theorem ind_on {n : Nat} {P : ZMod n → Prop} (q : ZMod n)
(h : ∀ a, P (mk n a)) : P q :=
Quotient.ind h q
end ZMod
Combined Examples
The following examples combine dependent functions, indexed families, and proof terms. Each demonstrates how types can enforce invariants that would be invisible to simpler type systems:
-- Dependent pairs (Sigma types)
def DependentPair := Σ n : Nat, Fin n
def examplePair : DependentPair := ⟨3, 2⟩
-- Type families and dependent pattern matching
def typeFamily : Nat → Type
| 0 => Unit
| 1 => Bool
| _ => Nat
def familyValue : (n : Nat) → typeFamily n
| 0 => ()
| 1 => true
| n@(_ + 2) => n * 2
-- Well-founded recursion via termination_by
def factorial : Nat → Nat
| 0 => 1
| n + 1 => (n + 1) * factorial n
-- Custom well-founded recursion
def div2 : Nat → Nat
| 0 => 0
| 1 => 0
| n + 2 => 1 + div2 n
example : factorial 5 = 120 := rfl
example : div2 10 = 5 := rfl
The machinery presented here forms the foundation of everything that follows. Dependent types are why Lean can serve simultaneously as a programming language and a proof assistant. When you write a type signature like Vector α n → Vector α (n + 1), you are making a mathematical claim that the compiler will verify. Specifications that the machine enforces, invariants that cannot be violated, programs that are correct by construction.
Type-Indexed State Machines
State machines appear everywhere in software: network protocols, UI workflows, resource management, authentication flows. The traditional approach represents state as a runtime value and scatters checks throughout the code. “Is the connection open? Is the user logged in? Has the transaction started?” Each check is a potential bug: forget one and you have undefined behavior, check the wrong condition and you have a security hole.
Type-indexed state machines take a different approach. Instead of tracking state at runtime and checking it manually, we encode state in the type itself. The type checker then verifies that operations happen in the correct order. Invalid sequences become type errors, caught at compile time rather than runtime.
Consider a vending machine. The naive implementation tracks balance as a runtime value, checking at each operation whether funds suffice. Bugs lurk: what if someone calls vend before inserting coins? What if returnChange is called twice? These are not type errors in conventional languages. They are runtime failures waiting to happen.
inductive Product where
| HoneyComb
| SalmonJerky
| BerryMix
| GrubBar
| AcornCrunch
deriving Repr, DecidableEq
def Product.price : Product → Nat
| .HoneyComb => 150
| .SalmonJerky => 200
| .BerryMix => 100
| .GrubBar => 125
| .AcornCrunch => 175
def Product.name : Product → String
| .HoneyComb => "Honey Comb"
| .SalmonJerky => "Salmon Jerky"
| .BerryMix => "Berry Mix"
| .GrubBar => "Grub Bar"
| .AcornCrunch => "Acorn Crunch"
The Machine type is indexed by cents inserted. This index exists only in the type system. At runtime, Machine 0 and Machine 200 are identical unit values with no data. The number is a phantom type parameter that the compiler tracks but that costs nothing at runtime.
structure Machine (cents : Nat) where mk ::
def insertCoin (coin : Nat) {n : Nat} (_m : Machine n) : Machine (n + coin) := ⟨⟩
def insertDollar {n : Nat} (m : Machine n) : Machine (n + 100) := insertCoin 100 m
def vend (p : Product) {n : Nat} (_m : Machine n) (_h : n ≥ p.price) :
Product × Machine (n - p.price) := (p, ⟨⟩)
def returnChange {n : Nat} (_m : Machine n) : Nat × Machine 0 := (n, ⟨⟩)
def empty : Machine 0 := ⟨⟩
Study the type signatures carefully. insertCoin takes a Machine n and returns a Machine (n + coin). The balance increases by exactly the inserted amount. vend requires a proof \(n \geq p.price\) and returns a Machine (n - p.price). You cannot call vend without providing this proof, and the compiler will reject any attempt to vend with insufficient funds. returnChange resets to Machine 0 regardless of the input balance, modeling the fact that all remaining money is returned.
The key insight is that each operation transforms the type index in a way that reflects its effect on the state. The compiler tracks these transformations and ensures they compose correctly. If you try to write code that vends without inserting money, the type checker will demand a proof of \(0 \geq 100\) (or whatever the price is), which is unprovable because it is false.
def exampleTransaction : Product × Nat := Id.run do
let m := empty
let m := insertDollar m
let m := insertDollar m
let (snack, m) := vend .BerryMix m (by native_decide)
let (change, _) := returnChange m
return (snack, change)
The example shows a complete transaction. We start with an empty machine, insert two dollars (200 cents), vend a berry mix for 100 cents, and return the remaining 100 cents as change. At each step, the type system knows exactly how much money is in the machine. The by native_decide proof discharge works because \(200 \geq 100\) is decidably true.
This pattern scales to real systems. A file handle can be indexed by whether it is open or closed: read requires Handle Open and returns Handle Open, while close takes Handle Open and returns Handle Closed. Calling read on a closed handle becomes a type error. A network socket can track connection state: you cannot send on an unconnected socket because the types forbid it.
Authentication systems benefit particularly. A session token can be indexed by authentication level: Session Guest, Session User, Session Admin. Functions that require admin privileges take Session Admin and the compiler ensures you cannot access admin functionality without proper authentication. Privilege escalation bugs become impossible because the type system enforces the security policy.
The tradeoff is complexity. Type-indexed state machines require careful API design and more sophisticated type signatures. The proof obligations can become burdensome for complex protocols. But for systems where correctness matters (financial transactions, security boundaries, safety-critical code), the guarantee that invalid states are unrepresentable is worth the investment.
Constraint Satisfaction: N-Queens
The N-Queens puzzle asks: place N queens on an \(N \times N\) chessboard so that no two attack each other. Queens attack along rows, columns, and diagonals. The naive approach generates placements and filters invalid ones. The dependent type approach makes invalid placements unrepresentable.
A placement is a list of column positions, one per row. Two queens attack if they share a column or diagonal:
abbrev Placement := List Nat
def attacks (r1 c1 r2 c2 : Nat) : Bool :=
c1 == c2 || (if r1 ≥ r2 then r1 - r2 else r2 - r1) == (if c1 ≥ c2 then c1 - c2 else c2 - c1)
def isSafeAux (newRow col : Nat) : Placement → Nat → Bool
| [], _ => true
| qc :: rest, row => !attacks newRow col row qc && isSafeAux newRow col rest (row + 1)
def isSafe (col : Nat) (p : Placement) : Bool := isSafeAux p.length col p 0
A valid placement has the right length, all columns in bounds, and no attacking pairs:
def Valid (n : Nat) (p : Placement) : Prop :=
p.length = n ∧ p.all (· < n) ∧
∀ i j, i < j → j < p.length → !attacks i p[i]! j p[j]!
structure Board (n : Nat) where
placement : Placement
valid : Valid n placement
The Board n type bundles a placement with its validity proof. You cannot construct a Board 8 with queens that attack each other; the proof obligation cannot be discharged. The constraint is not checked at runtime but enforced at compile time.
theorem board_length {n : Nat} (b : Board n) : b.placement.length = n := b.valid.1
The theorem board_length extracts the length invariant from a valid board. The proof is trivial projection because the invariant is baked into the type. This is the dependent types payoff: properties that would require defensive runtime checks become facts the type system guarantees.
Most software is written fast, tested hopefully, and debugged frantically. Dependent types offer a different mode: slower to write, harder to learn, guarantees that survive contact with production. Whether the tradeoff makes sense depends on how much a bug costs. For most code, the answer is “not much.” For some code, the answer is “careers” or “lives.” Know which kind you are writing.
From Theory to Practice
You now understand the type-theoretic machinery. The next article turns to strategy: how to approach proofs systematically, read goal states, choose tactics, and develop the intuition for what technique applies where. Less “what does this mean” and more “how do I make this red squiggle go away.”
Proof Strategy
The previous articles taught you individual tactics. Now we learn how to think. A proof is not a random sequence of tactics that happens to work. It is a structured argument, and understanding that structure makes the difference between flailing and fluency. The gap between knowing the tactics and knowing how to prove things is the gap between knowing the rules of chess and knowing how to not lose immediately.
The Goal State
Every proof begins with a goal and ends with no goals. The goal state is your map. Learning to read it fluently is the most important skill in tactic-based proving.
case succ
n : Nat
ih : P n
⊢ P (n + 1)
This goal state tells you everything: you are in the succ case of an induction, you have a natural number n, you have an induction hypothesis ih stating that $P(n)$ holds, and you must prove $P(n + 1)$. The turnstile $\vdash$ separates what you have from what you need.
When a proof has multiple goals, they appear stacked. The first goal is your current focus. Tactics typically operate on the first goal, though combinators like all_goals and any_goals can target multiple goals simultaneously.
Goal State Evolution
Here is an induction proof showing how the goal state evolves at each step:
-- A typical induction proof showing goal state evolution
theorem double_sum (n : Nat) : 2 * n = n + n := by
induction n with
| zero =>
-- Goal state: ⊢ 2 * 0 = 0 + 0
rfl
| succ n ih =>
-- Goal state here:
-- n : Nat
-- ih : 2 * n = n + n
-- ⊢ 2 * (n + 1) = (n + 1) + (n + 1)
omega
The diagram below visualizes how intro transforms the goal state step by step. Each box shows the context (hypotheses above the line) and the goal (below the line). Watch how intro h moves P from the goal into the context:
Categories of Tactics
Tactics fall into natural categories based on what they do to the goal state. Understanding these categories helps you choose the right tool.
Introduction tactics move structure from the goal into the context. When your goal is $P \to Q$, the tactic intro h assumes $P$ (calling it h) and changes the goal to $Q$. When your goal is $\forall x, P(x)$, the tactic intro x introduces a fresh $x$ and changes the goal to $P(x)$. Introduction tactics make progress by assuming what you need to prove under.
-- Introduction tactics: moving from goal to context
-- intro moves implications into the context
theorem intro_example (P Q : Prop) : P → Q → P := by
intro hp -- assumes P, calling it hp
intro _ -- assumes Q (unused)
exact hp -- goal is now P, which we have
-- intro works with forall too
theorem forall_intro (P : Nat → Prop) (h : ∀ n, P n) : ∀ m, P m := by
intro m -- introduces arbitrary m
exact h m -- apply universal hypothesis
Elimination tactics use structure from the context to transform the goal. When you have h : P ∧ Q and need $P$, the tactic exact h.1 extracts the left component. When you have h : P ∨ Q, the tactic cases h splits into two goals, one assuming $P$ and one assuming $Q$. Elimination tactics make progress by using what you have.
-- Elimination tactics: using hypotheses to transform goals
-- cases eliminates disjunctions
theorem cases_example (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R := by
cases h with
| inl p => exact hp p -- left case: we have P
| inr q => exact hq q -- right case: we have Q
-- And.left/right eliminate conjunctions
theorem and_elim (P Q : Prop) (h : P ∧ Q) : P := h.1
-- exists elimination via obtain
theorem exists_elim (P : Nat → Prop) (h : ∃ n, P n ∧ n > 0) : ∃ m, P m := by
obtain ⟨n, hn, _⟩ := h
exact ⟨n, hn⟩
Rewriting tactics transform the goal using equalities. The tactic rw [h] replaces occurrences of the left side of h with the right side. The tactic simp applies many such rewrites automatically. Rewriting makes progress by simplifying toward something obviously true.
-- Rewriting tactics: transforming goals with equalities
theorem rewrite_example (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
rw [h1] -- goal becomes b = c
rw [h2] -- goal becomes c = c
-- or: rw [h1, h2] in one step
theorem rewrite_reverse (a b : Nat) (h : a = b) : b = a := by
rw [← h] -- rewrite right-to-left using ←
theorem simp_example (xs : List Nat) : (xs ++ []).length = xs.length := by
simp -- applies simp lemmas automatically
Automation tactics search for proofs. The tactic simp tries simplification lemmas. The tactic omega solves linear arithmetic. The tactic aesop performs general proof search. Automation makes progress by doing work you would rather not do by hand.
Structural tactics manipulate the proof state without making logical progress. The tactic swap reorders goals. The tactic rename changes hypothesis names. The tactic clear removes unused hypotheses. These tactics keep your proof organized.
Reading the Goal
Before applying any tactic, ask: what is the shape of my goal? The outermost connective determines your next move.
Goals that require building structure call for introduction tactics. If your goal is an implication $P \to Q$, use intro to assume $P$ and reduce the goal to $Q$. Universal statements $\forall x, P(x)$ work the same way: intro x gives you an arbitrary $x$ and asks you to prove $P(x)$. For conjunctions $P \land Q$, use constructor to split into two subgoals. For disjunctions $P \lor Q$, you must commit: left obligates you to prove $P$, while right obligates you to prove $Q$. Existentials $\exists x, P(x)$ require a witness: use t provides the term $t$ and leaves you to prove $P(t)$.
Goals that are equations or basic facts call for different tactics. For equality $a = b$, try rfl if the terms are definitionally equal, simp for simplification, rw with known equalities, or ring for algebraic identities. Negation $\neg P$ is secretly an implication: since $\neg P$ means $P \to \bot$, you use intro h to assume $P$ and then derive a contradiction. If your goal is $\bot$ itself, you need to find conflicting hypotheses.
Reading the Context
Your context contains hypotheses. Each one is a tool waiting to be used. The shape of a hypothesis determines what you can do with it.
Hypotheses that provide conditional information let you make progress when you can satisfy their conditions. An implication $h : P \to Q$ gives you $Q$ if you can prove $P$. When your goal is $Q$, use apply h to reduce it to proving $P$. A universal $h : \forall x, P(x)$ can be instantiated at any term: specialize h t replaces $h$ with $P(t)$, or have ht := h t keeps the original.
Hypotheses that package multiple facts can be taken apart. A conjunction $h : P \land Q$ gives you both pieces: access them with h.1 and h.2, or destructure with obtain ⟨hp, hq⟩ := h. An existential $h : \exists x, P(x)$ packages a witness and a proof: obtain ⟨x, hx⟩ := h extracts both. A disjunction $h : P \lor Q$ requires case analysis since you do not know which side holds: cases h splits your proof into two branches.
An equality $h : a = b$ lets you substitute. Use rw [h] to replace $a$ with $b$ in your goal, or rw [← h] to go the other direction.
Proof Patterns
Certain proof structures recur constantly. Recognizing them saves time.
Direct proof: Introduce assumptions, manipulate, conclude. Most proofs follow this pattern.
-- Direct proof: apply implication and provide hypothesis
theorem direct (P Q : Prop) (h : P → Q) (hp : P) : Q := by
apply h
exact hp
Proof by cases: When you have a disjunction or an inductive type, split and prove each case.
-- Case analysis: split into cases and handle each
theorem by_cases_template (n : Nat) : n = 0 ∨ n ≥ 1 := by
cases n with
| zero => left; rfl
| succ m => right; simp
Proof by induction: For properties of recursive types, prove the base case and the inductive step.
-- Induction: prove base case, then prove successor using IH
theorem by_induction (n : Nat) : 0 + n = n := by
induction n with
| zero => rfl
| succ n ih => simp [ih]
Proof by contradiction: Assume the negation and derive $\bot$.
-- Contradiction: assume negation, derive False
theorem by_contradiction (P : Prop) (h : ¬¬P) : P := by
by_contra hnp
exact h hnp
Proof by contraposition: To prove $P \to Q$, prove $\neg Q \to \neg P$ instead.
-- Contraposition: prove ¬Q → ¬P instead of P → Q
theorem by_contraposition (P Q : Prop) (h : ¬Q → ¬P) : P → Q := by
intro hp
by_contra hnq
exact h hnq hp
Backward and Forward Reasoning
Backward reasoning works from goal toward hypotheses. Forward reasoning builds from hypotheses toward the goal:
-- Backward reasoning: working from goal toward hypotheses
-- apply works backward from the goal
theorem backward_example (P Q R : Prop) (hpq : P → Q) (hqr : Q → R) (hp : P) : R := by
apply hqr -- goal becomes Q (we need to prove what hqr needs)
apply hpq -- goal becomes P
exact hp -- we have P
-- have introduces intermediate lemmas
theorem have_example (a b : Nat) (ha : a > 5) (hb : b < 10) : a + b > 5 := by
have h1 : a ≥ 6 := ha
have h2 : b ≥ 0 := Nat.zero_le b
omega
The diagram below shows backward reasoning in action. We start with goal R and work backwards through the implications. Each apply transforms the goal into what we need to establish the premise:
-- Forward reasoning: building from hypotheses toward goal
-- calc chains equational reasoning
theorem calc_example (a b c : Nat) (h1 : a = b + 1) (h2 : b = c + 1) : a = c + 2 := by
calc a = b + 1 := h1
_ = (c + 1) + 1 := by rw [h2]
_ = c + 2 := by ring
-- obtain destructs existentials and conjunctions
theorem obtain_example (h : ∃ n : Nat, n > 0 ∧ n < 10) : ∃ m, m < 10 := by
obtain ⟨n, _, hlt⟩ := h
exact ⟨n, hlt⟩
Induction Patterns
Induction is the workhorse for recursive types:
-- Induction patterns
-- Simple structural induction on Nat
theorem nat_induction (P : Nat → Prop) (base : P 0) (step : ∀ n, P n → P (n + 1))
: ∀ n, P n := by
intro n
induction n with
| zero => exact base
| succ n ih => exact step n ih
-- Strong induction when you need smaller cases
theorem strong_induction (n : Nat) (h : n > 0) : n ≥ 1 := by
omega -- trivial here, but pattern matters
-- Induction on lists
theorem list_induction (xs : List Nat) : xs.reverse.reverse = xs := by
induction xs with
| nil => rfl
| cons x xs ih => simp [ih]
Case Splitting
When the path forward depends on which case holds:
-- Case splitting strategies
-- split_ifs handles if-then-else
def abs (n : Int) : Int := if n < 0 then -n else n
theorem abs_nonneg (n : Int) : abs n ≥ 0 := by
unfold abs
split_ifs with h
· omega -- case: n < 0, so -n ≥ 0
· omega -- case: n ≥ 0
-- by_cases for arbitrary propositions
theorem by_cases_example (P : Prop) [Decidable P] : P ∨ ¬P := by
by_cases h : P
· left; exact h
· right; exact h
-- interval_cases for bounded natural numbers
theorem small_cases (n : Nat) (h : n < 3) : n = 0 ∨ n = 1 ∨ n = 2 := by
omega
Proof by Contradiction
When direct proof fails, assume the negation and derive absurdity:
-- Proof by contradiction
theorem contradiction_example (P : Prop) (h : P) (hn : ¬P) : False := by
exact hn h -- ¬P is P → False
theorem by_contra_example (n : Nat) (h : ¬(n = 0)) : n > 0 := by
by_contra h'
push_neg at h'
omega
Choosing Automation
Different automation tactics excel at different domains:
-- Choosing the right automation
-- omega for linear arithmetic
theorem omega_example (a b : Nat) (h : a + b = 10) (h2 : a ≤ 3) : b ≥ 7 := by
omega
-- ring for polynomial identities
theorem ring_example (x y : Int) : (x + y)^2 = x^2 + 2*x*y + y^2 := by
ring
-- simp for definitional simplification
theorem simp_list (xs ys : List Nat) : (xs ++ ys).length = xs.length + ys.length := by
simp
-- aesop for general proof search
theorem aesop_example (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
aesop
When You Get Stuck
Every proof hits obstacles. Here is how to get unstuck.
Simplify first. Try simp or simp only [relevant_lemmas]. Often the goal simplifies to something obvious.
Check your hypotheses. Do you have what you need? Use have to derive intermediate facts. Use obtain to destructure complex hypotheses.
Try automation. For arithmetic, try omega or linarith. For algebraic identities, try ring or field_simp. For general goals, try aesop or decide.
Work backwards. What would make your goal obviously true? If you need $P \land Q$, you need to prove both $P$ and $Q$. What tactics produce those subgoals?
Work forwards. What can you derive from your hypotheses? If you have $h : P \to Q$ and hp : P, you can derive $Q$.
Split the problem. Use have to state and prove intermediate lemmas. Breaking a proof into steps often reveals the path.
Read the error. Lean’s error messages are verbose but precise. “Type mismatch” tells you what was expected and what you provided. “Unknown identifier” means a name is not in scope. “Unsolved goals” means you are not done.
Use the library. Mathlib contains thousands of lemmas. Use exact? to search for lemmas that close your goal. Use apply? to search for lemmas whose conclusion matches your goal.
Tactic Decision Guide
When staring at a goal, ask: what is its outermost structure? This table maps goal shapes to tactics.
By Goal Shape
| Goal looks like… | First tactic to try | What it does |
|---|---|---|
P → Q | intro h | Assume P, prove Q |
∀ x, P x | intro x | Introduce arbitrary x, prove P x |
P ∧ Q | constructor | Split into two goals: prove P, prove Q |
P ∨ Q | left or right | Commit to proving one side |
∃ x, P x | use t | Provide witness t, prove P t |
¬P (i.e., P → False) | intro h | Assume P, derive contradiction |
a = b (definitionally equal) | rfl | Reflexivity closes it |
a = b (needs rewriting) | simp or rw [h] | Simplify or rewrite using equalities |
a = b (algebraic) | ring | Solves polynomial identities |
a < b, a ≤ b (linear) | omega or linarith | Decision procedures for linear arithmetic |
True | trivial | Trivially true |
False | Look for contradiction | Need conflicting hypotheses |
| Decidable proposition | decide | Compute the answer |
By Hypothesis Shape
| Hypothesis looks like… | How to use it | What it does |
|---|---|---|
h : P → Q | apply h (goal is Q) | Changes goal to P |
h : P → Q | have := h hp | Get Q if you have hp : P |
h : ∀ x, P x | specialize h t | Instantiate at specific t |
h : P ∧ Q | obtain ⟨hp, hq⟩ := h | Extract both components |
h : P ∧ Q | h.1, h.2 | Access components directly |
h : P ∨ Q | cases h | Split into two cases |
h : ∃ x, P x | obtain ⟨x, hx⟩ := h | Extract witness and proof |
h : a = b | rw [h] | Replace a with b in goal |
h : a = b | rw [← h] | Replace b with a in goal |
h : False | contradiction | Closes any goal |
h : a ≠ a or conflicting facts | contradiction | Derives False automatically |
Common Proof Templates
Implication: To prove P → Q:
intro h -- assume P, call it h
... -- work toward Q
exact ... -- provide Q
Universal: To prove ∀ x, P x:
intro x -- let x be arbitrary
... -- prove P x
Conjunction: To prove P ∧ Q:
constructor -- creates two goals
· ... -- prove P
· ... -- prove Q
Existential: To prove ∃ x, P x:
use t -- provide witness t
... -- prove P t
Case split: When you have h : P ∨ Q:
cases h with
| inl hp => ... -- case where P holds
| inr hq => ... -- case where Q holds
Induction: To prove ∀ n, P n by induction:
intro n
induction n with
| zero => ... -- base case: prove P 0
| succ n ih => ... -- inductive step: ih is P n, prove P (n+1)
Contradiction: To prove P by contradiction:
by_contra h -- assume ¬P
... -- derive False
Tactic Composition
Tactics compose in several ways. Sequencing separates tactics with newlines or semicolons, each operating on the result of the previous one. Focusing uses · to work on a single goal, with indentation grouping tactics under that focus. Combinators like <;> apply a tactic to all goals produced by the previous tactic, first | t1 | t2 tries tactics in order, and repeat t applies a tactic until it fails.
-- Tactic composition: sequencing, focusing, and combinators
-- Sequencing: separate tactics with newlines or semicolons
theorem seq_demo (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
constructor
exact h.2
exact h.1
-- Focusing: use · to focus on a single goal
theorem focus_demo (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
constructor
· exact h.2
· exact h.1
-- Combinators: <;> applies to all goals, first tries alternatives
theorem combinator_demo (P : Prop) (h : P) : P ∧ P ∧ P := by
constructor <;> (try constructor) <;> exact h
Next-Generation Automation
The tactics described so far require you to think. You read the goal, choose a strategy, apply tactics step by step. This is how mathematicians have always worked, and there is value in understanding your proof at every stage. But a new generation of tactics is changing the calculus of what is worth formalizing.
Higher-order tactics like aesop, grind, and SMT integration lift proof development from low-level term manipulation to structured, automated search over rich proof states. Instead of specifying every proof step, you specify goals, rule sets, or search parameters, and these tactics synthesize proof terms that Lean’s kernel then checks. The soundness guarantee remains absolute since the kernel verifies everything, but the human cost drops dramatically. This decoupling of “what should be proved” from “how to construct the term” is what makes large-scale formalization feasible.
aesop implements white-box best-first proof search, exploring a tree of proof states guided by user-configurable rules. Unlike black-box automation, aesop lets you understand and tune the search: rules are indexed via discrimination trees for rapid retrieval, and you can register domain-specific lemmas to teach it new tricks. grind draws inspiration from modern SMT solvers, maintaining a shared workspace where congruence closure, E-matching, and forward chaining cooperate on a goal. It excels when many interacting equalities and logical facts are present, automatically deriving consequences that would be tedious to script by hand. For goals requiring industrial-strength decision procedures, SMT tactics send suitable fragments to proof-producing solvers like cvc5, then reconstruct proofs inside Lean so the kernel can verify them. This lets Lean leverage decades of solver engineering while preserving the LCF-style trust model where only the small kernel must be trusted.
The strategic question is when to reach for automation versus working by hand. The temptation is to try grind on everything and move on when it works. This is efficient but opaque: you learn nothing, and when automation fails on a similar goal later, you have no insight into why. A better approach is to use automation to explore, then understand what it found. Goals that would take an hour of tedious case analysis now take seconds. This frees you to tackle harder problems. But remember: when grind closes a goal, it has found a valid proof term. It has not gained insight. That remains your job.
The Tactics Reference
The following article is a reference. It documents every major tactic in Lean 4 and Mathlib, organized alphabetically. You do not need to memorize it. You need to know it exists, and you need to know how to find the tactic you need.
When you encounter a goal you do not know how to prove, return here. Ask: what is the shape of my goal? What is in my context? What pattern does this proof follow? The answer will point you to the right tactic, and the reference will tell you how to use it.
The strategies in this article apply beyond Lean. The structure of mathematical argument is universal. Direct proof, case analysis, induction, contradiction: these are the fundamental patterns of reason itself. Learning them in a proof assistant merely makes them explicit. You cannot handwave past a case you forgot to consider when the computer is watching.
Congruence and Subtyping
Every type system makes tradeoffs between precision and convenience. A function that takes Nat will accept any natural number, including zero, even when zero would cause a division error three stack frames later. A function that takes Int cannot directly accept a Nat without explicit conversion, even though every natural number is an integer. The constraints are either too loose or the syntax is too verbose. Pick your frustration.
Lean provides tools to fix both problems. Subtypes let you carve out precisely the values you want: positive numbers, non-empty lists, valid indices. The constraint travels with the value, enforced by the type system. Coercions let the compiler insert safe conversions automatically, so you can pass a Nat where an Int is expected without ceremony. These mechanisms together give you precise types with ergonomic syntax.
Types are sets with attitude. A Nat carries the natural numbers along with all their operations and laws. A subtype narrows this: the positive natural numbers are the naturals with an extra constraint, a proof obligation that travels with every value. This is refinement: taking a broad type and carving out the subset you actually need.
The other direction is coercion. When Lean expects an Int but you give it a Nat, something must convert between them. Explicit casts are tedious. Coercions make the compiler do the work, inserting conversions automatically where safe. The result is code that looks like it mixes types freely but maintains type safety underneath.
Congruence handles the third concern: propagating equality. If a = b, then f a = f b. This seems obvious, but the compiler needs to be told. The congr tactic applies this principle systematically, breaking equality goals into their components.
Subtypes
A subtype refines an existing type with a predicate. Values of a subtype carry both the data and a proof that the predicate holds.
def Positive := { n : Nat // n > 0 }
def five : Positive := ⟨5, by decide⟩
#eval five.val -- 5
def NonEmpty (α : Type) := { xs : List α // xs ≠ [] }
def singletonList : NonEmpty Nat := ⟨[42], by decide⟩
Working with Subtypes
Functions on subtypes can access the underlying value and use the proof to ensure operations are safe.
def doublePositive (p : Positive) : Positive :=
⟨p.val * 2, Nat.mul_pos p.property (by decide)⟩
#eval (doublePositive five).val -- 10
def addPositive (a b : Positive) : Positive :=
⟨a.val + b.val, Nat.add_pos_left a.property b.val⟩
def safeHead {α : Type} (xs : NonEmpty α) : α :=
match h : xs.val with
| x :: _ => x
| [] => absurd h xs.property
Refinement Types
Subtypes let you express precise invariants. Common patterns include bounded numbers, non-zero values, and values satisfying specific properties.
def Even := { n : Nat // n % 2 = 0 }
def Odd := { n : Nat // n % 2 = 1 }
def zero' : Even := ⟨0, rfl⟩
def two' : Even := ⟨2, rfl⟩
def one' : Odd := ⟨1, rfl⟩
def three' : Odd := ⟨3, rfl⟩
def BoundedNat (lo hi : Nat) := { n : Nat // lo ≤ n ∧ n < hi }
def inRange : BoundedNat 0 10 := ⟨5, by omega⟩
Basic Coercions
Coercions allow automatic type conversion. When a value of type A is expected but you provide type B, Lean looks for a coercion from B to A.
instance : Coe Positive Nat where
coe p := p.val
def useAsNat (p : Positive) : Nat :=
p + 10
#eval useAsNat five -- 15
instance {α : Type} : Coe (NonEmpty α) (List α) where
coe xs := xs.val
def listLength (xs : NonEmpty Nat) : Nat :=
xs.val.length
#eval listLength singletonList -- 1
Coercion Chains
Coercions can chain together. If there is a coercion from A to B and from B to C, Lean can automatically convert from A to C.
structure Meters where
val : Float
deriving Repr
structure Kilometers where
val : Float
deriving Repr
instance : Coe Meters Float where
coe m := m.val
instance : Coe Kilometers Meters where
coe km := ⟨km.val * 1000⟩
def addDistances (a : Meters) (b : Kilometers) : Meters :=
⟨a.val + (b : Meters).val⟩
#eval addDistances ⟨500⟩ ⟨1.5⟩ -- Meters.mk 2000.0
Function Coercions
CoeFun allows values to be used as functions. This is useful for callable objects and function-like structures.
instance : CoeFun Positive (fun _ => Nat → Nat) where
coe p := fun n => p.val + n
#eval five 10 -- 15
structure Adder where
amount : Nat
instance : CoeFun Adder (fun _ => Nat → Nat) where
coe a := fun n => n + a.amount
def addFive : Adder := ⟨5⟩
#eval addFive 10 -- 15
Sort Coercions
CoeFun coerces values to functions, allowing structures to behave like callable objects.
structure Predicate' (α : Type) where
test : α → Bool
instance {α : Type} : CoeFun (Predicate' α) (fun _ => α → Bool) where
coe p := p.test
def isEven' : Predicate' Nat := ⟨fun n => n % 2 == 0⟩
#eval isEven' 4 -- true
#eval isEven' 5 -- false
Congruence
The congr tactic applies congruence reasoning: if you need to prove f a = f b and you know a = b, congruence can close the goal.
example (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
congr
example (f : Nat → Nat) (a b : Nat) (h : a = b) : f a = f b := by
congr
example (a b c d : Nat) (h1 : a = b) (h2 : c = d) : a + c = b + d := by
congr <;> assumption
Congruence with Multiple Arguments
Congruence works with functions of multiple arguments, generating subgoals for each argument that differs.
example (f : Nat → Nat → Nat) (a b c d : Nat)
(h1 : a = c) (h2 : b = d) : f a b = f c d := by
rw [h1, h2]
example (xs ys : List Nat) (h : xs = ys) : xs.length = ys.length := by
rw [h]
Substitution and Rewriting
The subst tactic substitutes equal terms, and rw rewrites using equalities. These are fundamental tactics for equality reasoning.
example (a b : Nat) (h : a = b) : a * a = b * b := by
subst h
rfl
example (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
rw [h1, h2]
example (a b : Nat) (h : a = b) (f : Nat → Nat) : f a = f b := by
rw [h]
Type Conversion
Lean provides automatic coercion between numeric types and explicit conversion functions.
example (n : Nat) : Int := n
def natToInt (n : Nat) : Int := n
#eval natToInt 42 -- 42
def stringToNat? (s : String) : Option Nat :=
s.toNat?
#eval stringToNat? "123" -- some 123
#eval stringToNat? "abc" -- none
Decidable Propositions
A proposition is decidable if there is an algorithm to determine its truth. This enables using propositions in if-expressions.
def isPositive (n : Int) : Decidable (n > 0) :=
if h : n > 0 then isTrue h else isFalse h
def checkPositive (n : Int) : String :=
if n > 0 then "positive" else "not positive"
#eval checkPositive 5 -- "positive"
#eval checkPositive (-3) -- "not positive"
def decideEqual (a b : Nat) : Decidable (a = b) :=
if h : a = b then isTrue h else isFalse h
Type Class Inheritance
Type classes can extend other classes, creating inheritance hierarchies. A type implementing a subclass automatically implements its parent classes.
class Animal (α : Type) where
speak : α → String
class Dog (α : Type) extends Animal α where
fetch : α → String
structure Labrador where
name : String
instance : Animal Labrador where
speak lab := s!"{lab.name} says woof!"
instance : Dog Labrador where
speak lab := s!"{lab.name} says woof!"
fetch lab := s!"{lab.name} fetches the ball!"
def makeSpeak {α : Type} [Animal α] (a : α) : String :=
Animal.speak a
def rex : Labrador := ⟨"Rex"⟩
#eval makeSpeak rex -- "Rex says woof!"
#eval Dog.fetch rex -- "Rex fetches the ball!"
Structure Extension
Structures can extend other structures, inheriting their fields while adding new ones.
structure Shape where
name : String
structure Circle extends Shape where
radius : Float
structure Rectangle extends Shape where
width : Float
height : Float
def myCircle : Circle := { name := "unit circle", radius := 1.0 }
def myRect : Rectangle := { name := "square", width := 2.0, height := 2.0 }
#eval myCircle.name -- "unit circle"
#eval myCircle.radius -- 1.0
Nominal vs Structural Typing
Lean uses nominal typing: two types with identical structures are still distinct types. This prevents accidental mixing of values with different semantics. A UserId and a ProductId might both be integers underneath, but you cannot accidentally pass one where the other is expected. The bug where you deleted user 47 because product 47 was out of stock becomes a compile error. Nominal typing is the formal version of “label your variables.”
structure Meters' where
val : Float
structure Seconds where
val : Float
def distance : Meters' := ⟨100.0⟩
def time : Seconds := ⟨10.0⟩
abbrev Speed := Float
def calcSpeed (d : Meters') (t : Seconds) : Speed :=
d.val / t.val
#eval calcSpeed distance time -- 10.0
Classic Results
The machinery is in place. You understand types, proofs, tactics, and the refinements that make specifications precise. Next we put it all together: classic mathematical proofs formalized in Lean. Bezout’s identity, the infinitude of primes, the irrationality of root two. All the greatest hits.
Classic Proofs
This article presents proofs you likely encountered in undergraduate mathematics, now written in Lean. Each example shows the traditional proof and its formalization side by side. The goal is not to teach you these theorems; you already know them. The goal is to build intuition for how mathematical reasoning translates into Lean code. When you see a proof by contradiction in English, what tactic does that become? When a textbook says “by strong induction,” what does Lean require? The side-by-side format lets you map familiar reasoning patterns onto unfamiliar syntax.
Euclid’s proof of the infinitude of primes has survived for over two thousand years. It requires no calculus, no abstract algebra, only the observation that $n! + 1$ shares no prime factors with $n!$. Yet formalizing this argument reveals hidden assumptions: that every number greater than one has a prime divisor, that primes are well-defined, that contradiction is a valid proof technique. The proofs here are not difficult by mathematical standards, but they exercise the full machinery of dependent types, tactics, and theorem proving. If you can formalize theorems that have survived two millennia of scrutiny, how hard can proving your web app correctly validates email addresses really be?
Infinitude of Primes
Traditional Proof
Theorem. There exist infinitely many prime numbers.
The proof proceeds in two parts. First, we establish that every integer $n \geq 2$ has a prime divisor. If $n$ is prime, it divides itself. Otherwise, $n$ has a proper divisor $m$ with $1 < m < n$. By strong induction, $m$ has a prime divisor $p$, and since $p \mid m$ and $m \mid n$, we have $p \mid n$.
Second, we show that for any $n$, there exists a prime $p > n$. Consider $N = n! + 1$. Since $n! \geq 1$, we have $N \geq 2$, so $N$ has a prime divisor $p$. We claim $p > n$. Suppose for contradiction that $p \leq n$. Then $p \mid n!$ since $n!$ contains all factors from 1 to $n$. Since $p \mid N$ and $p \mid n!$, we have $p \mid (N - n!) = 1$. But $p \geq 2$, so $p \nmid 1$, a contradiction. Therefore $p > n$. QED
Lean Formalization
The Lean proof mirrors this structure exactly. The theorem exists_prime_factor establishes the first part by case analysis and strong induction (via termination_by). The main theorem InfinitudeOfPrimes constructs $n! + 1$, extracts a prime divisor, then derives a contradiction using dvd_factorial and Nat.dvd_add_right.
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Nat.Factorial.Basic
import Mathlib.Tactic
namespace ZeroToQED.Proofs
open Nat
theorem exists_prime_factor (n : ℕ) (hn : 2 ≤ n) : ∃ p, Nat.Prime p ∧ p ∣ n := by
by_cases hp : Nat.Prime n
· exact ⟨n, hp, dvd_refl n⟩
· obtain ⟨m, hm_dvd, hm_ne_one, hm_ne_n⟩ := exists_dvd_of_not_prime hn hp
have hm_lt : m < n := lt_of_le_of_ne (Nat.le_of_dvd (by omega) hm_dvd) hm_ne_n
have hm_ge : 2 ≤ m := by
rcases m with _ | _ | m <;> simp_all
obtain ⟨p, hp_prime, hp_dvd⟩ := exists_prime_factor m hm_ge
exact ⟨p, hp_prime, dvd_trans hp_dvd hm_dvd⟩
termination_by n
theorem factorial_pos (n : ℕ) : 0 < n ! := Nat.factorial_pos n
theorem dvd_factorial {k n : ℕ} (hk : 0 < k) (hkn : k ≤ n) : k ∣ n ! :=
Nat.dvd_factorial hk hkn
theorem InfinitudeOfPrimes : ∀ n, ∃ p > n, Nat.Prime p := by
intro n
have hN : 2 ≤ n ! + 1 := by
have hfact : 0 < n ! := factorial_pos n
omega
obtain ⟨p, hp_prime, hp_dvd⟩ := exists_prime_factor (n ! + 1) hN
refine ⟨p, ?_, hp_prime⟩
by_contra hle
push_neg at hle
have hp_pos : 0 < p := hp_prime.pos
have hdvd_fact : p ∣ n ! := dvd_factorial hp_pos hle
have hdvd_one : p ∣ 1 := (Nat.dvd_add_right hdvd_fact).mp hp_dvd
have hp_le_one : p ≤ 1 := Nat.le_of_dvd one_pos hdvd_one
have hp_ge_two : 2 ≤ p := hp_prime.two_le
omega
end ZeroToQED.Proofs
Alternative: Proof by Grind
The same theorem admits a much shorter proof using Lean’s grind tactic. The proof below defines its own IsPrime predicate and factorial to remain self-contained. Notice how each theorem body collapses to one or two lines, with grind handling the case analysis and arithmetic that required explicit by_contra, push_neg, and omega calls in the manual version.
import Mathlib.Tactic
namespace ZeroToQED.Proofs.Grind
/-- A prime is a number larger than 1 with no trivial divisors -/
def IsPrime (n : Nat) := 1 < n ∧ ∀ k, 1 < k → k < n → ¬ k ∣ n
/-- Every number larger than 1 has a prime factor -/
theorem exists_prime_factor :
∀ n, 1 < n → ∃ k, IsPrime k ∧ k ∣ n := by
intro n h1
by_cases hprime : IsPrime n
· grind [Nat.dvd_refl]
· obtain ⟨k, _⟩ : ∃ k, 1 < k ∧ k < n ∧ k ∣ n := by
simp_all [IsPrime]
obtain ⟨p, _, _⟩ := exists_prime_factor k (by grind)
grind [Nat.dvd_trans]
/-- The factorial, defined recursively -/
def factorial : Nat → Nat
| 0 => 1
| n+1 => (n + 1) * factorial n
/-- Factorial postfix notation -/
notation:10000 n "!" => factorial n
/-- The factorial is always positive -/
theorem factorial_pos : ∀ n, 0 < n ! := by
intro n; induction n <;> grind [factorial]
/-- The factorial is divided by its constituent factors -/
theorem dvd_factorial : ∀ n, ∀ k ≤ n, 0 < k → k ∣ n ! := by
intro n; induction n <;>
grind [Nat.dvd_mul_right, Nat.dvd_mul_left_of_dvd, factorial]
/-- There are infinitely many primes: for any n, there exists p > n that is prime -/
theorem InfinitudeOfPrimes : ∀ n, ∃ p > n, IsPrime p := by
intro n
have : 1 < n ! + 1 := by grind [factorial_pos]
obtain ⟨p, hp, _⟩ := exists_prime_factor (n ! + 1) this
suffices ¬p ≤ n by grind
intro (_ : p ≤ n)
have : 1 < p := hp.1
have : p ∣ n ! := dvd_factorial n p ‹p ≤ n› (by grind)
have := Nat.dvd_sub ‹p ∣ n ! + 1› ‹p ∣ n !›
grind [Nat.add_sub_cancel_left, Nat.dvd_one]
end ZeroToQED.Proofs.Grind
Both proofs establish the same theorem. The explicit version teaches you the proof structure; the grind version shows what automation can handle once you understand the underlying mathematics.
Irrationality of $\sqrt{2}$
Traditional Proof
Theorem. $\sqrt{2}$ is irrational.
The key lemma is: if $n^2$ is even, then $n$ is even. We prove the contrapositive. Suppose $n$ is odd, so $n = 2k + 1$ for some integer $k$. Then $n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2 + 2k) + 1$, which is odd. Therefore, if $n^2$ is even, $n$ cannot be odd, so $n$ must be even.
Now suppose $\sqrt{2} = p/q$ where $p, q$ are integers with $q \neq 0$ and $\gcd(p,q) = 1$. Then $2q^2 = p^2$, so $p^2$ is even, hence $p$ is even. Write $p = 2k$. Then $2q^2 = 4k^2$, so $q^2 = 2k^2$, meaning $q^2$ is even, hence $q$ is even. But then $\gcd(p,q) \geq 2$, contradicting our assumption. QED
Lean Formalization
The Lean code proves the parity lemmas explicitly. The theorem sq_odd_of_odd shows that squaring an odd number yields an odd number by expanding $(2k+1)^2$. The theorem even_of_sq_even proves the contrapositive: assuming $n$ is odd leads to $n^2$ being odd, which contradicts $n^2$ being even. The final irrationality result follows from Mathlib’s irrational_sqrt_two, which uses this same parity argument internally.
import Mathlib.NumberTheory.Real.Irrational
import Mathlib.Tactic
namespace ZeroToQED.Proofs
theorem sq_even_of_even {n : ℤ} (h : Even n) : Even (n ^ 2) := by
obtain ⟨k, hk⟩ := h
exact ⟨2 * k ^ 2, by rw [hk]; ring⟩
theorem sq_odd_of_odd {n : ℤ} (h : Odd n) : Odd (n ^ 2) := by
obtain ⟨k, hk⟩ := h
exact ⟨2 * k ^ 2 + 2 * k, by rw [hk]; ring⟩
theorem even_of_sq_even {n : ℤ} (h : Even (n ^ 2)) : Even n := by
by_contra hodd
rw [Int.not_even_iff_odd] at hodd
have hsq_odd : Odd (n ^ 2) := sq_odd_of_odd hodd
obtain ⟨k, hk⟩ := hsq_odd
obtain ⟨m, hm⟩ := h
omega
theorem sqrt2_irrational : Irrational (Real.sqrt 2) := irrational_sqrt_two
end ZeroToQED.Proofs
Euclid’s Lemma
Traditional Proof
Theorem (Euclid’s Lemma). If a prime $p$ divides a product $ab$, then $p \mid a$ or $p \mid b$.
Let $p$ be prime with $p \mid ab$. Since $p$ is prime, the only divisors of $p$ are 1 and $p$. Therefore $\gcd(p, a) \in \{1, p\}$ (greatest common divisor).
Case 1: If $\gcd(p, a) = p$, then $p \mid a$ and we are done.
Case 2: If $\gcd(p, a) = 1$, we show $p \mid b$. Consider $\gcd(pb, ab)$. Since $p \mid pb$ and $p \mid ab$, we have $p \mid \gcd(pb, ab)$. By the property $\gcd(pb, ab) = b \cdot \gcd(p, a) = b \cdot 1 = b$, we conclude $p \mid b$. QED
Lean Formalization
The Lean proof follows this GCD-based argument directly. It case-splits on whether $\gcd(p, a) = 1$ or $\gcd(p, a) > 1$. In the coprime case, it uses Nat.gcd_mul_right to establish that $\gcd(pb, ab) = b$, then shows $p$ divides this GCD. In the non-coprime case, since $p$ is prime, $\gcd(p, a) = p$, so $p \mid a$.
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Nat.GCD.Basic
import Mathlib.Tactic
namespace ZeroToQED.Proofs
open Nat
theorem euclid_lemma {a b p : ℕ} (hp : Nat.Prime p) (h : p ∣ a * b) :
p ∣ a ∨ p ∣ b := by
rcases Nat.eq_or_lt_of_le (Nat.one_le_iff_ne_zero.mpr (Nat.gcd_pos_of_pos_left a hp.pos).ne')
with hcop | hncop
· right
have key : p ∣ Nat.gcd (p * b) (a * b) := Nat.dvd_gcd (dvd_mul_right p b) h
rwa [Nat.gcd_mul_right, hcop.symm, one_mul] at key
· left
have hdvd := Nat.gcd_dvd_left p a
rcases hp.eq_one_or_self_of_dvd _ hdvd with h1 | hp_eq
· omega
· have : p ∣ a := by rw [← hp_eq]; exact Nat.gcd_dvd_right p a
exact this
theorem prime_divides_product_iff {p a b : ℕ} (hp : Nat.Prime p) :
p ∣ a * b ↔ p ∣ a ∨ p ∣ b :=
⟨euclid_lemma hp, fun h => h.elim (dvd_mul_of_dvd_left · b) (dvd_mul_of_dvd_right · a)⟩
end ZeroToQED.Proofs
Binomial Theorem
Traditional Proof
Theorem (Binomial Theorem). For real numbers $x, y$ and natural number $n$: \[(x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}\]
The proof proceeds by induction. The base case $n = 0$ gives $(x+y)^0 = 1 = \binom{0}{0}x^0y^0$. For the inductive step, we expand $(x+y)^{n+1} = (x+y)(x+y)^n$, distribute, and apply Pascal’s identity $\binom{n}{k} + \binom{n}{k-1} = \binom{n+1}{k}$ to combine terms.
As concrete examples: $(x+1)^2 = x^2 + 2x + 1$ and $(x+1)^3 = x^3 + 3x^2 + 3x + 1$. QED
Lean Formalization
Mathlib provides add_pow, which establishes the binomial theorem via the same inductive argument. Our binomial_theorem reformulates this in the standard notation. The specific cases binomial_two and binomial_three are verified by the ring tactic, which normalizes polynomial expressions.
import Mathlib.Tactic
namespace ZeroToQED.Proofs
theorem binomial_theorem (x y : ℝ) (n : ℕ) :
(x + y) ^ n = (Finset.range (n + 1)).sum fun k => ↑(n.choose k) * x ^ k * y ^ (n - k) := by
rw [add_pow]
apply Finset.sum_congr rfl
intros k _
ring
theorem binomial_two (x : ℝ) : (x + 1) ^ 2 = x ^ 2 + 2 * x + 1 := by ring
theorem binomial_three (x : ℝ) : (x + 1) ^ 3 = x ^ 3 + 3 * x ^ 2 + 3 * x + 1 := by ring
example : (2 : ℝ) ^ 5 = 32 := by norm_num
end ZeroToQED.Proofs
Fibonacci Numbers
Traditional Proof
Definition. The Fibonacci sequence: $F_0 = 0$, $F_1 = 1$, $F_{n+2} = F_{n+1} + F_n$. The sequence that appears everywhere: rabbit populations, sunflower spirals, financial markets, bad interview questions.
Theorem. $\sum_{k=0}^{n-1} F_k + 1 = F_{n+1}$
Base case ($n = 0$): The empty sum equals 0, and $0 + 1 = 1 = F_1$.
Inductive step: Assume $\sum_{k=0}^{n-1} F_k + 1 = F_{n+1}$. Then: \[\sum_{k=0}^{n} F_k + 1 = \left(\sum_{k=0}^{n-1} F_k + 1\right) + F_n = F_{n+1} + F_n = F_{n+2}\] which equals $F_{(n+1)+1}$. QED
Lean Formalization
The Lean proof follows the same structure. The definition fib uses pattern matching on 0, 1, and $n+2$. The theorem fib_sum proceeds by induction: the base case simplifies directly, and the inductive step uses Finset.sum_range_succ to split off the last term, applies the inductive hypothesis, then uses the recurrence relation.
import Mathlib.Tactic
namespace ZeroToQED.Proofs
def fib : ℕ → ℕ
| 0 => 0
| 1 => 1
| n + 2 => fib (n + 1) + fib n
theorem fib_add_two (n : ℕ) : fib (n + 2) = fib (n + 1) + fib n := rfl
theorem fib_pos {n : ℕ} (h : 0 < n) : 0 < fib n := by
cases n with
| zero => contradiction
| succ n =>
cases n with
| zero => decide
| succ m =>
rw [fib_add_two]
exact Nat.add_pos_left (fib_pos (Nat.zero_lt_succ _)) _
theorem fib_sum (n : ℕ) : (Finset.range n).sum fib + 1 = fib (n + 1) := by
induction n with
| zero => simp [fib]
| succ n ih =>
rw [Finset.sum_range_succ, add_assoc, add_comm (fib n) 1, ← add_assoc, ih]
rfl
end ZeroToQED.Proofs
Pigeonhole Principle
Traditional Proof
Theorem (Pigeonhole Principle). Let $f : A \to B$ be a function between finite sets with $|A| > |B|$. Then $f$ is not injective: there exist distinct $a_1, a_2 \in A$ with $f(a_1) = f(a_2)$.
Suppose for contradiction that $f$ is injective, meaning $f(a_1) = f(a_2)$ implies $a_1 = a_2$. An injective function from $A$ to $B$ implies $|A| \leq |B|$, since distinct elements of $A$ map to distinct elements of $B$. But we assumed $|A| > |B|$, a contradiction. Therefore $f$ is not injective, so there exist distinct $a_1 \neq a_2$ with $f(a_1) = f(a_2)$. QED
Corollary. In any group of $n > 365$ people, at least two share a birthday.
Lean Formalization
The Lean proof mirrors this argument precisely. It assumes by contradiction (by_contra hinj) that no two distinct elements collide. The push_neg tactic transforms this into: for all $a_1, a_2$, if $a_1 \neq a_2$ then $f(a_1) \neq f(a_2)$. This is exactly injectivity. We then apply Fintype.card_le_of_injective, which states that an injective function implies $|A| \leq |B|$, contradicting our hypothesis $|B| < |A|$.
import Mathlib.Data.Fintype.Card
import Mathlib.Tactic
namespace ZeroToQED.Proofs
theorem pigeonhole {α β : Type*} [Fintype α] [Fintype β]
(f : α → β) (h : Fintype.card β < Fintype.card α) :
∃ a₁ a₂ : α, a₁ ≠ a₂ ∧ f a₁ = f a₂ := by
by_contra hinj
push_neg at hinj
have inj : Function.Injective f := fun a₁ a₂ heq =>
Classical.byContradiction fun hne => hinj a₁ a₂ hne heq
exact Nat.not_lt.mpr (Fintype.card_le_of_injective f inj) h
theorem birthday_pigeonhole {n : ℕ} (hn : 365 < n) (birthday : Fin n → Fin 365) :
∃ i j : Fin n, i ≠ j ∧ birthday i = birthday j := by
have hcard : Fintype.card (Fin 365) < Fintype.card (Fin n) := by simp [hn]
exact pigeonhole birthday hcard
end ZeroToQED.Proofs
Divisibility
Traditional Proof
Definition. We write $a \mid b$ (divisibility) if there exists $k$ such that $b = ak$.
Theorem. Divisibility satisfies:
- $a \mid a$ (reflexivity)
- $a \mid b \land b \mid c \Rightarrow a \mid c$ (transitivity)
- $a \mid b \land a \mid c \Rightarrow a \mid (b + c)$
- $a \mid b \Rightarrow a \mid bc$
Proof. (1) $a = a \cdot 1$, so take $k = 1$.
(2) If $b = ak$ and $c = bm$, then $c = (ak)m = a(km)$.
(3) If $b = ak$ and $c = am$, then $b + c = ak + am = a(k + m)$.
(4) If $b = ak$, then $bc = (ak)c = a(kc)$. QED
Lean Formalization
Each Lean proof constructs the witness $k$ explicitly. The obtain tactic extracts the witnesses from divisibility hypotheses, then we provide the new witness as an anonymous constructor ⟨_, _⟩. The equality proofs use rw to substitute and mul_assoc or mul_add to rearrange.
import Mathlib.Tactic
namespace ZeroToQED.Proofs
example : 3 ∣ 12 := ⟨4, rfl⟩
example : ¬5 ∣ 12 := by decide
theorem dvd_refl' (n : ℕ) : n ∣ n := ⟨1, (mul_one n).symm⟩
theorem dvd_trans' {a b c : ℕ} (hab : a ∣ b) (hbc : b ∣ c) : a ∣ c := by
obtain ⟨k, hk⟩ := hab
obtain ⟨m, hm⟩ := hbc
exact ⟨k * m, by rw [hm, hk, mul_assoc]⟩
theorem dvd_add' {a b c : ℕ} (hab : a ∣ b) (hac : a ∣ c) : a ∣ b + c := by
obtain ⟨k, hk⟩ := hab
obtain ⟨m, hm⟩ := hac
exact ⟨k + m, by rw [hk, hm, mul_add]⟩
theorem dvd_mul_right' (a b : ℕ) : a ∣ a * b := ⟨b, rfl⟩
theorem dvd_mul_left' (a b : ℕ) : b ∣ a * b := ⟨a, (mul_comm b a).symm⟩
end ZeroToQED.Proofs
Generalized Riemann Hypothesis
The proofs above are solved problems. But what about the unsolved ones? The Generalized Riemann Hypothesis asserts that all non-trivial zeros of Dirichlet L-functions have real part $\frac{1}{2}$. It has resisted proof since 1859. The statement is precise enough to formalize:
/-- The **Generalized Riemann Hypothesis** asserts that all the non-trivial zeros of the
Dirichlet L-function L(χ, s) of a primitive Dirichlet character χ have real part 1/2. -/
theorem generalized_riemann_hypothesis (q : ℕ) [NeZero q] (χ : DirichletCharacter ℂ q)
(hχ : χ.IsPrimitive) (s : ℂ) (hs : χ.LFunction s = 0)
(hs_nontrivial : s ∉ Int.cast '' trivialZeros χ) :
s.re = 1 / 2 :=
sorry
That sorry is worth a million dollars from the Clay Mathematics Institute, a Fields Medal, arguably a Nobel Prize in Physics (for its implications in quantum chaos), a Turing Award if you use a computer to help, and mass adoration from strangers on the internet. The reward structure for closing that sorry is, by any reasonable measure, excessive.
Google DeepMind maintains a repository of open mathematical conjectures formalized in Lean, including the Generalized Riemann Hypothesis. The existence of this repository says something profound: the frontier of human mathematical knowledge can now be expressed as a list of sorry statements waiting to be filled. When someone eventually proves or disproves these conjectures, the proof will compile.
Algebraic Structures
Mathematics organizes operations into hierarchies. A group is more than a set with an operation: it is a semigroup with identity and inverses, and a semigroup is a set with an associative operation. These hierarchies matter because theorems proved at one level apply to all structures below it. Prove something about groups, and it holds for integers, permutations, and matrix transformations alike.
Lean captures these hierarchies through type classes. Each algebraic structure becomes a type class, and instances register specific types as members. The type class system then automatically provides the right theorems and operations wherever they apply. The notation is convenient, but the real value is the machinery underneath: generic mathematical code that works across any conforming type.
Semigroups
A semigroup is the simplest algebraic structure: a type with a binary operation that is associative. Nothing more. No identity element, no inverses, just the guarantee that $(a \cdot b) \cdot c = a \cdot (b \cdot c)$.
-- A semigroup has an associative binary operation
class Semigroup (α : Type) where
op : α → α → α
op_assoc : ∀ a b c : α, op (op a b) c = op a (op b c)
-- Notation for our operation
infixl:70 " ⋆ " => Semigroup.op
The op_assoc field is a proof that clients can use directly. Any theorem about semigroups can invoke this associativity without asking whether a particular type satisfies it. The type class instance guarantees it.
Monoids
A monoid extends a semigroup with an identity element. The identity must satisfy two laws: left identity ($e \cdot a = a$) and right identity ($a \cdot e = a$). Natural numbers under addition form a monoid with identity 0. Natural numbers under multiplication form a different monoid with identity 1. Same type, different structures.
-- A monoid adds an identity element to a semigroup
class Monoid (α : Type) extends Semigroup α where
e : α
e_op : ∀ a : α, op e a = a
op_e : ∀ a : α, op a e = a
The extends Semigroup α clause means every monoid is automatically a semigroup. Lean’s type class inheritance handles this: any function expecting a semigroup accepts a monoid. The hierarchy is operational, not merely conceptual.
Groups
A group adds inverses to a monoid. Every element $a$ has an inverse $a^{-1}$ such that $a^{-1} \cdot a = e$ and $a \cdot a^{-1} = e$. Integers under addition form a group. Positive rationals under multiplication form a group. Permutations under composition form a group. The examples proliferate across mathematics.
-- A group adds inverses to a monoid
class Group (α : Type) extends Monoid α where
inv : α → α
inv_op : ∀ a : α, op (inv a) a = e
op_inv : ∀ a : α, op a (inv a) = e
-- Notation for inverse
postfix:max "⁻¹" => Group.inv
With this definition, we can prove fundamental group theorems that apply to any group. These are not approximations or heuristics; they are mathematical facts verified by the type checker.
Group Theorems
From just the group axioms, many properties follow. Cancellation laws let us simplify equations. The identity is unique, as are inverses. The inverse of a product reverses the order. These theorems are mechanical consequences of the axioms, and Lean verifies each step.
-- Now we prove fundamental group theorems from the axioms
variable {G : Type} [Group G]
-- Left cancellation: if a ⋆ b = a ⋆ c then b = c
theorem op_left_cancel (a b c : G) (h : a ⋆ b = a ⋆ c) : b = c := by
have : a⁻¹ ⋆ (a ⋆ b) = a⁻¹ ⋆ (a ⋆ c) := by rw [h]
simp only [← Semigroup.op_assoc, Group.inv_op, Monoid.e_op] at this
exact this
-- Right cancellation: if b ⋆ a = c ⋆ a then b = c
theorem op_right_cancel (a b c : G) (h : b ⋆ a = c ⋆ a) : b = c := by
have : (b ⋆ a) ⋆ a⁻¹ = (c ⋆ a) ⋆ a⁻¹ := by rw [h]
simp only [Semigroup.op_assoc, Group.op_inv, Monoid.op_e] at this
exact this
-- The identity is unique
theorem e_unique (e' : G) (h : ∀ a : G, e' ⋆ a = a) : e' = Monoid.e := by
have : e' ⋆ Monoid.e = Monoid.e := h Monoid.e
rw [Monoid.op_e] at this
exact this
-- Inverses are unique
theorem inv_unique (a b : G) (h : b ⋆ a = Monoid.e) : b = a⁻¹ := by
have step1 : b ⋆ a ⋆ a⁻¹ = Monoid.e ⋆ a⁻¹ := by rw [h]
simp only [Semigroup.op_assoc, Group.op_inv, Monoid.op_e, Monoid.e_op] at step1
exact step1
-- Double inverse: (a⁻¹)⁻¹ = a
theorem inv_inv (a : G) : (a⁻¹)⁻¹ = a := by
symm
apply inv_unique
exact Group.op_inv a
-- Inverse of product: (a ⋆ b)⁻¹ = b⁻¹ ⋆ a⁻¹
theorem op_inv_rev (a b : G) : (a ⋆ b)⁻¹ = b⁻¹ ⋆ a⁻¹ := by
symm
apply inv_unique
calc b⁻¹ ⋆ a⁻¹ ⋆ (a ⋆ b)
= b⁻¹ ⋆ (a⁻¹ ⋆ (a ⋆ b)) := by rw [Semigroup.op_assoc]
_ = b⁻¹ ⋆ (a⁻¹ ⋆ a ⋆ b) := by rw [← Semigroup.op_assoc a⁻¹ a b]
_ = b⁻¹ ⋆ (Monoid.e ⋆ b) := by rw [Group.inv_op]
_ = b⁻¹ ⋆ b := by rw [Monoid.e_op]
_ = Monoid.e := Group.inv_op b
The theorem op_inv_rev shows that $(a \cdot b)^{-1} = b^{-1} \cdot a^{-1}$. The order reverses because we need to undo the operations in reverse sequence. The proof uses our inv_unique theorem: to show two things are equal to an inverse, show they act as that inverse.
Integers Mod 2
Theory without examples is suspect. Let us build a concrete group: integers modulo 2. This group has exactly two elements (zero and one) with addition wrapping around: $1 + 1 = 0$.
-- Example: Integers mod 2 form a group under addition
inductive Z2 : Type where
| zero : Z2
| one : Z2
deriving DecidableEq, Repr
def Z2.add : Z2 → Z2 → Z2
| .zero, a => a
| .one, .zero => .one
| .one, .one => .zero
def Z2.neg : Z2 → Z2
| a => a -- In Z2, every element is its own inverse
Every element is its own inverse ($0 + 0 = 0$ and $1 + 1 = 0$), which simplifies the structure. Now we register this as a group instance:
instance : Group Z2 where
op := Z2.add
op_assoc := by
intro a b c
cases a <;> cases b <;> cases c <;> rfl
e := Z2.zero
e_op := by
intro a
cases a <;> rfl
op_e := by
intro a
cases a <;> rfl
inv := Z2.neg
inv_op := by
intro a
cases a <;> rfl
op_inv := by
intro a
cases a <;> rfl
-- Test computation
#eval (Z2.one ⋆ Z2.one : Z2) -- zero
#eval (Z2.one ⋆ Z2.zero : Z2) -- one
Each proof obligation is discharged by case analysis. With only two elements, Lean can verify each law by exhaustively checking all combinations.
-- Verify our theorems work on the concrete example
example : (Z2.one)⁻¹ = Z2.one := rfl
example : Z2.one ⋆ Z2.one⁻¹ = Z2.zero := rfl
theorem z2_self_inverse (a : Z2) : a ⋆ a = Monoid.e := by
cases a <;> rfl
-- Z2 is commutative
theorem z2_comm (a b : Z2) : a ⋆ b = b ⋆ a := by
cases a <;> cases b <;> rfl
Because Z2 is now a Group, all our general theorems apply. The op_left_cancel and inv_unique theorems work on Z2 without modification. Generic mathematics, specific verification.
Commutative Groups
Some groups satisfy an additional property: commutativity. In a commutative (or Abelian) group, $a \cdot b = b \cdot a$ for all elements. Integer addition is commutative; matrix multiplication is not.
-- Commutative (Abelian) groups
class CommGroup (α : Type) extends Group α where
op_comm : ∀ a b : α, Semigroup.op a b = Semigroup.op b a
instance : CommGroup Z2 where
op_comm := z2_comm
Vector Spaces
Groups appear everywhere, including in linear algebra. A vector space is an Abelian group (vectors under addition) equipped with scalar multiplication satisfying certain compatibility laws. Let us build a simple 2D vector space over the integers.
-- A simple 2D vector space over integers
structure Vec2 where
x : Int
y : Int
deriving DecidableEq, Repr
def Vec2.add (v w : Vec2) : Vec2 :=
⟨v.x + w.x, v.y + w.y⟩
def Vec2.neg (v : Vec2) : Vec2 :=
⟨-v.x, -v.y⟩
def Vec2.zero : Vec2 := ⟨0, 0⟩
def Vec2.smul (c : Int) (v : Vec2) : Vec2 :=
⟨c * v.x, c * v.y⟩
infixl:65 " +ᵥ " => Vec2.add
prefix:100 "-ᵥ" => Vec2.neg
infixl:70 " •ᵥ " => Vec2.smul
The vectors form a group under addition. Each vector $(x, y)$ has inverse $(-x, -y)$, and the zero vector is the identity.
-- Vec2 forms a group under addition
theorem vec2_inv_op (a : Vec2) : Vec2.add (Vec2.neg a) a = Vec2.zero := by
simp only [Vec2.add, Vec2.neg, Vec2.zero, Int.add_comm, Int.add_right_neg]
theorem vec2_op_inv (a : Vec2) : Vec2.add a (Vec2.neg a) = Vec2.zero := by
simp only [Vec2.add, Vec2.neg, Vec2.zero, Int.add_right_neg]
instance : Group Vec2 where
op := Vec2.add
op_assoc := by
intro a b c
simp only [Vec2.add, Int.add_assoc]
e := Vec2.zero
e_op := by
intro a
simp only [Vec2.add, Vec2.zero, Int.zero_add]
op_e := by
intro a
simp only [Vec2.add, Vec2.zero, Int.add_zero]
inv := Vec2.neg
inv_op := vec2_inv_op
op_inv := vec2_op_inv
-- Vec2 is commutative
theorem vec2_comm (a b : Vec2) : a ⋆ b = b ⋆ a := by
show Vec2.add a b = Vec2.add b a
simp only [Vec2.add, Int.add_comm]
instance : CommGroup Vec2 where
op_comm := vec2_comm
Scalar multiplication satisfies the expected laws. These are the axioms that make scalar multiplication “compatible” with the vector space structure.
-- Scalar multiplication properties
theorem smul_zero (c : Int) : c •ᵥ Vec2.zero = Vec2.zero := by
simp only [Vec2.smul, Vec2.zero, Int.mul_zero]
theorem zero_smul (v : Vec2) : (0 : Int) •ᵥ v = Vec2.zero := by
simp only [Vec2.smul, Vec2.zero, Int.zero_mul]
theorem one_smul (v : Vec2) : (1 : Int) •ᵥ v = v := by
simp only [Vec2.smul, Int.one_mul]
theorem smul_add (c : Int) (v w : Vec2) :
c •ᵥ (v +ᵥ w) = (c •ᵥ v) +ᵥ (c •ᵥ w) := by
simp only [Vec2.smul, Vec2.add, Int.mul_add]
theorem add_smul (c d : Int) (v : Vec2) :
(c + d) •ᵥ v = (c •ᵥ v) +ᵥ (d •ᵥ v) := by
simp only [Vec2.smul, Vec2.add, Int.add_mul]
theorem smul_smul (c d : Int) (v : Vec2) :
c •ᵥ (d •ᵥ v) = (c * d) •ᵥ v := by
simp only [Vec2.smul, Int.mul_assoc]
Concrete computations confirm the definitions work:
-- Concrete computations
def v1 : Vec2 := ⟨1, 2⟩
def v2 : Vec2 := ⟨3, 4⟩
#eval v1 +ᵥ v2 -- { x := 4, y := 6 }
#eval 2 •ᵥ v1 -- { x := 2, y := 4 }
#eval v1 +ᵥ (-ᵥv1) -- { x := 0, y := 0 }
-- The group inverse works correctly
example : v1 +ᵥ (-ᵥv1) = Vec2.zero := by rfl
Rings
A ring has two operations: addition forming an Abelian group, and multiplication forming a monoid. Distributivity connects them: $a \cdot (b + c) = a \cdot b + a \cdot c$. Integers, polynomials, and matrices all form rings.
-- A ring combines two structures: an abelian group under addition
-- and a monoid under multiplication, linked by distributivity
class Ring (α : Type) where
-- Additive abelian group
add : α → α → α
zero : α
neg : α → α
add_assoc : ∀ a b c, add (add a b) c = add a (add b c)
zero_add : ∀ a, add zero a = a
add_zero : ∀ a, add a zero = a
neg_add : ∀ a, add (neg a) a = zero
add_comm : ∀ a b, add a b = add b a
-- Multiplicative monoid
mul : α → α → α
one : α
mul_assoc : ∀ a b c, mul (mul a b) c = mul a (mul b c)
one_mul : ∀ a, mul one a = a
mul_one : ∀ a, mul a one = a
-- Distributivity connects the two structures
left_distrib : ∀ a b c, mul a (add b c) = add (mul a b) (mul a c)
right_distrib : ∀ a b c, mul (add a b) c = add (mul a c) (mul b c)
The integers satisfy all ring axioms:
-- The integers form a ring
instance intRing : Ring Int where
add := (· + ·)
zero := 0
neg := (- ·)
mul := (· * ·)
one := 1
add_assoc := Int.add_assoc
zero_add := Int.zero_add
add_zero := Int.add_zero
neg_add := Int.add_left_neg
add_comm := Int.add_comm
mul_assoc := Int.mul_assoc
one_mul := Int.one_mul
mul_one := Int.mul_one
left_distrib := Int.mul_add
right_distrib := Int.add_mul
From these axioms, one can prove that $0 \cdot a = 0$ for any ring element $a$. This follows from distributivity: $0 \cdot a + 0 \cdot a = (0 + 0) \cdot a = 0 \cdot a$, and cancellation gives $0 \cdot a = 0$. See ring_zero_mul and ring_mul_zero in AlgebraicStructures.lean for the full proofs.
The Hierarchy
The structures we have defined form a hierarchy. At the base sits Semigroup, requiring only an associative operation. Monoid extends Semigroup by adding an identity element. Group extends Monoid by adding inverses. From Group, two paths diverge: CommGroup adds commutativity, while Ring combines an abelian group (for addition) with a monoid (for multiplication) linked by distributivity.
Each extension relationship means theorems flow downward. Prove something about semigroups, and it applies to monoids, groups, and rings. Lean’s type class inheritance makes this operational: any function expecting a Semigroup instance automatically accepts a Monoid, Group, or Ring.
Mathlib takes this much further. The full algebraic hierarchy includes semirings, division rings, fields, modules, algebras, and dozens of ordered variants. Each structure captures a precise set of assumptions, and theorems are proved at exactly the level of generality where they hold.
First Principles to Mathlib
We built these structures from scratch to understand how they work. In practice, you would use Mathlib’s definitions, which are battle-tested and integrated with thousands of theorems. Our Group is Mathlib’s Group. Our Ring is Mathlib’s Ring. The concepts are identical; the implementations are industrial-strength.
The value of building from first principles is understanding. When Mathlib’s ring tactic solves a polynomial identity, it is applying theorems like our ring_zero_mul millions of times per second. When type class inference finds a CommGroup instance, it is navigating a hierarchy like the one we drew. The abstraction is real, and so is the machinery underneath.
Constraints Beget Structure
One of the beautiful facts in group theory is that strong constraints force unexpected structure. Consider a group where every element is its own inverse: $g^2 = e$ for all $g$. Such groups must be abelian. The proof is a gem of algebraic reasoning.
-- If every element is its own inverse, the group must be abelian.
-- The proof: ab = (ab)⁻¹ = b⁻¹a⁻¹ = ba. Constraints beget structure.
theorem involutive_imp_comm {G : Type} [Group G]
(h : ∀ g : G, g ⋆ g = Monoid.e) : ∀ a b : G, a ⋆ b = b ⋆ a := by
intro a b
-- Key insight: if g² = e then g = g⁻¹
have inv_self : ∀ g : G, g = g⁻¹ := fun g => by
have := h g
calc g = g ⋆ Monoid.e := (Monoid.op_e _).symm
_ = g ⋆ (g ⋆ g⁻¹) := by rw [Group.op_inv]
_ = (g ⋆ g) ⋆ g⁻¹ := (Semigroup.op_assoc _ _ _).symm
_ = Monoid.e ⋆ g⁻¹ := by rw [this]
_ = g⁻¹ := Monoid.e_op _
-- Now: ab = (ab)⁻¹ = b⁻¹a⁻¹ = ba
calc a ⋆ b
= (a ⋆ b)⁻¹ := inv_self (a ⋆ b)
_ = b⁻¹ ⋆ a⁻¹ := op_inv_rev a b
_ = b ⋆ a := by rw [← inv_self a, ← inv_self b]
The key insight is that $g^2 = e$ implies $g = g^{-1}$. From there, $ab = (ab)^{-1} = b^{-1}a^{-1} = ba$. The constraint on squares forces commutativity. Our Z2 group is an example: every element squares to zero, and the group is indeed abelian.
Squaring Distributes
A related result: if squaring distributes over the group operation, meaning $(ab)^2 = a^2 b^2$ for all elements, then the group must be abelian. This is left as an exercise.
-- Exercise: If squaring distributes, the group is abelian.
-- Hint: expand (ab)² = a²b² and cancel to get ab = ba.
theorem square_distrib_imp_comm {G : Type} [Group G]
(h : ∀ a b : G, (a ⋆ b) ⋆ (a ⋆ b) = (a ⋆ a) ⋆ (b ⋆ b)) :
∀ a b : G, a ⋆ b = b ⋆ a := by
sorry
The hint is to expand both sides. On the left, $(ab)^2 = abab$. On the right, $a^2 b^2 = aabb$. The equality $abab = aabb$ lets you cancel $a$ on the left and $b$ on the right, yielding $ba = ab$. The machinery we built here forms the foundation for the full algebraic hierarchy, including Galois theory in Mathlib.
Mathlib
Mathlib is the mathematical library for Lean 4. Over a million lines of formalized mathematics, from basic logic through graduate-level algebra, analysis, and number theory. Hundreds of contributors have poured years of work into this thing. When you import it, you inherit their labor. The triangle inequality is already proven. So is the fundamental theorem of algebra. You do not need to prove that primes are infinite; someone did that in 2017 and you can just use it. The community tracks progress against a list of 100 major theorems; most are done.
The library is organized hierarchically. At the foundation sit logic, sets, and basic data types. Above these rise algebraic structures, then topology and analysis, then specialized domains like combinatorics and number theory. Each layer builds on those below. Finding what you need in a million-line codebase used to be challenging, but the community has built excellent semantic search tools powered by AI, and the Mathlib documentation provides searchable API references for every declaration.
Core Foundations
These modules provide the logical and set-theoretic foundations that everything else depends on. The logic modules formalize propositional and predicate calculus, including both constructive reasoning and classical axioms like the law of excluded middle. Set theory in Mathlib is built on top of type theory rather than replacing it: Set α is defined as α → Prop, making sets predicates on types. This foundation supports finite sets with decidable membership, order theory including lattices and Galois connections, and the infrastructure for defining mathematical structures throughout the library.
| Module | Description |
|---|---|
Mathlib.Logic.Basic | Core logical connectives, And, Or, Not, Iff, basic lemmas |
Init.Classical | Classical axioms: Classical.em, Classical.choose, Classical.byContradiction |
Mathlib.Data.Set.Basic | Set operations: union, intersection, complement, membership |
Mathlib.Data.Finset.Basic | Finite sets with decidable membership |
Mathlib.Order.Basic | Partial orders, lattices, suprema and infima |
Algebraic Hierarchy
Mathlib builds algebra through a hierarchy of type classes. Each structure adds operations and axioms to those below it. The hierarchy begins with semigroups and monoids, progresses through groups and rings, and culminates in fields and modules. This design means that a theorem about groups automatically applies to rings, fields, and any other structure that extends groups. The library includes both additive and multiplicative variants of each structure, connected by the @[to_additive] attribute that automatically generates parallel theories. Key accomplishments include complete formalizations of Galois theory, the structure theorem for finitely generated modules over PIDs, and the Nullstellensatz.
| Module | Description |
|---|---|
Mathlib.Algebra.Group.Basic | Monoids, groups, abelian groups |
Mathlib.Algebra.Ring.Basic | Semirings, rings, commutative rings |
Mathlib.Algebra.Field.Basic | Division rings, fields |
Mathlib.Algebra.Module.Basic | Modules over rings, vector spaces |
Mathlib.Algebra.Module.LinearMap.Defs | Linear maps, submodules, quotients |
Mathlib.RingTheory.Ideal.Basic | Ideals, quotient rings |
Number Systems
The standard number types and their properties, constructed with mathematical rigor. Natural numbers come from Lean’s core, but Mathlib adds comprehensive libraries for divisibility, primality, and arithmetic functions. Integers are built as a quotient of pairs of naturals. Rationals are fractions in lowest terms. Real numbers are equivalence classes of Cauchy sequences of rationals. Complex numbers are pairs of reals. Each construction comes with the expected algebraic structure and interoperability lemmas. The library also provides modular arithmetic through ZMod n, which is a field when n is prime.
| Module | Description |
|---|---|
Mathlib.Data.Nat.Prime.Defs | Prime numbers, factorization |
Mathlib.Data.Int.Basic | Integers |
Mathlib.Data.Rat.Defs | Rational numbers |
Mathlib.Data.Real.Basic | Real numbers (Cauchy completion) |
Mathlib.Data.Complex.Basic | Complex numbers |
Mathlib.Data.ZMod.Defs | Integers modulo n |
Analysis and Topology
Continuous mathematics built on topological foundations. The topology library provides general topological spaces, filters, and nets as the foundation for limits and continuity. Metric spaces add distance functions with the expected triangle inequality and completeness properties. Analysis proper includes differentiation in arbitrary normed spaces, the Fréchet derivative for multivariable calculus, and integration via measure theory. Major formalizations include the Fundamental Theorem of Calculus, the Hahn-Banach theorem, the spectral theorem for compact self-adjoint operators, and the Central Limit Theorem. The library handles both real and complex analysis through a unified framework.
| Module | Description |
|---|---|
Mathlib.Topology.Basic | Topological spaces, open sets, continuity |
Mathlib.Topology.MetricSpace.Basic | Metric spaces, distances |
Mathlib.Analysis.Normed.Field.Basic | Normed fields |
Mathlib.Analysis.Calculus.Deriv.Basic | Derivatives |
Mathlib.MeasureTheory.Measure.MeasureSpace | Measure spaces, integration |
Category Theory and Combinatorics
Abstract structures and discrete mathematics form two largely independent branches of Mathlib. The category theory library provides a comprehensive framework for categorical reasoning: categories, functors, natural transformations, adjunctions, limits, colimits, and monads. This infrastructure supports both abstract mathematics and the categorical semantics of type theory. The combinatorics library covers graph theory with simple graphs and multigraphs, the pigeonhole principle, inclusion-exclusion, and generating functions. Notable formalizations include Szemerédi’s regularity lemma, the cap set problem bound, and significant progress toward the Polynomial Freiman-Ruzsa conjecture.
| Module | Description |
|---|---|
Mathlib.CategoryTheory.Category.Basic | Categories, functors, natural transformations |
Mathlib.CategoryTheory.Limits.IsLimit | Limits and colimits |
Mathlib.CategoryTheory.Monad.Basic | Monads in category theory |
Mathlib.Combinatorics.SimpleGraph.Basic | Graph theory |
Mathlib.Combinatorics.Pigeonhole | Pigeonhole principle |
Finding What You Need
Mathlib is large. You will spend more time searching for lemmas than proving theorems, at least at first. Accept this. The good news is that the lemma you need almost certainly exists. The bad news is that it might be named something you would never guess. The community has built an ecosystem of search tools, each with different strengths.
Moogle: Semantic search for Mathlib. Type “triangle inequality for norms” and find norm_add_le. Sometimes it even understands what you meant rather than what you typed. Moogle uses embeddings trained on mathematical text, so it handles synonyms and related concepts well. Start here when you know what you want but not what it is called.
Loogle: Type signature search. If you need a lemma involving List.map and List.length, search for List.map, List.length and find List.length_map. Loogle lets you search by the shape of the types involved, using wildcards and constraints. Precise queries get precise answers. This is the tool when you know the types but not the name.
LeanSearch: Another semantic search engine over Mathlib, focused on finding relevant theorems from natural language descriptions. It provides a different ranking algorithm than Moogle, so when one fails, try the other. Sometimes the theorem you need appears on page two of one engine and page one of another.
LeanExplore: Semantic search that indexes not just theorems but also metaprogramming, tactics, and attributes. It exposes an MCP (Model Context Protocol) interface, making it accessible to AI coding assistants. Useful when you need to find not just a lemma but how to use a particular tactic or attribute.
Lean Finder: Understands proof states and theorem statements at a deeper level than keyword matching. You can paste your current goal and it will suggest relevant lemmas. Particularly useful when you are mid-proof and need something that applies to your specific situation.
Mathlib4 Docs: The auto-generated API reference for every declaration in Mathlib. Not a search engine per se, but once you know the module, this is where you browse the available lemmas. Each declaration links to its source code and shows its type signature, docstring, and related definitions.
In-editor tactics: The exact? tactic searches for lemmas that exactly match your goal. The apply? tactic finds lemmas whose conclusion unifies with your goal. Slow but thorough since they search locally compiled dependencies. Use them when web-based search fails or when you need something from a non-Mathlib dependency.
Module structure: If you need facts about prime numbers, look in Mathlib.Data.Nat.Prime. If you need topology lemmas, start in Mathlib.Topology. The Mathematics in Mathlib overview provides a map of what has been formalized and where. When all else fails, grep the source code like everyone else does.
Importing Mathlib
Most projects import Mathlib wholesale:
import Mathlib
This works, but your compile times will make you reconsider your life choices. For faster iteration during development, import only what you need:
-- Import specific modules for faster compilation
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Real.Basic
import Mathlib.Algebra.Group.Basic
import Mathlib.Tactic
The Mathlib documentation lists all available modules. When your proof needs a specific lemma, check which module provides it and add that import. Or just import everything and go make coffee while it compiles.
Working with Primes
Number theory in Mathlib is surprisingly pleasant. The basics are all there, and the proofs often look like what you would write on paper if paper could check your work:
-- Working with prime numbers from Mathlib
example : Nat.Prime 17 := by decide
example : ¬ Nat.Prime 15 := by decide
-- Every number > 1 has a prime factor
example (n : Nat) (h : n > 1) : ∃ p, Nat.Prime p ∧ p ∣ n := by
have hn : n ≠ 1 := by omega
exact Nat.exists_prime_and_dvd hn
-- Infinitely many primes: for any n, there's a prime ≥ n
example (n : Nat) : ∃ p, Nat.Prime p ∧ p ≥ n := by
obtain ⟨p, hn, hp⟩ := Nat.exists_infinite_primes n
exact ⟨p, hp, hn⟩
Algebraic Structures
Type classes do the heavy lifting here. Declare that your type is a group, and you get inverses, identity laws, and associativity for free. Declare it is a ring, and multiplication distributes over addition without you lifting a finger:
-- Using algebraic structures
-- Groups: every element has an inverse
example {G : Type*} [Group G] (a : G) : a * a⁻¹ = 1 :=
mul_inv_cancel a
-- Rings: distributivity comes for free
example {R : Type*} [Ring R] (a b c : R) : a * (b + c) = a * b + a * c :=
mul_add a b c
-- Commutativity in commutative rings
example {R : Type*} [CommRing R] (a b : R) : a * b = b * a :=
mul_comm a b
Real Numbers
The reals are constructed as equivalence classes of Cauchy sequences, which is mathematically clean but occasionally leaks through the abstraction when you least expect it. Most of the time you can pretend they are just numbers:
-- Real number analysis
-- Reals are a field
example (x : ℝ) (h : x ≠ 0) : x * x⁻¹ = 1 :=
mul_inv_cancel₀ h
-- Basic inequalities
example (x y : ℝ) (hx : 0 < x) (hy : 0 < y) : 0 < x + y :=
add_pos hx hy
-- The triangle inequality
example (x y : ℝ) : |x + y| ≤ |x| + |y| :=
abs_add_le x y
Mathlib Tactics
Mathlib ships tactics that know more mathematics than most undergraduates. ring closes polynomial identities. linarith handles linear arithmetic over ordered rings. positivity proves things are positive. These are not magic; they are carefully engineered decision procedures. But from the outside, they look like magic:
-- Mathlib tactics in action
-- ring solves polynomial identities
example (x y : ℤ) : (x - y) * (x + y) = x^2 - y^2 := by ring
-- linarith handles linear arithmetic
example (x y z : ℚ) (h1 : x < y) (h2 : y < z) : x < z := by linarith
-- field_simp clears denominators
example (x : ℚ) (h : x ≠ 0) : (1 / x) * x = 1 := by field_simp
-- positivity proves positivity goals
example (x : ℝ) : 0 ≤ x^2 := by positivity
-- gcongr for monotonic reasoning
example (a b c d : ℕ) (h1 : a ≤ b) (h2 : c ≤ d) : a + c ≤ b + d := by gcongr
Using Search Tools
When stuck, let the computer do the searching. exact? trawls through Mathlib looking for a lemma that exactly matches your goal. apply? finds lemmas whose conclusion fits. These tactics are slow, but they beat staring at the screen trying to remember if the lemma is called add_comm or comm_add:
-- Finding lemmas with exact? and apply?
-- When stuck, use exact? to find matching lemmas
example (n : Nat) : n + 0 = n := by
exact Nat.add_zero n -- exact? would suggest this
-- apply? finds lemmas whose conclusion matches goal
example (a b : Nat) (h : a ∣ b) (h2 : b ∣ a) : a = b := by
exact Nat.dvd_antisymm h h2 -- apply? would find this
The Fundamental Theorem of Calculus
The Fundamental Theorem of Calculus in Mathlib is due to Yury Kudryashov and Benjamin Davidson. It comes in two parts, as you might remember from analysis.
The first part says that integrating then differentiating recovers the original function. If $f$ is integrable and tends to $c$ at $b$, then the function $u \mapsto \int_a^u f(x),dx$ has derivative $c$ at $b$:
theorem integral_hasStrictDerivAt_of_tendsto_ae_right
(hf : IntervalIntegrable f volume a b)
(hmeas : StronglyMeasurableAtFilter f (nhds b) volume)
(hb : Tendsto f (nhds b ⊓ ae volume) (nhds c)) :
HasStrictDerivAt (fun u => ∫ x in a..u, f x) c b
The second part says that differentiating then integrating recovers the original function up to boundary values. If $f$ is continuous on $[a,b]$ and differentiable on $(a,b)$ with integrable derivative, then:
$$\int_a^b f’(x) , dx = f(b) - f(a)$$
theorem integral_eq_sub_of_hasDeriv_right_of_le
(hab : a ≤ b)
(hcont : ContinuousOn f (Icc a b))
(hderiv : ∀ x ∈ Ioo a b, HasDerivWithinAt f (f' x) (Ioi x) x)
(hint : IntervalIntegrable f' volume a b) :
∫ x in a..b, f' x = f b - f a
The hypotheses handle the edge cases your calculus teacher glossed over: continuity on the closed interval, differentiability on the open interior, integrability of the derivative. Centuries of refinement, machine-checked.
Resources
- Moogle - Search Mathlib using natural language queries like “triangle inequality for norms”.
- Loogle - Search by type signature when you know the shape of the lemma you need.
- LeanSearch - Semantic search over Mathlib with a focus on finding relevant theorems.
- LeanExplore - Semantic search that also indexes metaprogramming and exposes an MCP interface for AI agents.
- Lean Finder - Understands proof states and theorem statements, not just keywords.
- Mathlib4 Docs - Auto-generated API reference for every declaration in Mathlib.
- Mathematics in Mathlib - Map of formalized mathematics organized by mathematical area.
- 100 Theorems - Tracks which of 100 major theorems have been formalized in Lean.
- Zulip Chat - Ask questions in the “Is there code for X?” stream when search engines fail.
- GitHub - Source code, issue tracker, and contribution guidelines.
Verified Programs
The promise of theorem provers extends beyond mathematics. We can verify that software does what we claim it does. This article demonstrates verification techniques where the verified code and the production code are the same: everything lives within Lean.
Intrinsically-Typed Interpreters
The standard approach to building interpreters involves two phases. First, parse text into an untyped abstract syntax tree. Second, run a type checker that rejects malformed programs. This works, but the interpreter must still handle the case where a program passes the type checker but evaluates to nonsense. The runtime carries the burden of the type system’s failure modes. It is like a bouncer who checks IDs at the door but still has to deal with troublemakers inside.
Intrinsically-typed interpreters refuse to play this game. The abstract syntax tree itself encodes typing judgments. An ill-typed program cannot be constructed. The type system statically excludes runtime type errors, not by checking them at runtime, but by making them unrepresentable. The bouncer is replaced by architecture: there is no door for troublemakers to enter.
Consider a small expression language with natural numbers, booleans, arithmetic, and conditionals. We start by defining the types our language supports and a denotation function that maps them to Lean types.
inductive Ty where
| nat : Ty
| bool : Ty
deriving Repr, DecidableEq
@[reducible] def Ty.denote : Ty → Type
| .nat => Nat
| .bool => Bool
The denote function is key. It interprets our object-level types (Ty) as meta-level types (Type). When our expression language says something has type nat, we mean it evaluates to a Lean Nat. When it says bool, we mean a Lean Bool. This type-level interpretation function is what makes the entire approach work.
Expressions
The expression type indexes over the result type. Each constructor precisely constrains which expressions can be built and what types they produce.
inductive Expr : Ty → Type where
| nat : Nat → Expr .nat
| bool : Bool → Expr .bool
| add : Expr .nat → Expr .nat → Expr .nat
| mul : Expr .nat → Expr .nat → Expr .nat
| lt : Expr .nat → Expr .nat → Expr .bool
| eq : Expr .nat → Expr .nat → Expr .bool
| and : Expr .bool → Expr .bool → Expr .bool
| or : Expr .bool → Expr .bool → Expr .bool
| not : Expr .bool → Expr .bool
| ite : {t : Ty} → Expr .bool → Expr t → Expr t → Expr t
Every constructor documents its typing rule. The add constructor requires both arguments to be natural number expressions and produces a natural number expression. The ite constructor requires a boolean condition and two branches of matching type.
This encoding makes ill-typed expressions unrepresentable. You cannot write add (nat 1) (bool true) because the types do not align. The Lean type checker rejects such expressions before they exist.
/-
def bad : Expr .nat := .add (.nat 1) (.bool true)
-- Error: type mismatch
-/
Evaluation
The evaluator maps expressions to their denotations. Because expressions are intrinsically typed, the evaluator is total. It never fails, never throws exceptions, never encounters impossible cases. Every pattern match is exhaustive.
def Expr.eval : {t : Ty} → Expr t → t.denote
| _, .nat n => n
| _, .bool b => b
| _, .add e₁ e₂ => e₁.eval + e₂.eval
| _, .mul e₁ e₂ => e₁.eval * e₂.eval
| _, .lt e₁ e₂ => e₁.eval < e₂.eval
| _, .eq e₁ e₂ => e₁.eval == e₂.eval
| _, .and e₁ e₂ => e₁.eval && e₂.eval
| _, .or e₁ e₂ => e₁.eval || e₂.eval
| _, .not e => !e.eval
| _, .ite c t e => if c.eval then t.eval else e.eval
The return type t.denote varies with the expression’s type index. A natural number expression evaluates to Nat. A boolean expression evaluates to Bool. This dependent return type is what makes the evaluator type-safe by construction.
def ex1 : Expr .nat := .add (.nat 2) (.nat 3)
def ex2 : Expr .bool := .lt (.nat 2) (.nat 3)
def ex3 : Expr .nat := .ite (.lt (.nat 2) (.nat 3)) (.nat 10) (.nat 20)
def ex4 : Expr .nat := .mul (.add (.nat 2) (.nat 3)) (.nat 4)
#eval ex1.eval -- 5
#eval ex2.eval -- true
#eval ex3.eval -- 10
#eval ex4.eval -- 20
Verified Optimization
Interpreters become interesting when we transform programs. Compilers do this constantly: dead code elimination, loop unrolling, strength reduction. Each transformation promises to preserve meaning while improving performance. But how do we know the promise is kept? A constant folder simplifies expressions by evaluating constant subexpressions at compile time. Adding two literal numbers produces a literal. Conditionals with constant conditions eliminate the untaken branch.
def Expr.constFold : {t : Ty} → Expr t → Expr t
| _, .nat n => .nat n
| _, .bool b => .bool b
| _, .add e₁ e₂ =>
match e₁.constFold, e₂.constFold with
| .nat n, .nat m => .nat (n + m)
| e₁', e₂' => .add e₁' e₂'
| _, .mul e₁ e₂ =>
match e₁.constFold, e₂.constFold with
| .nat n, .nat m => .nat (n * m)
| e₁', e₂' => .mul e₁' e₂'
| _, .lt e₁ e₂ => .lt e₁.constFold e₂.constFold
| _, .eq e₁ e₂ => .eq e₁.constFold e₂.constFold
| _, .and e₁ e₂ => .and e₁.constFold e₂.constFold
| _, .or e₁ e₂ => .or e₁.constFold e₂.constFold
| _, .not e => .not e.constFold
| _, .ite c t e =>
match c.constFold with
| .bool true => t.constFold
| .bool false => e.constFold
| c' => .ite c' t.constFold e.constFold
The optimization preserves types. If e : Expr t, then e.constFold : Expr t. The type indices flow through unchanged. The type system enforces this statically.
But type preservation is a weak property. We want semantic preservation: the optimized program computes the same result as the original. This requires a proof.
theorem constFold_correct : ∀ {t : Ty} (e : Expr t), e.constFold.eval = e.eval := by
intro t e
induction e with
| nat n => rfl
| bool b => rfl
| add e₁ e₂ ih₁ ih₂ =>
simp only [Expr.constFold, Expr.eval]
cases he₁ : e₁.constFold <;> cases he₂ : e₂.constFold <;>
simp only [Expr.eval, ← ih₁, ← ih₂, he₁, he₂]
| mul e₁ e₂ ih₁ ih₂ =>
simp only [Expr.constFold, Expr.eval]
cases he₁ : e₁.constFold <;> cases he₂ : e₂.constFold <;>
simp only [Expr.eval, ← ih₁, ← ih₂, he₁, he₂]
| lt e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
| eq e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
| and e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
| or e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
| not e ih => simp only [Expr.constFold, Expr.eval, ih]
| ite c t e ihc iht ihe =>
simp only [Expr.constFold, Expr.eval]
cases hc : c.constFold <;> simp only [Expr.eval, ← ihc, ← iht, ← ihe, hc]
case bool b => cases b <;> rfl
The theorem states that for any expression, evaluating the constant-folded expression yields the same result as evaluating the original. The proof proceeds by structural induction on the expression. Most cases follow directly from the induction hypotheses.
A Verified Compiler
The intrinsically-typed interpreter demonstrates type safety. But real systems compile to lower-level representations. Can we verify the compiler itself? The answer is yes, and it requires remarkably little code. In roughly 40 lines, we can define a source language, a target language, compilation, and prove the compiler correct. This is CompCert in miniature.
The source language is arithmetic expressions: literals, addition, and multiplication. The target language is a stack machine with push, add, and multiply instructions. The compilation strategy is straightforward: literals become pushes, binary operations compile their arguments and then emit the operator.
-- Source: arithmetic expressions
inductive Expr where
| lit : Nat → Expr
| add : Expr → Expr → Expr
| mul : Expr → Expr → Expr
deriving Repr
-- Target: stack machine instructions
inductive Instr where
| push : Nat → Instr
| add : Instr
| mul : Instr
deriving Repr
-- What an expression means: evaluate to a number
def eval : Expr → Nat
| .lit n => n
| .add a b => eval a + eval b
| .mul a b => eval a * eval b
-- Compile expression to stack code
def compile : Expr → List Instr
| .lit n => [.push n]
| .add a b => compile a ++ compile b ++ [.add]
| .mul a b => compile a ++ compile b ++ [.mul]
-- Execute stack code
def run : List Instr → List Nat → List Nat
| [], stack => stack
| .push n :: is, stack => run is (n :: stack)
| .add :: is, b :: a :: stack => run is ((a + b) :: stack)
| .mul :: is, b :: a :: stack => run is ((a * b) :: stack)
| _ :: is, stack => run is stack
-- Lemma: execution distributes over concatenation
theorem run_append (is js : List Instr) (s : List Nat) :
run (is ++ js) s = run js (run is s) := by
induction is generalizing s with
| nil => rfl
| cons i is ih =>
cases i with
| push n => exact ih _
| add => cases s with
| nil => exact ih _
| cons b s => cases s with
| nil => exact ih _
| cons a s => exact ih _
| mul => cases s with
| nil => exact ih _
| cons b s => cases s with
| nil => exact ih _
| cons a s => exact ih _
-- THE THEOREM: the compiler is correct
-- Running compiled code pushes exactly the evaluated result
theorem compile_correct (e : Expr) (s : List Nat) :
run (compile e) s = eval e :: s := by
induction e generalizing s with
| lit n => rfl
| add a b iha ihb => simp [compile, run_append, iha, ihb, run, eval]
| mul a b iha ihb => simp [compile, run_append, iha, ihb, run, eval]
The key insight is the run_append lemma: executing concatenated instruction sequences is equivalent to executing them in order. This lets us prove correctness compositionally. The main theorem, compile_correct, states that running compiled code pushes exactly the evaluated result onto the stack.
The proof proceeds by structural induction on expressions. Literal compilation is trivially correct. For binary operations, we use run_append to split the execution: first we run the compiled left argument, then the compiled right argument, then the operator. The induction hypotheses tell us each subexpression evaluates correctly. The operator instruction combines them as expected.
-- Try it: (2 + 3) * 4
def expr : Expr := .mul (.add (.lit 2) (.lit 3)) (.lit 4)
#eval eval expr -- 20
#eval compile expr -- [push 2, push 3, add, push 4, mul]
#eval run (compile expr) [] -- [20]
-- The theorem guarantees these always match. Not by testing. By proof.
This is verified compiler technology at its most distilled. The same principles scale to CompCert, which verifies a production C compiler. The gap between 40 lines and 100,000 lines is mostly the complexity of real languages and optimizations, not the verification methodology.
Proof-Carrying Parsers
The intrinsically-typed interpreter guarantees type safety. The verified compiler guarantees semantic preservation. But what about parsers? A parser takes untrusted input and produces structured data. The traditional approach is to hope the parser is correct and test extensively. The verified approach is to make the parser carry its own proof of correctness.
A proof-carrying parser returns both the parsed result and evidence that the result matches the grammar. Invalid parses become type errors rather than runtime errors. The proof is constructed during parsing and verified by the type checker.
We define a grammar as an inductive type with constructors for characters, sequencing, alternation, repetition, and the empty string:
inductive Grammar where
| char : Char → Grammar
| seq : Grammar → Grammar → Grammar
| alt : Grammar → Grammar → Grammar
| many : Grammar → Grammar
| eps : Grammar
inductive Matches : Grammar → List Char → Prop where
| char {c} : Matches (.char c) [c]
| eps : Matches .eps []
| seq {g₁ g₂ s₁ s₂} : Matches g₁ s₁ → Matches g₂ s₂ → Matches (.seq g₁ g₂) (s₁ ++ s₂)
| altL {g₁ g₂ s} : Matches g₁ s → Matches (.alt g₁ g₂) s
| altR {g₁ g₂ s} : Matches g₂ s → Matches (.alt g₁ g₂) s
| manyNil {g} : Matches (.many g) []
| manyCons {g s₁ s₂} : Matches g s₁ → Matches (.many g) s₂ → Matches (.many g) (s₁ ++ s₂)
The Matches relation defines when a string matches a grammar. Each constructor corresponds to a grammar production: a character matches itself, sequences match concatenations, alternatives match either branch, and repetition matches zero or more occurrences.
A parse result bundles the consumed input, remaining input, and a proof that the consumed portion matches the grammar:
structure ParseResult (g : Grammar) where
consumed : List Char
rest : List Char
proof : Matches g consumed
abbrev Parser (g : Grammar) := List Char → Option (ParseResult g)
The parser combinators construct these proof terms as they parse. When pchar 'a' succeeds, it returns a ParseResult containing proof that 'a' matches Grammar.char 'a'. When pseq combines two parsers, it combines their proofs using the Matches.seq constructor:
def pchar (c : Char) : Parser (.char c) := fun
| x :: xs => if h : x = c then some ⟨[c], xs, h ▸ .char⟩ else none
| [] => none
variable {g₁ g₂ g : Grammar}
def pseq (p₁ : Parser g₁) (p₂ : Parser g₂) : Parser (.seq g₁ g₂) := fun input =>
p₁ input |>.bind fun ⟨s₁, r, pf₁⟩ => p₂ r |>.map fun ⟨s₂, r', pf₂⟩ => ⟨s₁ ++ s₂, r', .seq pf₁ pf₂⟩
def palt (p₁ : Parser g₁) (p₂ : Parser g₂) : Parser (.alt g₁ g₂) := fun input =>
(p₁ input |>.map fun ⟨s, r, pf⟩ => ⟨s, r, .altL pf⟩) <|>
(p₂ input |>.map fun ⟨s, r, pf⟩ => ⟨s, r, .altR pf⟩)
partial def pmany (p : Parser g) : Parser (.many g) := fun input =>
match p input with
| none => some ⟨[], input, .manyNil⟩
| some ⟨s₁, r, pf₁⟩ =>
if s₁.isEmpty then some ⟨[], input, .manyNil⟩
else (pmany p r).map fun ⟨s₂, r', pf₂⟩ => ⟨s₁ ++ s₂, r', .manyCons pf₁ pf₂⟩
infixl:60 " *> " => pseq
infixl:50 " <+> " => palt
postfix:90 "⁺" => pmany
Soundness is trivial. Every successful parse carries its proof:
theorem soundness (p : Parser g) (input : List Char) (r : ParseResult g) :
p input = some r → Matches g r.consumed := fun _ => r.proof
The theorem says: if a parser returns a result, then the consumed input matches the grammar. The proof is the identity function, because the evidence is already in the result. Proof-carrying data constructs correctness alongside the computation rather than establishing it after the fact.
The Stack Machine
We continue with another Lean-only verification example: a stack machine, the fruit fly of computer science. Like the fruit fly in genetics, stack machines are simple enough to study exhaustively yet complex enough to exhibit interesting behavior. The machine has five operations: push a value, pop the top, add the top two values, multiply them, or duplicate the top.
inductive Op where
| push : Int → Op
| pop : Op
| add : Op
| mul : Op
| dup : Op
deriving Repr, DecidableEq
The run function executes a program against a stack:
def run : List Op → List Int → List Int
| [], stack => stack
| .push n :: ops, stack => run ops (n :: stack)
| .pop :: ops, _ :: stack => run ops stack
| .pop :: ops, [] => run ops []
| .add :: ops, b :: a :: stack => run ops ((a + b) :: stack)
| .add :: ops, stack => run ops stack
| .mul :: ops, b :: a :: stack => run ops ((a * b) :: stack)
| .mul :: ops, stack => run ops stack
| .dup :: ops, x :: stack => run ops (x :: x :: stack)
| .dup :: ops, [] => run ops []
#eval run [.push 3, .push 4, .add] []
#eval run [.push 3, .push 4, .mul] []
#eval run [.push 5, .dup, .mul] []
#eval run [.push 10, .push 3, .pop] []
Universal Properties
The power of theorem proving lies not in verifying specific programs but in proving properties about all programs. Consider the composition theorem: running two programs in sequence equals running their concatenation.
theorem run_append (p1 p2 : List Op) (s : List Int) :
run (p1 ++ p2) s = run p2 (run p1 s) := by
induction p1 generalizing s with
| nil => rfl
| cons op ops ih =>
cases op with
| push n => exact ih _
| pop => cases s <;> exact ih _
| add => match s with
| [] | [_] => exact ih _
| _ :: _ :: _ => exact ih _
| mul => match s with
| [] | [_] => exact ih _
| _ :: _ :: _ => exact ih _
| dup => cases s <;> exact ih _
This theorem quantifies over all programs p1 and p2 and all initial stacks s. The proof proceeds by induction on the first program, with case analysis on each operation and the stack state. The result is a guarantee that holds for the infinite space of all possible programs.
Stack Effects
Each operation has a predictable effect on stack depth. Push and dup add one element; pop, add, and mul remove one (add and mul consume two and produce one). We can compute the total effect of a program statically:
def effect : Op → Int
| .push _ => 1
| .pop => -1
| .add => -1
| .mul => -1
| .dup => 1
def totalEffect : List Op → Int
| [] => 0
| op :: ops => effect op + totalEffect ops
theorem effect_append (p1 p2 : List Op) :
totalEffect (p1 ++ p2) = totalEffect p1 + totalEffect p2 := by
induction p1 with
| nil => simp [totalEffect]
| cons op ops ih => simp [totalEffect, ih]; ring
The effect_append theorem proves that stack effects compose additively. If program p1 changes the stack depth by n and p2 changes it by m, then p1 ++ p2 changes it by n + m. This is another universal property, holding for all programs.
Program Equivalence
We can also prove that certain program transformations preserve semantics. Addition and multiplication are commutative, so swapping the order of pushes does not change the result:
theorem add_comm (n m : Int) (rest : List Op) (s : List Int) :
run (.push n :: .push m :: .add :: rest) s =
run (.push m :: .push n :: .add :: rest) s := by
simp [run, Int.add_comm]
theorem mul_comm (n m : Int) (rest : List Op) (s : List Int) :
run (.push n :: .push m :: .mul :: rest) s =
run (.push m :: .push n :: .mul :: rest) s := by
simp [run, Int.mul_comm]
theorem dup_add_eq_double (n : Int) (rest : List Op) (s : List Int) :
run (.push n :: .dup :: .add :: rest) s =
run (.push (n + n) :: rest) s := by
simp [run]
theorem dup_mul_eq_square (n : Int) (rest : List Op) (s : List Int) :
run (.push n :: .dup :: .mul :: rest) s =
run (.push (n * n) :: rest) s := by
simp [run]
These theorems justify program transformations. An optimizer that reorders pushes before adds is provably correct. The dup_add_eq_double and dup_mul_eq_square theorems show that push n; dup; add computes 2n and push n; dup; mul computes n². A compiler could use these equivalences for strength reduction.
What We Proved
The stack machine demonstrates verification of universal properties. We proved that running concatenated programs equals sequential execution (composition), that stack effects compose predictably (effect additivity), that push order does not affect addition or multiplication (commutativity), and that certain instruction sequences compute the same result (equivalences).
These theorems quantify over the entire space of programs, unlike tests of specific inputs. The composition theorem alone covers infinitely many cases that no test suite could enumerate. A passing test establishes an existential claim (“there exists an input where the program works”), while a theorem establishes a universal claim (“for all inputs, the program works”). Tests sample behavior, proofs characterize it completely.
The Verification Gap
Everything so far lives entirely within Lean. The interpreter is correct by construction. The compiler preserves semantics. The parser carries its proof. The stack machine obeys universal laws. These are real theorems about real programs. And yet they share a fundamental limitation: the verified code and the production code are the same code. There is no gap to bridge because there is no bridge to cross.
Real systems are not written in Lean. They are written in Rust, C, Go, or whatever language the team knows and the platform demands. The next article explores how to bridge the gap between a verified model and a production implementation.
Model Checking
The previous article demonstrated verification techniques where everything lives within Lean. But real systems are not written in Lean. They are written in Rust, C, Go, or whatever language the team knows and the platform demands. The gap between a verified model and a production implementation is where bugs hide. A correct specification means nothing if the implementation diverges from it.
This article explores how to bridge that gap using bounded model checking and verification-guided development.
Conway’s Game of Life
To see the verification gap in concrete terms, consider Conway’s Game of Life. It is a zero-player game that evolves on an infinite grid. Each cell is either alive or dead. At each step, cells follow simple rules based on the eight neighbors surrounding each cell:
The rules are simple. A live cell with two or three neighbors survives. A dead cell with exactly three neighbors becomes alive. Everything else dies. From these rules emerges startling complexity: oscillators, spaceships, and patterns that compute arbitrary functions.
The Game of Life is an excellent verification target because we can prove properties about specific patterns without worrying about the infinite grid. The challenge is that the true Game of Life lives on an unbounded plane, which we cannot represent directly. We need a finite approximation that preserves the local dynamics.
The standard solution is a toroidal grid. Imagine taking a rectangular grid and gluing the top edge to the bottom edge, forming a cylinder. Then glue the left edge to the right edge, forming a torus. Geometrically, this is the surface of a donut. A cell at the right edge has its eastern neighbor on the left edge. A cell at the top has its northern neighbor at the bottom. Every cell has exactly eight neighbors, with no special boundary cases.
This topology matters for verification. On a bounded grid with walls, edge cells would have fewer neighbors, changing their evolution rules. We would need separate logic for corners, edges, and interior cells. The toroidal topology eliminates this complexity: the neighbor-counting function is uniform across all cells. More importantly, patterns that fit within the grid and do not interact with their wrapped-around selves behave exactly as they would on the infinite plane. A 5x5 blinker on a 10x10 torus evolves identically to a blinker on the infinite grid, because the pattern never grows large enough to meet itself coming around the other side.
abbrev Grid := Array (Array Bool)
def Grid.mk (n m : Nat) (f : Fin n → Fin m → Bool) : Grid :=
Array.ofFn fun i => Array.ofFn fun j => f i j
def Grid.get (g : Grid) (i j : Nat) : Bool :=
if h₁ : i < g.size then
let row := g[i]
if h₂ : j < row.size then row[j] else false
else false
def Grid.dead (n m : Nat) : Grid :=
Array.replicate n (Array.replicate m false)
def Grid.rows (g : Grid) : Nat := g.size
def Grid.cols (g : Grid) : Nat := if h : 0 < g.size then g[0].size else 0
The grid representation uses arrays of arrays, with accessor functions that handle boundary conditions. The countNeighbors function implements toroidal wrapping by computing indices modulo the grid dimensions.
def Grid.countNeighbors (g : Grid) (i j : Nat) : Nat :=
let n := g.rows
let m := g.cols
let deltas : List (Int × Int) :=
[(-1, -1), (-1, 0), (-1, 1),
(0, -1), (0, 1),
(1, -1), (1, 0), (1, 1)]
deltas.foldl (fun acc (di, dj) =>
let ni := (((i : Int) + di + n) % n).toNat
let nj := (((j : Int) + dj + m) % m).toNat
if g.get ni nj then acc + 1 else acc) 0
The step function applies Conway’s rules to every cell. The pattern matching encodes the survival conditions directly: a live cell survives with 2 or 3 neighbors, a dead cell is born with exactly 3 neighbors.
def Grid.step (g : Grid) : Grid :=
let n := g.rows
let m := g.cols
Array.ofFn fun (i : Fin n) => Array.ofFn fun (j : Fin m) =>
let neighbors := g.countNeighbors i.val j.val
let alive := g.get i.val j.val
match alive, neighbors with
| true, 2 => true
| true, 3 => true
| false, 3 => true
| _, _ => false
def Grid.stepN (g : Grid) : Nat → Grid
| 0 => g
| k + 1 => (g.step).stepN k
Now for the fun part. We can define famous patterns and prove properties about them.
The blinker is a period-2 oscillator: three cells in a row that flip between horizontal and vertical orientations, then back again.
The block is a 2x2 square that never changes. Each live cell has exactly three neighbors, so all survive. No dead cell has exactly three live neighbors, so none are born.
The glider is the star of our show. It is a spaceship: a pattern that translates across the grid. After four generations, the glider has moved one cell diagonally.
After generation 4, the pattern is identical to generation 0, but shifted one cell down and one cell right. The glider crawls across the grid forever.
-- John Conway (1937-2020) invented this cellular automaton in 1970.
def blinker : Grid := Grid.mk 5 5 fun i j =>
(i.val, j.val) ∈ [(1, 2), (2, 2), (3, 2)]
def blinkerPhase2 : Grid := Grid.mk 5 5 fun i j =>
(i.val, j.val) ∈ [(2, 1), (2, 2), (2, 3)]
def glider : Grid := Grid.mk 6 6 fun i j =>
(i.val, j.val) ∈ [(0, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
def gliderTranslated : Grid := Grid.mk 6 6 fun i j =>
(i.val, j.val) ∈ [(1, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
def block : Grid := Grid.mk 4 4 fun i j =>
(i.val, j.val) ∈ [(1, 1), (1, 2), (2, 1), (2, 2)]
Here is where theorem proving earns its keep. We can prove that the blinker oscillates with period 2, that the block is stable, and that the glider translates after exactly four generations.
theorem blinker_oscillates : blinker.step = blinkerPhase2 := by native_decide
theorem blinker_period_2 : blinker.stepN 2 = blinker := by native_decide
theorem glider_translates : glider.stepN 4 = gliderTranslated := by native_decide
theorem block_is_stable : block.step = block := by native_decide
The native_decide tactic does exhaustive computation. Lean evaluates the grid evolution and confirms the equality. The proof covers every cell in the grid across the specified number of generations.
We have formally verified that a glider translates diagonally after four steps. Every cellular automaton enthusiast knows this empirically, having watched countless gliders march across their screens. But we have proven it. The glider must translate. It is not a bug that the pattern moves; it is a theorem. (Readers of Greg Egan’s Permutation City may appreciate that we are now proving theorems about the computational substrate in which his characters would live.)
We can also verify that the blinker conserves population, and observe that the glider does too:
def Grid.population (g : Grid) : Nat :=
g.foldl (fun acc row => row.foldl (fun acc cell => if cell then acc + 1 else acc) acc) 0
#eval blinker.population
#eval blinker.step.population
#eval glider.population
#eval glider.step.population
For visualization, we can print the grids:
def Grid.toString (g : Grid) : String :=
String.intercalate "\n" <|
g.toList.map fun row =>
String.mk <| row.toList.map fun cell =>
if cell then '#' else '.'
#eval IO.println blinker.toString
#eval IO.println blinker.step.toString
#eval IO.println glider.toString
#eval IO.println (glider.stepN 4).toString
The Gap Made Concrete
Here is the sobering reality. We have a beautiful proof that gliders translate. The Lean model captures Conway’s rules precisely. The theorems are watertight. And yet, if someone writes a Game of Life implementation in Rust, our proofs say nothing about it.
The Rust implementation in examples/game-of-life/ implements the same rules. It has the same step function, the same neighbor counting, the same pattern definitions. Run it and you will see blinkers blink and gliders glide. But the Lean proofs do not transfer automatically. The Rust code might have off-by-one errors in the wrap-around logic. It might use different integer semantics. It might have subtle bugs in edge cases that our finite grid proofs never exercise.
This is the central problem of software verification. Writing proofs about mathematical models is satisfying but insufficient. Real software runs on real hardware with real bugs. The gap matters most where the stakes are highest: matching engines that execute trades, auction mechanisms that allocate resources, systems where a subtle bug can cascade into market-wide failures.
How do we bridge the gap between a verified model and a production implementation?
Verification-Guided Development
The answer comes from verification-guided development. The approach has three components. First, write the production implementation in your target language. Second, transcribe the core logic into Lean as a pure functional program. Third, prove properties about the Lean model; the proofs transfer to the production code because the transcription is exact. This technique was developed by AWS for their Cedar policy language, and it applies wherever a functional core can be isolated from imperative scaffolding.
The transcription must be faithful. Every control flow decision in the Rust code must have a corresponding decision in the Lean model. Loops become recursion. Mutable state becomes accumulator parameters. Early returns become validity flags. When the transcription is exact, we can claim that the Lean proofs apply to the Rust implementation.
To verify this correspondence, both systems produce execution traces. A trace records the state after each operation. If the Rust implementation and the Lean model produce identical traces on all inputs, the proof transfers. For finite input spaces, we can verify this exhaustively. For infinite spaces, we can sometimes prove that bounded testing implies unbounded correctness, as we will see with the circuit breaker’s uniformity theorem.
Bounded Model Checking
Many real systems require state machines with complex transition rules: network protocols, payment processing, order lifecycles, and resilience patterns. How do we connect a verified Lean model to a production Rust implementation with strong guarantees?
The circuit breaker pattern prevents cascading failures in distributed systems. When a service starts failing, the circuit breaker “trips open” to block requests, giving the service time to recover. After a timeout, it allows a test request through. If the test succeeds, the circuit closes and normal operation resumes. If the test fails, the circuit stays open.
The key insight is that each state carries different data. A closed breaker tracks failure count. An open breaker tracks when it opened (for timeout calculation). A half-open breaker needs no extra data.
structure Config where
threshold : Nat
timeout : Nat
deriving DecidableEq, Repr
inductive State where
| closed (failures : Nat) : State
| opened (openedAt : Nat) : State
| halfOpen : State
deriving DecidableEq, Repr
Events trigger transitions between states:
inductive Event where
| success : Event
| failure (time : Nat) : Event
| tick (time : Nat) : Event
| probeSuccess (time : Nat) : Event
| probeFailure (time : Nat) : Event
deriving DecidableEq, Repr
The Step Function
The entire verification approach centers on one function: step. This single function defines all circuit breaker behavior. Both Lean proofs and Rust verification target this exact definition.
def step (cfg : Config) (s : State) (e : Event) : State :=
match s, e with
| .closed _, .success => .closed 0
| .closed failures, .failure time =>
if failures + 1 >= cfg.threshold then .opened time
else .closed (failures + 1)
| .opened openedAt, .tick time =>
if time - openedAt >= cfg.timeout then .halfOpen
else .opened openedAt
| .halfOpen, .probeSuccess _ => .closed 0
| .halfOpen, .probeFailure time => .opened time
| s, _ => s
This is the source of truth. Every property we prove, every test we run, every guarantee we claim flows from this definition. The function is pure, total, and deterministic.
Proving Invariants
The state invariant says that closed circuits never accumulate failures beyond the threshold. Once failures reach the threshold, the circuit must trip open.
def Invariant (cfg : Config) : State → Prop
| .closed failures => failures < cfg.threshold
| .opened _ => True
| .halfOpen => True
theorem initial_invariant (cfg : Config) (h : cfg.threshold > 0) :
Invariant cfg initial := h
theorem step_preserves_invariant (cfg : Config) (s : State) (e : Event)
(hinv : Invariant cfg s) (hpos : cfg.threshold > 0) :
Invariant cfg (step cfg s e) := by
cases s with
| closed failures =>
cases e with
| success => simp [step, Invariant, hpos]
| failure time =>
simp only [step]
split
· simp [Invariant]
· simp [Invariant]; omega
| tick _ => simp [step, Invariant]; exact hinv
| probeSuccess _ => simp [step, Invariant]; exact hinv
| probeFailure _ => simp [step, Invariant]; exact hinv
| opened openedAt =>
cases e with
| tick time =>
simp only [step]
split <;> simp [Invariant]
| _ => simp [step, Invariant]
| halfOpen =>
cases e with
| probeSuccess _ => simp [step, Invariant, hpos]
| probeFailure _ => simp [step, Invariant]
| _ => simp [step, Invariant]
We prove specific transition properties too. Success resets failures. Reaching the threshold trips the circuit. The timeout transitions to half-open. These theorems are definitionally true, following directly from the structure of step:
theorem success_resets (cfg : Config) (f : Nat) :
step cfg (.closed f) .success = .closed 0 := rfl
theorem threshold_trips (cfg : Config) (f t : Nat) (h : f + 1 >= cfg.threshold) :
step cfg (.closed f) (.failure t) = .opened t := by simp [step, h]
theorem below_threshold_increments (cfg : Config) (f t : Nat) (h : f + 1 < cfg.threshold) :
step cfg (.closed f) (.failure t) = .closed (f + 1) := by
simp [step]; omega
theorem timeout_transitions (cfg : Config) (o t : Nat) (h : t - o >= cfg.timeout) :
step cfg (.opened o) (.tick t) = .halfOpen := by simp [step, h]
theorem probe_success_closes (cfg : Config) (t : Nat) :
step cfg .halfOpen (.probeSuccess t) = .closed 0 := rfl
theorem probe_failure_reopens (cfg : Config) (t : Nat) :
step cfg .halfOpen (.probeFailure t) = .opened t := rfl
Predicate-Determined State Machines
Before presenting the main theorem, we need to understand why bounded testing can work at all for this system. The answer lies in a structural property: the circuit breaker is predicate-determined.
Look carefully at the step function. It makes exactly two comparisons: failures + 1 >= threshold (should the circuit trip?) and time - openedAt >= timeout (has the timeout elapsed?). Everything else is pattern matching on constructors. The function does not compute with the numeric values beyond these two boolean tests. It does not add timeout to threshold. It does not multiply failure counts. It does not branch on whether a timestamp is even or odd. The values flow through the function, but only these two predicates determine the control flow.
Contrast this with a function that lacks this structure:
step(count, Increment) = count + 1
Here the output depends on the magnitude of count, not just on a comparison. Testing with count=0, 1, 2, 3 tells us nothing about count=1000000. The function performs arithmetic that directly affects the output, creating infinitely many distinct behaviors.
The circuit breaker avoids this trap. When it stores failures + 1 in the new state, that value flows through unchanged until the next comparison. The function never computes failures * 2 or threshold - failures. Values are compared and stored, never combined arithmetically.
This structure has a profound consequence: if two inputs produce the same boolean comparison results, they must produce the same output constructor. With threshold=3 and failures=2, the comparison failures + 1 >= threshold yields true. With threshold=1000000 and failures=999999, the same comparison also yields true. Both inputs take the same branch. Both produce an Open state. The actual magnitudes do not matter; only the boolean outcomes do.
The Uniformity Theorem
The predicate-determined structure enables a remarkable theorem. We formalize the observation above as the uniformity theorem. In equational form:
\[ \text{kind}(s_1) = \text{kind}(s_2) \land \text{kind}(e_1) = \text{kind}(e_2) \land \text{cmp}(s_1, e_1) = \text{cmp}(s_2, e_2) \] \[ \implies \text{kind}(\text{step}(s_1, e_1)) = \text{kind}(\text{step}(s_2, e_2)) \]
where \(\text{kind}\) extracts the constructor (Closed, Open, or HalfOpen) and \(\text{cmp}\) extracts the boolean comparison results. The theorem says: inputs that agree on structure and comparisons produce outputs that agree on structure.
Put simply: the function does not do math with the numbers, it just asks “is this bigger than that?” Once you have tested both “yes” and “no” for each question, you have tested everything.
Proof sketch: The proof proceeds in three steps. First, we case-split on the state constructors. If the two states have different constructors (say, one is Closed and one is Open), the hypothesis sameStateKind s₁ s₂ = true is false, giving an immediate contradiction. This eliminates all off-diagonal cases. Second, for each diagonal case (both Closed, both Open, or both HalfOpen), we case-split on event constructors. Again, mismatched events contradict sameEventKind. Third, we are left with only the cases where step actually branches: (Closed, Failure) which checks the threshold, and (Open, Tick) which checks the timeout. For these, we case-split on whether each comparison is true or false. The hypothesis hsame_cmp says the comparisons have the same boolean result, so if they disagree we have a contradiction. If they agree, both calls to step take the same branch and produce outputs with the same constructor.
def sameStateKind : State → State → Bool
| .closed _, .closed _ => true
| .opened _, .opened _ => true
| .halfOpen, .halfOpen => true
| _, _ => false
def sameEventKind : Event → Event → Bool
| .success, .success => true
| .failure _, .failure _ => true
| .tick _, .tick _ => true
| .probeSuccess _, .probeSuccess _ => true
| .probeFailure _, .probeFailure _ => true
| _, _ => false
structure ComparisonResults where
thresholdReached : Bool
timeoutElapsed : Bool
deriving DecidableEq, Repr
def getComparisons (cfg : Config) (s : State) (e : Event) : ComparisonResults :=
match s, e with
| .closed f, .failure _ => ⟨f + 1 >= cfg.threshold, false⟩
| .opened o, .tick t => ⟨false, t - o >= cfg.timeout⟩
| _, _ => ⟨false, false⟩
theorem uniformity (cfg₁ cfg₂ : Config) (s₁ s₂ : State) (e₁ e₂ : Event)
(hsame_state : sameStateKind s₁ s₂)
(hsame_event : sameEventKind e₁ e₂)
(hsame_cmp : getComparisons cfg₁ s₁ e₁ = getComparisons cfg₂ s₂ e₂) :
sameStateKind (step cfg₁ s₁ e₁) (step cfg₂ s₂ e₂) := by
-- Step 1: Case split on state constructors. Off-diagonal cases (e.g., closed vs opened)
-- are contradictions since hsame_state requires matching constructors.
cases s₁ <;> cases s₂ <;> simp_all [sameStateKind]
-- Step 2: For each diagonal case (closed/closed, opened/opened, halfOpen/halfOpen),
-- split on event constructors. Again, off-diagonal cases contradict hsame_event.
all_goals cases e₁ <;> cases e₂ <;> simp_all [sameEventKind, step, getComparisons]
-- Step 3: Two cases remain where step branches on comparisons:
-- (closed, failure) branches on threshold check
case closed.closed.failure.failure f₁ f₂ t₁ t₂ =>
-- hsame_cmp says both threshold comparisons have the same boolean result.
-- Case split on whether each threshold is reached.
by_cases h₁ : f₁ + 1 >= cfg₁.threshold <;>
by_cases h₂ : f₂ + 1 >= cfg₂.threshold <;>
-- When h₁ and h₂ disagree, hsame_cmp gives a contradiction.
-- When they agree, both step to the same constructor.
simp only [h₂, ↓reduceIte] <;> simp_all
-- (opened, tick) branches on timeout check
case opened.opened.tick.tick o₁ o₂ t₁ t₂ =>
-- Same logic: hsame_cmp forces timeout comparisons to agree.
by_cases h₁ : t₁ - o₁ >= cfg₁.timeout <;>
by_cases h₂ : t₂ - o₂ >= cfg₂.timeout <;>
simp only [h₂, ↓reduceIte] <;> simp_all
The theorem states: if two inputs have the same state kind (both Closed, both Open, or both HalfOpen), the same event kind, and the same comparison results, then the outputs have the same state kind. The proof proceeds by exhaustive case analysis on state and event constructors, then shows that matching comparison results force matching output constructors.
Bounded Verification
The comparison outcomes partition the infinite input space into equivalence classes. All inputs where failures + 1 >= threshold is true behave identically (modulo the specific values stored). All inputs where it is false behave identically. Since there are only two comparisons, each boolean, there are at most four equivalence classes per (state kind, event kind) pair.
To verify the implementation for all inputs, we only need to test representatives from each equivalence class. A threshold of 3 with 2 failures represents all cases where the threshold is reached. A threshold of 3 with 0 failures represents all cases where it is not. Testing both covers the infinite space of threshold/failure combinations.
The uniformity theorem provides a mathematical proof that the equivalence classes are complete, eliminating sampling and heuristics. If an implementation passes tests covering all equivalence classes, it is correct for all inputs. Bounded testing with small values that hit both true and false for each comparison proves correctness for all values.
Where Bounded Model Checking Applies
Many real-world state machines share this predicate-determined structure. Protocol state machines like TCP transition based on flags and sequence number comparisons, not on packet payload arithmetic; a SYN-RECEIVED state becomes ESTABLISHED when ACK is set, regardless of sequence number magnitudes. Business rule engines for order lifecycles (pending, confirmed, shipped, delivered) transition on event types and threshold comparisons like “payment received” or “inventory available,” not on order total arithmetic. Access control systems depend on role membership and policy predicates, not on computing with user IDs. Rate limiters using token buckets transition on “tokens available >= cost” comparisons where the exact count matters only for that boolean test.
For any such system, bounded model checking can provide complete verification. The recipe is straightforward: identify all comparisons in the transition function, prove (or convince yourself) that behavior depends only on comparison outcomes, generate test cases covering all combinations of comparison outcomes, and verify the implementation against these cases.
Where It Does Not Apply
The uniformity property does not hold for systems where output depends on arithmetic over unbounded values. Counters and accumulators that sum transaction amounts cannot be verified by bounded testing; the sum of [1, 2, 3] tells us nothing about [1000000, 2000000]. Cryptographic functions like hashes and encryption depend intimately on bit-level arithmetic where small inputs reveal nothing about large ones. Numerical algorithms involving floating-point, matrix operations, or differential equations have behaviors that depend on magnitude, precision, and numerical stability. Recursive depth matters too: a function that changes behavior at depth 1000 cannot be verified by testing to depth 100. Overflow-sensitive code is particularly treacherous; if the implementation uses fixed-width integers that overflow, the Lean model (using mathematical naturals) diverges at the overflow boundary, and bounded testing might miss the case entirely.
The uniformity theorem gives us a criterion: can you factor the transition function into (1) comparisons that produce booleans, and (2) value shuffling that stores results without arithmetic? If yes, bounded model checking works. If no, you need different techniques.
The Deeper Principle
The uniformity theorem exemplifies a broader principle in verification: exploit structure to reduce infinite problems to finite ones.
Note
The key insight: The circuit breaker’s predicate-determined structure lets us collapse an infinite input space into finitely many equivalence classes. This is non-trivial and depends on the specific structure of this problem. Not all state machines admit such a reduction. The uniformity theorem is a precise statement of why this particular system has this property: because
stepbranches only on boolean comparisons, never on arithmetic over values. Systems that compute with their inputs (counters, accumulators, cryptographic functions) do not have this structure and cannot be verified this way.
Other structures enable other reductions:
- Symmetry: If a function treats all elements of a set uniformly, test one representative
- Monotonicity: If a function is monotonic, test boundary cases
- Compositionality: If a function composes smaller functions, verify the pieces
The art of verification is recognizing which structures your system has and exploiting them appropriately. For predicate-determined state machines, bounded model checking provides complete verification, justified by mathematical proof.
Test Generation
The uniformity theorem justifies generating exhaustive test cases within bounds. We enumerate all states, events, and configurations:
structure Bounds where
maxThreshold : Nat := 4
maxTimeout : Nat := 10
maxTime : Nat := 20
structure TestCase where
threshold : Nat
timeout : Nat
state : State
event : Event
expected : State
deriving Repr
This generates
\[ \sum_{t=1}^{4} 10 \times (t + 22) \times 85 = 83{,}300 \]
test cases: for each threshold \(t\), we have 10 timeouts, \(t + 22\) states (\(t\) closed states plus 21 open states plus half-open), and 85 events. Each test case records the expected output state computed by Lean’s step function. These cases are exported to JSON for Rust consumption.
The Rust Implementation
The Rust step function must exactly match Lean’s semantics. This is the verified core:
#![allow(unused)]
fn main() {
#[allow(clippy::match_same_arms)]
pub fn step(threshold: u64, timeout: u64, state: State, event: &Event) -> State {
match (state, event) {
(State::Closed(_), Event::Success) => State::Closed(0),
(State::Closed(failures), Event::Failure(time)) => {
if failures + 1 >= threshold {
State::Open(*time)
} else {
State::Closed(failures + 1)
}
}
(State::Open(opened_at), Event::Tick(time)) => {
if time.saturating_sub(opened_at) >= timeout {
State::HalfOpen
} else {
State::Open(opened_at)
}
}
(State::HalfOpen, Event::ProbeSuccess(_)) => State::Closed(0),
(State::HalfOpen, Event::ProbeFailure(time)) => State::Open(*time),
(s, _) => s,
}
}
}
Note the use of saturating_sub for the timeout check. Lean’s natural number subtraction is saturating (returns 0 for negative results), so Rust must use the same semantics to match.
The Typestate API
The Rust typestate pattern provides an ergonomic API with compile-time state transition safety. The key insight is that every method calls the verified step function internally:
#![allow(unused)]
fn main() {
pub fn record_failure(self, now: u64) -> Result<Self, CircuitBreaker<Open>> {
let new_state = step(
self.threshold,
self.timeout,
self.state,
&Event::Failure(now),
);
match new_state {
State::Closed(_) => Ok(Self {
state: new_state,
..self
}),
State::Open(_) => Err(CircuitBreaker {
threshold: self.threshold,
timeout: self.timeout,
state: new_state,
_marker: PhantomData,
}),
State::HalfOpen => unreachable!(),
}
}
}
Invalid transitions are compile errors. You cannot call record_failure on a CircuitBreaker<Open>. You cannot call check_timeout on a CircuitBreaker<Closed>. The type system enforces the state machine protocol at compile time.
Exhaustive Testing
The Rust test loads all 83,300 test cases and verifies exact correspondence:
#![allow(unused)]
fn main() {
#[test]
fn exhaustive_lean_equivalence() {
let compressed = include_bytes!(concat!(
env!("CARGO_MANIFEST_DIR"),
"/testdata/exhaustive_tests.json.gz"
));
let mut decoder = GzDecoder::new(&compressed[..]);
let mut json = String::new();
decoder.read_to_string(&mut json).expect("valid gzip");
let cases: Vec<ExhaustiveTestCase> =
serde_json::from_str(&json).expect("valid exhaustive test JSON");
for case in &cases {
let actual = step(case.threshold, case.timeout, case.state, &case.event);
assert_eq!(
actual, case.expected,
"threshold={}, timeout={}, state={:?}, event={:?}",
case.threshold, case.timeout, case.state, case.event
);
}
}
}
The test performs exhaustive verification within bounds, covering every combination of (threshold 1-4, timeout 1-10, state, event). The uniformity theorem guarantees that if all bounded cases pass, the unbounded implementation is correct. The full Rust source is available on GitHub.
Where Trust Lives
The verification pipeline has three stages, and each introduces its own risks. Understanding where trust lies is essential to assessing the strength of the overall guarantee.
Model and Transcription Risk
The Lean model must faithfully capture the intent of the Rust implementation. Unlike systems like CompCert or Coq’s extraction mechanism, there is no automatic verified extraction from Lean to Rust. The correspondence relies on manual transcription. If the programmer makes a mistake in the transcription, a correct Lean proof says nothing about the incorrect Rust code.
The typestate API adds another layer. The ergonomic wrapper around the verified step function is verified only through unit tests, not exhaustive model checking. A bug in how the wrapper invokes step would compromise the guarantee.
Execution Equivalence Risk
Rust and Lean have different runtime semantics. Rust’s saturating_sub matches Lean’s natural number subtraction, but this correspondence is verified by testing, not by formal proof. A different integer type or subtraction operation could break the equivalence silently.
Integer overflow is particularly treacherous. Lean uses unbounded natural numbers; Rust uses fixed-width integers. If the implementation overflows where the model does not, bounded testing might miss the divergence entirely. The circuit breaker avoids this by keeping all values small, but the risk remains for systems with larger numeric ranges.
Testing Infrastructure Risk
The verification pipeline includes components that must simply be trusted: the JSON serialization layer that exports test cases from Lean, the serde deserialization that reads them in Rust, and the file I/O that moves data between systems. A bug in any of these components could cause false positives, reporting that tests pass when the implementations actually diverge.
Defense in Depth
Despite these risks, the approach provides strong guarantees through layered defenses. The Lean model is provably correct: invariant preservation and the uniformity theorem are machine-checked proofs. The Rust step function is verified against 83,300 exhaustive test cases. The typestate API prevents invalid transitions at compile time. No single layer is impenetrable, but an attacker (or a bug) would need to defeat multiple independent mechanisms to produce an incorrect result.
The conjunction of all guarantees is captured in a single metatheorem:
theorem correctness (cfg : Config) (hpos : cfg.threshold > 0) :
(Invariant cfg initial) ∧
(∀ s e, Invariant cfg s → Invariant cfg (step cfg s e)) ∧
(∀ cfg₂ s₁ s₂ e₁ e₂,
sameStateKind s₁ s₂ → sameEventKind e₁ e₂ →
getComparisons cfg s₁ e₁ = getComparisons cfg₂ s₂ e₂ →
sameStateKind (step cfg s₁ e₁) (step cfg₂ s₂ e₂)) :=
⟨initial_invariant cfg hpos,
fun s e hinv => step_preserves_invariant cfg s e hinv hpos,
fun _ _ _ _ _ hs he hc => uniformity cfg _ _ _ _ _ hs he hc⟩
This theorem is the “golden assertion” of the circuit breaker: the initial state is valid, every transition preserves validity, and behavior depends only on comparison outcomes. If this theorem compiles, the model is correct.
Closing Thoughts
Why do we prove properties rather than test for them? Rice’s Classes of Recursively Enumerable Sets and Their Decision Problems provides the fundamental answer: every non-trivial semantic property of programs is undecidable. You cannot write a program that decides whether other programs halt, are correct, never access null, or satisfy any interesting behavioral property. The proof reduces from the halting problem. Verification escapes this limitation by requiring human-provided proofs that the compiler can check, rather than trying to infer properties automatically.
The examples in this series form a hierarchy of verification strength, from weakest to strongest:
- Game of Life:
native_decideexhaustively checks specific finite patterns (gliders glide, blinkers blink), but the guarantees cover only those patterns and only the Lean model. - Proof-carrying parsers: Soundness by construction within Lean, with evidence built alongside computation, though again confined to the Lean model.
- Intrinsically-typed interpreter: Ill-typed programs are unrepresentable, a structural guarantee that eliminates entire classes of bugs but only within Lean’s type system.
- Verified compiler: Semantic preservation universally over all expressions; compiled code produces the same result as interpretation. A stronger claim that quantifies over infinite inputs but remains Lean-only.
- Stack machine: Universal theorems (composition, commutativity, effect additivity) quantify over infinite program spaces with no external transfer.
- Circuit breaker: The uniformity theorem mathematically justifies that bounded testing covers unbounded inputs, enabling Lean proofs to transfer to a Rust implementation via exhaustive model checking. Only this example bridges the verification gap to production code.
Each example illustrates a different verification technique. The Game of Life and verified compiler use native_decide for exhaustive finite computation: Lean evaluates both sides and confirms equality, proof by brute force rather than insight. The stack machine uses structural induction to prove universal properties over infinite program spaces. The circuit breaker combines both: structural induction proves the uniformity theorem, which then justifies exhaustive finite testing as a complete verification technique.
The circuit breaker also demonstrates verification-guided development: we do not verify the Rust code directly. Rust’s ownership system, borrow checker, and imperative features make direct verification impractical. Instead, we carve out the functional core, transcribe it to Lean, prove properties there, and transfer the proofs back through exhaustive testing. The verification gap closes through disciplined transcription and bounded model checking justified by mathematical proof.
The techniques scale far beyond toy examples. Financial systems are a particularly compelling domain: matching engines, order books, and clearing systems where bugs can trigger flash crashes or expose participants to unbounded losses. Trading systems are state machines at heart, and the state machines that move money tend to be predicate-determined in exactly the way that makes bounded model checking viable. The theorems exist in papers, and the implementations exist in production. Verification-guided development bridges them.
Artificial Intelligence
In 2024, a computer solved one of the hardest problems at the International Mathematical Olympiad with a formally verified proof. In 2025, another hit gold-medal standard. The proofs were checked down to the axioms. No trust required. The interesting part is not that machines beat humans at competition math. The interesting part is that reason is becoming executable at scale, and that will change the world. How quickly is an open question.
The Current State
Mathlib now contains 1.9 million lines of formalized mathematics spanning algebra, analysis, topology, and number theory. It grows by thousands of theorems monthly. No single person understands all of it, and no single person needs to. The theorem you formalize today may be imported by a researcher fifty years from now working on problems we cannot imagine. The proof will still check. Meanwhile, neural networks have learned to propose proof steps that formal systems verify. The model guesses, the kernel checks. DeepSeek-Prover and LeanDojo make this practical today. PhysLean is formalizing physics itself: Maxwell’s equations, quantum mechanics, field theory. The tooling has matured faster than most expected.
We should be honest about limits. Higher-order logic is undecidable. Church and Turing settled this in 1936. Formalization is expensive: the Polynomial Freiman-Ruzsa conjecture required 20,000 lines of Lean for a 50-page paper. Some domains resist entirely. Physics says “for large N” and expects you to understand. But within scope, something remarkable becomes possible: certainty. Not high confidence. Certainty. The proof typechecks or it does not.
The Stakes
This kind of work gets funded because it pushes the frontiers of human knowledge. Science foundations fund theorem provers because they see infrastructure for the future of mathematics. Trading firms fund them because they need systems that actually work. Both are right. Knight Capital lost $440 million in 45 minutes from a deployment bug. The code did exactly what it was written to do. It was simply written wrong. Formal methods addresses both failure modes: it verifies that the code matches the specification, and forces you to make the specification precise enough to verify. You cannot prove a theorem you do not understand. For firms whose existence depends on algorithm correctness, this discipline can be existential.
The Prover-Verifier Architecture
The breakthrough behind systems like DeepSeek-Prover-V2 is architectural. The key insight: wrap a neural network in a recursive loop where a formal system acts as judge.
The pipeline works as follows. A large model decomposes a theorem into lemmas. A smaller specialized model attempts to prove each lemma in Lean. The Lean compiler checks the proof. If it fails, the model retries with the error message as feedback. If it passes, the lemma is proven with certainty. The system synthesizes proven lemmas into a complete proof.
This architecture bounds the hallucination problem for formal domains. In standard RLHF, humans grade answers, which is noisy and expensive. In prover-verifier loops, the compiler provides a binary signal: the proof typechecks or it does not. This enables pure reinforcement learning. DeepSeek-R1-Zero learned to reason without any supervised fine-tuning, developing self-correction behaviors purely from trial and error against a verifier. The model discovered “aha moments” on its own.
The synthetic data flywheel accelerates this. DeepSeek generated millions of formal statements, verified them with Lean, and fed the correct proofs back into training. No human annotation required. The loop is self-reinforcing: better provers generate more training data, which trains better provers.
The search strategy matters. DeepSeek-Prover-V2 uses Monte Carlo Tree Search to explore proof paths, backtracking when a path fails. This is the same algorithmic family as AlphaGo, applied to theorem proving. Instead of evaluating board positions, the model evaluates partial proof states.
The broader hypothesis is inference-time scaling: rather than making models bigger (training-time compute), make them think longer (inference-time compute). Early results suggest that letting models reason for more tokens improves accuracy, at least on certain benchmarks. But the upper bound on this improvement remains an open question. Whether inference-time scaling continues to yield gains, or hits diminishing returns at some threshold, is something the next generation of models will determine empirically. Test-time search, parallel rollouts, and recursive self-correction all bet on this dynamic. The bet may pay off. It may not.
Open Problem: Honesty as Strategy
William Vickrey won the 1996 Nobel Prize in Economics for a deceptively simple idea. In an auction where the highest bidder wins but pays only the second-highest bid, lying about your value cannot help you. Truthful bidding is weakly dominant. It is a theorem, provable from first principles.
abbrev Bid := Nat
abbrev Value := Nat
/-- A sealed-bid auction between two bidders -/
structure Auction where
bid1 : Bid
bid2 : Bid
deriving DecidableEq, Repr
/-- Higher bid wins; ties favor bidder 1 -/
def winner (a : Auction) : Fin 2 :=
if a.bid1 ≥ a.bid2 then 0 else 1
/-- Second-price rule: winner pays the losing bid -/
def payment (a : Auction) : Nat :=
if a.bid1 ≥ a.bid2 then a.bid2 else a.bid1
The payoff structure captures the essence: you win if you outbid, but you pay what your opponent bid, not what you bid. Your bid determines whether you win. It does not determine what you pay.
/-- Payoff: value minus payment when winning, zero when losing -/
def payoff (value : Value) (a : Auction) : Int :=
if winner a = 0 then (value : Int) - (payment a : Int) else 0
/-- Bid truthfully: declare your actual value -/
def truthful (value : Value) (otherBid : Bid) : Auction :=
⟨value, otherBid⟩
/-- Bid strategically: declare something else -/
def strategic (altBid : Bid) (otherBid : Bid) : Auction :=
⟨altBid, otherBid⟩
We can already prove that honesty never loses money:
/-- Truthful bidding never loses money -/
theorem truthful_nonneg (value : Value) (otherBid : Bid) :
payoff value (truthful value otherBid) ≥ 0 := by
unfold payoff truthful winner payment
by_cases h : value ≥ otherBid
· simp [h, Int.ofNat_le.mpr h]
· simp [h]
The deeper theorem is that honesty is optimal. No strategic deviation improves your expected outcome:
/--
The weak dominance theorem: no strategic bid beats truthful bidding.
For any valuation, any alternative bid, and any opponent behavior,
telling the truth does at least as well as any lie.
-/
theorem weak_dominance (value : Value) (altBid : Bid) (otherBid : Bid) :
payoff value (truthful value otherBid) ≥
payoff value (strategic altBid otherBid) := by
sorry
Fill in the sorry. The proof is case analysis. Overbidding makes you win auctions you should lose, paying more than the item is worth. Underbidding makes you lose auctions you should win, missing profitable trades. Truthful bidding threads the needle: you win exactly when winning is profitable.
This two-bidder result is a toy, but the insight scales. Combinatorial auctions let participants bid on bundles of assets, expressing preferences like “I want A and B together, or neither.” The optimization becomes NP-hard, but the incentive properties generalize. The VCG mechanism extends Vickrey’s insight to arbitrary allocation problems. Markets that allocate spectrum, landing slots, and financial instruments all descend from these ideas.
OneChronos builds this infrastructure for financial markets. We run combinatorial auctions that match complex orders across multiple securities simultaneously. These kinds of theorems and verification matter because they guarantee properties that no amount of testing could verify: incentive compatibility, efficiency under stated assumptions, bounds on strategic manipulation. These are hard problems at the intersection of optimization, game theory, and formal methods. If that sounds interesting, we are hiring.
Modern Reasoning Models
Frontier models have become increasingly capable at writing Lean. As of December 2025, Gemini 3.5 Pro and Claude Opus 4.5 represent the state of the art for interactive theorem proving. Google reportedly has internal models that perform even better. Six months ago these models struggled with basic tactics; now they can complete non-trivial proofs with guidance. They are not yet autonomous mathematicians, but they are useful collaborators today.
The key to effective AI-assisted theorem proving is giving models access to the proof state. Without it, they generate tactics blind and hallucinate lemma names. With it, they can read the goal, search for relevant theorems, and build proofs interactively. The Model Context Protocol standardizes this interaction, letting AI assistants query external tools through a common interface.
The ML Infrastructure Stack
Progress in neural theorem proving is measured against standardized benchmarks. MiniF2F contains 488 Olympiad-level problems (IMO, AIME, AMC) formalized across multiple proof assistants; state-of-the-art models now exceed 88% on the test set. PutnamBench offers 1724 problems from the Putnam competition, substantially harder, where even the best systems solve under 10%. ProofNet covers 371 undergraduate textbook exercises. Models are evaluated using Pass@N: generate N proof attempts, succeed if any one verifies. Higher N (32, 128, 512) reveals a model’s coverage; Pass@1 measures single-shot accuracy.
LeanDojo is the foundation. It wraps Lean 4 in a Python API, turning theorem proving into an RL environment. You send a tactic; it returns success or an error message. This is the bridge between neural networks and formal verification. Every serious research project in this space builds on it.
Lean Copilot brings this into VS Code. It suggests tactics in real-time as you write proofs, using a local or remote LLM. When you accept a suggestion, Lean immediately verifies it. The human provides high-level guidance; the model fills in tedious proof steps. This collaboration is more productive than either working alone.
llmstep is a lightweight alternative, model-agnostic and easy to integrate. It calls an LLM to suggest the next proof step, designed to work with any backend.
You can build a complete prover-verifier loop today without proprietary models. Use Claude Opus 4.5 or GPT-5.2 as the prover, LeanDojo as the environment, and Lean as the verifier. The stack: prompt the model with the goal state, receive a tactic, check it with Lean, feed errors back as context, repeat. This gives you the reasoning power of frontier models with the guarantees of formal verification.
Setting Up Claude Code
Claude Code is Anthropic’s command-line tool for AI-assisted development. To use it with Lean, you need to connect it to Lean’s language server via MCP. First, ensure you have the uv package manager installed. Then, from your Lean project root (after running lake build), register the MCP server:
claude mcp add lean-lsp uvx lean-lsp-mcp
This installs the lean-lsp-mcp server, which exposes Lean’s language server to Claude. The model can now read diagnostics, inspect goal states, query hover documentation, and search Mathlib using Loogle and LeanSearch.
For richer automation, you can install the lean4-skills plugin, which provides structured workflows for common theorem proving tasks:
/plugin marketplace add cameronfreer/lean4-skills
The plugin adds specialized agents for proof optimization, sorry filling, axiom checking, and compiler-guided repair. They encode patterns that experienced Lean users apply manually, saving time on routine tasks and helping beginners discover effective strategies.
Setting Up Cursor and VS Code
For Cursor or VS Code with an MCP-compatible extension, add the server to your MCP settings:
{ "mcpServers": { "lean-lsp": { "command": "uvx", "args": ["lean-lsp-mcp"] } } }
The workflow is similar: the model reads proof states through the language server and proposes tactics based on the current goal. The human reviews, accepts or rejects, and guides the search. This collaboration is more productive than either working alone.
What Comes Next
Markets are mathematical objects. Combinatorial auctions turn resource allocation into constraint satisfaction problems that reduce to weighted set packing: find the best non-overlapping selection from exponentially many candidates. NP-hard in general, but tractable instances exist, and the boundary between hard and easy is where the interesting mathematics lives. Proving properties about these systems requires exactly the kind of formal verification this series has been building toward: that mechanisms are incentive-compatible, that they converge, that they allocate efficiently under stated assumptions. Every improvement in market mechanism design, every formally verified property of an auction protocol, translates into real systems allocating real resources. Better reasoning about markets means systems that allocate capital more efficiently, and efficient allocation is the difference between prosperity and stagnation.
The European universities doing formal methods research, the quant firms in New York and London, the AI labs in China, all contribute to this ecosystem. DeepSeek’s open-source theorem provers emerged from it. The competition is global but the infrastructure is shared. A trading firm in New York open-sources a proof automation library; researchers in Beijing build on it. An AI lab in Hangzhou releases trained models; mathematicians in Paris use them. Private incentive aligns with public good. The tools developed for trading algorithms can verify medical devices. The techniques refined for financial models can prove properties of cryptographic protocols. And as AI infrastructure itself becomes tradeable, as markets emerge for GPU compute (The AI Boom Needs a Market for Compute), data center capacity, and model inference, the same auction theory applies. The resources that train the models are allocated by the mechanisms the models might one day verify.
AI agents will increasingly act in markets: trading, lending, allocating, optimizing. This is already happening today. The question is not whether but how. An AI agent can be constrained by rules, but only if those rules are precise enough to check. Natural language policies are suggestions. Formally verified constraints are guarantees. Imagine market infrastructure where agents must prove, before executing, that their actions satisfy regulatory constraints, risk limits, fairness properties. Not “we reviewed the code” but “the system verified the proof.” The agent that cannot demonstrate compliance cannot act. Formal constraints are not a limitation on AI autonomy. They can be the tools that make AI autonomy safe.
We are building that infrastructure now, whether we recognize it or not. Every verified auction protocol, every theorem about market equilibria, becomes a potential constraint on future autonomous systems. The practical question is not whether to allow AI agents in markets but how to make them work well. Formal verification offers something concrete: constraints that actually constrain, rules that cannot be silently violated, guarantees that hold regardless of what the model learned.
If these trends continue, we may be witnessing another industrial revolution. Before mechanization, most work in the physics sense of force times distance was done by muscles. The steam engine and its descendants changed that. Most thinking used to be done by brains. In the near or far future, that may change too. This is not inevitable. It presumes breakthroughs that are plausible but not guaranteed. But the trajectory is clear enough to take seriously.
Conclusion
If you are reading this as a student or someone early in your career: this stuff is fun. Watching a proof come together, seeing the goal state shrink to nothing, getting that green checkmark from the compiler when everything finally clicks. It is like solving puzzles, except the puzzles are deep and the solutions last. The theorems you formalize will still be valid when you are gone. That is a strange thing to be able to say about your work. The field is small enough that you can make real contributions and growing fast enough that there is plenty to do.
The work is hard. The learning curve is real. There will be days when the goal state mocks you and nothing seems to work. This is normal. One lemma at a time, one proof at a time. The field is growing fast, and you can be part of it.
Resources
Libraries & Tools
- Mathlib: The formalized mathematics library
- PhysLean: Formalizing physics in Lean
- LeanDojo: ML infrastructure for theorem proving
- Lean Copilot: Neural inference in Lean
- llmstep: Lightweight model-agnostic tactic suggestion
- lean-lsp-mcp: MCP server for Lean interaction
- LeanExplore: Semantic search across Mathlib
Models & Reproductions
- DeepSeek-Prover-V2-671B: 671B parameter model built on DeepSeek-V3-Base, achieves 88.9% on MiniF2F-test using recursive subgoal decomposition and RL with binary verification feedback
- DeepSeek-Prover-V1.5: Prover-verifier codebase
- DeepSeek-R1: Reasoning model weights and documentation
- TinyZero: Minimal reproduction of DeepSeek-R1-Zero reasoning
- Open-R1: HuggingFace’s open reproduction of the R1 pipeline
- Verifiers: Modular MCTS and search for LLM reasoning
Papers & Analysis
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs: The R1-Zero paper on pure RL reasoning
- DeepSeek-Prover-V2: Formal theorem proving with tree search
- Process-Driven Autoformalization (FormL4): Compiler-guided natural language to Lean translation
- AI and Formal Verification: Kleppmann on the convergence of LLMs and proof assistants
- Technical Deep Dive: DeepSeek: Raschka’s architectural analysis
References
Official Documentation
- Lean 4 Manual
- Theorem Proving in Lean 4
- Functional Programming in Lean
- Metaprogramming in Lean 4
- Mathematics in Lean
- Mathlib Documentation
- Lean Zulip Chat
Theorem Proving Games
- Natural Number Game - HHU Düsseldorf
- Real Analysis Game - Rutgers University
- Reintroduction to Proofs - A game introducing proofs, dependent type theory, and Lean prepared by Emily Riehl for a first year seminar at Johns Hopkins (Fall 2025). Covers types, functions, products, coproducts, quantifiers, and dependent types through interactive puzzles. Source
University Courses (Lean 4)
- Functional Programming and Theorem Proving - Stanford University
- Formal Proof and Verification - Brown University
- The Mechanics of Proof - Fordham University
- Formalising Mathematics - Imperial College London
- Formalized Mathematics in Lean - University of Bonn
- Interactive Theorem Proving - LMU Munich
- Proofs and Programs - Indian Institute of Science
- Theorem Proving with Lean - University of Warwick
- Logic and Mechanized Reasoning - Carnegie Mellon University
- Lean for Scientists and Engineers - University of Maryland
- An Introduction to Lean 4 - Universitat de València
- Interactive Theorem Proving in Lean - MPI Leipzig
- The Hitchhiker’s Guide to Logical Verification - Various institutions
- Formal Methods in Mathematics - Mastermath (Netherlands)
- Logique et démonstrations assistées - Université Paris-Saclay
- Semantics and Verification of Software - RWTH Aachen
- Formal Proof - Ohio State University
- Lean Community Course Catalog - Full listing
University Courses (Lean 3)
- Logic and Proof - Carnegie Mellon University
- Modern Mathematics with Lean - University of Exeter
- Graduate Introduction to Logic - University of Hawaii
- Introduction to Proofs with Lean - Johns Hopkins University
- Logic and Modelling - Vrije Universiteit Amsterdam
- Harvard MATH 161 - University course on theorem proving with Lean and Mathlib.
Appendix A: Syntax Comparison
For readers coming from other functional languages, these tables map familiar syntax to Lean equivalents. The goal is not completeness but orientation: enough to read Lean code without constantly consulting documentation.
Type Declarations
| Concept | Haskell | OCaml | Lean 4 |
|---|---|---|---|
| Type alias | type Name = String | type name = string | abbrev Name := String |
| Product type | data Point = Point Int Int | type point = { x: int; y: int } | structure Point where x : Int; y : Int |
| Sum type | data Maybe a = Nothing | Just a | type 'a option = None | Some of 'a | inductive Option (α : Type) where ... |
| Recursive type | data List a = Nil | Cons a (List a) | type 'a list = Nil | Cons of 'a * ... | inductive List (α : Type) where ... |
| Type class | class Eq a where (==) :: a -> a -> Bool | N/A (use modules) | class Eq (α : Type) where eq : α → α → Bool |
| Instance | instance Eq Int where ... | N/A | instance : Eq Int where ... |
Function Definitions
| Concept | Haskell | OCaml | Lean 4 |
|---|---|---|---|
| Named function | f x = x + 1 | let f x = x + 1 | def f (x : Nat) := x + 1 |
| Lambda | \x -> x + 1 | fun x -> x + 1 | fun x => x + 1 |
| Type signature | f :: Int -> Int | val f : int -> int | def f : Int → Int |
| Pattern matching | case x of { Just a -> ...; Nothing -> ... } | match x with Some a -> ... | None -> ... | match x with | some a => ... | none => ... |
| Guards | f x | x > 0 = ... | otherwise = ... | N/A (use if) | if x > 0 then ... else ... |
| Where clause | f x = y + 1 where y = x * 2 | let f x = let y = x * 2 in y + 1 | def f x := let y := x * 2; y + 1 |
| Partial application | map (+1) | List.map ((+) 1) | List.map (· + 1) |
Monads and Effects
| Concept | Haskell | OCaml | Lean 4 |
|---|---|---|---|
| Bind | x >>= f or do { a <- x; f a } | N/A (use let*) | x >>= f or do let a ← x; f a |
| Return | return x or pure x | N/A | pure x |
| Monad transformer | StateT s m a | N/A | StateT σ m α |
| IO action | IO a | unit -> 'a | IO α |
putStrLn "hello" | print_endline "hello" | IO.println "hello" |
Common Operations
| Concept | Haskell | OCaml | Lean 4 |
|---|---|---|---|
| List literal | [1, 2, 3] | [1; 2; 3] | [1, 2, 3] |
| List cons | x : xs | x :: xs | x :: xs |
| List map | map f xs | List.map f xs | xs.map f or List.map f xs |
| List filter | filter p xs | List.filter p xs | xs.filter p |
| Function composition | f . g | N/A (use fun x -> f (g x)) | f ∘ g |
| String concat | s1 ++ s2 | s1 ^ s2 | s1 ++ s2 |
| Tuple | (a, b) | (a, b) | (a, b) |
| Tuple access | fst p, snd p | fst p, snd p | p.1, p.2 |
| If expression | if c then t else f | if c then t else f | if c then t else f |
Key Differences
Explicit types: Lean requires explicit type annotations more often than Haskell. Where Haskell infers id x = x has type a -> a, Lean prefers def id (x : α) : α := x.
Unicode: Lean uses unicode operators freely: → for function types, ∀ for universal quantification, ∧ for conjunction. ASCII alternatives exist (->, forall, /\) but idiomatic Lean uses unicode.
Termination: Every Lean function must terminate. Haskell allows infinite loops; Lean rejects them. Use partial for functions you cannot prove terminating.
Dependent types: Lean’s (n : Nat) → Vector α n has no Haskell equivalent. Types depending on values is what makes Lean a theorem prover.
Propositions vs Booleans: Lean distinguishes Prop (logical propositions, erased at runtime) from Bool (computational booleans). Haskell’s Bool is both.
Appendix B: Toplevel Declarations
Every Lean file is a sequence of toplevel declarations. These are the building blocks of every program and proof. This appendix provides a quick reference for all declaration types, with links to detailed explanations in the main text.
Definitions and Proofs
| Declaration | Purpose | Example |
|---|---|---|
def | Define a value or function | Basics |
theorem | State and prove a proposition (opaque) | Basics, Proving |
lemma | Same as theorem | Proving |
example | Anonymous proof (not saved) | Type Theory |
abbrev | Transparent abbreviation | Basics |
opaque | Hide implementation | Proofs |
axiom | Unproven assumption | Proofs |
The distinction between def and theorem matters for performance. Lean marks theorem proofs as opaque, meaning they are never unfolded during type checking. This keeps proof terms from bloating computations. Use def for values you need to compute with and theorem for propositions you need to prove.
Type Declarations
| Declaration | Purpose | Example |
|---|---|---|
inductive | Define type with constructors | Data Structures |
structure | Single-constructor with fields | Data Structures |
class | Type class interface | Polymorphism |
instance | Type class implementation | Polymorphism |
mutual | Mutually recursive definitions | Dependent Types |
Organization
| Declaration | Purpose | Example |
|---|---|---|
import | Load another module | Basics |
variable | Auto-add to definitions | Basics |
namespace | Group under prefix | Basics |
section | Scope for variables | Basics |
open | Bring names into scope | Basics |
universe | Declare universe levels | Type Theory |
attribute | Attach metadata | Polymorphism |
export | Re-export from namespace | Basics |
notation | Custom syntax | Dependent Types |
set_option | Configure compiler | Type Theory |
Interactive Commands
| Command | Purpose | Example |
|---|---|---|
#eval | Evaluate and print | Basics |
#check | Display type | Basics |
#print | Print declaration info | Basics |
#reduce | Reduce to normal form | Basics |
These commands are prefixed with # to distinguish them from regular declarations. They produce output but do not contribute to the compiled program. Use them liberally during development to inspect types and evaluate expressions.
Tactics Reference
Robin Milner’s LCF system introduced a radical idea in the 1970s: let users extend the theorem prover with custom proof procedures, but channel all proof construction through a small trusted kernel. You could write arbitrarily clever automation, and if it produced a proof, that proof was guaranteed valid. Tactics are this idea fully realized. They are programs that build proofs, metaprograms that manipulate the proof state, search procedures that explore the space of possible arguments. When you write simp and Lean simplifies your goal through dozens of rewrite steps, you are invoking a sophisticated algorithm. When you write omega and Lean discharges a linear arithmetic obligation, you are running a decision procedure. The proof terms these tactics construct may be enormous, but they are checked by the kernel, and the kernel is small enough to trust. Think of tactics as the code you write, and the kernel as the one colleague who actually reads your pull requests.
Table of Contents
The following covers all the major tactics in Lean 4 and Mathlib. Click on any tactic name to jump to its documentation and examples.
abel- Prove equalities in abelian groupsaesop- General automation tacticall_goals- Apply tactic to all current goalsany_goals- Apply tactic to any applicable goalapply- Apply hypotheses or lemmas to solve goalsapply_fun- Apply function to both sides of equalityassumption- Use hypothesis matching goalbound- Prove inequalities from structureby_cases- Perform case splittingby_contra- Proof by contradictioncalc- Chain equations and inequalitiescases- Case analysis on inductive typeschoose- Extract choice function from forall-existscongr- Prove equality using congruence rulesconstructor- Break down conjunctions, existentials, and iffcontradiction- Find contradictions in hypothesesconv- Targeted rewriting in specific partsconvert- Prove by showing goal equals type of expressiondecide- Run decision proceduresexact- Provide an exact proof termexfalso- Prove anything from Falseext- Prove equality of functions extensionallyfield_simp- Simplify field expressionsfin_cases- Split finite type into casesfirst- Try tactics until one succeedsfocus- Limit tactics to first goalgcongr- Prove inequalities using congruencegeneralize- Replace expressions with variablesgrind- Proof search using congruence closuregroup- Prove equalities in groupshave- Introduce new hypotheseshint- Get tactic suggestionsinduction- Perform inductive proofsinterval_cases- Split bounded values into casesintro- Introduce assumptions from implications and quantifiersleft- Choose left side of disjunctionlift- Lift variable to higher typelinarith- Prove linear inequalitieslinear_combination- Prove from linear combinationsmodule- Prove equalities in modulesnlinarith- Handle nonlinear inequalitiesnoncomm_ring- Prove in non-commutative ringsnorm_cast- Simplify by moving casts outwardnorm_num- Simplify numerical expressionsnth_rw- Rewrite only the nth occurrenceobtain- Destructure existentials and structuresomega- Solve linear arithmetic over Nat and Intpick_goal- Move specific goal to frontpositivity- Prove positivity goalspush_cast- Push casts inwardpush_neg- Push negations inwardqify- Shift to rationalsrefine- Apply with holes to fill laterrename- Rename hypotheses for clarityrepeat- Apply tactic repeatedly until failsrevert- Move hypotheses back to the goalrfl- Prove by reflexivityright- Choose right side of disjunctionring- Prove equalities in commutative ringsrw- Rewrite using equalitiessimp- Apply simplification lemmassimp_all- Simplify everything including hypothesessimp_rw- Rewrite with simplification at each stepsmt- Discharge goals to external SMT solverssorry- Admit goal without proofspecialize- Instantiate hypothesis with specific argumentssplit- Handle if-then-else and pattern matchingsplit_ifs- Case on if-then-else expressionssubst- Substitute variable with its valueswap- Swap first two goalssymm- Swap symmetric relationstauto- Prove logical tautologiestrans- Split transitive relationstrivial- Prove simple goals automaticallytry- Attempt tactic, continue if failsuse- Provide witnesses for existential goalszify- Shift natural numbers to integers
Logical Connectives
intro
The intro tactic moves hypotheses from the goal into the local context. When your goal is ∀ x, P x or P → Q, using intro names the bound variable or assumption and makes it available for use in the proof.
theorem intro_apply : ∀ x : Nat, x = x → x + 0 = x := by
intro x h -- Introduce x and hypothesis h : x = x
simp -- Simplify x + 0 = x
constructor
The constructor tactic applies the first constructor of an inductive type to the goal. For And (conjunction), it splits the goal into two subgoals. For Exists, it expects you to provide a witness. For Iff, it creates subgoals for both directions.
theorem constructor_example : True ∧ True := by
constructor
· trivial -- Prove first True
· trivial -- Prove second True
left and right
The left and right tactics select which side of a disjunction to prove. When your goal is P ∨ Q, use left to commit to proving P or right to prove Q.
theorem or_example : 5 < 10 ∨ 10 < 5 := by
left
simp
use
The use tactic provides a concrete witness for an existential goal. When your goal is ∃ x, P x, using use t substitutes t for x and leaves you to prove P t.
theorem use_example : ∃ x : Nat, x * 2 = 10 := by
exact ⟨5, rfl⟩
obtain
The obtain tactic extracts components from existential statements and structures in hypotheses. It combines have and pattern matching, letting you name both the witness and the proof simultaneously.
theorem have_example (x : Nat) : x + 0 = x := by
have h : x + 0 = x := by simp
exact h
theorem obtain_example (h : ∃ x : Nat, x > 5 ∧ x < 10) : ∃ y, y = 7 := by
obtain ⟨_x, _hgt, _hlt⟩ := h -- Destructure the existential
exact ⟨7, rfl⟩
Applying Lemmas
exact
The exact tactic closes a goal by providing a term whose type matches the goal exactly. It performs no additional unification or elaboration beyond what is necessary.
theorem exact_example (h : 2 + 2 = 4) : 4 = 2 + 2 := by
exact h.symm
/-- `refine` allows holes with ?_ -/
theorem refine_example : ∃ x : Nat, x > 5 := by
refine ⟨10, ?_⟩ -- Use 10, but leave proof as a hole
simp -- Fill the hole: prove 10 > 5
apply
The apply tactic works backwards from the goal. Given a lemma h : A → B and a goal B, using apply h reduces the goal to proving A. It unifies the conclusion of the lemma with the current goal.
refine
The refine tactic is like exact but allows placeholders written as ?_ that become new goals. This lets you partially specify a proof term while deferring some parts.
theorem exact_example (h : 2 + 2 = 4) : 4 = 2 + 2 := by
exact h.symm
/-- `refine` allows holes with ?_ -/
theorem refine_example : ∃ x : Nat, x > 5 := by
refine ⟨10, ?_⟩ -- Use 10, but leave proof as a hole
simp -- Fill the hole: prove 10 > 5
convert
The convert tactic applies a term to the goal even when the types do not match exactly, generating side goals for the mismatches. It is useful when you have a lemma that is almost but not quite what you need.
theorem convert_example (x y : Nat) (h : x = y) : Nat.succ x = Nat.succ y := by
convert rfl using 1
rw [h]
specialize
The specialize tactic instantiates a universally quantified hypothesis with concrete values, replacing the general statement with a specific instance in your context.
theorem specialize_example (h : ∀ x : Nat, x > 0 → x ≥ 1) : 5 ≥ 1 := by
specialize h 5 (by simp)
exact h
Context Manipulation
have
The have tactic introduces a new hypothesis into the context. You state what you want to prove as an intermediate step, prove it, and then it becomes available for the rest of the proof.
theorem have_example (x : Nat) : x + 0 = x := by
have h : x + 0 = x := by simp
exact h
theorem obtain_example (h : ∃ x : Nat, x > 5 ∧ x < 10) : ∃ y, y = 7 := by
obtain ⟨_x, _hgt, _hlt⟩ := h -- Destructure the existential
exact ⟨7, rfl⟩
rename
The rename tactic changes the name of a hypothesis in the local context, making proofs more readable when auto-generated names are unclear.
theorem rename_example (h : 1 = 1) : 1 = 1 := by
exact h
revert
The revert tactic is the inverse of intro. It moves a hypothesis from the context back into the goal as an implication or universal quantifier, which is useful before applying induction or certain lemmas.
theorem revert_example (x : Nat) (h : x = 5) : x = 5 := by
revert h x
intro x h
exact h
generalize
The generalize tactic replaces a specific expression in the goal with a fresh variable, abstracting over that value. This is useful when you need to perform induction on a compound expression.
theorem generalize_example : (2 + 3) * 4 = 20 := by
simp
Rewriting and Simplifying
rw (rewrite)
The rw tactic replaces occurrences of the left-hand side of an equality with the right-hand side. Use rw [←h] to rewrite in the reverse direction. Multiple rewrites can be chained in a single rw [h1, h2, h3].
theorem rw_example (a b : Nat) (h : a = b) : a + 2 = b + 2 := by
rw [h] -- Rewrite a to b using h
theorem simp_example (x : Nat) : x + 0 = x ∧ 0 + x = x := by
simp -- Simplifies both sides
Tip
rwrewrites the first occurrence it finds. Userw [h] at hypto rewrite in a hypothesis instead of the goal. If rewriting fails due to dependent types or metavariables, trysimp_rwwhich handles these cases more gracefully. Usenth_rw n [h]to target a specific occurrence.
simp
The simp tactic repeatedly applies lemmas marked with @[simp] to simplify the goal. It handles common algebraic identities, list operations, and logical simplifications automatically.
theorem rw_example (a b : Nat) (h : a = b) : a + 2 = b + 2 := by
rw [h] -- Rewrite a to b using h
theorem simp_example (x : Nat) : x + 0 = x ∧ 0 + x = x := by
simp -- Simplifies both sides
Tip
Use
simp only [lemma1, lemma2]for reproducible proofs. Baresimpcan break when new simp lemmas are added to the library. Usesimp?to see which lemmas were applied, then replace withsimp only [...]for stability. In Mathlib code reviews, baresimpat non-terminal positions is discouraged.
simp_all
The simp_all tactic simplifies both the goal and all hypotheses simultaneously, using each simplified hypothesis to help simplify the others.
theorem simp_all_example (x : Nat) (h : x = 0) : x + x = 0 := by
simp_all
simp_rw
The simp_rw tactic rewrites using the given lemmas but applies simplification at each step, which helps when rewrites would otherwise fail due to associativity or other issues.
theorem simp_rw_example (x y : Nat) : (x + y) + (y + x) = 2 * (x + y) := by
simp_rw [Nat.add_comm y x]
ring
nth_rw
The nth_rw tactic rewrites only a specific occurrence of a pattern, counting from 1. This gives precise control when an expression appears multiple times and you only want to change one instance.
theorem nth_rewrite_example (x : Nat) : x + x + x = 3 * x := by
nth_rw 2 [← Nat.add_zero x] -- Rewrite only the second occurrence of x
simp
ring
norm_num
The norm_num tactic evaluates and simplifies numeric expressions, proving goals like 2 + 2 = 4 or 7 < 10 by computation. It handles arithmetic in various number types.
theorem norm_num_example : 2 ^ 3 + 5 * 7 = 43 := by
norm_num
norm_cast
The norm_cast tactic normalizes expressions involving type coercions by pushing casts outward and combining them, making goals about mixed numeric types easier to prove.
theorem norm_cast_example (n : Nat) : (n : Int) + 1 = ((n + 1) : Int) := by
norm_cast
push_cast
The push_cast tactic pushes type coercions inward through operations, distributing a cast over addition, multiplication, and other operations.
set_option linter.unusedTactic false in
theorem push_cast_example (n m : Nat) : ((n + m) : Int) = (n : Int) + (m : Int) := by
push_cast
rfl
conv
The conv tactic enters a conversion mode that lets you navigate to specific subexpressions and rewrite only there. It is invaluable when rw affects the wrong occurrence or when you need surgical precision.
theorem conv_example (x y : Nat) : x + y = y + x := by
conv =>
lhs -- Focus on left-hand side
rw [Nat.add_comm]
Tip
Navigation commands in
convmode:lhs/rhsselect sides of an equation,arg nselects the nth argument,extintroduces binders, andenter [1, 2]navigates by path. Useconv_lhsorconv_rhsas shortcuts when you only need to work on one side of an equation.
Reasoning with Relations
rfl (reflexivity)
The rfl tactic proves goals of the form a = a where both sides are definitionally equal. It works even when the equality is not syntactically obvious but follows from definitions.
theorem rfl_example (x : Nat) : x = x := by
rfl
symm
The symm tactic reverses a symmetric relation like equality. If your goal is a = b and you have h : b = a, using symm on h or the goal makes them match.
theorem symm_example (x y : Nat) (h : x = y) : y = x := by
symm
exact h
trans
The trans tactic splits a transitive goal like a = c into two subgoals a = b and b = c for a chosen intermediate value b. It works for any transitive relation.
theorem trans_example (a b c : Nat) (h1 : a ≤ b) (h2 : b ≤ c) : a ≤ c := by
trans b
· exact h1
· exact h2
subst
The subst tactic eliminates a variable by substituting it everywhere with an equal expression. Given h : x = e, using subst h replaces all occurrences of x with e and removes x from the context.
theorem subst_example (x y : Nat) (h : x = 5) : x + y = 5 + y := by
subst h
rfl
ext (extensionality)
The ext tactic proves equality of functions, sets, or structures by showing they agree on all inputs or components. It introduces the necessary variables and reduces the goal to pointwise equality.
theorem ext_example (f g : Nat → Nat)
(h : ∀ x, f x = g x) : f = g := by
ext x
exact h x
calc
The calc tactic provides a structured way to write chains of equalities or inequalities. Each step shows the current expression, the relation, and the justification, mirroring traditional mathematical proofs.
theorem calc_example (a b c : Nat)
(h1 : a = b) (h2 : b = c) : a = c := by
calc a = b := h1
_ = c := h2
apply_fun
The apply_fun tactic applies a function to both sides of an equality hypothesis. It automatically generates a side goal requiring the function to be injective when needed.
theorem apply_fun_example (x y : Nat) (h : x = y) : x + 2 = y + 2 := by
apply_fun (· + 2) at h
exact h
congr
The congr tactic reduces an equality goal f a = f b to proving a = b, applying congruence recursively. It handles nested function applications by breaking them into component equalities.
theorem congr_example (f : Nat → Nat) (x y : Nat) (h : x = y) : f x = f y := by
congr
gcongr
The gcongr tactic proves inequalities by applying monotonicity lemmas. It automatically finds and applies lemmas showing that operations preserve ordering, such as adding to both sides of an inequality.
theorem gcongr_example (x y a b : Nat) (h1 : x ≤ y) (h2 : a ≤ b) : x + a ≤ y + b := by
gcongr
linear_combination
The linear_combination tactic proves an equality by showing it follows from a linear combination of given hypotheses. You specify the coefficients, and it verifies the algebra.
theorem linear_combination_example (x y : ℚ) (h1 : 2*x + y = 4) (h2 : x + 2*y = 5) :
x + y = 3 := by
linear_combination (h1 + h2) / 3
positivity
The positivity tactic proves goals asserting that an expression is positive, nonnegative, or nonzero. It analyzes the structure of the expression and applies appropriate lemmas automatically.
set_option linter.unusedVariables false in
theorem positivity_example (x : ℚ) (h : 0 < x) : 0 < x^2 + x := by
positivity
bound
The bound tactic proves inequality goals by recursively analyzing expression structure and applying bounding lemmas. It is particularly effective for expressions built from well-behaved operations.
theorem bound_example (x y : ℕ) : x ≤ x + y := by
bound
Reasoning Techniques
cases
The cases tactic performs case analysis on an inductive type, creating separate subgoals for each constructor. For a natural number, it splits into the zero case and the successor case.
theorem cases_example (n : Nat) : n = 0 ∨ n > 0 := by
cases n with
| zero => left; rfl
| succ _m => right; exact Nat.succ_pos _
theorem induction_example (n : Nat) : n + 0 = n := by
induction n with
| zero => rfl
| succ n _ih => rfl
induction
The induction tactic sets up a proof by induction on an inductive type. It creates a base case for each non-recursive constructor and an inductive case with an induction hypothesis for each recursive constructor.
theorem cases_example (n : Nat) : n = 0 ∨ n > 0 := by
cases n with
| zero => left; rfl
| succ _m => right; exact Nat.succ_pos _
theorem induction_example (n : Nat) : n + 0 = n := by
induction n with
| zero => rfl
| succ n _ih => rfl
Tip
Use
induction n with | zero => ... | succ n ih => ...for structured case syntax. If your induction hypothesis is too weak, tryreverton additional variables before inducting, or useinduction n generalizing x yto strengthen the hypothesis. For mutual or nested induction, considerinduction ... usingwith a custom recursor.
split
The split tactic splits goals involving if-then-else expressions or pattern matching into separate cases. It creates subgoals for each branch with the appropriate condition as a hypothesis.
def abs (x : Int) : Nat :=
if x ≥ 0 then x.natAbs else (-x).natAbs
theorem split_example (x : Int) : abs x ≥ 0 := by
unfold abs
split <;> simp
split_ifs
The split_ifs tactic finds all if-then-else expressions in the goal and splits on their conditions, creating cases for each combination of true and false branches.
theorem split_ifs_example (p : Prop) [Decidable p] (x y : Nat) :
(if p then x else y) ≤ max x y := by
split_ifs
· exact le_max_left x y
· exact le_max_right x y
contradiction
The contradiction tactic closes the goal by finding contradictory hypotheses in the context, such as h1 : P and h2 : ¬P, or an assumption of False.
theorem contradiction_example (h1 : False) : 0 = 1 := by
contradiction
theorem exfalso_example (h : 0 = 1) : 5 = 10 := by
exfalso -- Goal becomes False
simp at h
exfalso
The exfalso tactic changes any goal to False, applying the principle of explosion. Use this when you can derive a contradiction from your hypotheses.
theorem contradiction_example (h1 : False) : 0 = 1 := by
contradiction
theorem exfalso_example (h : 0 = 1) : 5 = 10 := by
exfalso -- Goal becomes False
simp at h
by_contra
The by_contra tactic starts a proof by contradiction. It adds the negation of the goal as a hypothesis and changes the goal to False, requiring you to derive a contradiction.
theorem by_contra_example : ∀ n : Nat, n = n := by
intro n
rfl
Proof of negation vs proof by contradiction: These are often confused but differ in an important way. A proof of negation proves ¬P by assuming P and deriving False. This is constructive since ¬P is defined as P → False. A proof by contradiction proves P by assuming ¬P and deriving False. This requires classical logic (double negation elimination) because you must go from ¬¬P to P. The by_contra tactic performs proof by contradiction and relies on Classical.byContradiction. If you are proving a negation, you can use intro h instead, which is constructive.
push_neg
The push_neg tactic pushes negations through quantifiers and connectives using De Morgan’s laws. It transforms ¬∀ x, P x into ∃ x, ¬P x and similar patterns.
theorem push_neg_example : ¬(∀ x : Nat, ∃ y, x < y) ↔ ∃ x : Nat, ∀ y, ¬(x < y) := by
push_neg
rfl
by_cases
The by_cases tactic splits the proof into two cases based on whether a proposition is true or false, adding the proposition as a hypothesis in one branch and its negation in the other.
theorem by_cases_example (p : Prop) : p ∨ ¬p := by
by_cases h : p
· left; exact h
· right; exact h
choose
The choose tactic extracts a choice function from a hypothesis of the form ∀ x, ∃ y, P x y. It produces a function f and a proof that ∀ x, P x (f x).
theorem choose_example (h : ∀ x : Nat, ∃ y : Nat, x < y) :
∃ f : Nat → Nat, ∀ x, x < f x := by
choose f hf using h
exact ⟨f, hf⟩
lift
The lift tactic replaces a variable with one of a more specific type when you have a proof justifying the lift. For example, lifting an integer to a natural number given a proof it is nonnegative.
theorem lift_example (n : ℤ) (hn : 0 ≤ n) : ∃ m : ℕ, (m : ℤ) = n := by
lift n to ℕ using hn
exact ⟨n, rfl⟩
zify
The zify tactic converts a goal about natural numbers to one about integers, which often makes subtraction and other operations easier to handle since integers are closed under subtraction.
theorem zify_example (n m : ℕ) (_ : n ≥ m) : (n - m : ℤ) = n - m := by
zify
qify
The qify tactic converts a goal about integers or naturals to one about rationals, enabling division and making certain algebraic manipulations possible.
theorem qify_example (n m : ℕ) : (n : ℚ) / (m : ℚ) = (n / m : ℚ) := by
norm_cast
Searching
assumption
The assumption tactic closes the goal if there is a hypothesis in the context that exactly matches. It searches through all available hypotheses to find one with the right type.
theorem assumption_example (P Q : Prop) (h1 : P) (_h2 : Q) : P := by
assumption -- Finds h1
trivial
The trivial tactic tries a collection of simple tactics including rfl, assumption, and contradiction to close easy goals without you specifying which approach to use.
theorem trivial_example : True := by
trivial
decide
The decide tactic evaluates decidable propositions by computation. For finite checks like 2 < 5 or membership in a finite list, it simply computes the answer and closes the goal.
theorem decide_example : 3 < 5 := by
decide
Note
decideworks in the kernel and produces small proof terms but can be slow.native_decidecompiles to native code and runs faster but produces larger proof terms that just assert the result. For quick checks usedecide; for expensive computations like verifying grid states in our Game of Life proofs,native_decideis essential.
hint
The hint tactic suggests which tactics might make progress on the current goal. It is a discovery tool that helps when you are unsure how to proceed.
theorem hint_example : 2 + 2 = 4 := by
simp -- hint would suggest this
General Automation
omega
The omega tactic is a decision procedure for linear arithmetic over natural numbers and integers. It handles goals involving addition, subtraction, multiplication by constants, and comparisons.
theorem omega_example (x y : Nat) : x < y → x + 1 ≤ y := by
omega
Note
omegahandlesNatandIntbut notRatorReal. It solves linear constraints but fails on nonlinear multiplication likex * y < z. For rationals, trylinarithafterqify. For nonlinear goals, trynlinarithorpolyrith.
linarith
The linarith tactic proves goals that follow from linear arithmetic over ordered rings. It combines hypotheses about inequalities to derive the goal using Fourier-Motzkin elimination.
theorem linarith_example (x y z : ℚ) (h1 : x < y) (h2 : y < z) : x < z := by
linarith
nlinarith
The nlinarith tactic extends linarith to handle some nonlinear goals by first preprocessing with polynomial arithmetic before applying linear reasoning.
theorem nlinarith_example (x : ℚ) (h : x > 0) : x^2 > 0 := by
nlinarith
smt
The smt tactic discharges goals to an external SMT solver like Z3 or cvc5. SMT (Satisfiability Modulo Theories) solvers are battle-tested tools that combine SAT solving with decision procedures for arithmetic, arrays, bitvectors, and uninterpreted functions. When omega or linarith cannot handle your goal because it involves function symbols or complex quantifier patterns, an SMT solver often can.
The smt tactic translates your goal to SMT-LIB format, calls the solver, and if the solver returns “unsatisfiable” (meaning your goal is valid), it reconstructs a proof in Lean. This is not a trusted oracle; the proof is checked by Lean’s kernel.
example (x y : Int) (h1 : x < y) (h2 : y < x + 1) : False := by
smt [h1, h2]
example (a b c : Int) (h1 : a + b = c) (h2 : a = b) : 2 * b = c := by
smt [h1, h2]
example (p q r : Prop) : (p → q) → (q → r) → p → r := by
smt
SMT solvers excel at uninterpreted functions, reasoning about function applications without knowing what the functions compute:
example (f : Int → Int) (x y : Int) (h1 : x = y) : f x = f y := by
smt [h1]
example (f : Int → Int) (x y z : Int)
(h1 : x = y) (h2 : y = z) : f x = f z := by
smt [h1, h2]
They handle quantifiers through instantiation heuristics, though this can be unpredictable:
example (f : Int → Int) (h : ∀ x, f x = x + 1) : f 5 = 6 := by
smt [h]
example (h : ∀ x : Int, x < x + 1) : ∃ y : Int, 0 < y := by
smt [h]
The real power emerges when combining theories. Here the solver mixes arithmetic with uninterpreted functions:
example (f : Int → Int) (x : Int)
(h1 : f x > 0) (h2 : f x < 2) : f x = 1 := by
smt [h1, h2]
example (f : Int → Int → Int) (a b : Int)
(h1 : f a b = a + b)
(h2 : a = 3)
(h3 : b = 4) : f a b = 7 := by
smt [h1, h2, h3]
Note
The
smttactic requires setup. First, install an SMT solver:
- macOS:
brew install z3- Ubuntu:
apt install z3Then add the lean-smt library to your
lakefile.lean:require smt from git "https://github.com/ufmg-smite/lean-smt.git" @ "main"Import with
import Smt. Check the lean-smt repository for compatible Lean versions and supported solvers (Z3 and cvc5). The examples above are standalone and not part of this book’s build; copy them to your own project to try them.
ring
The ring tactic proves polynomial equalities in commutative rings by normalizing both sides to a canonical form and checking if they match. It handles addition, multiplication, and powers.
theorem ring_example (x y : ℤ) : (x + y)^2 = x^2 + 2*x*y + y^2 := by
ring
noncomm_ring
The noncomm_ring tactic proves equalities in non-commutative rings where multiplication order matters, such as matrix rings or quaternions.
theorem noncomm_ring_example (x y z : ℤ) : x * (y + z) = x * y + x * z := by
ring
field_simp
The field_simp tactic clears denominators in field expressions by multiplying through, reducing goals involving fractions to polynomial equalities that ring can handle.
theorem field_simp_example (x y : ℚ) (hy : y ≠ 0) : x / y + 1 = (x + y) / y := by
field_simp
abel
The abel tactic proves equalities in abelian groups by normalizing expressions involving addition, subtraction, and negation to a canonical form.
theorem abel_example (x y z : ℤ) : x + y + z = z + x + y := by
abel
group
The group tactic proves equalities in groups using the group axioms. It handles multiplication, inverses, and the identity element, normalizing expressions to compare them.
theorem group_example (x y : ℤ) : x + (-x + y) = y := by
group
module
The module tactic proves equalities in modules over a ring, handling scalar multiplication and vector addition to normalize expressions.
theorem module_example (x y : ℤ) (a : ℤ) : a • (x + y) = a • x + a • y := by
module
aesop
The aesop tactic is a general-purpose automation tactic that combines many strategies including simplification, introduction rules, and case splitting to solve goals automatically.
theorem aesop_example (p q r : Prop) : p → (p → q) → (q → r) → r := by
aesop
Tip
aesopis powerful but can be slow on complex goals. Useaesop?to see what it did, then extract a faster proof. Register custom lemmas with@[aesop safe]or@[aesop unsafe 50%]to extend its knowledge. Thesaferules are always applied;unsaferules are tried with backtracking weighted by percentage.
grind
The grind tactic is one of Lean 4’s most sophisticated automation tools. Under the hood, it maintains an e-graph (equivalence graph), a data structure that efficiently represents equivalence classes of terms. When you assert a = b, the e-graph merges the equivalence classes containing a and b. The key insight is congruence: if a = b, then f a = f b for any function f. The e-graph propagates these consequences automatically.
The algorithm works in three phases. First, congruence closure processes all equalities and computes the transitive, symmetric, reflexive closure under function application. If you know $x = y$ and $f(x) = 10$, congruence closure deduces $f(y) = 10$ without explicit rewriting. Second, forward chaining applies implications: if you have $p \land q$ and $q \to r$, it extracts $q$ from the conjunction and fires the implication to derive $r$. Third, case splitting handles disjunctions and if-then-else expressions by exploring branches.
theorem grind_example1 (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
grind
theorem grind_example2 (f : Nat → Nat) (x y : Nat)
(h1 : x = y) (h2 : f x = 10) : f y = 10 := by
grind
theorem grind_example3 (p q r : Prop)
(h1 : p ∧ q) (h2 : q → r) : p ∧ r := by
grind
theorem grind_example4 (x y : Nat) :
(if x = y then x else y) = y ∨ x = y := by
grind
The power shows up when these mechanisms combine. Here grind chains four equalities through two functions to conclude f b = 42:
-- Nested function applications with chained equalities
theorem grind_chain (f g : Nat → Nat) (a b c d : Nat)
(h1 : a = b) (h2 : c = d) (h3 : f a = g c) (h4 : g d = 42) :
f b = 42 := by
grind
-- Existential witnesses from equality reasoning
theorem grind_exists (p : Nat → Prop) (a b : Nat)
(h1 : a = b) (h2 : p a) : ∃ x, p x ∧ x = b := by
grind
Tip
grindexcels at “obvious” goals that would require tedious manual rewriting. If your goal involves chained equalities, function congruence, or propositional reasoning, trygrindbefore writing out the steps by hand. For debugging,grind?shows the proof term it constructs.
tauto
The tauto tactic proves propositional tautologies involving $\land$, $\lor$, $\to$, $\leftrightarrow$, $\lnot$, True, and False. It handles classical and intuitionistic reasoning automatically.
theorem tauto_example (p q : Prop) : p → (p → q) → q := by
tauto
Goal Operations
sorry
The sorry tactic closes any goal without actually proving it, leaving a hole in the proof. Use it as a placeholder during development, but never in finished proofs as it makes theorems unsound.
@[simp] -- Add simp attribute to suppress sorry warning
theorem incomplete_proof : ∀ P : Prop, P ∨ ¬P := by
sorry -- Proof left as exercise
Warning
Any theorem containing
sorryis marked as unsound and propagates this flag to anything that depends on it. Use#check @myTheoremto see if a theorem is sorry-free. Mathlib rejects all PRs containing sorry. During development,sorryis invaluable for sketching proofs top-down, but treat each one as a debt to be paid.
swap
The swap tactic exchanges the first two goals in the goal list, letting you work on the second goal first when that is more convenient.
theorem swap_example : True ∧ True := by
constructor
swap
· trivial -- Proves second goal first
· trivial -- Then first goal
pick_goal
The pick_goal tactic moves a specific numbered goal to the front of the goal list, allowing you to address goals in any order you choose.
theorem pick_goal_example : True ∧ True := by
constructor
pick_goal 2 -- Move second goal to front
· trivial -- Prove second goal
· trivial -- Prove first goal
all_goals
The all_goals tactic applies a given tactic to every goal in the current goal list, which is useful when multiple goals can be solved the same way.
theorem all_goals_example : (1 = 1) ∧ (2 = 2) := by
constructor
all_goals rfl
any_goals
The any_goals tactic applies a given tactic to each goal where it succeeds, skipping goals where it fails. It succeeds if it makes progress on at least one goal.
theorem any_goals_example : (1 = 1) ∧ (True) := by
constructor
· rfl
· trivial
focus
The focus tactic restricts attention to the first goal, hiding all other goals. This helps ensure you complete one goal before moving to the next.
theorem focus_example : True ∧ True := by
constructor
· focus
trivial
· trivial
try
The try tactic attempts to apply a tactic and succeeds regardless of whether the inner tactic succeeds or fails. It is useful for optional simplification steps.
theorem try_example (p q : Prop) (hp : p) : p ∨ q := by
try assumption -- tries to close the goal if it exactly matches a hypothesis
exact Or.inl hp
first
The first tactic tries a list of tactics in order and uses the first one that succeeds. It fails only if all tactics fail.
set_option linter.unusedTactic false in
set_option linter.unreachableTactic false in
theorem first_example (x : Nat) : x = x := by
first | simp | rfl | sorry
repeat
The repeat tactic applies a given tactic repeatedly until it fails to make progress. It is useful for exhaustively applying a simplification or introduction rule.
set_option linter.unusedTactic false in
set_option linter.unreachableTactic false in
/-- `repeat` applies a tactic repeatedly -/
theorem repeat_example : True ∧ True ∧ True := by
repeat constructor
all_goals trivial
Tactic Combinators
The semicolon ; sequences tactics, while <;> applies the second tactic to all goals created by the first. These combinators help write concise proof scripts.
theorem combinator_example : (True ∧ True) ∧ (True ∧ True) := by
constructor <;> (constructor <;> trivial)
Domain-Specific Tactics
interval_cases
The interval_cases tactic performs case analysis when a variable is known to lie in a finite range. Given bounds on a natural number, it generates a case for each possible value.
theorem interval_cases_example (n : ℕ) (h : n ≤ 2) : n = 0 ∨ n = 1 ∨ n = 2 := by
interval_cases n
· left; rfl
· right; left; rfl
· right; right; rfl
fin_cases
The fin_cases tactic performs case analysis on elements of a finite type like Fin n or Bool, creating a subgoal for each possible value of the type.
theorem fin_cases_example (i : Fin 3) : i.val < 3 := by
fin_cases i <;> simp
Working with Quantifiers
Existential Quantifiers
Existential statements claim that some witness exists satisfying a property. To prove one, use use to provide the witness. To use an existential hypothesis, use obtain to extract the witness and its property.
theorem exists_intro : ∃ n : Nat, n > 10 := by
exact ⟨42, by simp⟩
theorem exists_elim (h : ∃ n : Nat, n > 10) : True := by
obtain ⟨n, hn⟩ := h
trivial
Universal Quantifiers
Universal statements claim a property holds for all values. To prove one, use intro to introduce an arbitrary value. To use a universal hypothesis, use specialize or simply apply it to a specific value.
theorem forall_intro : ∀ x : Nat, x + 0 = x := by
intro x
simp
theorem forall_elim (h : ∀ x : Nat, x + 0 = x) : 5 + 0 = 5 := by
exact h 5
Using This Reference
You do not need to memorize this article. Bookmark it. When you encounter a goal you cannot close, return here and ask: what shape is my goal? Implication, conjunction, existential, equality? Find the matching section. The tactic you need is there. Over time, the common ones become muscle memory. The obscure ones remain here for when you need them.