Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

From Zero to QED

Introduction

Welcome to From Zero to QED, an article series that teaches Lean 4 from first principles. This is the tutorial I wish I had when I started learning Lean. The language is powerful but the learning resources remain scattered and incomplete. This series aims to fill that gap.

Note

This is the beta release. There are bound to be typos, errors, and rough edges. If you spot something, send a PR on GitHub.

Tip

This article series is itself a Lean project. Every code sample, every proof, every theorem is extracted from source files that the Lean compiler typechecks on every build. The text you are reading is backed by machine-verified code in the GitHub repository. Call it a literate proof.

What This Series Covers

The series divides into two arcs. The first arc treats Lean as a programming language. You will learn the syntax, type system, control flow, polymorphism, monads, and IO. By the end of this arc you can write real programs in Lean.

The second arc treats Lean as a theorem prover. You will learn to write proofs, understand type theory and dependent types, master tactics, and eventually prove classic mathematical results. The series concludes with the emerging intersection of theorem proving with artificial intelligence, and why formal methods may matter more in the coming decade than they have in the previous five.

No prior experience with theorem provers is assumed. Familiarity with a typed functional language like Haskell, OCaml, or Scala helps but is not required.

Getting Started

The full source code is available on GitHub: github.com/sdiehl/zero-to-qed

To run the examples locally, install Lean 4 and clone the repository:

git clone https://github.com/sdiehl/zero-to-qed
cd zero-to-qed
lake build

You can also serve the documentation locally with just serve if you have mdBook installed.

Additional learning resources are collected in the References appendix.

Why?

Maybe you want to understand how markets actually work, not the hand-wavy version but the mathematical objects underneath. Maybe you want to learn what the engineers at top quant firms and AI labs are learning, because that seems like a reasonable bet on where things are heading. Maybe you are interested in programming languages, or how computers can check human reasoning. Maybe you saw a cool demo and want to try it yourself. Maybe you have no practical reason at all and just want to learn something new. All of these are good reasons.

Lean sits at the intersection of programming and mathematics. You can use it to write programs that the compiler verifies are correct. You can use it to prove mathematical theorems and have a machine check your work. You can use it to explore ideas about types, logic, and computation that simply cannot be expressed in mainstream languages. Or you can just mess around and see what happens. There is no entrance exam.

There are also practical reasons. The world runs on systems that allocate resources: power grids balancing supply and demand, networks routing packets, markets matching buyers and sellers. These systems are mathematical objects, and their correctness matters. A bug in an auction mechanism does not just crash a server; it misallocates capital, distorts prices, rewards the wrong participants. Formal methods let you prove properties about such systems before deployment. The matching algorithm that assigns medical residents to hospitals, the slot allocation system that schedules aircraft landings across congested airspace, the combinatorial auction that enables expressive bidding where market orders encode preferences over combinations of instruments: these are theorems waiting to be verified. The gap between “we tested it extensively” and “we proved it correct” is the gap between confidence and certainty. For systems where the stakes justify the effort, that gap matters.

The honest truth is that most people do not need dependent type theory or formal verification for their day jobs. But necessity is a boring criterion for what to learn. People study ancient languages, play chess, learn to cook elaborate dishes, prove that there are infinitely many primes. The question is not whether you need something but whether it interests you.

If you are curious about what it means to prove something with absolute certainty, Lean can show you. When a proof type-checks, it is correct. Not “probably correct” or “correct assuming the tests pass” but mathematically, definitively correct. The compiler does not care about your reputation or your confidence. It cares whether the logic holds. There is something clarifying about that. You discover what you actually understand versus what you thought you understood.

Lean is also difficult. It will not let you wave your hands. It will not accept “it is obvious that”, “because of abstract nonsense” or “this margin is too narrow to contain the proof” as justifications. The feedback loop is immediate and unforgiving. This can be frustrating, but it is also what makes the successes satisfying. When a proof finally compiles, you know it works.

There is a kind of peace in this. The world outside is uncertain, full of claims that cannot be verified, arguments that cannot be resolved, systems that fail in ways no one predicted. Inside a proof assistant, the rules are clear. A theorem holds or it does not. The machine tells you which. You can build something that will still be correct in a hundred years, long after the context that motivated it has faded. This is craftsmanship: making things that work, that endure, that do not depend on luck or circumstance. The difficulty is part of the appeal. Easy things do not teach you much, and they do not last.

If this sounds interesting, keep reading. If not, there are plenty of other things worth learning. No gatekeeping here.

Why Lean Specifically?

Among proof assistants, several mature options exist: Coq, Agda, Isabelle, and others each have devoted communities and decades of development. Lean 4 occupies a distinctive position among them. It is simultaneously a theorem prover backed by a million-line mathematical library and a general-purpose programming language fast enough for production use. You can prove theorems about prime numbers in the morning and write a web server in the afternoon, using the same language, the same tools, the same mental model.

The system emerged from Microsoft Research in 2013 and has evolved through four major versions. Lean 4, released in 2021, was a ground-up rewrite that reimagined the tool as both proof assistant and practical programming language. The implementation is largely written in Lean itself, a feat of bootstrapping that demonstrates the language’s capabilities. The mathematical library, Mathlib, contains over a million lines of formalized mathematics spanning algebra, analysis, topology, number theory, and beyond.

And on top of all that, Lean is fun. If Dwarf Fortress is the most complex simulation ever built and Factorio is crystallized obsession in game form, then Lean is what happens when you point that same energy at the foundations of mathematics. It is basically the nerdiest video game ever created: you write code to solve puzzles, and the compiler gives you immediate feedback on whether you succeeded. You can always ask for hints. There are multiple ways to solve each puzzle. The more you play, the better you get, and the more you learn about the deep structure of mathematics itself. Like Dwarf Fortress, losing is fun, because even a failed proof attempt teaches you something about why it failed. Like Factorio, there is always one more lemma to optimize, one more proof to refactor, one more elegant solution lurking just beyond the current mess. What’s not to love?

This article series is my (admittedly flawed) attempt to take you from zero knowledge of Lean to writing your own proofs and programs, with enough depth to tackle real problems. Whether you finish it or abandon it halfway through, whether you master the material or merely glimpse its outlines, you will have spent your time on something genuinely interesting. In a world of endless distractions optimized to capture attention without rewarding it, that counts for something.

Theorem Provers

This article covers the history of theorem provers. If you do not care about history, skip to Basics.

The idea of mechanizing mathematical reasoning dates back centuries, but the modern era of theorem proving began in the 1960s and 1970s when researchers first attempted to implement formal logic on computers. These early systems were primitive by today’s standards, but they established the fundamental insight that proofs could be represented as data structures and verified by algorithms.

Early Systems

The first generation of theorem provers emerged from two distinct traditions. One tradition, exemplified by systems like Automath developed by Nicolaas de Bruijn in the late 1960s, focused on encoding existing mathematical proofs in a formal language that a computer could check. De Bruijn’s work introduced many concepts that remain central to modern systems, including the idea that types could depend on values and that propositions could be represented as types. The other tradition focused on automated theorem proving, attempting to have computers discover proofs on their own through search procedures. While fully automated proving remains intractable for most interesting mathematics, techniques from this tradition inform the automation tactics available in modern proof assistants.

The 1980s saw the development of several influential systems that shaped the field. The Calculus of Constructions, introduced by Thierry Coquand and Gérard Huet, provided a unified foundation combining dependent types with a hierarchy of universes. This calculus became the theoretical basis for Coq, which remains one of the most widely used proof assistants today. Coq pioneered many features now standard in the field, including tactic-based proof development, extraction of executable programs from proofs, and a module system for organizing large developments. Major verification efforts in Coq include the CompCert certified C compiler and the mathematical proof of the four color theorem.

First-Order Theorem Proving

A parallel tradition built theorem provers on first-order logic rather than type theory. The Boyer-Moore family of provers, culminating in ACL2, used an untyped computational substrate based on Lisp with powerful automation heuristics for discovering proofs. ACL2 achieved notable industrial successes, including verification of AMD’s floating-point division after the Pentium FDIV bug made hardware correctness suddenly interesting to executives.

Despite these successes, first-order theorem proving has not been widely adopted outside specialized industrial applications. First-order logic imposes an expressiveness ceiling that makes formalizing modern mathematics awkward. Without dependent types, you cannot easily express properties like “a vector of length n” or “a sorted list.” These systems rely heavily on opaque automation heuristics rather than user-programmable tactics, which makes it harder to understand why proofs fail and how to fix them. Most importantly, there is no Curry-Howard correspondence linking proofs to programs, which means verified algorithms cannot be extracted into executable code.

The contrast is instructive. Type-theoretic systems grew ecosystems of thousands of users, million-line mathematical libraries, and active research communities. First-order provers remained specialized tools for specific classes of problems. The Curry-Howard insight that proofs are programs and types are propositions turned out to be generatively powerful in ways that first-order theorem proving was not. When you can express your specification, your implementation, and your correctness proof in the same language, each informs the others. This unity is what makes dependent type theory feel like mathematics rather than a checkbox.

The LCF Tradition

Around the same time, researchers in Edinburgh developed the LCF system and its descendants, which introduced the influential LCF architecture. In this design, there is a small trusted kernel that defines what constitutes a valid proof, and all proof construction must ultimately pass through this kernel. This approach provides strong guarantees because only the kernel needs to be trusted, while tactics and automation can be implemented in untrusted code. The HOL family of provers, including HOL4 and Isabelle/HOL, descend from this tradition. Isabelle in particular has been used for major verification efforts including the seL4 verified operating system kernel.

Dependent Type Theory

The development of Martin-Löf type theory in the 1970s and 1980s provided another foundational framework that influenced systems like Agda and later Idris. Per Martin-Löf’s intensional type theory emphasized the computational content of proofs and introduced identity types as a way to reason about equality. Agda, developed primarily at Chalmers University, implements a variant of this theory with sophisticated support for dependent pattern matching. Its syntax influenced Lean’s design, and it remains popular for research in type theory and programming language semantics.

Idris, created by Edwin Brady, took a different approach by prioritizing practical programming with dependent types rather than theorem proving per se. Idris demonstrated that dependent types could be integrated into a language designed for general-purpose programming, with features like implicit arguments and type-driven development making dependently typed code more accessible to working programmers. Many of these ergonomic innovations influenced Lean 4’s design.

The 2010s brought renewed interest in the foundations of mathematics through homotopy type theory, which reinterprets types as spaces and equality as paths. This perspective, developed by Vladimir Voevodsky and others, led to new proof assistants like Cubical Agda that implement univalent foundations. While Lean does not natively support cubical type theory, the mathematical insights from this research have influenced how the community thinks about equality and transport.

Lean’s Origins

Lean emerged from this rich history. The first version was developed by Leonardo de Moura at Microsoft Research starting in 2013, with the goal of building a system suitable for both interactive theorem proving and automated reasoning. Lean 2 and Lean 3 refined the system and built a substantial mathematical library called Mathlib. Lean 4 was a ground-up rewrite, largely self-hosting, with a powerful metaprogramming framework that allows users to extend the system.

Modern Landscape

Today’s theorem provers share many features despite their different foundations. Most support some form of dependent types, allowing types to depend on values and enabling precise specifications. Most provide tactic languages for interactive proof development alongside term-mode proof construction. Most include automation ranging from simple rewriting to sophisticated decision procedures. The systems differ in their logical foundations, their approach to equality and computation, their support for classical versus constructive reasoning, and their emphasis on programming versus pure mathematics. Lean occupies a distinctive position by providing classical logic by default while maintaining strong computational properties, and by treating programming and proving as equally important activities.

If you want to jump ahead to specific topics: Type Theory covers the theoretical foundations, Dependent Types explains how types can depend on values, Tactics covers interactive proof development, and Proof Strategy discusses automation including next-generation tactics like aesop and grind. The Artificial Intelligence article discusses where this is all heading.

The mathematical libraries built on these systems represent decades of formalization effort. Mathlib, Lean’s mathematical library, contains over a million lines of formalized mathematics covering undergraduate and graduate-level material across algebra, analysis, topology, number theory, and other areas. Similar libraries exist for other systems. These libraries demonstrate that modern proof assistants can handle serious mathematics, not just toy examples, and they provide the infrastructure needed for working mathematicians to formalize their own results.

The field continues to evolve rapidly. Large language models have recently shown promise in generating proof steps, raising the possibility of more powerful automation. Verification of software and hardware systems has become increasingly practical, with companies using theorem provers to ensure correctness of critical systems. The boundary between programming and proving continues to blur as dependent types become more accessible. Lean 4 represents the current state of the art in making these powerful tools usable for a broad audience of mathematicians and programmers.

Lake Build System

Every programming language eventually grows a build system, and that build system eventually grows into a small civilization with its own customs and territorial disputes. Lake is Lean’s entry in this tradition. It borrows good ideas from Cargo, is written in Lean itself, and mostly works. The documentation is sparse, the error messages occasionally cryptic, and there are two competing configuration formats that do almost but not quite the same thing. Welcome to the frontier.

That said, Lake gets the job done. Paired with Elan for version management, you get reproducible builds and workable dependency management. Code that compiles on your machine will usually compile on other machines. For a young ecosystem, this is not nothing.

Elan

Elan is the Lean version manager. It downloads, installs, and switches between different versions of Lean. Most users install Elan first and then let it manage their Lean installation. On Unix systems, installation is a single command that downloads and runs the installer script. On Windows, a dedicated installer is available.

Once installed, Elan reads a lean-toolchain file in your project directory to determine which Lean version to use. This file typically contains a single line specifying the version, such as leanprover/lean4:v4.3.0 or simply leanprover/lean4:stable for the latest stable release. When you enter a directory containing this file, Elan automatically activates the correct version. If that version is not installed, Elan downloads it transparently.

This per-project versioning solves a common problem in software development. Different projects may require different Lean versions, and Elan lets them coexist without conflict. You can work on a project using Lean 4.2 in one terminal and a project using Lean 4.5 in another. The toolchain file checked into version control ensures all collaborators use the same Lean version.

Elan also manages additional toolchain components. The Lean installation includes the compiler, the language server for editor integration, and documentation tools. Updates happen through Elan with commands like elan update to fetch the latest versions.

Lake

Lake is the build system and package manager for Lean. The name combines “Lean” and “Make,” and every Lean project contains a lakefile.lean that describes its structure, dependencies, and build configuration. A minimal lakefile declares the package name and defines one or more build targets. The most common targets are libraries, which compile Lean source files into modules that other code can import, and executables, which produce standalone programs. Lake reads this configuration and orchestrates compilation, handling dependencies between modules automatically.

import Lake
open Lake DSL

package myproject where
  version := v!"0.1.0"

lean_lib MyLib where
  roots := #[`MyLib]

@[default_target]
lean_exe myapp where
  root := `Main

This lakefile defines a package named myproject containing a library called MyLib and an executable called myapp. The library compiles all modules under the MyLib namespace, while the executable uses Main as its entry point. The @[default_target] attribute marks myapp as the target built when you run lake build without arguments.

Dependencies on external packages are declared in the lakefile using the require keyword. Lake fetches dependencies from Git repositories, and you can specify versions through tags, branches, or commit hashes. When you build your project, Lake first ensures all dependencies are available and up to date, then compiles them before your own code. Reservoir serves as the community package registry, indexing Lean packages and providing searchable documentation, dependency graphs, and build status for the ecosystem.

require mathlib from git
  "https://github.com/leanprover-community/mathlib4" @ "v4.3.0"

require aesop from git
  "https://github.com/leanprover-community/aesop" @ "master"

Lake maintains a lake-manifest.json file that records the exact versions of all dependencies. This lockfile ensures reproducible builds across different machines and times. When you run lake update, Lake fetches the latest versions matching your constraints and updates the manifest.

The build process produces artifacts in a .lake directory within your project. Compiled Lean files become .olean files containing serialized proof terms and compiled code. These intermediate files enable incremental compilation, where Lake only recompiles modules that have changed or whose dependencies have changed. For large projects like Mathlib, this incremental approach is essential for practical development.

Lake also supports downloading precompiled artifacts called caches. Mathlib maintains a cache of compiled artifacts for anyone who would rather not spend hours rebuilding from source. The lake exe cache get command fetches these artifacts, reducing initial setup from hours to minutes.

Project Structure

A typical Lean project follows a conventional directory layout. The lakefile sits at the project root alongside the lean-toolchain file. Source files live in directories matching their module namespaces. A module named MyLib.Data.List would be in the file MyLib/Data/List.lean. This correspondence between filesystem paths and module names makes navigation straightforward.

myproject/
  lakefile.lean
  lean-toolchain
  lake-manifest.json
  MyLib/
    Basic.lean
    Data/
      List.lean
      Vector.lean
  Main.lean

Test files typically live in a separate directory, often called Test or Tests, with their own library target in the lakefile. Documentation, examples, and scripts occupy other directories as needed. Lake does not enforce a particular structure beyond the lakefile requirements, but conventions have emerged from the community.

Common Commands

Building a project uses lake build, which compiles all default targets. You can build specific targets by name, like lake build MyLib or lake build myapp. For development, lake build after editing a file recompiles only what changed.

Running an executable uses lake exe followed by the executable name, like lake exe myapp. Arguments after the executable name pass through to the program. You can also use lake run with the executable target name.

Managing dependencies uses lake update to refresh the manifest with the latest matching versions. After modifying the lakefile to add or change dependencies, running lake update fetches and locks the new versions.

Cleaning build artifacts uses lake clean, which removes the .lake/build directory. A more thorough lake clean --all also removes downloaded dependencies, useful when troubleshooting dependency issues.

The lake env command prints environment variables that configure Lean to find your project’s modules. This is useful when running Lean directly or integrating with external tools.

Editor Integration

The Lean language server provides IDE features like error highlighting, go-to-definition, type information on hover, and code completion. Lake integrates with the language server by providing project configuration. When you open a Lean file in an editor with Lean support, the language server reads your lakefile to understand the project structure.

Visual Studio Code with the lean4 extension is the most popular editor setup. The extension automatically starts the language server and provides a panel showing proof states and messages. Other editors like Emacs and Neovim have community-maintained Lean integrations that communicate with the same language server.

For the language server to work correctly, it must know about your project configuration. Opening a Lean file outside a Lake project, or opening a file before dependencies are built, can cause errors. Building the project with lake build before editing ensures the language server has the information it needs.

The Interactive Workflow

Lean development is fundamentally interactive. Unlike batch compilers where you write code, compile, and hope for the best, Lean provides continuous feedback as you type. This tight feedback loop is not a convenience feature but the primary way you develop in Lean.

The Infoview panel is your window into Lean’s reasoning. In VS Code, it appears on the right side when you open a Lean file. As you move your cursor through the code, the Infoview updates to show the state at that position. When writing proofs, it displays the current goal: what hypotheses you have available and what remains to be proved. When writing programs, it shows types and values. This panel is essential for understanding what Lean sees at every point in your code.

Consider a simple proof in progress:

theorem add_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => _

When your cursor is on the underscore, the Infoview shows:

case succ
m n : Nat
ih : n + m = m + n
⊢ n + 1 + m = m + (n + 1)

This goal state tells you everything: you are in the succ case of an induction, you have m and n as natural numbers, you have an induction hypothesis ih, and you must prove the equation shown after the turnstile . Without this feedback, tactic proving would be like navigating a maze blindfolded.

Evaluation Commands

Lean provides several commands that evaluate expressions and report results directly in the editor. These are invaluable for exploration and debugging.

#check displays the type of an expression:

#check 1 + 1        -- 1 + 1 : Nat
#check [1, 2, 3]    -- [1, 2, 3] : List Nat
#check fun x => x   -- fun x => x : ?m.1 → ?m.1

#eval evaluates an expression and shows its value:

#eval 2 + 2         -- 4
#eval [1, 2, 3].length  -- 3
#eval "hello".toUpper   -- "HELLO"

#print shows the definition of a constant, including theorems:

#print Nat.add_comm
-- theorem Nat.add_comm : ∀ (n m : Nat), n + m = m + n := ...

#reduce fully reduces an expression using definitional equality:

#reduce (fun x => x + 1) 5  -- 6

These commands appear as blue underlines in VS Code. Hover over them or check the Infoview to see results. They let you test ideas immediately without writing a full program or proof.

Reading Error Messages

Lean’s error messages are verbose but precise. They tell you exactly what went wrong, which is both a blessing and a curse for newcomers who may find the detail overwhelming.

A type mismatch error shows what was expected and what was provided:

type mismatch
  h
has type
  P
but is expected to have type
  Q

This means you tried to use a proof of P where a proof of Q was needed. Look at the goal state to understand what type you actually need.

An unknown identifier error means a name is not in scope:

unknown identifier 'foo'

Check for typos, missing imports, or hypotheses you forgot to introduce.

An unsolved goals error at the end of a proof means you have not proved everything:

unsolved goals
⊢ P ∧ Q

Your proof is incomplete. Look at what remains and continue.

The habit of reading error messages carefully, rather than guessing at fixes, will save hours of confusion. Lean is trying to help; let it.

Mathlib Projects

Projects depending on Mathlib benefit from additional tooling. The cache executable bundled with Mathlib downloads prebuilt artifacts, avoiding the need to compile Mathlib yourself. After adding Mathlib as a dependency, running lake exe cache get fetches compiled files for your Lean version.

Mathlib projects often use a template that includes recommended configuration. The template sets up the toolchain file, lakefile, and auxiliary files for continuous integration and documentation generation. Starting from this template ensures compatibility with Mathlib’s infrastructure.

Because Mathlib updates frequently, projects must balance using new features against the cost of keeping up with changes. Pinning to specific Mathlib versions provides stability, while tracking recent versions provides access to new material. The Mathlib changelog documents breaking changes to help with updates.

Command Reference

CommandDescriptionExample
lake newCreate a new projectlake new myproject
lake initInitialize project in current directorylake init myproject
lake buildBuild default targetslake build
lake build <target>Build specific targetlake build MyLib
lake cleanRemove build artifactslake clean
lake clean --allRemove artifacts and dependencieslake clean --all
lake updateUpdate dependencies to latest versionslake update
lake exe <name>Run an executablelake exe myapp --flag
lake envPrint environment variableslake env
lake script runRun a lakefile scriptlake script run test
lake testRun project testslake test
lake exe cache getDownload Mathlib cachelake exe cache get
elan showShow installed toolchainselan show
elan updateUpdate all toolchainselan update
elan defaultSet default toolchainelan default leanprover/lean4:stable
elan overrideSet directory-specific toolchainelan override set leanprover/lean4:v4.3.0

Compiler Backend and Runtime

Lean 4 compiles to C code, which is then compiled to native executables using a system C compiler (typically Clang or GCC). This compilation pipeline differs from most theorem provers, which either interpret code or extract to another language like OCaml or Haskell. The choice to target C provides portability and enables linking with existing C libraries.

The compilation process involves several stages. Lean source code is first type-checked and elaborated into the Lean kernel language. Proof terms are then erased since they have no computational content. The remaining code is converted to an intermediate representation (IR) that resembles a simplified functional language. Finally, this IR is translated to C code that Lake compiles with your system’s C compiler.

Lean’s runtime uses reference counting rather than tracing garbage collection. Each heap-allocated object maintains a count of references to it. When the count drops to zero, the object is immediately freed. This approach has lower latency than tracing collectors since there are no garbage collection pauses. The Counting Immutable Beans paper describes the design in detail.

Reference counting enables a technique called Functional But In-Place (FBIP). When you perform a functional update on a data structure and the original has a reference count of one, the runtime can reuse the memory in place rather than allocating new storage. This means that pure functional code operating on unshared data achieves performance comparable to imperative mutation. The Array type in Lean exploits this property: appending to an unshared array mutates it in place despite the pure functional semantics.

The runtime is strict, not lazy like Haskell. All function arguments are evaluated before the function body executes. This makes performance more predictable but requires different idioms for infinite data structures or expensive computations that might not be needed. Lean provides explicit thunks via the Thunk type when lazy evaluation is required.

Caution

The ecosystem lacks mature libraries for common tasks like HTTP clients, database connectors, encryption, and async I/O. While the Axiomed project is building HTTP support and the community has created socket bindings, these are far less polished than equivalents in established languages. Linking against system libraries requires out-of-band setup that Lake cannot manage portably across operating systems. There is no parallel runtime: the Task API provides concurrency through cooperative scheduling on a single thread, not true parallelism across cores.

Binary sizes tend to be large because the generated C code includes the Lean runtime and any Mathlib dependencies are substantial. Compile times for projects depending on Mathlib can be lengthy, though the cache system mitigates this for incremental builds. The compiler itself is under active development, with the Year 3 Roadmap promising improvements to code generation, smaller binaries, and better reference counting.

For production systems, Lean is best suited as a specification and verification tool rather than as the implementation language. A practical pattern is to write formal specifications in Lean, prove properties about algorithms, then implement the actual system in a production language while using Lean-generated tests or runtime checks to verify the implementation matches the specification. Alternatively, Lean excels for tools where correctness matters more than ecosystem maturity: proof automation, code generators, domain-specific languages, and programs where the type system’s expressiveness justifies the ecosystem tradeoffs.

Basics

True to the title of this article series, we start from zero. Not “Hello, World!” but an actual zero: the natural number that forms the foundation of arithmetic.

Zero

-- From Zero to QED: let's start at the very beginning
def zero : Nat := 0

#eval zero  -- Output: 0

-- The natural numbers are defined inductively:
-- Nat.zero is the base case
-- Nat.succ n is the successor of n
def one : Nat := Nat.succ Nat.zero
def two : Nat := Nat.succ (Nat.succ Nat.zero)

#eval one   -- Output: 1
#eval two   -- Output: 2

-- Of course, we can just write the literals directly
def fortyTwo : Nat := 42

The natural numbers are perhaps the most fundamental type in mathematics and programming. Lean represents them inductively: zero is the base case, and every other natural number is the successor of another. This simple construction gives us the entire infinite sequence 0, 1, 2, 3, and so on.

Now that we have numbers, let us also greet the world:

def greet (name : String) : String :=
  s!"Hello, {name}!"

#eval greet "World"  -- Output: "Hello, World!"

Natural Numbers

Natural numbers in Lean represent non-negative integers, defined inductively just as Peano intended in 1889. They support standard arithmetic operations, but subtraction truncates at zero since negative results would fall outside the type. This has caused approximately as many bugs as unsigned integers in C, which is to say: more than anyone wants to admit.

-- Natural numbers (Nat) are non-negative integers: 0, 1, 2, 3, ...
def myNat : Nat := 42
def anotherNat : Nat := 100

-- Basic arithmetic
#eval 3 + 5      -- 8
#eval 10 - 3     -- 7 (truncated subtraction: 3 - 10 = 0)
#eval 4 * 7      -- 28
#eval 17 / 5     -- 3 (integer division)
#eval 17 % 5     -- 2 (modulo)

-- Natural number subtraction truncates at zero
#eval (3 : Nat) - 10  -- 0, not -7

-- Comparison
#eval 5 < 10     -- true
#eval 5 ≤ 5      -- true
#eval (10 == 10) -- true

Integers

When you need negative numbers, use Int. Integer arithmetic behaves as you would expect from standard mathematics, unburdened by the horrors of two’s complement overflow that have plagued systems programmers since the PDP-11.

-- Integers (Int) include negative numbers
def myInt : Int := -17
def posInt : Int := 42

-- Integer arithmetic handles negatives properly
#eval (-5 : Int) + 3     -- -2
#eval (3 : Int) - 10     -- -7
#eval (-4 : Int) * (-3)  -- 12
#eval (-17 : Int) / 5    -- -3
#eval (-17 : Int) % 5    -- -2

-- Converting between Nat and Int
#eval Int.ofNat 42             -- 42 as Int
#eval (42 : Int).toNat         -- 42 as Nat
#eval (-5 : Int).toNat         -- 0 (negative becomes 0)
#eval (42 : Int).natAbs        -- 42 (absolute value as Nat)
#eval (-42 : Int).natAbs       -- 42

Fin

Fin n represents natural numbers strictly less than n. The type carries a proof that its value is in bounds, making it useful for safe array indexing.

-- Fin n is the type of natural numbers less than n
def smallNum : Fin 5 := 3      -- 3 is less than 5
def anotherSmall : Fin 10 := 7 -- 7 is less than 10

-- Fin values carry a proof that they're in bounds
#eval (smallNum : Fin 5).val   -- 3 (the underlying Nat)

-- Useful for array indexing with guaranteed bounds
def safeIndex {α : Type} (arr : Array α) (i : Fin arr.size) : α :=
  arr[i]

-- Fin arithmetic wraps around
#eval (3 : Fin 5) + 4  -- 2 (wraps: 7 mod 5 = 2)

Tip

Notice that Fin n bundles a value with a proof about that value. This pattern appears everywhere in Lean. Types can contain proofs. Later you will see this is not a special feature but a consequence of something deeper: the Curry-Howard correspondence, where propositions are types and proofs are values.

Fixed-Precision Integers

For performance-critical code or when interfacing with external systems, Lean provides fixed-precision integers that map directly to machine types.

-- Fixed-precision unsigned integers
def byte : UInt8 := 255
def word : UInt16 := 65535
def dword : UInt32 := 0xDEADBEEF
def qword : UInt64 := 0xCAFEBABE12345678

-- Overflow wraps around
#eval (255 : UInt8) + 1  -- 0

-- Size type for platform-dependent sizing
def platformSize : USize := 42

-- Signed fixed-precision integers
def signedByte : Int8 := -128
def signedWord : Int16 := -32768

Bitvectors

Bitvectors represent fixed-width binary data and support bitwise operations. They are essential for low-level programming, cryptography, and hardware verification.

-- BitVec n is an n-bit vector
def bits8 : BitVec 8 := 0xFF
def bits16 : BitVec 16 := 0xABCD
def bits32 : BitVec 32 := 0xDEADBEEF

-- Bitwise operations
#eval (0b1100 : BitVec 4) &&& 0b1010  -- 0b1000 (AND)
#eval (0b1100 : BitVec 4) ||| 0b1010  -- 0b1110 (OR)
#eval (0b1100 : BitVec 4) ^^^ 0b1010  -- 0b0110 (XOR)
#eval ~~~(0b1100 : BitVec 4)          -- 0b0011 (NOT)

-- Shifts
#eval (0b0001 : BitVec 4) <<< 2  -- 0b0100 (left shift)
#eval (0b1000 : BitVec 4) >>> 2  -- 0b0010 (right shift)

Floats

Lean supports IEEE 754 double-precision floating-point numbers for scientific computing and applications that require real number approximations.

-- IEEE 754 double-precision floating-point
def myFloat : Float := 3.14159
def scientific : Float := 6.022e23
def negativeFloat : Float := -273.15

-- Floating-point arithmetic
#eval 3.14 + 2.86    -- 6.0
#eval 10.0 / 3.0     -- 3.333...
#eval Float.sqrt 2.0 -- 1.414...
#eval Float.sin 0.0  -- 0.0
#eval Float.cos 0.0  -- 1.0

-- Special values
#eval (1.0 / 0.0 : Float)   -- inf
#eval (0.0 / 0.0 : Float)   -- nan

Chars

Characters in Lean are Unicode scalar values, capable of representing any character from any human language, mathematical symbols, and 🐻.

-- Characters are Unicode scalar values
def letterA : Char := 'A'
def digit : Char := '7'
def unicode : Char := 'λ'
def bear : Char := '🐻'

-- Character properties
#eval 'A'.isAlpha      -- true
#eval '7'.isDigit      -- true
#eval ' '.isWhitespace -- true
#eval 'a'.isLower      -- true
#eval 'A'.isUpper      -- true

-- Character to/from Nat
#eval 'A'.toNat        -- 65
#eval Char.ofNat 65    -- 'A'

-- Case conversion
#eval 'a'.toUpper      -- 'A'
#eval 'Z'.toLower      -- 'z'

Strings

Strings are sequences of characters with a rich set of operations for text processing. They are UTF-8 encoded, which means you have already won half the battle that consumed the first decade of web development.

-- Strings are sequences of characters
def greeting : String := "Hello, Lean!"
def multiline : String := "Line 1\nLine 2\nLine 3"

-- String operations
#eval "Hello".length           -- 5
#eval "Hello".append " World"  -- "Hello World"
#eval "Hello" ++ " " ++ "World" -- "Hello World"
#eval "Hello".toList           -- ['H', 'e', 'l', 'l', 'o']

-- String interpolation
def shipName := "Mistake Not My Current State Of Alarm"
def shipClass := "GCU"
#eval s!"The {shipClass} {shipName} has entered the system."

-- Substring operations
#eval "Hello World".take 5     -- "Hello"
#eval "Hello World".drop 6     -- "World"
#eval "Hello".isPrefixOf "Hello World"  -- true

-- Splitting and joining
#eval "a,b,c".splitOn ","      -- ["a", "b", "c"]
#eval ",".intercalate ["a", "b", "c"]  -- "a,b,c"

Unit

The Unit type has exactly one value: (). It serves as a placeholder when a function has no meaningful return value, similar to void in C except that void is a lie and Unit is honest about being boring. In type theory, Unit is the terminal object: everything can produce one, and there is only one way to do so.

-- Unit has exactly one value: ()
def nothing : Unit := ()

-- Often used for side-effecting functions
def printAndReturn : IO Unit := do
  IO.println "Side effect!"
  return ()

-- Unit in function types indicates "no meaningful return value"
def greetIO (name : String) : IO Unit :=
  IO.println s!"Hello, {name}!"

Empty

The Empty type has no values at all, the dual of Unit. It represents logical impossibility and marks unreachable code branches. If you somehow obtain a value of type Empty, you can derive anything from it, a principle the medievals called ex falso quodlibet: from falsehood, anything follows. This is less a programming technique than a logical curiosity, but it does come up when you need to convince the compiler that a pattern match is exhaustive.

-- Empty has no values at all
-- It represents impossibility or unreachable code

-- If you have a value of type Empty, you can prove anything
def absurd' {α : Type} (e : Empty) : α :=
  Empty.elim e

-- Empty is useful for marking impossible cases
inductive Void where  -- Custom empty type (equivalent to Empty)

Booleans

Booleans represent truth values and form the basis of conditional logic. George Boole would be pleased, though he might find it curious that his algebra of logic became the foundation for arguments about whether 0 or 1 should represent truth.

-- Booleans: true and false
def myBool : Bool := true
def myFalse : Bool := false

-- Boolean operations
#eval true && false   -- false (and)
#eval true || false   -- true (or)
#eval !true           -- false (not)
#eval true ^^ false   -- true (xor)

-- Conditionals
def absInt (x : Int) : Int :=
  if x < 0 then -x else x

#eval absInt (-5)  -- 5

-- Boolean decision
#eval if true then "yes" else "no"   -- "yes"
#eval if false then "yes" else "no"  -- "no"

Option

The Option type represents values that may or may not exist. It is Lean’s safe alternative to null references, which Tony Hoare famously called his “billion dollar mistake.” With Option, absence is explicit in the type: you cannot forget to check because the compiler will not let you. The hollow log either contains honey or it does not, and you must handle both cases.

-- Option represents a value that may or may not exist
def someValue : Option Nat := some 42
def noValue : Option Nat := none

-- Pattern matching on Option
def getOrDefault (opt : Option Nat) (default : Nat) : Nat :=
  match opt with
  | some x => x
  | none => default

#eval getOrDefault (some 10) 0  -- 10
#eval getOrDefault none 0       -- 0

-- Option combinators
#eval (some 5).map (· * 2)      -- some 10
#eval (none : Option Nat).map (· * 2)  -- none
#eval (some 5).getD 0           -- 5
#eval (none : Option Nat).getD 0       -- 0
#eval (some 5).isSome           -- true
#eval (some 5).isNone           -- false

-- Chaining Options
#eval (some 5).bind (fun x => some (x + 1))  -- some 6

Tuples

Tuples combine values of potentially different types into a single value. They are the basic building block for returning multiple values from functions.

-- Tuples combine values of different types
def pair : Nat × String := (42, "answer")
def triple : Nat × String × Bool := (1, "one", true)

-- Accessing tuple elements
#eval pair.1        -- 42
#eval pair.2        -- "answer"
#eval pair.fst      -- 42
#eval pair.snd      -- "answer"

-- Pattern matching on tuples
def swap {α β : Type} (p : α × β) : β × α :=
  let (a, b) := p
  (b, a)

#eval swap (1, "hello")  -- ("hello", 1)

-- Nested tuples
def nested : (Nat × Nat) × String := ((1, 2), "pair")
#eval nested.1.1    -- 1
#eval nested.1.2    -- 2

Sum Types

Sum types represent a choice between two alternatives. The Except variant is commonly used for error handling.

-- Sum types represent a choice between types
def leftValue : Nat ⊕ String := Sum.inl 42
def rightValue : Nat ⊕ String := Sum.inr "hello"

-- Pattern matching on Sum
def describeSum (s : Nat ⊕ String) : String :=
  match s with
  | Sum.inl n => s!"A number: {n}"
  | Sum.inr str => s!"A string: {str}"

#eval describeSum leftValue   -- "A number: 42"
#eval describeSum rightValue  -- "A string: hello"

-- Except is like Sum but for error handling
def divideExcept (x y : Nat) : Except String Nat :=
  if y == 0 then
    Except.error "Division by zero"
  else
    Except.ok (x / y)

#eval divideExcept 10 2   -- Except.ok 5
#eval divideExcept 10 0   -- Except.error "Division by zero"

Lists

Lists are singly-linked sequences of elements, the workhorse data structure of functional programming since LISP introduced them in 1958. They support pattern matching and have a rich set of higher-order operations. Prepending is \(O(1)\); appending is \(O(n)\). If this bothers you, wait until you meet Arrays.

-- Linked lists
def myList : List Nat := [1, 2, 3, 4, 5]
def emptyList : List Nat := []

-- List construction
def consExample := 0 :: [1, 2, 3]  -- [0, 1, 2, 3]
def appendExample := [1, 2] ++ [3, 4]  -- [1, 2, 3, 4]

-- Common operations
#eval [1, 2, 3].length       -- 3
#eval [1, 2, 3].head?        -- some 1
#eval [1, 2, 3].tail?        -- some [2, 3]
#eval [1, 2, 3][1]?          -- some 2
#eval [1, 2, 3].reverse      -- [3, 2, 1]

-- Higher-order functions
#eval [1, 2, 3].map (· * 2)  -- [2, 4, 6]
#eval [1, 2, 3, 4].filter (· > 2)  -- [3, 4]
#eval [1, 2, 3, 4].foldl (· + ·) 0  -- 10

-- Cartesian product via flatMap
#eval [1, 2].flatMap (fun x => [10, 20].map (fun y => x + y))  -- [11, 21, 12, 22]

Arrays

Arrays provide \(O(1)\) random access and are the preferred choice when you need indexed access without the pointer-chasing of linked lists. Thanks to Lean’s reference counting, operations on unshared arrays mutate in place, giving you the performance of imperative code with the semantics of pure functions. Purity without the performance penalty. The trick is that “unshared” does a lot of work in that sentence.

-- Arrays provide O(1) random access
def myArray : Array Nat := #[1, 2, 3, 4, 5]
def emptyArray : Array Nat := #[]

-- Array operations
#eval #[1, 2, 3].size        -- 3
#eval #[1, 2, 3][0]!         -- 1 (panics if out of bounds)
#eval #[1, 2, 3][1]?         -- some 2
#eval #[1, 2, 3][10]?        -- none

-- Modifying arrays (creates new array)
#eval #[1, 2, 3].push 4      -- #[1, 2, 3, 4]
#eval #[1, 2, 3].pop         -- #[1, 2]
#eval #[1, 2, 3].set! 1 99   -- #[1, 99, 3]

-- Conversion
#eval #[1, 2, 3].toList      -- [1, 2, 3]
#eval [1, 2, 3].toArray      -- #[1, 2, 3]

-- Higher-order functions
#eval #[1, 2, 3].map (· * 2)     -- #[2, 4, 6]
#eval #[1, 2, 3, 4].filter (· % 2 == 0)  -- #[2, 4]

ByteArrays

ByteArrays are efficient arrays of bytes, useful for binary data, file I/O, and network protocols.

-- ByteArray is an efficient array of bytes
def bytes : ByteArray := ByteArray.mk #[0x48, 0x65, 0x6C, 0x6C, 0x6F]

-- Operations
#eval bytes.size                 -- 5
#eval bytes.get! 0               -- 72 (0x48 = 'H')
#eval bytes.toList               -- [72, 101, 108, 108, 111]

-- Convert to/from String (UTF-8)
#eval "Hello".toUTF8             -- ByteArray from string

Maps and Sets

Hash maps and hash sets provide efficient key-value storage and membership testing.

-- HashMap for key-value storage
open Std in
def myMap : HashMap String Nat :=
  HashMap.emptyWithCapacity
    |>.insert "one" 1
    |>.insert "two" 2
    |>.insert "three" 3

#eval myMap.get? "two"     -- some 2
#eval myMap.get? "four"    -- none
#eval myMap.contains "one" -- true
#eval myMap.size           -- 3

-- HashSet for unique elements
open Std in
def mySet : HashSet Nat :=
  HashSet.emptyWithCapacity
    |>.insert 1
    |>.insert 2
    |>.insert 3
    |>.insert 2  -- duplicate ignored

#eval mySet.contains 2     -- true
#eval mySet.contains 5     -- false
#eval mySet.size           -- 3

Subtypes

Subtypes refine an existing type with a predicate. The value carries both the data and a proof that the predicate holds. This is where dependent types begin to flex: instead of checking at runtime whether a number is positive, you encode positivity in the type itself. The predicate becomes part of the contract, enforced at compile time.

-- Subtypes refine a type with a predicate
def Positive := { n : Nat // n > 0 }

def five : Positive := ⟨5, by decide⟩

-- Accessing subtype values
#eval five.val    -- 5 (the underlying Nat)
-- five.property is a proof that 5 > 0

-- Common subtypes
def NonEmptyList (α : Type) := { xs : List α // xs ≠ [] }

def exampleNEL : NonEmptyList Nat :=
  ⟨[1, 2, 3], by decide⟩

-- Safe head function for non-empty lists
def safeHead {α : Type} [Inhabited α] (nel : NonEmptyList α) : α :=
  match nel.val with
  | x :: _ => x
  | [] => default  -- unreachable due to property, but needed for totality

#eval safeHead exampleNEL  -- 1

Functions

Functions are first-class values in Lean. You can define them in multiple ways and partially apply them to create new functions.

def add (x : Nat) (y : Nat) : Nat :=
  x + y

def double : Nat → Nat :=
  fun x => 2 * x

def addFive := add 5  -- Partially applied function

#eval add 3 4        -- Output: 7
#eval double 21      -- Output: 42
#eval addFive 10     -- Output: 15

Pattern Matching

Pattern matching is a powerful feature for destructuring data and defining functions by cases.

def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

def describe : Nat → String
  | 0 => "zero"
  | 1 => "one"
  | 2 => "two"
  | n => s!"many ({n})"

#eval factorial 5    -- Output: 120
#eval describe 0     -- Output: "zero"
#eval describe 100   -- Output: "many (100)"

Structures

Structures are Lean’s way of grouping related data with named fields.

structure Point where
  x : Float
  y : Float
  deriving Repr

def origin : Point := ⟨0.0, 0.0⟩
def myPoint : Point := { x := 3.0, y := 4.0 }

def distance (p : Point) : Float :=
  Float.sqrt (p.x * p.x + p.y * p.y)

#eval distance myPoint  -- Output: 5.0

Inductive Types

Inductive types allow you to define custom data types by specifying their constructors.

inductive SpellSchool where
  | abjuration    -- Protective magic
  | conjuration   -- Summoning things from elsewhere
  | divination    -- Knowing things you shouldn't
  | enchantment   -- Making friends (involuntarily)
  | evocation     -- Fireballs, obviously
  | illusion      -- Lying, but with magic
  | necromancy    -- Asking corpses for favors
  | transmutation -- Turning lead into gold (or frogs)
  deriving Repr, DecidableEq

def schoolDanger : SpellSchool → Nat
  | .abjuration    => 1
  | .divination    => 2
  | .illusion      => 3
  | .transmutation => 4
  | .enchantment   => 5
  | .conjuration   => 6
  | .evocation     => 8  -- Fireballs in enclosed spaces
  | .necromancy    => 9  -- Ethical concerns

inductive MyList (α : Type) where
  | nil : MyList α
  | cons : α → MyList α → MyList α

def MyList.length {α : Type} : MyList α → Nat
  | MyList.nil => 0
  | MyList.cons _ tail => 1 + tail.length

Type Classes

Type classes provide a way to define generic interfaces that can be implemented for different types.

def showTwice {α : Type} [ToString α] (x : α) : String :=
  s!"{x} {x}"

#eval showTwice 42           -- Output: "42 42"
#eval showTwice "hello"      -- Output: "hello hello"
#eval showTwice true         -- Output: "true true"

Example Programs

Now let’s put everything together to build some non-trivial trivial programs! The following examples demonstrate Lean as a general-purpose programming language, showing how its type system and functional style work together in practice.

FizzBuzz

FizzBuzz is the canonical “can you actually program” interview question, famous for its ability to filter candidates who cannot write a loop. Here it demonstrates pattern matching on multiple conditions: “Fizz” for multiples of 3, “Buzz” for multiples of 5, “FizzBuzz” for both, and the number itself otherwise. The Lean version is more elegant than the nested conditionals you have seen in lesser languages.

-- FizzBuzz
-- Run with: lake exe fizzbuzz [limit]

def fizzbuzz (n : Nat) : String :=
  match n % 3 == 0, n % 5 == 0 with
  | true,  true  => "FizzBuzz"
  | true,  false => "Fizz"
  | false, true  => "Buzz"
  | false, false => toString n

def runFizzbuzz (limit : Nat) : List String :=
  (List.range limit).map fun i => fizzbuzz (i + 1)

def main (args : List String) : IO Unit := do
  let limit := match args.head? >>= String.toNat? with
    | some n => if n > 0 then n else 15
    | none => 15

  IO.println s!"FizzBuzz 1 to {limit}"
  IO.println "================="
  for s in runFizzbuzz limit do
    IO.println s

Build and run with:

lake exe fizzbuzz 20

Word Frequency

This example uses a hash map to count word occurrences in a string. The foldl function threads an accumulator through the word list, updating counts as it goes.

-- Word Frequency Counter
-- Run with: lake exe wordfreq "your text here"

import Std.Data.HashMap

open Std in
def wordFrequency (text : String) : HashMap String Nat :=
  let words := text.toLower.splitOn " "
  words.foldl (init := HashMap.emptyWithCapacity) fun acc word =>
    let count := acc.getD word 0
    acc.insert word (count + 1)

def main (args : List String) : IO Unit := do
  let text := match args.head? with
    | some t => t
    | none => "the spice must flow the spice extends life the spice expands consciousness"

  IO.println "Word Frequency Counter"
  IO.println "======================"
  IO.println s!"Text: \"{text}\""
  IO.println ""
  IO.println "Frequencies:"

  let freq := wordFrequency text
  for (word, count) in freq.toList do
    IO.println s!"  {word}: {count}"

Build and run with:

lake exe wordfreq "the spice must flow the spice extends life"

Collatz Conjecture

The Collatz conjecture states that repeatedly applying a simple rule (halve if even, triple and add one if odd) will eventually reach 1 for any positive starting integer. Proposed in 1937, it remains unproven. Mathematicians have verified it for numbers up to \(2^{68}\), yet no one can prove it always works. Erdos famously said of Collatz that “Mathematics is not yet ready for such problems.”

But we can still explore it computationally in Lean!

/-- The Collatz conjecture: every positive integer eventually reaches 1.
    Unproven since 1937, but we can at least watch the journey. -/
def collatzStep (n : Nat) : Nat :=
  if n % 2 == 0 then n / 2 else 3 * n + 1

def collatzSequence (n : Nat) (fuel : Nat := 1000) : List Nat :=
  match fuel with
  | 0 => [n]  -- give up, though Collatz would be disappointed
  | fuel' + 1 =>
    if n <= 1 then [n]
    else n :: collatzSequence (collatzStep n) fuel'

def collatzLength (n : Nat) : Nat :=
  (collatzSequence n).length

-- The famous 27: takes 111 steps and peaks at 9232
#eval collatzSequence 27
#eval collatzLength 27

-- Find the longest sequence for starting values 1 to n
def longestCollatz (n : Nat) : Nat × Nat :=
  (List.range n).map (· + 1)
    |>.map (fun k => (k, collatzLength k))
    |>.foldl (fun acc pair => if pair.2 > acc.2 then pair else acc) (1, 1)

#eval longestCollatz 100  -- (97, 119)

The full Collatz explorer is available as a standalone executable. Build and run it with:

lake exe collatz 27

Control Flow and Structures

“The question of whether machines can think is about as relevant as the question of whether submarines can swim.” So said Edsger Dijkstra, a man who spent his career thinking very precisely about how machines should be told what to do. Lean inherits this precision: there are no statements, only expressions. Every if produces a value. Every match is exhaustive or the compiler complains. Every recursive function must terminate or you must convince the system otherwise. Where imperative languages let you wave your hands, Lean demands you show your work.

Conditionals

Lean’s if-then-else is an expression, not a statement. Every branch must produce a value of the same type, and the entire expression evaluates to that value. There is no void, no implicit fall-through, no forgetting to return. The ternary operator that other languages treat as a curiosity is simply how conditionals work here.

-- If-then-else expressions
def absolute (x : Int) : Int :=
  if x < 0 then -x else x

#eval absolute (-5)   -- 5
#eval absolute 3      -- 3

-- Nested conditionals
def classifyNumber (n : Int) : String :=
  if n < 0 then "negative"
  else if n == 0 then "zero"
  else "positive"

#eval classifyNumber (-10)  -- "negative"
#eval classifyNumber 0      -- "zero"
#eval classifyNumber 42     -- "positive"

-- Conditionals are expressions, not statements
def minValue (a b : Nat) : Nat :=
  if a < b then a else b

#eval minValue 3 7  -- 3

Pattern Matching

Pattern matching with match expressions lets you destructure data and handle cases exhaustively. The compiler verifies you have covered every possibility, which eliminates an entire category of bugs that haunt languages where the default case is “do nothing and hope for the best.” You can match on constructors, literals, and multiple values simultaneously.

-- Pattern matching with match
def describeList {α : Type} (xs : List α) : String :=
  match xs with
  | [] => "empty"
  | [_] => "singleton"
  | [_, _] => "pair"
  | _ => "many elements"

#eval describeList ([] : List Nat)     -- "empty"
#eval describeList [1]                  -- "singleton"
#eval describeList [1, 2]               -- "pair"
#eval describeList [1, 2, 3, 4]         -- "many elements"

-- Matching on multiple values
def fizzbuzz (n : Nat) : String :=
  match n % 3, n % 5 with
  | 0, 0 => "FizzBuzz"
  | 0, _ => "Fizz"
  | _, 0 => "Buzz"
  | _, _ => toString n

#eval (List.range 16).map fizzbuzz
-- ["FizzBuzz", "1", "2", "Fizz", "4", "Buzz", ...]

-- Guards in pattern matching
def classifyAge (age : Nat) : String :=
  match age with
  | 0 => "infant"
  | n => if n < 13 then "child"
         else if n < 20 then "teenager"
         else "adult"

#eval classifyAge 5   -- "child"
#eval classifyAge 15  -- "teenager"
#eval classifyAge 30  -- "adult"

Simple Recursion

Recursive functions are fundamental to functional programming. A function processing a list works through elements one by one, patient and systematic, eventually reaching the empty case and returning upstream with its catch. In Lean, functions that call themselves must be shown to terminate. For simple structural recursion on inductive types like Nat or List, Lean can automatically verify termination.

-- Simple recursion on natural numbers
def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

#eval factorial 5   -- 120
#eval factorial 10  -- 3628800

-- Recursion on lists
def sum : List Nat → Nat
  | [] => 0
  | x :: xs => x + sum xs

#eval sum [1, 2, 3, 4, 5]  -- 15

-- Computing the length of a list
def length {α : Type} : List α → Nat
  | [] => 0
  | _ :: xs => 1 + length xs

#eval length [1, 2, 3]  -- 3

Tail Recursion

Naive recursion can overflow the stack for large inputs because each recursive call adds a frame. Tail recursion solves this by restructuring the computation so the recursive call is the last operation, allowing the compiler to optimize it into a loop. Scheme mandated tail call optimization in 1975. Most other languages did not, which is why stack traces exist.

-- Naive recursion can overflow the stack for large inputs
-- Tail recursion uses an accumulator to avoid this

-- Tail-recursive factorial
def factorialTR (n : Nat) : Nat :=
  let rec go (acc : Nat) : Nat → Nat
    | 0 => acc
    | k + 1 => go (acc * (k + 1)) k
  go 1 n

#eval factorialTR 20  -- 2432902008176640000

-- Tail-recursive sum
def sumTR (xs : List Nat) : Nat :=
  let rec go (acc : Nat) : List Nat → Nat
    | [] => acc
    | x :: rest => go (acc + x) rest
  go 0 xs

#eval sumTR (List.range 1000)  -- Sum of 0..999

-- Tail-recursive reverse
def reverseTR {α : Type} (xs : List α) : List α :=
  let rec go (acc : List α) : List α → List α
    | [] => acc
    | x :: rest => go (x :: acc) rest
  go [] xs

#eval reverseTR [1, 2, 3, 4, 5]  -- [5, 4, 3, 2, 1]

Structural Recursion

Lean requires all recursive functions to terminate, which prevents you from accidentally writing infinite loops and passing them off as proofs. For simple cases where the recursive argument structurally decreases, Lean verifies termination automatically. For more complex cases, you must provide termination hints, essentially explaining to the compiler why your clever recursion scheme actually finishes. The termination checker is not easily fooled.

-- Lean requires recursion to be well-founded
-- Structural recursion on decreasing arguments is automatic

-- Merge sort: structurally recursive
def merge : List Nat → List Nat → List Nat
  | [], ys => ys
  | xs, [] => xs
  | x :: xs, y :: ys =>
    if x ≤ y then x :: merge xs (y :: ys)
    else y :: merge (x :: xs) ys

-- Insertion sort is simpler and doesn't need termination proofs
def insert' (x : Nat) : List Nat → List Nat
  | [] => [x]
  | y :: ys => if x ≤ y then x :: y :: ys else y :: insert' x ys

def insertionSort : List Nat → List Nat
  | [] => []
  | x :: xs => insert' x (insertionSort xs)

#eval insertionSort [3, 1, 4, 1, 5, 9, 2, 6]  -- [1, 1, 2, 3, 4, 5, 6, 9]

Structures

Structures group related data with named fields. If you have used records in ML, structs in Rust, or data classes in Kotlin, the concept is familiar. Unlike C structs, Lean structures come with automatically generated accessor functions, projection notation, and none of the memory layout anxiety.

-- Structures group related data with named fields
structure Point where
  x : Float
  y : Float
  deriving Repr

-- Creating structure instances
def origin : Point := { x := 0.0, y := 0.0 }
def point1 : Point := Point.mk 3.0 4.0
def point2 : Point := ⟨1.0, 2.0⟩

-- Accessing fields
#eval point1.x  -- 3.0
#eval point1.y  -- 4.0

-- Functions on structures
def distance (p : Point) : Float :=
  Float.sqrt (p.x * p.x + p.y * p.y)

#eval distance point1  -- 5.0

Updating Structures

Rather than modifying structures in place, Lean provides the with syntax for creating new structures based on existing ones with some fields changed. This functional update pattern means you never have to wonder whether some other part of the code is holding a reference to your data and will be surprised by your mutations.

-- Updating structures with "with" syntax
def moveRight (p : Point) (dx : Float) : Point :=
  { p with x := p.x + dx }

def moveUp (p : Point) (dy : Float) : Point :=
  { p with y := p.y + dy }

#eval moveRight origin 5.0  -- { x := 5.0, y := 0.0 }

-- Multiple field updates
def translate (p : Point) (dx dy : Float) : Point :=
  { p with x := p.x + dx, y := p.y + dy }

#eval translate origin 3.0 4.0  -- { x := 3.0, y := 4.0 }

Nested Structures

Structures can contain other structures, allowing you to build complex data hierarchies. Lean’s projection notation makes accessing nested fields readable: person.address.city works as you would hope, without the verbose getter chains of enterprise Java.

-- Nested structures
structure Rectangle where
  topLeft : Point
  bottomRight : Point
  deriving Repr

def myRect : Rectangle := {
  topLeft := { x := 0.0, y := 10.0 },
  bottomRight := { x := 10.0, y := 0.0 }
}

def width (r : Rectangle) : Float :=
  r.bottomRight.x - r.topLeft.x

def height (r : Rectangle) : Float :=
  r.topLeft.y - r.bottomRight.y

def area (r : Rectangle) : Float :=
  width r * height r

#eval area myRect  -- 100.0

Default Values

Structures can have default values for their fields, making it easy to create instances with sensible defaults while overriding only specific fields.

-- Structures with default values
structure Config where
  host : String := "localhost"
  port : Nat := 8080
  debug : Bool := false
  deriving Repr

-- Use defaults
def defaultConfig : Config := {}

-- Override some defaults
def prodConfig : Config := { host := "api.example.com", port := 443 }

#eval defaultConfig  -- { host := "localhost", port := 8080, debug := false }
#eval prodConfig     -- { host := "api.example.com", port := 443, debug := false }

Inductive Types: Enumerations

Inductive types let you define custom data types by specifying their constructors. This is the core mechanism that makes Lean’s type system expressive: natural numbers, lists, trees, and abstract syntax all emerge from the same primitive. Simple enumerations have constructors with no arguments; more complex types carry data in each variant.

-- Simple enumerations
inductive Direction where
  | north
  | south
  | east
  | west
  deriving Repr, DecidableEq

def opposite : Direction → Direction
  | .north => .south
  | .south => .north
  | .east => .west
  | .west => .east

#eval opposite Direction.north  -- Direction.south

-- Culture ship classes (for the Banksian among us)
inductive CultureShip where
  | gsv   -- General Systems Vehicle
  | gcu   -- General Contact Unit
  | rou   -- Rapid Offensive Unit
  | lcu   -- Limited Contact Unit
  | msu   -- Medium Systems Unit
  deriving Repr, DecidableEq

def threatLevel : CultureShip → Nat
  | .gsv => 1  -- It's a city, not a weapon
  | .gcu => 3  -- Diplomacy with teeth
  | .rou => 9  -- The teeth
  | .lcu => 2  -- Polite but firm
  | .msu => 4  -- Versatile

#eval threatLevel CultureShip.rou  -- 9

-- Enums with associated data
inductive Shape where
  | circle (radius : Float)
  | rectangle (width : Float) (height : Float)
  | triangle (base : Float) (height : Float)
  deriving Repr

def shapeArea : Shape → Float
  | .circle r => 3.14159 * r * r
  | .rectangle w h => w * h
  | .triangle b h => 0.5 * b * h

#eval shapeArea (Shape.circle 2.0)        -- ~12.57
#eval shapeArea (Shape.rectangle 3.0 4.0) -- 12.0
#eval shapeArea (Shape.triangle 6.0 4.0)  -- 12.0

Recursive Inductive Types

Inductive types can be recursive, allowing you to define trees, linked lists, and other self-referential structures. This is where inductive types earn their name: you define larger values in terms of smaller ones, and the recursion has a base case that grounds the whole edifice.

-- Recursive inductive types
inductive BinaryTree (α : Type) where
  | leaf : BinaryTree α
  | node : α → BinaryTree α → BinaryTree α → BinaryTree α
  deriving Repr

-- Building trees
def exampleTree : BinaryTree Nat :=
  .node 1
    (.node 2 .leaf .leaf)
    (.node 3
      (.node 4 .leaf .leaf)
      .leaf)

-- Counting nodes
def BinaryTree.size {α : Type} : BinaryTree α → Nat
  | .leaf => 0
  | .node _ left right => 1 + left.size + right.size

#eval exampleTree.size  -- 4

-- Computing depth
def BinaryTree.depth {α : Type} : BinaryTree α → Nat
  | .leaf => 0
  | .node _ left right => 1 + max left.depth right.depth

#eval exampleTree.depth  -- 3

-- In-order traversal
def BinaryTree.inorder {α : Type} : BinaryTree α → List α
  | .leaf => []
  | .node x left right => left.inorder ++ [x] ++ right.inorder

#eval exampleTree.inorder  -- [2, 1, 4, 3]

Parameterized Types

Inductive types can be parameterized, making them generic over the types they contain. This is how you write a List α that works for any element type, or an expression tree parameterized by its literal type. One definition, infinitely many instantiations.

-- Expression trees for a simple language
inductive Expr where
  | num : Int → Expr
  | add : Expr → Expr → Expr
  | mul : Expr → Expr → Expr
  | neg : Expr → Expr
  deriving Repr

-- Evaluate an expression
def Expr.eval : Expr → Int
  | .num n => n
  | .add e1 e2 => e1.eval + e2.eval
  | .mul e1 e2 => e1.eval * e2.eval
  | .neg e => -e.eval

-- (2 + 3) * -4
def expr1 : Expr := .mul (.add (.num 2) (.num 3)) (.neg (.num 4))

#eval expr1.eval  -- -20

-- Pretty printing
def Expr.toString : Expr → String
  | .num n => s!"{n}"
  | .add e1 e2 => s!"({e1.toString} + {e2.toString})"
  | .mul e1 e2 => s!"({e1.toString} * {e2.toString})"
  | .neg e => s!"(-{e.toString})"

#eval expr1.toString  -- "((2 + 3) * (-4))"

Mutual Recursion

Sometimes you need multiple definitions that refer to each other, like even and odd functions that call each other, or a parser and its sub-parsers. Lean supports mutually recursive definitions within a mutual block. The termination checker handles these jointly, so your circular definitions must still demonstrably finish.

-- Mutually recursive definitions
mutual
  def isEven : Nat → Bool
    | 0 => true
    | n + 1 => isOdd n

  def isOdd : Nat → Bool
    | 0 => false
    | n + 1 => isEven n
end

#eval isEven 10  -- true
#eval isOdd 7    -- true

-- Mutually recursive types
mutual
  inductive Tree (α : Type) where
    | node : α → Forest α → Tree α

  inductive Forest (α : Type) where
    | nil : Forest α
    | cons : Tree α → Forest α → Forest α
end

-- Example forest
def exampleForest : Forest Nat :=
  .cons (.node 1 .nil)
    (.cons (.node 2 (.cons (.node 3 .nil) .nil))
      .nil)

Role Playing Game Example

The constructs above combine naturally in larger programs. What better way to demonstrate this than a D&D character generator? Structures hold character data, inductive types represent races and classes, pattern matching computes racial bonuses, and recursion drives the dice-rolling simulation.

-- D&D Character Generator
-- Run with: lake exe dnd [seed]

structure AbilityScores where
  strength : Nat
  dexterity : Nat
  constitution : Nat
  intelligence : Nat
  wisdom : Nat
  charisma : Nat
  deriving Repr

inductive Race where
  | human      -- The "versatile" option (read: boring but effective)
  | elf        -- Pointy ears, superiority complex
  | dwarf      -- Short, angry, likes rocks
  | halfling   -- Eats seven meals a day, steals your stuff
  | dragonborn -- Scales, breath weapon, identity crisis
  | tiefling   -- Horns and daddy issues
  deriving Repr, DecidableEq

inductive CharClass where
  | fighter   -- Hit things until they stop moving
  | wizard    -- Glass cannon with a god complex
  | rogue     -- "I check for traps" (narrator: there were no traps)
  | cleric    -- The party's reluctant babysitter
  | barbarian -- Anger management issues, but make it heroic
  | bard      -- Seduces the dragon (or tries to)
  deriving Repr, DecidableEq

inductive Background where
  | soldier    -- "In my day, we walked uphill both ways to the dungeon"
  | sage       -- Actually read the quest briefing
  | criminal   -- Has a "guy" for everything
  | noble      -- Insufferable but well-funded
  | hermit     -- Lived in a cave, has opinions about society
  | entertainer -- "Anyway, here's Wonderwall"
  deriving Repr, DecidableEq

structure Character where
  name : String
  race : Race
  charClass : CharClass
  background : Background
  level : Nat
  abilities : AbilityScores
  hitPoints : Nat
  deriving Repr

def racialBonuses (r : Race) : AbilityScores :=
  match r with
  | .human      => { strength := 1, dexterity := 1, constitution := 1,
                     intelligence := 1, wisdom := 1, charisma := 1 }
  | .elf        => { strength := 0, dexterity := 2, constitution := 0,
                     intelligence := 1, wisdom := 0, charisma := 0 }
  | .dwarf      => { strength := 0, dexterity := 0, constitution := 2,
                     intelligence := 0, wisdom := 1, charisma := 0 }
  | .halfling   => { strength := 0, dexterity := 2, constitution := 0,
                     intelligence := 0, wisdom := 0, charisma := 1 }
  | .dragonborn => { strength := 2, dexterity := 0, constitution := 0,
                     intelligence := 0, wisdom := 0, charisma := 1 }
  | .tiefling   => { strength := 0, dexterity := 0, constitution := 0,
                     intelligence := 1, wisdom := 0, charisma := 2 }

def hitDie (c : CharClass) : Nat :=
  match c with
  | .wizard    => 6
  | .rogue     => 8
  | .bard      => 8
  | .cleric    => 8
  | .fighter   => 10
  | .barbarian => 12

structure RNG where
  state : Nat
  deriving Repr

def RNG.next (rng : RNG) : RNG × Nat :=
  let newState := (rng.state * 1103515245 + 12345) % (2^31)
  (⟨newState⟩, newState)

def RNG.range (rng : RNG) (lo hi : Nat) : RNG × Nat :=
  let (rng', n) := rng.next
  let range := hi - lo + 1
  (rng', lo + n % range)

def roll4d6DropLowest (rng : RNG) : RNG × Nat :=
  let (rng1, d1) := rng.range 1 6
  let (rng2, d2) := rng1.range 1 6
  let (rng3, d3) := rng2.range 1 6
  let (rng4, d4) := rng3.range 1 6
  let dice := [d1, d2, d3, d4]
  let sorted := dice.toArray.qsort (· < ·) |>.toList
  let top3 := sorted.drop 1
  (rng4, top3.foldl (· + ·) 0)

def rollAbilityScores (rng : RNG) : RNG × AbilityScores :=
  let (rng1, str) := roll4d6DropLowest rng
  let (rng2, dex) := roll4d6DropLowest rng1
  let (rng3, con) := roll4d6DropLowest rng2
  let (rng4, int) := roll4d6DropLowest rng3
  let (rng5, wis) := roll4d6DropLowest rng4
  let (rng6, cha) := roll4d6DropLowest rng5
  (rng6, { strength := str, dexterity := dex, constitution := con,
           intelligence := int, wisdom := wis, charisma := cha })

def applyRacialBonuses (base : AbilityScores) (race : Race) : AbilityScores :=
  let bonus := racialBonuses race
  { strength := base.strength + bonus.strength
    dexterity := base.dexterity + bonus.dexterity
    constitution := base.constitution + bonus.constitution
    intelligence := base.intelligence + bonus.intelligence
    wisdom := base.wisdom + bonus.wisdom
    charisma := base.charisma + bonus.charisma }

def abilityModifier (score : Nat) : Int :=
  (score : Int) / 2 - 5

def startingHP (con : Nat) (c : CharClass) : Nat :=
  let conMod := abilityModifier con
  let baseHP := hitDie c
  if conMod < 0 && baseHP < conMod.natAbs then 1
  else (baseHP : Int) + conMod |>.toNat

def pickRandom {α : Type} (rng : RNG) (xs : List α) (default : α) : RNG × α :=
  match xs with
  | [] => (rng, default)
  | _ =>
    let (rng', idx) := rng.range 0 (xs.length - 1)
    (rng', xs[idx]?.getD default)

def allRaces : List Race := [.human, .elf, .dwarf, .halfling, .dragonborn, .tiefling]
def allClasses : List CharClass := [.fighter, .wizard, .rogue, .cleric, .barbarian, .bard]
def allBackgrounds : List Background := [.soldier, .sage, .criminal, .noble, .hermit, .entertainer]

def fantasyNames : List String := [
  "Thorin Oakenshield", "Elara Nightwhisper", "Grimjaw the Unpleasant",
  "Sir Reginald von Pompous III", "Stabitha McBackstab", "Chad Thunderbicep",
  "Cornelius the Combustible", "Snarfblatt", "Mysterious Hooded Figure #7",
  "That One Guy From That Thing", "Bard McBardface", "Aethon Dragonsbane",
  "Pip Goodbarrel", "Zyx'thorax the Unpronounceable"
]

def generateCharacter (seed : Nat) : Character :=
  let rng : RNG := ⟨seed⟩
  let (rng1, name) := pickRandom rng fantasyNames "Adventurer"
  let (rng2, race) := pickRandom rng1 allRaces .human
  let (rng3, charClass) := pickRandom rng2 allClasses .fighter
  let (rng4, background) := pickRandom rng3 allBackgrounds .soldier
  let (_, baseAbilities) := rollAbilityScores rng4
  let abilities := applyRacialBonuses baseAbilities race
  let hp := startingHP abilities.constitution charClass
  { name := name
    race := race
    charClass := charClass
    background := background
    level := 1
    abilities := abilities
    hitPoints := hp }

def raceName : Race → String
  | .human => "Human"
  | .elf => "Elf"
  | .dwarf => "Dwarf"
  | .halfling => "Halfling"
  | .dragonborn => "Dragonborn"
  | .tiefling => "Tiefling"

def className : CharClass → String
  | .fighter => "Fighter"
  | .wizard => "Wizard"
  | .rogue => "Rogue"
  | .cleric => "Cleric"
  | .barbarian => "Barbarian"
  | .bard => "Bard"

def backgroundName : Background → String
  | .soldier => "Soldier"
  | .sage => "Sage"
  | .criminal => "Criminal"
  | .noble => "Noble"
  | .hermit => "Hermit"
  | .entertainer => "Entertainer"

def formatModifier (score : Nat) : String :=
  let m := abilityModifier score
  if m >= 0 then s!"+{m}" else s!"{m}"

def printCharacter (c : Character) : IO Unit := do
  IO.println "======================================"
  IO.println s!"  {c.name}"
  IO.println "======================================"
  IO.println s!"  Level {c.level} {raceName c.race} {className c.charClass}"
  IO.println s!"  Background: {backgroundName c.background}"
  IO.println ""
  IO.println "  ABILITY SCORES"
  IO.println "  --------------"
  IO.println s!"  STR: {c.abilities.strength} ({formatModifier c.abilities.strength})"
  IO.println s!"  DEX: {c.abilities.dexterity} ({formatModifier c.abilities.dexterity})"
  IO.println s!"  CON: {c.abilities.constitution} ({formatModifier c.abilities.constitution})"
  IO.println s!"  INT: {c.abilities.intelligence} ({formatModifier c.abilities.intelligence})"
  IO.println s!"  WIS: {c.abilities.wisdom} ({formatModifier c.abilities.wisdom})"
  IO.println s!"  CHA: {c.abilities.charisma} ({formatModifier c.abilities.charisma})"
  IO.println ""
  IO.println s!"  Hit Points: {c.hitPoints}"
  IO.println s!"  Hit Die: d{hitDie c.charClass}"
  IO.println "======================================"

def main (args : List String) : IO Unit := do
  let seed := match args.head? >>= String.toNat? with
    | some n => n
    | none => 42

  IO.println ""
  IO.println "  D&D CHARACTER GENERATOR"
  IO.println "  Type-Safe Adventures Await!"
  IO.println ""

  let character := generateCharacter seed

  printCharacter character

  IO.println ""
  IO.println "Your adventure begins..."
  IO.println "(Use a different seed for a different character: lake exe dnd <seed>)"

Build and run the character generator with:

lake exe dnd 42

Try different seeds to generate different characters. The generator uses a deterministic pseudo-random number generator, so the same seed always produces the same character.

Polymorphism and Type Classes

On September 23, 1999, the Mars Climate Orbiter disintegrated in the Martian atmosphere because one piece of software produced thrust data in pound-force seconds while another expected newton-seconds. A \$327 million spacecraft, destroyed by a unit conversion error that any undergraduate physics student could catch. The software worked. The math was correct. The types simply failed to express the constraints that mattered.

This is what motivates the machinery in this article. Type classes and phantom types let you encode dimensional constraints directly in the type system. You cannot add meters to seconds because the compiler will not let you. You cannot pass thrust in the wrong units because the function signature forbids it. These constraints cost nothing at runtime, they compile away completely, but they catch at compile time the very class of error that destroyed a spacecraft. By the end of this article, you will see how to build such a system yourself.

But safety is only half the story. Polymorphism is also about not writing the same code twice. A sorting algorithm should not care whether it sorts integers, strings, or financial instruments. A data structure should not be rewritten for each type it might contain. The alternative is copying code and changing types by hand, which is how bugs are born and how programmers lose their minds. Polymorphism is the machinery that makes abstraction possible without sacrificing type safety.

In 1967, Christopher Strachey drew a distinction that would shape programming languages for decades: parametric polymorphism, where code works uniformly for all types, versus ad-hoc polymorphism, where the behavior changes based on the specific type involved. The first gives you reverse : List α → List α, a function blissfully ignorant of what the list contains. The second gives you +, which does quite different things to integers, floats, and matrices. Lean provides both, unified under a type class system that traces its lineage through Haskell back to Wadler and Blott’s 1989 paper How to Make Ad-Hoc Polymorphism Less Ad Hoc. The result is generic code that is simultaneously flexible and precise.

Parametric Polymorphism

Functions can take type parameters, allowing them to work with any type without knowing or caring what that type is. The function length : List α → Nat counts elements whether they are integers, strings, or proof terms. The curly braces indicate implicit arguments that Lean infers automatically, sparing you the tedium of writing length (α := Int) myList everywhere.

def identity {α : Type} (x : α) : α := x

#eval identity 42          -- 42
#eval identity "hello"     -- "hello"
#eval identity [1, 2, 3]   -- [1, 2, 3]

def compose {α β γ : Type} (g : β → γ) (f : α → β) : α → γ :=
  fun x => g (f x)

def addOne (x : Nat) : Nat := x + 1
def double (x : Nat) : Nat := x * 2

#eval compose double addOne 5  -- 12

def flip {α β γ : Type} (f : α → β → γ) : β → α → γ :=
  fun b a => f a b

#eval flip Nat.sub 3 10  -- 7

Polymorphic Data Types

Data types can also be parameterized, creating generic containers that work with any element type. You define List α once and get lists of integers, lists of strings, and lists of lists for free. The alternative, writing IntList, StringList, and ListOfLists separately, is how Java programmers spent the 1990s.

def Pair (α β : Type) := α × β

def makePair {α β : Type} (a : α) (b : β) : Pair α β := (a, b)

#eval makePair 1 "one"  -- (1, "one")

inductive Either (α β : Type) where
  | left : α → Either α β
  | right : β → Either α β
  deriving Repr

def mapEither {α β γ : Type} (f : β → γ) : Either α β → Either α γ
  | .left a => .left a
  | .right b => .right (f b)

#eval mapEither (· + 1) (Either.right 5 : Either String Nat)

Implicit Arguments

Implicit arguments in curly braces are inferred by Lean from context. When inference fails or you want to override it, the @ prefix makes everything explicit. This escape hatch is rarely needed, but when you need it, you really need it.

def listLength {α : Type} (xs : List α) : Nat :=
  match xs with
  | [] => 0
  | _ :: rest => 1 + listLength rest

#eval listLength [1, 2, 3]        -- 3
#eval listLength ["a", "b"]       -- 2
#eval @listLength Nat [1, 2, 3]   -- explicit type argument

def firstOrDefault {α : Type} (xs : List α) (default : α) : α :=
  match xs with
  | [] => default
  | x :: _ => x

#eval firstOrDefault [1, 2, 3] 0      -- 1
#eval firstOrDefault ([] : List Nat) 0 -- 0

Instance Arguments

Square brackets denote instance arguments, resolved through type class inference. When you write [Add α], you are saying “give me any type that knows how to add.” The compiler finds the right implementation automatically. This is the mechanism that lets + work on integers, floats, vectors, and anything else with an Add instance.

def printTwice {α : Type} [ToString α] (x : α) : String :=
  s!"{x} and {x}"

#eval printTwice 42       -- "42 and 42"
#eval printTwice true     -- "true and true"
#eval printTwice "hi"     -- "hi and hi"

def maximum {α : Type} [Ord α] (xs : List α) : Option α :=
  xs.foldl (init := none) fun acc x =>
    match acc with
    | none => some x
    | some m => if compare x m == .gt then some x else some m

#eval maximum [3, 1, 4, 1, 5, 9]  -- some 9
#eval maximum ["b", "a", "c"]     -- some "c"

Defining Type Classes

Type classes define interfaces that types can implement, but unlike object-oriented interfaces, the implementation is external to the type. You can add new capabilities to existing types without modifying them. This is how Lean can make + work for types defined in libraries you do not control.

class Printable (α : Type) where
  format : α → String

instance : Printable Nat where
  format n := s!"Nat({n})"

instance : Printable Bool where
  format b := if b then "yes" else "no"

instance : Printable String where
  format s := s!"\"{s}\""

def showValue {α : Type} [Printable α] (x : α) : String :=
  Printable.format x

#eval showValue 42      -- "Nat(42)"
#eval showValue true    -- "yes"
#eval showValue "test"  -- "\"test\""

Tip

A type class constraint like [Add α] is a proof obligation. The compiler must find evidence that α supports addition. Instance resolution is proof search. This connection between “finding implementations” and “finding proofs” is not a metaphor; it is the same mechanism. When you reach the Proofs article, you will see the compiler searching for proofs exactly as it searches for instances here.

Polymorphic Instances

Instances themselves can be polymorphic, building complex instances from simpler ones. If you can print an α, you can print a List α. This compositionality is the quiet superpower of type classes: small building blocks assemble into sophisticated behavior without explicit wiring.

instance {α : Type} [Printable α] : Printable (List α) where
  format xs :=
    let items := xs.map Printable.format
    "[" ++ ", ".intercalate items ++ "]"

instance {α β : Type} [Printable α] [Printable β] : Printable (α × β) where
  format p := s!"({Printable.format p.1}, {Printable.format p.2})"

#eval showValue [1, 2, 3]           -- "[Nat(1), Nat(2), Nat(3)]"
#eval showValue (42, true)          -- "(Nat(42), yes)"
#eval showValue [(1, true), (2, false)]

Numeric Type Classes

Type classes excel at abstracting over numeric operations. Write your algorithm once against an abstract Mul and Add, and it works for integers, rationals, complex numbers, matrices, and polynomials. The abstraction costs nothing at runtime because instance resolution happens at compile time. The generic code specializes to concrete operations in the generated code.

class Addable (α : Type) where
  add : α → α → α
  zero : α

instance : Addable Nat where
  add := Nat.add
  zero := 0

instance : Addable Int where
  add := Int.add
  zero := 0

instance : Addable Float where
  add := Float.add
  zero := 0.0

def sumList {α : Type} [Addable α] (xs : List α) : α :=
  xs.foldl Addable.add Addable.zero

#eval sumList [1, 2, 3, 4, 5]           -- 15
#eval sumList [1.5, 2.5, 3.0]           -- 7.0
#eval sumList ([-1, 2, -3] : List Int)  -- -2

Extending Classes

Type classes can extend other classes, inheriting their operations while adding new ones. An Ord instance gives you compare, and from that you get <, , >, , min, and max for free. The hierarchy of algebraic structures in Mathlib, from magmas through groups to rings and fields, is built this way.

class Eq' (α : Type) where
  eq : α → α → Bool

class Ord' (α : Type) extends Eq' α where
  lt : α → α → Bool

instance : Eq' Nat where
  eq := (· == ·)

instance : Ord' Nat where
  eq := (· == ·)
  lt := (· < ·)

def sortedInsert {α : Type} [Ord' α] (x : α) (xs : List α) : List α :=
  match xs with
  | [] => [x]
  | y :: ys => if Ord'.lt x y then x :: y :: ys else y :: sortedInsert x ys

#eval sortedInsert 3 [1, 2, 4, 5]  -- [1, 2, 3, 4, 5]

Functor

The Functor pattern captures the idea of mapping a function over a structure while preserving its shape. Lists, options, arrays, trees, and IO actions are all functors. Once you see the pattern, you see it everywhere: any context that wraps a value and lets you transform that value without escaping the context.

class Functor' (F : Type → Type) where
  map : {α β : Type} → (α → β) → F α → F β

instance : Functor' List where
  map := List.map

instance : Functor' Option where
  map f
    | none => none
    | some x => some (f x)

def doubleAll {F : Type → Type} [Functor' F] (xs : F Nat) : F Nat :=
  Functor'.map (· * 2) xs

#eval doubleAll [1, 2, 3]      -- [2, 4, 6]
#eval doubleAll (some 21)      -- some 42
#eval doubleAll (none : Option Nat)  -- none

Multiple Constraints

Functions can require multiple type class constraints, combining capabilities from different classes. A sorting function needs Ord; a function that sorts and prints needs [Ord α] [Repr α]. The constraints document exactly what your function requires, nothing more, nothing less.

def showCompare {α : Type} [ToString α] [Ord α] (x y : α) : String :=
  let result := match compare x y with
    | .lt => "less than"
    | .eq => "equal to"
    | .gt => "greater than"
  s!"{x} is {result} {y}"

#eval showCompare 3 5       -- "3 is less than 5"
#eval showCompare "b" "a"   -- "b is greater than a"

def sortAndShow {α : Type} [ToString α] [Ord α] (xs : List α) : String :=
  let sorted := xs.toArray.qsort (compare · · == .lt) |>.toList
  s!"{sorted}"

#eval sortAndShow [3, 1, 4, 1, 5]  -- "[1, 1, 3, 4, 5]"

Deriving Instances

Many standard type classes can be automatically derived, saving you from writing boilerplate that follows predictable patterns. The deriving clause generates instances for Repr, BEq, Hashable, and others. Let the machine do the mechanical work; save your creativity for the parts that require thought.

structure Point' where
  x : Nat
  y : Nat
  deriving Repr, BEq, Hashable

#eval Point'.mk 1 2 == Point'.mk 1 2  -- true
#eval Point'.mk 1 2 == Point'.mk 3 4  -- false

inductive Color where
  | red | green | blue
  deriving Repr, DecidableEq

#eval Color.red == Color.red    -- true
#eval Color.red == Color.blue   -- false

structure Person where
  name : String
  age : Nat
  deriving Repr, BEq

def alice : Person := { name := "Alice", age := 30 }
def bob : Person := { name := "Bob", age := 25 }

#eval alice == bob  -- false
#eval repr alice    -- { name := "Alice", age := 30 }

Semigroups and Monoids

Algebraic structures like semigroups and monoids capture patterns that recur across mathematics and programming. A semigroup has an associative operation; a monoid adds an identity element. String concatenation, list append, function composition, and integer addition are all monoids. Recognizing the common structure lets you write code once and apply it to all of them.

class Semigroup (α : Type) where
  append : α → α → α

class Monoid' (α : Type) extends Semigroup α where
  empty : α

def concat {α : Type} [Monoid' α] (xs : List α) : α :=
  xs.foldl Semigroup.append Monoid'.empty

instance : Monoid' String where
  append := String.append
  empty := ""

instance {α : Type} : Monoid' (List α) where
  append := List.append
  empty := []

#eval concat ["Hello", " ", "World"]  -- "Hello World"
#eval concat [[1, 2], [3], [4, 5]]    -- [1, 2, 3, 4, 5]

Units of Measurement: Safety for Free

Here is the payoff for all the type class machinery. We can prevent the Mars Orbiter bug entirely, not through runtime checks but through types that make the error inexpressible. Consider representing physical quantities with phantom types:

/-!
# Type-Safe Units of Measurement

A demonstration of using phantom types and type classes to enforce dimensional
correctness at compile time with zero runtime overhead. The types exist only
in the compiler's mind; the generated code manipulates raw floats.
-/

-- Phantom types representing physical dimensions
-- These have no constructors and no runtime representation
structure Meters
structure Seconds
structure Kilograms
structure MetersPerSecond
structure MetersPerSecondSquared
structure Newtons
structure NewtonSeconds

-- A quantity is a value tagged with its unit type
-- At runtime, this is just a Float
structure Quantity (unit : Type) where
  val : Float
  deriving Repr

-- Type class for multiplying quantities with compatible units
class UnitMul (u1 u2 result : Type) where

-- Type class for dividing quantities with compatible units
class UnitDiv (u1 u2 result : Type) where

-- Define the dimensional relationships
instance : UnitDiv Meters Seconds MetersPerSecond where
instance : UnitDiv MetersPerSecond Seconds MetersPerSecondSquared where
instance : UnitMul Kilograms MetersPerSecondSquared Newtons where
instance : UnitMul Newtons Seconds NewtonSeconds where

-- Arithmetic operations that preserve dimensional correctness
def Quantity.add {u : Type} (x y : Quantity u) : Quantity u :=
  ⟨x.val + y.val⟩

def Quantity.sub {u : Type} (x y : Quantity u) : Quantity u :=
  ⟨x.val - y.val⟩

def Quantity.mul {u1 u2 u3 : Type} [UnitMul u1 u2 u3] (x : Quantity u1) (y : Quantity u2) : Quantity u3 :=
  ⟨x.val * y.val⟩

def Quantity.div {u1 u2 u3 : Type} [UnitDiv u1 u2 u3] (x : Quantity u1) (y : Quantity u2) : Quantity u3 :=
  ⟨x.val / y.val⟩

def Quantity.scale {u : Type} (x : Quantity u) (s : Float) : Quantity u :=
  ⟨x.val * s⟩

-- Smart constructors for readability
def meters (v : Float) : Quantity Meters := ⟨v⟩
def seconds (v : Float) : Quantity Seconds := ⟨v⟩
def kilograms (v : Float) : Quantity Kilograms := ⟨v⟩
def newtons (v : Float) : Quantity Newtons := ⟨v⟩
def newtonSeconds (v : Float) : Quantity NewtonSeconds := ⟨v⟩

-- Example: Computing velocity from distance and time
-- This compiles because Meters / Seconds = MetersPerSecond
def computeVelocity (distance : Quantity Meters) (time : Quantity Seconds)
    : Quantity MetersPerSecond :=
  distance.div time

-- Example: Computing force from mass and acceleration
-- This compiles because Kilograms * MetersPerSecondSquared = Newtons
def computeForce (mass : Quantity Kilograms) (accel : Quantity MetersPerSecondSquared)
    : Quantity Newtons :=
  mass.mul accel

-- Example: Computing impulse from force and time
-- This compiles because Newtons * Seconds = NewtonSeconds
def computeImpulse (force : Quantity Newtons) (time : Quantity Seconds)
    : Quantity NewtonSeconds :=
  force.mul time

-- The Mars Climate Orbiter scenario:
-- One team computes impulse in Newton-seconds
-- Another team expects the same units
-- The type system ensures they match

def thrusterImpulse : Quantity NewtonSeconds :=
  let force := newtons 450.0
  let burnTime := seconds 10.0
  computeImpulse force burnTime

-- This would NOT compile - you cannot add NewtonSeconds to Meters:
-- def nonsense := thrusterImpulse.add (meters 100.0)

-- This would NOT compile - you cannot pass Meters where Seconds expected:
-- def badVelocity := computeVelocity (meters 100.0) (meters 50.0)

-- The key insight: all the Quantity wrappers and phantom types vanish at runtime.
-- The generated code is just floating-point arithmetic.
-- The safety is free.

def main : IO Unit := do
  let distance := meters 100.0
  let time := seconds 9.58  -- Usain Bolt's 100m record
  let velocity := computeVelocity distance time

  IO.println s!"Distance: {distance.val} m"
  IO.println s!"Time: {time.val} s"
  IO.println s!"Velocity: {velocity.val} m/s"

  let mass := kilograms 1000.0
  let accel : Quantity MetersPerSecondSquared := ⟨9.81⟩
  let force := computeForce mass accel

  IO.println s!"Mass: {mass.val} kg"
  IO.println s!"Acceleration: {accel.val} m/s²"
  IO.println s!"Force: {force.val} N"

  let impulse := computeImpulse force (seconds 5.0)
  IO.println s!"Impulse: {impulse.val} N·s"

The Quantity type wraps a Float but carries a phantom type parameter representing its unit. You cannot add meters to seconds because Quantity.add requires both arguments to have the same unit type. You cannot pass thrust in the wrong units because the function signature encodes the dimensional requirements.

The crucial insight is that these phantom types vanish at runtime. The Meters and Seconds types have no constructors, no fields, no runtime representation whatsoever. The generated code manipulates raw floats with raw floating-point operations. The type checker enforces dimensional correctness; the compiled program pays no cost for it. This is the dream of static typing: safety that exists only in the compiler’s mind, free at runtime, catching at compile time the very class of error that destroyed a spacecraft.

There is a broader lesson here about the direction of software. The mathematics that physicists scribble on whiteboards, the dimensional analysis that engineers perform by hand, the invariants that programmers hold in their heads and document in comments: these are all pseudocode. They are precise enough for humans to follow but not precise enough for machines to verify. The project of programming language research, from Curry and Howard through ML and Haskell to Lean and dependent types, has been to formalize this pseudocode. To turn informal reasoning into machine-checked artifacts.

As code generation by large language models becomes routine, this formalization becomes essential. A neural network can produce syntactically correct code that passes tests yet harbors subtle unit errors, off-by-one mistakes, and violated invariants. The guardrails cannot be more tests, more code review, more human attention. The guardrails must be formalisms that make entire categories of errors unrepresentable. Type classes, phantom types, dependent types: these are not academic curiosities but safety controls for a future where most code is synthesized. The Mars Climate Orbiter was written by humans who made a human error. The code that replaces them must be held to a higher standard. (For more on this trajectory, see Artificial Intelligence.)

Build and run with:

lake exe units

Monads

“A monad is just a monoid in the category of endofunctors, what’s the problem?” This infamous quip became the ur-meme of functional programming, spawning a thousand blog posts explaining monads via burritos, boxes, and space suits. The tragedy is that the concept is not hard. It just got wrapped in mystique before anyone explained it clearly.

Here is what matters: programs have effects. They might fail, consult state, perform IO, branch nondeterministically. The question is how to represent these effects in a way that lets us reason about composition. Monads are one answer. They capture a pattern for sequencing operations where each step produces both a result and some context. Bind chains these operations, threading the context automatically. Do notation makes the sequencing readable. The interface is minimal; the applications are broad.

But monads are not the only answer, and treating them as sacred obscures the deeper point. Algebraic effect systems, linear types, graded monads, and effect handlers all attack the same problem from different angles. What they share is the conviction that effects should be visible in types and that composition should be governed by laws. The specific mechanism matters less than the principle: make the structure explicit so that humans and machines can reason about it.

Lean uses monads because they work well and the ecosystem inherited them from functional programming’s Haskell lineage. They are a good tool. But the goal is not to be monadic; the goal is to capture effects algebraically, whatever form that takes. When you understand monads, you understand one particularly elegant solution to sequencing effectful computations. You also understand a template for how programming abstractions should work: a minimal interface, a set of laws, and the discipline to respect both.

The Option Monad

The simplest monad handles computations that might fail. You already understand this pattern: look something up, and if it exists, do something with it; if not, propagate the absence. Every programmer has written this code a hundred times. The monad just gives it a name and a uniform interface.

def safeDivide (x y : Nat) : Option Nat :=
  if y == 0 then none else some (x / y)

def safeHead {α : Type} (xs : List α) : Option α :=
  match xs with
  | [] => none
  | x :: _ => some x

#eval safeDivide 10 2   -- some 5
#eval safeDivide 10 0   -- none
#eval safeHead [1, 2]   -- some 1
#eval safeHead ([] : List Nat)  -- none

Chaining Without Monads

Without the abstraction, chaining fallible operations produces the pyramid of doom: nested conditionals, each handling failure explicitly, the actual logic buried under boilerplate. This is not hypothetical. This is what error handling looks like in languages without monadic structure. It is also what early JavaScript looked like before Promises, which are, of course, monads by another name.

def computation (xs : List Nat) : Option Nat :=
  match safeHead xs with
  | none => none
  | some x =>
    match safeDivide 100 x with
    | none => none
    | some y => some (y + 1)

#eval computation [5, 2, 3]   -- some 21
#eval computation [0, 2, 3]   -- none (division by zero)
#eval computation []          -- none (empty list)

The bind Operation

The bind operation (written >>=) is the heart of the monad. It takes a value in context and a function that produces a new value in context, and chains them together. For Option, this means: if the first computation succeeded, apply the function; if it failed, propagate the failure. The pattern generalizes far beyond failure, but failure is the clearest example.

def computation' (xs : List Nat) : Option Nat :=
  safeHead xs |>.bind fun x =>
  safeDivide 100 x |>.bind fun y =>
  some (y + 1)

#eval computation' [5, 2, 3]  -- some 21
#eval computation' [0, 2, 3]  -- none
#eval computation' []         -- none

Do Notation

Do notation is syntactic sugar that makes monadic code look imperative. The left arrow desugars to bind; the semicolon sequences operations. This is not a concession to programmers who cannot handle functional style. It is recognition that sequential composition is how humans think about processes, and fighting that serves no purpose. The abstraction remains; only the syntax yields to ergonomics.

def computationDo (xs : List Nat) : Option Nat := do
  let x ← safeHead xs
  let y ← safeDivide 100 x
  return y + 1

#eval computationDo [5, 2, 3]  -- some 21
#eval computationDo [0, 2, 3]  -- none
#eval computationDo []         -- none

def validateInput (name : String) (age : Nat) : Option (String × Nat) := do
  if name.isEmpty then none
  if age == 0 then none
  return (name, age)

#eval validateInput "Alice" 30  -- some ("Alice", 30)
#eval validateInput "" 30       -- none
#eval validateInput "Bob" 0     -- none

The Except Monad

Option tells you that something failed but not why. Except carries the reason. This is the difference between a function returning null and a function throwing an exception with a message. The monadic structure is identical; only the context changes. This uniformity is the point. Learn the pattern once, apply it to failure, to errors, to state, to nondeterminism, to parsing, to probability distributions. The shape is always the same.

inductive ValidationError where
  | emptyName
  | invalidAge (age : Nat)
  | missingField (field : String)
  deriving Repr

def validateName (name : String) : Except ValidationError String :=
  if name.isEmpty then .error .emptyName
  else .ok name

def validateAge (age : Nat) : Except ValidationError Nat :=
  if age == 0 || age > 150 then .error (.invalidAge age)
  else .ok age

def validatePerson (name : String) (age : Nat) : Except ValidationError (String × Nat) := do
  let validName ← validateName name
  let validAge ← validateAge age
  return (validName, validAge)

#eval validatePerson "Alice" 30   -- Except.ok ("Alice", 30)
#eval validatePerson "" 30        -- Except.error ValidationError.emptyName
#eval validatePerson "Bob" 200    -- Except.error (ValidationError.invalidAge 200)

The State Monad

The state monad threads mutable state through a pure computation. You get the ergonomics of mutation, the ability to read and write a value as you go, without actually mutating anything. Each computation takes a state and returns a new state alongside its result. The threading is automatic, hidden behind the monadic interface. This is not a trick. It is a different way of thinking about state: not as a mutable box but as a value that flows through your computation, transformed at each step.

abbrev State' (σ α : Type) := σ → (α × σ)

def get'' {σ : Type} : State' σ σ := fun s => (s, s)

def set'' {σ : Type} (newState : σ) : State' σ Unit := fun _ => ((), newState)

def modify'' {σ : Type} (f : σ → σ) : State' σ Unit := fun s => ((), f s)

def runState' {σ α : Type} (init : σ) (m : State' σ α) : α × σ :=
  m init

def counter' : State' Nat Nat := fun n => (n, n + 1)

#eval runState' 0 counter'       -- (0, 1)
#eval runState' 10 counter'      -- (10, 11)

StateM in Practice

Lean provides StateM, a production-ready state monad. The operations get, set, and modify do exactly what their names suggest. Combined with do notation, stateful code looks almost identical to imperative code, except that the state is explicit in the type and the purity is preserved. You can run the same computation with different initial states and get reproducible results. You can reason about what the code does without worrying about hidden mutation elsewhere.

def tick : StateM Nat Unit := modify (· + 1)

def getTicks : StateM Nat Nat := get

def countOperations : StateM Nat Nat := do
  tick
  tick
  tick
  let count ← getTicks
  return count

#eval countOperations.run 0    -- (3, 3)
#eval countOperations.run 10   -- (13, 13)

The List Monad

Lists as a monad represent nondeterministic computation: a value that could be many things at once. Bind explores all combinations, like nested loops but without the nesting. This is how you generate permutations, enumerate possibilities, or implement backtracking search. The abstraction is the same; only the interpretation differs. A monad does not care whether its context is failure, state, or multiplicity. It only knows how to sequence.

def pairs (xs : List Nat) (ys : List Nat) : List (Nat × Nat) :=
  xs.flatMap fun x => ys.map fun y => (x, y)

#eval pairs [1, 2] [10, 20]  -- [(1, 10), (1, 20), (2, 10), (2, 20)]

def pythagTriples (n : Nat) : List (Nat × Nat × Nat) :=
  (List.range n).flatMap fun a =>
  (List.range n).flatMap fun b =>
  (List.range n).filterMap fun c =>
    if a * a + b * b == c * c && a > 0 && b > 0 then
      some (a, b, c)
    else
      none

#eval pythagTriples 15  -- [(3, 4, 5), (4, 3, 5), (5, 12, 13), ...]

The Monad Type Class

Under the hood, all monads implement the same interface. pure lifts a plain value into the monadic context. bind sequences two computations, passing the result of the first to the second. That is the entire interface. Everything else, the do notation, the specialized operations, the ergonomic helpers, builds on these two primitives. The simplicity is deliberate. A minimal interface means maximal generality.

class Monad' (M : Type → Type) extends Functor M where
  pure' : {α : Type} → α → M α
  bind' : {α β : Type} → M α → (α → M β) → M β

instance : Monad' Option where
  map f
    | none => none
    | some x => some (f x)
  pure' := some
  bind' m f := match m with
    | none => none
    | some x => f x

instance : Monad' List where
  map := List.map
  pure' x := [x]
  bind' m f := m.flatMap f

Monad Laws

Here is where the algebra becomes essential. Monads must satisfy three laws: left identity, right identity, and associativity. These are not suggestions. They are the contract that makes generic programming possible.

In the traditional bind/return formulation:

LawLeanMath
Left Identitypure a >>= f = f a\(\eta(a) \star f = f(a)\)
Right Identitym >>= pure = m\(m \star \eta = m\)
Associativity(m >>= f) >>= g = m >>= (λ x => f x >>= g)\((m \star f) \star g = m \star (\lambda x. f(x) \star g)\)

The same laws look cleaner in the Kleisli category, where we compose monadic functions directly. If \(f : A \to M B\) and \(g : B \to M C\), their Kleisli composition is \(g \circ f : A \to M C\):

LawLeanMath
Left Identitypure >=> f = f\(\eta \circ f = f\)
Right Identityf >=> pure = f\(f \circ \eta = f\)
Associativity(f >=> g) >=> h = f >=> (g >=> h)\((h \circ g) \circ f = h \circ (g \circ f)\)

The Kleisli formulation reveals that monads give you a category where objects are types and morphisms are functions \(A \to M B\). The laws say pure is the identity morphism and >=> is associative composition. A monad is a way of embedding effectful computation into the compositional structure of functions.

-- Left identity: pure a >>= f  =  f a
example (f : Nat → Option Nat) (a : Nat) :
    (pure a >>= f) = f a := rfl

-- Right identity: m >>= pure  =  m
example (m : Option Nat) : (m >>= pure) = m := by
  cases m <;> rfl

-- Associativity: (m >>= f) >>= g  =  m >>= (fun x => f x >>= g)
example (m : Option Nat) (f : Nat → Option Nat) (g : Nat → Option Nat) :
    ((m >>= f) >>= g) = (m >>= fun x => f x >>= g) := by
  cases m <;> rfl

Tip

In most languages, monad laws are documentation or tests. In Lean, they are theorems. You can state them as propositions and prove them. This is the Curry-Howard correspondence at work: the law “left identity” becomes a type, and proving it means constructing a value of that type. The Proofs article shows how.

At this point someone usually asks what a monad “really is.” The answers have become a genre: a burrito, a spacesuit, a programmable semicolon, a monoid in the category of endofunctors. These metaphors are not wrong, but they are not enlightening either. A monad is the three laws above and nothing else. Everything follows from the laws. The metaphors are for people who want to feel like they understand before they do the work of understanding.

If you want the category theory (and category theorists will tell you this is all “abstract nonsense,” which is their term of endearment for their own field): a monad is a monoid object in the monoidal category of endofunctors under composition. Equivalently, it is a lax 2-functor from the terminal 2-category to Cat. A category is just a monoid in the endomorphism hom-category of the bicategory of spans. The Kleisli category is the free algebra of the monad. some is just the identity morphism in the Kleisli category of Option. In Haskell they called it Just, which is just an endomorphism. Are you enlightened yet? No? Good. Learn the laws.

Early Return

Do notation supports early return, loops, and mutable references, all the imperative conveniences. Combined with monads, this gives you the syntax of imperative programming with the semantics of pure functions. You can write code that reads like Python and reasons like Haskell. This is not cheating. It is the whole point: capturing effects in types so that the compiler knows what your code might do, while letting you write in whatever style is clearest.

def findFirst {α : Type} (p : α → Bool) (xs : List α) : Option α := do
  for x in xs do
    if p x then return x
  none

#eval findFirst (· > 5) [1, 2, 3, 7, 4]  -- some 7
#eval findFirst (· > 10) [1, 2, 3]       -- none

def processUntilError (xs : List Nat) : Except String (List Nat) := do
  let mut results := []
  for x in xs do
    if x == 0 then
      throw "encountered zero"
    results := results ++ [x * 2]
  return results

#eval processUntilError [1, 2, 3]     -- Except.ok [2, 4, 6]
#eval processUntilError [1, 0, 3]     -- Except.error "encountered zero"

Combining Monadic Operations

Functions like mapM and filterM combine monadic operations over collections. Map a fallible function over a list and either get all the results or the first failure. Filter a list with a predicate that consults external state. These combinators emerge naturally once you have the abstraction. They are not special cases but instances of a general pattern, composable because they respect the monad laws.

def mayFail (x : Nat) : Option Nat :=
  if x == 0 then none else some (100 / x)

def processAll (xs : List Nat) : Option (List Nat) :=
  xs.mapM mayFail

#eval processAll [1, 2, 4, 5]   -- some [100, 50, 25, 20]
#eval processAll [1, 0, 4, 5]   -- none

def filterValid (xs : List Nat) : List Nat :=
  xs.filterMap mayFail

#eval filterValid [1, 0, 2, 0, 4]  -- [100, 50, 25]

The Larger Pattern

Monads are one algebraic structure among many. Functors capture mapping. Applicatives capture independent combination. Monads capture dependent sequencing. Comonads capture context extraction. Arrows generalize computation graphs. Algebraic effects decompose monads into composable pieces. Each abstraction comes with laws, and those laws are the actual content. The specific names matter less than the discipline: identify a pattern, find its laws, and build an interface that enforces them.

The trajectory of programming language research has been toward making this structure explicit. Effects that C programmers handle with conventions, functional programmers handle with types. Invariants that documentation describes, dependent types enforce. Properties that tests sample, proofs establish. Each step reduces the burden on human memory and attention, encoding knowledge in artifacts that machines can check.

This matters because the economics of software are changing. When code is cheap to generate, correctness becomes the bottleneck. A language model can produce plausible implementations faster than any human, but “plausible” is not “correct.” The leverage shifts to whoever can specify precisely what correct means. Types, laws, contracts, proofs: these are the tools for specifying. Monads are a small example, one worked case of a pattern made precise. The concept itself was always simple. Sequencing with context. The value was never in the mystery but in the laws that let us reason compositionally about programs we did not write and cannot fully read. (For more on where this is heading, see Artificial Intelligence.)

IO and Concurrency

Programs must touch the world. They read files, open sockets, query databases, render pixels, and occasionally launch missiles. Pure functions, by definition, cannot do any of this. They map inputs to outputs, indifferent to the universe outside their arguments. And yet we write programs precisely to change that universe, to impose our will on entropy, to build something that persists after the process exits. This is the fundamental tension of typed functional programming, and IO is how we resolve it.

The insight is to describe impure computations as pure values. An IO String is not a string obtained through side effects; it is a pure, inert description of a computation that, when executed by the runtime, will interact with the world and produce a string. The description itself is referentially transparent. You can pass it around, store it in data structures, combine it with other descriptions, all without anything actually happening. Only when the runtime interprets the description do effects occur. The effect is separated from the specification of the effect, and that separation is everything.

When effects are explicit in types, the signature readFile : FilePath → IO String tells you something important: this function touches the filesystem. Compare this to a language where any function might secretly perform IO, where you cannot know without reading the implementation whether getUserName() is a pure lookup or a network call. The type becomes documentation that cannot lie. More than that, it becomes a constraint that tools can check. A function typed as pure cannot sneak in a database write. The compiler enforces the separation.

Effects come in varieties. Observable effects change the world in ways the program can later detect: writing a file, then reading it back. Unobservable effects change the world but leave no trace the program can see: logging to stderr, incrementing a metrics counter, writing to /dev/null. Then there are effects with international consequences: the missile launch, the funds transfer, the irreversible deletion. Code that launches missiles should look different from code that formats strings. When effects are tracked in types, it does. The engineer reading the signature knows what they are dealing with. The engineer writing the code must confront, at every function boundary, whether this operation belongs in IO or whether it can remain pure. A bear foraging in the forest leaves tracks; a program touching the world should leave type signatures. This is craftsmanship: knowing the consequences of your logic and designing systems where those consequences are visible, reviewable, and constrained.

This matters for the same reason everything in this series matters: we are building systems too complex to hold in our heads, increasingly written by machines that have no heads at all. Explicit effect tracking is not academic purity worship. It is engineering discipline for an era when the codebase is larger than any human can read and the authors include entities that optimize for plausibility rather than correctness. The type signature is the contract. The contract is the only thing we can trust.

IO Basics

The IO monad represents computations that can perform side effects. All effectful operations live inside IO, and you sequence them using do notation. The “Hello, World!” of functional programming is not a string; it is IO Unit, a description of an action that, when run, will emit bytes to a file descriptor and return nothing interesting. The Unit return type is honest about its triviality: we did not call this function for its result but for its consequences.

def greet : IO Unit := do
  IO.println "What is your name?"
  let stdin ← IO.getStdin
  let name ← stdin.getLine
  IO.println s!"Hello, {name.trim}!"

def printNumbers : IO Unit := do
  for i in [1, 2, 3, 4, 5] do
    IO.println s!"Number: {i}"

def getCurrentTime : IO Unit := do
  let now ← IO.monoMsNow
  IO.println s!"Milliseconds since start: {now}"

Pure Computations in IO

Not everything in IO needs to be effectful. Pure computations can be lifted into IO when you need to mix them with effects. The pure function wraps a value in an IO that does nothing but return that value. This is not contamination; it is embedding. A pure value inside IO is like a diplomat with immunity: present in foreign territory but untouched by local laws. You can always extract the pure part because it never actually did anything.

def pureComputation : IO Nat := do
  let x := 10
  let y := 20
  return x + y

#eval pureComputation  -- 30

def combineIO : IO String := do
  let a ← pure "Hello"
  let b ← pure "World"
  return s!"{a} {b}"

#eval combineIO  -- "Hello World"

File Operations

File operations live in the IO.FS namespace: read, write, append, delete, the usual POSIX inheritance. Files are where programs meet persistence, where the ephemeral computation leaves a mark that outlasts the process. The filesystem is the boundary between the volatile and the durable, between the RAM that vanishes when power fails and the spinning rust or flash cells that remember. Every configuration file, every log, every database is ultimately bytes on a filesystem, and the operations here are how you reach them.

def writeToFile (path : String) (content : String) : IO Unit := do
  IO.FS.writeFile path content

def readFromFile (path : String) : IO String := do
  IO.FS.readFile path

def appendToFile (path : String) (content : String) : IO Unit := do
  let handle ← IO.FS.Handle.mk path .append
  handle.putStrLn content

def copyFile (src dst : String) : IO Unit := do
  let content ← IO.FS.readFile src
  IO.FS.writeFile dst content

Processing Lines

Reading a file line by line is the fundamental pattern of Unix text processing, the reason awk and sed exist, the shape of a thousand shell pipelines. The newline character is civilization’s oldest API, a contract that dates to the teletype and has outlived every attempt to replace it. In Lean, you get the same streaming capability with type safety and without the quoting nightmares that have spawned more security vulnerabilities than any other single source.

def processLines (path : String) : IO (List String) := do
  let content ← IO.FS.readFile path
  return content.splitOn "\n"

def countLines (path : String) : IO Nat := do
  let lines ← processLines path
  return lines.length

def filterLines' (path : String) (pred : String → Bool) : IO (List String) := do
  let lines ← processLines path
  return lines.filter pred

Error Handling

IO operations can fail. Files go missing, networks drop packets, disks fill up, permissions change, the NFS mount goes stale, the USB drive gets yanked mid-write. The universe is hostile to running programs, and error handling is how we cope with entropy’s preference for chaos. Like a bear preparing for winter, a robust program anticipates scarcity: the resource that was there yesterday may not be there tomorrow. Lean provides try-catch blocks that should feel familiar to anyone who has written Java, Python, or JavaScript, except that here the error handling is explicit in the type. An IO action either succeeds or throws, and try/catch is how you intercept the failure before it propagates to somewhere you cannot recover.

def safeDivideIO (x y : Nat) : IO Nat := do
  if y == 0 then
    throw <| IO.userError "Division by zero"
  return x / y

def trySafeDivide : IO Unit := do
  try
    let result ← safeDivideIO 10 0
    IO.println s!"Result: {result}"
  catch e =>
    IO.println s!"Error: {e}"

def withDefault' {α : Type} (action : IO α) (default : α) : IO α := do
  try
    action
  catch _ =>
    return default

Mutable References

Sometimes you need a mutable cell, a box you can read from and write to as the computation proceeds. IO.Ref provides this: a reference that lives in IO, supporting get, set, and modify. This is controlled mutation, explicit in the type system, not the invisible aliasing that makes imperative code a maze of spooky action at a distance. The reference is a value. You can see who has access to it. You can trace where it flows. The mutation is real, but the accounting is honest.

def counterExample : IO Nat := do
  let counter ← IO.mkRef 0
  for _ in List.range 10 do
    counter.modify (· + 1)
  counter.get

#eval counterExample  -- 10

def accumulate (values : List Nat) : IO Nat := do
  let sum ← IO.mkRef 0
  for v in values do
    sum.modify (· + v)
  sum.get

#eval accumulate [1, 2, 3, 4, 5]  -- 15

Monad Transformers: ExceptT

Real programs layer effects. You want IO for the filesystem, but you also want typed errors that propagate cleanly. Monad transformers stack these capabilities like geological strata, each layer adding something to the computation below. ExceptT wraps another monad and adds the ability to short-circuit with a typed error. The result is a computation that can both perform IO and fail with a specific error type, the signature making both capabilities explicit. The layering is architectural, not just convenient. The type tells you exactly what can go wrong and what side effects might occur.

abbrev AppM := ExceptT String IO

def validatePositive (n : Int) : AppM Int := do
  if n <= 0 then throw "Must be positive"
  return n

def validateRange (n : Int) (lo hi : Int) : AppM Int := do
  if n < lo || n > hi then throw s!"Must be between {lo} and {hi}"
  return n

def processNumber : AppM Int := do
  let n ← validatePositive 42
  let m ← validateRange n 0 100
  return m * 2

def runApp : IO Unit := do
  match ← processNumber.run with
  | .ok result => IO.println s!"Success: {result}"
  | .error msg => IO.println s!"Failed: {msg}"

Monad Transformers: ReaderT

ReaderT provides access to a read-only environment, an implicit parameter that threads through your computation without cluttering every function signature. This is the dependency injection pattern done right: instead of global variables that any code can mutate, you have a typed environment that flows lexically through your program. Configuration, database connections, logger handles, the parameters that everything needs but nothing should own. The environment is read-only, which means you can reason about it. No function can secretly change the configuration and break something downstream.

structure Config where
  verbose : Bool
  maxRetries : Nat
  timeout : Nat
  deriving Repr

abbrev ConfigM := ReaderT Config IO

def getVerbose : ConfigM Bool := do
  let cfg ← read
  return cfg.verbose

def logIfVerbose (msg : String) : ConfigM Unit := do
  if ← getVerbose then
    IO.println s!"[LOG] {msg}"

def runWithConfig : IO Unit := do
  let config : Config := { verbose := true, maxRetries := 3, timeout := 5000 }
  (logIfVerbose "Starting process").run config

Running External Processes

Lean can spawn external processes and capture their output, bridging the gap between the typed world inside your program and the untyped chaos of shell commands. This is where you call out to git, invoke compilers, run linters, orchestrate builds. The interface is necessarily stringly-typed at the boundary, but you can wrap it in whatever validation your domain requires. Just remember that every subprocess is a portal to undefined behavior: the external program can do anything, and your types cannot save you from what happens on the other side.

def runCommand (cmd : String) (args : List String) : IO String := do
  let output ← IO.Process.output {
    cmd := cmd
    args := args.toArray
  }
  return output.stdout

def shellExample : IO Unit := do
  let result ← runCommand "echo" ["Hello", "World"]
  IO.println result

Environment Variables

Environment variables are the original configuration mechanism, a key-value store that predates databases by decades. They are inherited from parent processes, invisible in your source code, and different on every machine. This is both their power and their peril. PATH, HOME, USER, the secret sauce that makes your program behave differently in development and production. Access them through IO because reading them is an effect: the same code will behave differently depending on what the shell exported before launch.

def getEnvVar (name : String) : IO (Option String) := do
  IO.getEnv name

def printPath : IO Unit := do
  match ← getEnvVar "PATH" with
  | some path => IO.println s!"PATH: {path}"
  | none => IO.println "PATH not set"

def getCwd' : IO System.FilePath := do
  IO.currentDir

The Discipline of Effects

The history of programming is a history of discovering what we should not have been allowed to do. Unrestricted goto gave us spaghetti code; we constrained control flow with structured programming. Unrestricted memory access gave us buffer overflows; we constrained pointers with ownership and garbage collection. Unrestricted side effects gave us programs impossible to test, impossible to parallelize, impossible to reason about. We are still learning to constrain effects.

Every constraint is a gift. When the type system refuses to let you perform IO in a pure function, it is saving you from bugs that would surface only in production, only under load, only on the third Tuesday of months with an R in them. The constraint is leverage.

The IO monad is a template for how to think about the boundary between your program and the world. The world is messy, stateful, concurrent, and hostile. Your program can be clean, pure, sequential, and predictable. The boundary between them should be narrow, explicit, and scrutinized. That is engineering discipline for systems that must work when no one is watching.

Proofs

You have written functions. You have defined types. You have pattern matched, recursed, and composed. But you have not yet proved anything.

The difference matters. When you write a function, the compiler checks that types align, but it does not verify that your code does what you claim. You say this function sorts? The compiler shrugs. In theorem proving, you make claims and then justify them. The compiler verifies that your justification actually establishes the claim. You cannot bluff your way through a proof.

A bear learns to fish by watching the stream, understanding where salmon pause, developing patience for the moment when motion becomes certainty. Proving is similar. You learn to read the goal state, understand where progress stalls, develop patience for the tactic that transforms confusion into clarity.

Notation

Before we write our first proof, we need a shared language. The notation below bridges three worlds: the mathematical symbols you find in logic textbooks, the inference rules used in programming language theory (as in Pierce’s Types and Programming Languages and Harper’s Practical Foundations for Programming Languages), and the Lean syntax you will write. Learning to read all three simultaneously is the key to fluency.

SymbolNameMeaning
\(\vdash\)turnstile“proves” or “entails”
\(\Gamma\)Gammathe context (hypotheses we can use)
\(\to\)arrowimplication or function type
\(\forall\)for alluniversal quantification
\(\exists\)existsexistential quantification
\(\land\)andconjunction
\(\lor\)ordisjunction
\(\top\)toptruth (trivially provable)
\(\bot\)bottomfalsehood (unprovable)

A judgment \(\Gamma \vdash P\) reads “from context \(\Gamma\), we can prove \(P\).” An inference rule shows how to derive new judgments from existing ones:

\[ \frac{\Gamma \vdash P \quad \Gamma \vdash Q}{\Gamma \vdash P \land Q} \text{(∧-intro)} \]

This rule says: if you can prove \(P\) and you can prove \(Q\), then you can prove \(P \land Q\). The premises sit above the line; the conclusion below. The name on the right identifies the rule. Every tactic you learn corresponds to one or more such rules. The tactic is the mechanism; the rule is the justification.

Tactics as Proof-State Transformers

You may have repressed the trauma of high school algebra, but the core idea was sound: you start with \(2x + 5 = 11\) and apply operations until you reach \(x = 3\). Subtract 5, divide by 2, each step transforming the equation into something simpler. The tedium was doing it by hand, error-prone and joyless. But the method itself, symbolic manipulation through mechanical transformation, turns out to be extraordinarily powerful when the machine handles the bookkeeping.

Tactics work the same way. You start with a goal (what you want to prove) and a context (what you already know). Each tactic transforms the goal into simpler subgoals. You keep applying tactics until no goals remain. The proof is the sequence of transformations, not a single flash of insight.

Think of it as a game. Your current position is the proof state: the facts you hold and the destination you seek. Each tactic is a legal move that changes your position. Some moves split one goal into two (like constructor creating two subgoals). Some moves close a goal entirely (like rfl finishing with a checkmate). You win when the board is empty.

Formally, a proof state is a judgment \(\Gamma \vdash G\): context \(\Gamma\), goal \(G\). A tactic transforms one proof state into zero or more new proof states. When no goals remain, the proof is complete. This table is your Rosetta Stone:

TacticBeforeAfterRule
intro h\(\Gamma \vdash P \to Q\)\(\Gamma, h:P \vdash Q\)\(\to\)-intro
apply f\(\Gamma \vdash Q\)\(\Gamma \vdash P\)\(\to\)-elim (given \(f : P \to Q\))
exact h\(\Gamma, h:P \vdash P\)\(\square\)assumption
rfl\(\Gamma \vdash t = t\)\(\square\)refl
constructor\(\Gamma \vdash P \land Q\)\(\Gamma \vdash P\), \(\Gamma \vdash Q\)\(\land\)-intro
left\(\Gamma \vdash P \lor Q\)\(\Gamma \vdash P\)\(\lor\)-intro₁
right\(\Gamma \vdash P \lor Q\)\(\Gamma \vdash Q\)\(\lor\)-intro₂
cases h\(\Gamma, h:P \lor Q \vdash R\)\(\Gamma, h:P \vdash R\), \(\Gamma, h:Q \vdash R\)\(\lor\)-elim
induction n\(\Gamma \vdash \forall n., P(n)\)\(\Gamma \vdash P(0)\), \(\Gamma, ih:P(k) \vdash P(k{+}1)\)Nat-ind
rw [h]\(\Gamma, h: a=b \vdash P[a]\)\(\Gamma, h:a=b \vdash P[b]\)subst
simp\(\Gamma \vdash G\)\(\Gamma \vdash G’\)rewrite*
contradiction\(\Gamma, h:\bot \vdash P\)\(\square\)\(\bot\)-elim

The symbol \(\square\) marks a completed goal. Multiple goals after “After” mean the tactic created subgoals. Read left to right: you have the state on the left, you apply the tactic, you must now prove everything on the right. This is the algebra of proof. Each tactic is a function from proof states to proof states, and a complete proof is a composition that maps your theorem to \(\square\).

If the table above looks like both logic and programming, that is not a coincidence.

Proving vs Programming

The surprising insight is that proving and programming are the same activity viewed differently. A proof is a program. A theorem is a type. When you prove \(P \to Q\), you are writing a function that transforms evidence for \(P\) into evidence for \(Q\). This correspondence, the Curry-Howard isomorphism, means that logic and computation are two views of the same underlying structure:

LogicProgramming
propositiontype
proofprogram
\(P \to Q\)function from P to Q
\(P \land Q\)pair (P, Q)
\(P \lor Q\)either P or Q
\(\top\)unit type
\(\bot\)empty type

Every function you have written so far was secretly a proof. Every proof you write from now on is secretly a program. Two cultures, mathematicians and programmers, spoke the same language for decades without knowing it.

Your First Proof

Let us prove something undeniably true: one plus one equals two.

-- Your very first proof: 1 + 1 = 2
theorem one_plus_one : 1 + 1 = 2 := by
  rfl

-- Without tactics, just a direct proof term
theorem one_plus_one' : 1 + 1 = 2 := rfl

Whitehead and Russell famously required 362 pages of Principia Mathematica before reaching this result. We have done it in three characters. This is not because we are cleverer than Russell; it is because we inherited infrastructure. The Principia was an attempt to place all of mathematics on rigorous foundations, to banish the intuition and hand-waving that had allowed paradoxes to creep into set theory. It was a heroic, doomed effort: the notation was unreadable, the proofs were uncheckable by any human in finite time, and Gödel would soon prove that the program could never fully succeed. But the ambition was right. The ambition was to make mathematics a science of proof rather than a craft of persuasion.

A century later, the ambition survives in different form. We do not write proofs in Russell’s notation; we write them in languages that machines can check. The 362 pages compress to three characters not because the mathematics got simpler but because the verification got automated. What mathematicians have been writing all along was pseudocode: informal instructions meant for human execution, full of implicit steps and assumed context, correct only if the reader filled in the gaps charitably. We are finally compiling that pseudocode.

The keyword by enters tactic mode. Instead of writing a proof term directly, you give commands that build the proof incrementally. The tactic rfl (reflexivity) says “both sides of this equation compute to the same value, so they are equal.” Lean evaluates 1 + 1, gets 2, sees that 2 = 2, and accepts the proof. No faith required. No appeals to authority. The machine checked, and the machine does not lie.

Or does it? Ken Thompson’s Reflections on Trusting Trust demonstrated that a compiler can be trojaned to insert backdoors into code it compiles, including into future versions of itself. Turtles all the way down. At some point you trust the hardware, the firmware, the operating system, the compiler that compiled your proof assistant. We choose to stop the regress somewhere, not because the regress ends but because we must act in the world despite uncertainty. This is the stoic’s bargain: do the work carefully, verify what can be verified, and accept that perfection is not on offer. The alternative is paralysis, and paralysis builds nothing.

The Goal State

When you write proofs in Lean, the editor shows you the current goal state. This is your map, your honest accounting of where you stand. Unlike tests that can pass while bugs lurk, unlike documentation that drifts from reality, the goal state cannot lie. It tells you exactly what you have (hypotheses) and exactly what you need to prove (the goal). The gap between aspiration and achievement is always visible.

-- Demonstrating how the goal state changes
theorem add_zero (n : Nat) : n + 0 = n := by
  rfl

theorem zero_add (n : Nat) : 0 + n = n := by
  simp

When you place your cursor after by in add_zero, you see:

n : Nat
⊢ n + 0 = n

The line n : Nat is your context: the facts you know, the tools you have. The symbol (turnstile) separates what you have from what you need. The goal n + 0 = n is your obligation. After applying rfl, the goal disappears. No goals means the proof is complete. You have caught your fish.

Reflexivity: rfl

The rfl tactic proves goals of the form \(a = a\) where both sides are definitionally equal. In inference rule notation:

\[ \frac{}{\Gamma \vdash a = a} \text{(refl)} \]

No premises above the line means the rule is an axiom: equality is reflexive, always, unconditionally. “Definitionally equal” means Lean can compute both sides to the same value without any lemmas. This is equality by computation, the most basic form of truth: run the program on both sides and see if you get the same answer.

-- rfl works when both sides compute to the same value
theorem two_times_three : 2 * 3 = 6 := by rfl

theorem list_length : [1, 2, 3].length = 3 := by rfl

theorem string_append : "hello " ++ "world" = "hello world" := by rfl

theorem bool_and : true && false = false := by rfl

def double (n : Nat) : Nat := n + n

theorem double_two : double 2 = 4 := by rfl

When rfl works, it means the equality is “obvious” to Lean’s computation engine. When it fails, you need other tactics to transform the goal into something rfl can handle.

Triviality: trivial

The trivial tactic handles goals that are straightforwardly true. It combines several simple tactics and works well for basic logical facts.

theorem true_is_true : True := by
  trivial

theorem one_le_two : 1 ≤ 2 := by
  trivial

theorem and_true : True ∧ True := by
  trivial

Simplification: simp

The simp tactic is your workhorse. It applies a database of hundreds of rewrite rules, accumulated over years by the mathlib community, to simplify the goal. This is collective knowledge made executable: every time someone proved that x + 0 = x or list.reverse.reverse = list, they added to the arsenal that simp deploys on your behalf.

theorem add_zero_simp (n : Nat) : n + 0 = n := by
  simp

theorem zero_add_simp (n : Nat) : 0 + n = n := by
  simp

theorem silly_arithmetic : (1 + 0) * (2 + 0) + 0 = 2 := by
  simp

theorem list_append_nil {α : Type*} (xs : List α) : xs ++ [] = xs := by
  simp

theorem use_hypothesis (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
  simp [h]

When simp alone does not suffice, you can give it additional lemmas: simp [lemma1, lemma2]. You can also tell it to use hypotheses from your context: simp [h].

Tip

When stuck, try simp first. It solves a surprising number of goals. If it does not solve the goal completely, look at what remains.

Introducing Assumptions: intro

When your goal is an implication \(P \to Q\), you assume \(P\) and prove \(Q\). This is the introduction rule for implication:

\[ \frac{\Gamma, P \vdash Q}{\Gamma \vdash P \to Q} \text{(→-intro)} \]

The comma means “extended context”: if you can prove \(Q\) assuming \(P\), then you can prove \(P \to Q\) outright. The intro tactic performs this transformation, moving the antecedent from goal to hypothesis.

-- intro: move assumptions from goal into context
theorem imp_self (P : Prop) : P → P := by
  intro hp
  exact hp

-- apply: use a lemma whose conclusion matches the goal
theorem imp_trans (P Q R : Prop) : (P → Q) → (Q → R) → P → R := by
  intro hpq hqr hp
  apply hqr
  apply hpq
  exact hp

-- intro with universal quantifiers
theorem forall_self (P : Nat → Prop) : (∀ n, P n) → (∀ n, P n) := by
  intro h n
  exact h n

After intro hp, the goal changes from P → P to just P, and you gain hypothesis hp : P. Multiple intro commands can be combined: intro h1 h2 h3 introduces three assumptions at once.

Applying Lemmas: apply and exact

The apply tactic uses the elimination rule for implication, also called modus ponens:

\[ \frac{\Gamma \vdash P \to Q \quad \Gamma \vdash P}{\Gamma \vdash Q} \text{(→-elim)} \]

If you have a proof of \(P \to Q\) and a proof of \(P\), you can derive \(Q\). When your goal is \(Q\) and you apply a hypothesis \(h : P \to Q\), Lean transforms your goal to \(P\). The exact tactic says “this term is exactly what we need” and closes the goal.

In the imp_trans proof, apply hqr transforms the goal from R to Q, because hqr : Q → R. Then apply hpq transforms Q to P. Finally exact hp provides the P we need. Each step is modus ponens, chained backward.

Intermediate Steps: have

Sometimes you want to prove a helper fact before using it. The have tactic introduces a new hypothesis with its own proof. This is how knowledge accumulates: you establish a stepping stone, name it, and build on it.

-- have: introduce intermediate results
theorem have_demo (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  have step : a = b := h1
  rw [step, h2]

theorem sum_example (n : Nat) : n + n = 2 * n := by
  have h : 2 * n = n + n := by ring
  rw [h]

The pattern have name : type := proof adds name : type to your context.

Case Analysis: cases

When you have a value of an inductive type, cases splits the proof into one case per constructor. This is exhaustive reasoning: you consider every possible form the value could take, and you prove your claim holds in each. The compiler ensures you miss nothing. This is how careful decisions should be made: enumerate the possibilities, handle each one, leave no branch unexamined.

theorem bool_cases (b : Bool) : b = true ∨ b = false := by
  cases b with
  | true => left; rfl
  | false => right; rfl

theorem nat_zero_or_succ (n : Nat) : n = 0 ∨ n ≥ 1 := by
  cases n with
  | zero => left; rfl
  | succ m => right; simp

theorem option_destruct (o : Option Nat) : o = none ∨ ∃ n, o = some n := by
  cases o with
  | none => left; rfl
  | some n => right; exact ⟨n, rfl⟩

For Bool, there are two cases: true and false. For Nat, there are two cases: zero and succ m. For Option, there are none and some n.

Induction

For properties of natural numbers, mathematical induction is the fundamental principle:

\[ \frac{\Gamma \vdash P(0) \quad \Gamma, P(n) \vdash P(n+1)}{\Gamma \vdash \forall n., P(n)} \text{(Nat-ind)} \]

Prove the base case \(P(0)\). Then prove the inductive step: assuming \(P(n)\), show \(P(n+1)\). From these two finite proofs, you derive a statement about infinitely many numbers. The induction tactic generates both proof obligations automatically. The principle dates to Pascal and Fermat, but the mechanization is new.

theorem sum_twice (n : Nat) : n + n = 2 * n := by
  induction n with
  | zero => rfl
  | succ n ih => omega

theorem length_append {α : Type*} (xs ys : List α) :
    (xs ++ ys).length = xs.length + ys.length := by
  induction xs with
  | nil => simp
  | cons x xs ih => simp [ih]; ring

In the succ case, you get an induction hypothesis ih that assumes the property holds for n, and you must prove it holds for n + 1.

Arithmetic: omega

For goals involving linear arithmetic over natural numbers or integers, omega is powerful. It implements a decision procedure for Presburger arithmetic, a fragment of number theory that is provably decidable. Within its domain, omega does not search or guess; it decides.

theorem omega_simple (n : Nat) (h : n < 10) : n < 100 := by
  omega

theorem omega_transitive (a b c : Int) (h1 : a < b) (h2 : b < c) : a < c := by
  omega

theorem omega_sum (x y : Nat) (h : x + y = 10) : x ≤ 10 := by
  omega

If your goal involves only addition, subtraction, multiplication by constants, and comparisons, try omega.

Decision Procedures: decide

For decidable propositions, decide simply computes the answer. Is 7 less than 10? Run the comparison. Is this list empty? Check. Some questions have algorithms that answer them definitively, and decide invokes those algorithms. When it works, there is nothing to prove; the computation is the proof.

theorem three_lt_five : (3 : Nat) < 5 := by
  decide

theorem bool_compute : (true && false) = false := by
  decide

theorem list_membership : 3 ∈ [1, 2, 3, 4, 5] := by
  decide

theorem fin_in_bounds : (2 : Fin 5).val < 5 := by
  decide

Putting It Together

Real proofs combine multiple tactics. You introduce assumptions, simplify, split cases, apply lemmas, and close with computation. The art is knowing which tool fits which moment. With practice, patterns emerge: implications call for intro, equalities for rw or simp, inductive types for cases or induction. The goal state guides you.

theorem worked_example (n : Nat) : n + 0 = 0 + n := by
  simp

theorem worked_example2 (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
  rw [h]
theorem combined_proof (n : Nat) (h : n > 0) : n - 1 + 1 = n := by
  omega

theorem list_nonempty (xs : List Nat) (h : xs ≠ []) : xs.length > 0 := by
  cases xs with
  | nil => contradiction
  | cons x xs' => simp

The Tactics You Need

TacticPurpose
rflProve a = a when both sides compute to the same value
trivialProve obviously true goals
simpSimplify using rewrite rules
introAssume hypotheses from implications and universals
applyUse a lemma whose conclusion matches the goal
exactProvide exactly the term needed
haveIntroduce intermediate results
casesSplit on constructors of inductive types
inductionProve by induction on recursive types
omegaSolve linear arithmetic
decideCompute decidable propositions
rwRewrite using an equality

These twelve tactics will carry you through most of what follows.

Exercises

The best way to learn tactics is to use them. These exercises progress from straightforward applications of single tactics to combinations that require reading the goal state carefully.

-- Exercise 1: Use rfl to prove this computation
theorem ex_rfl : 3 * 4 = 12 := by
  rfl

-- Exercise 2: Use simp to simplify this expression
theorem ex_simp (n : Nat) : n * 1 + 0 = n := by
  simp

-- Exercise 3: Use intro and exact
theorem ex_intro_exact (P Q : Prop) (h : P) (hpq : P → Q) : Q := by
  exact hpq h

-- Exercise 4: Use cases to prove this about booleans
theorem ex_bool_not_not (b : Bool) : !!b = b := by
  cases b <;> rfl

-- Exercise 5: Use induction to prove addition is commutative
theorem ex_add_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => omega

-- Exercise 6: Use omega to prove this inequality
theorem ex_omega (x y : Nat) (h1 : x ≤ 5) (h2 : y ≤ 3) : x + y ≤ 8 := by
  omega

-- Exercise 7: Combine multiple tactics
theorem ex_combined (xs : List Nat) : ([] ++ xs).length = xs.length := by
  simp

-- Exercise 8: Prove implication transitivity
theorem ex_imp_chain (A B C D : Prop) : (A → B) → (B → C) → (C → D) → A → D := by
  intro hab hbc hcd ha
  exact hcd (hbc (hab ha))

-- Exercise 9: Use cases on a natural number
theorem ex_nat_lt (n : Nat) : n = 0 ∨ 0 < n := by
  cases n with
  | zero => left; rfl
  | succ m => right; omega

-- Exercise 10: Prove a property about list reversal
theorem ex_reverse_nil : ([] : List Nat).reverse = [] := by
  rfl

If you get stuck, ask yourself: what is the shape of my goal? What tactic handles that shape? What hypotheses do I have available? The Infoview is your guide.

What Comes Next

You can now prove things. The proofs have been simple, but the mental model is in place. You understand goals, hypotheses, and the tactic dance that connects them. Next we introduce type theory and dependent types, the language for stating claims worth proving.

Project: The Liar’s Trap

Try to prove something false:

-- Try to prove something false. Every tactic will fail.
theorem liar : 0 = 1 := by
  sorry  -- Try: rfl, simp, omega, decide. Nothing works.

-- The goal state shows: ⊢ 0 = 1
-- This goal is unprovable because it is false.

Every tactic fails. rfl cannot make 0 equal 1. simp finds nothing to simplify. omega knows arithmetic and refuses. decide computes the answer and it is false. The goal state sits there, immovable: ⊢ 0 = 1. You can stare at it, curse at it, try increasingly desperate combinations. Nothing works because nothing can work. The machine will not let you prove a falsehood.

This is the point. The compiler is not your collaborator; it is your adversary. It checks every step and rejects handwaving. When someone tells you their code is correct, you can ask: does it typecheck? When someone tells you their proof is valid, you can ask: did the machine accept it? The answers are not always the same, but when they are, you know something real.

Project: De Morgan’s Little Theorem

Augustus De Morgan formalized the laws that bear his name in the 1850s: the negation of a conjunction is the disjunction of negations, and vice versa. Every programmer knows these laws intuitively from boolean expressions. Let us prove one.

-- De Morgan's Law: ¬(P ∧ Q) → (¬P ∨ ¬Q)
theorem demorgan (P Q : Prop) (h : ¬(P ∧ Q)) : ¬P ∨ ¬Q := by
  by_cases hp : P
  · -- Case: P is true
    right
    intro hq
    apply h
    constructor
    · exact hp
    · exact hq
  · -- Case: P is false
    left
    exact hp

The proof proceeds by case analysis. We have h : ¬(P ∧ Q), a proof that P ∧ Q is false. We must show ¬P ∨ ¬Q. The by_cases tactic splits on whether P holds:

  • If P is true (call this hp), we go right and prove ¬Q. Why? If Q were true, then P ∧ Q would be true, contradicting h. So ¬Q.
  • If P is false (call this hnp), we go left and prove ¬P directly. We have it: hnp.

Each branch uses tactics from this article: intro, apply, exact, left, right, constructor. The contradiction tactic spots when hypotheses conflict. Read the proof slowly, watch the goal state at each step, and trace how the logical structure maps to the tactic sequence. This is the texture of real mathematics: case splits, contradictions, and the steady narrowing of possibilities until only truth remains.

De Morgan died in 1871. His laws persist in every boolean expression, every logic gate, every conditional branch. The proofs you write today will outlast you the same way. There is comfort in that. The work endures because the truths it captures are timeless. You are not just writing code that might be obsolete next year. You are building something permanent.

Type Theory

Humans classify. We sort animals into species, books into genres, people into roles, and programmers into those who have mass-assigned any to silence the compiler and those who are lying about it. Classification is how finite minds manage infinite variety. Types are classification for computation: every value belongs to a type, and the type determines what operations make sense. You can add numbers but not strings. You can take the length of a list but not the length of an integer. The type system enforces these distinctions before your program runs, which sounds obvious until you remember that most of the world’s software runs on languages where the type system’s considered opinion is “looks plausible” right up until production catches fire.

This seems pedestrian until you push it. What if types could say not just “this is a list” but “this is a list of exactly five elements”? What if they could say not just “this function returns an integer” but “this function returns a positive integer”? What if the type of a function could express its complete specification, so that any implementation with that type is correct by construction?

Dependent type theory answers yes to all of these. It is the most expressive type system in common use, and it blurs the line between programming and mathematics. A type becomes a proposition. A program becomes a proof. The compiler becomes a theorem checker. This is not metaphor; it is the Curry-Howard correspondence that we met in the previous article, now unleashed to its full power.

The Ladder of Expressiveness

Type systems form a ladder. Each rung lets you say more.

Simple types (C, Java): Values have types. int, string, bool. You cannot add a string to an integer. This catches typos and category errors, but nothing deeper.

Polymorphic types (Haskell, OCaml): Types can be parameterized. List α works for any element type. You write one reverse function that works on lists of integers, strings, or custom objects. The type ∀ α. List α → List α says “for any type α, give me a list of α and I’ll return a list of α.”

Dependent types (Lean, Coq, Agda): Types can depend on values. Vector α n is a list of exactly n elements. The number n is a value that appears in the type. Now the type system can express array bounds, matrix dimensions, protocol states, and any property you can state precisely.

The jump from polymorphic to dependent types is where things get interesting. Consider matrix multiplication. Two matrices can only be multiplied if the columns of the first equal the rows of the second. With dependent types:

Matrix m n → Matrix n p → Matrix m p

The shared n enforces compatibility at compile time. Multiply a 3×4 by a 5×2? Type error. The bug is caught before any code runs. Your linear algebra homework now has compile errors, which is somehow both better and worse.

Types as First-Class Citizens

In simple type systems, types and values live in separate worlds. You cannot write a function that takes a type as an argument or returns a type as a result. The wall between them is absolute.

Dependent types tear down this wall. Types become values. You can compute with them, pass them to functions, store them in data structures. The function that constructs Vector Int n takes a number n and returns a type. This uniformity is what makes the whole system work: if types can depend on values, then types must be values.

The theoretical foundations trace through the 20th century: Russell’s type distinctions to resolve paradoxes, Church’s simply typed lambda calculus, Martin-Löf’s intuitionistic type theory that unified logic and computation. Lean implements a refinement called the Calculus of Inductive Constructions, which adds inductive types and a hierarchy of universes to keep everything consistent.

The practical experience differs from conventional programming. Types become more informative but also more demanding. You must often provide proofs alongside your code, demonstrating that values satisfy required properties. The compiler becomes an adversary that checks your reasoning at every step, as we saw with tactics. When a program type-checks, you gain strong guarantees about its behavior. When it fails, the error messages guide you toward the gap between what you claimed and what you proved.

Universes and Russell’s Paradox

When you write universe u v w in Lean, you are declaring universe level variables. If that sentence meant nothing to you, good - it shouldn’t yet. But why do universes exist at all? The answer involves one of the most famous disasters in the history of mathematics, a polite letter that ended a career, and the dawning realization that self-reference is the serpent in every formal garden.

In 1901, Bertrand Russell sent a letter to Gottlob Frege, who had just completed his life’s work: a logical foundation for all of mathematics. Russell’s letter contained a single question. Consider the set R of all sets that do not contain themselves. Does R contain itself? If yes, then by definition it should not. If no, then by definition it should. Frege’s system was inconsistent. His life’s work collapsed. He wrote back: “Hardly anything more unfortunate can befall a scientific writer than to have one of the foundations of his edifice shaken after the work is finished.”

This is the danger of self-reference. A set that asks about its own membership. A sentence that asserts its own falsehood. A type that contains itself. These constructions look innocent but harbor contradictions. Mathematics needed walls to prevent them.

Type theory builds those walls through stratification. Types are organized into a hierarchy of universes. In Lean, Prop sits at Sort 0, Type 0 sits at Sort 1, Type 1 sits at Sort 2, and so on. A type at level n can only mention types at levels below n. The type Type 0 itself has type Type 1, not Type 0. This breaks the self-reference. You cannot ask whether Type contains itself because Type is not a single thing; it is an infinite ladder, and each rung can only see the rungs below.

universe u v w

-- Universe polymorphism (explicit universe level)
def polyIdentity (α : Sort u) (a : α) : α := a

-- Universe level expressions
def maxLevel (α : Type u) (β : Type v) : Type (max u v) := α × β

-- Type 0 contains types, Type 1 contains Type 0, etc.
example : Type 0 = Sort 1 := rfl
example : Prop = Sort 0 := rfl

-- Impredicative Prop: functions into Prop stay in Prop
def propPredicate (P : Type u → Prop) : Prop := ∀ α, P α

-- Predicative Type: function types take maximum universe
def typePredicate (P : Type u → Type v) : Type (max (u+1) v) :=
  ∀ α, P α

The declaration universe u v w introduces universe level variables. When you write def polyIdentity (α : Sort u) (a : α) : α := a, you are defining a function that works at any universe level. The Sort u includes both Prop (when u = 0) and Type n (when u = n + 1). Universe polymorphism lets you write single definitions that work across the entire hierarchy.

Predicativity and Impredicativity

Here is a rule that sounds obvious until you think about it: you cannot be in the photograph you are taking. The photographer stands outside the frame. A committee that selects its own members creates paradoxes of legitimacy. A definition that refers to a collection containing itself is suspect. This intuition, that the definer must stand apart from the defined, is called predicativity.

Imagine a monastery where knowledge is organized into concentric walls. Acolytes in the outer ring may study only texts from their own ring. Scholars who wish to reference the entire outer collection must do so from the second ring, looking inward. Those who would survey the second ring must stand in the third. And so on, each level permitted to see only what lies below. No scholar may cite a collection that includes their own work. The hierarchy prevents the serpent from eating its tail.

This is how predicative universes work. When you quantify over all types at level n, the resulting type lives at level n+1. The definition “for all types α in Type 0, the type α → α” must itself live in Type 1 because it speaks about the entirety of Type 0. You cannot make universal claims about a collection while remaining inside it. The quantification must ascend.

-- Predicative: quantifying over Type u produces Type (u+1)
-- The result must be "larger" than what we quantify over
def predicativeExample : Type 1 := ∀ (α : Type 0), α → α

-- Check the universe levels explicitly
#check (∀ (α : Type 0), α → α : Type 1)  -- Lives in Type 1
#check (∀ (α : Type 1), α → α : Type 2)  -- Lives in Type 2

-- Impredicative: quantifying over Prop still produces Prop
-- Prop can "talk about itself" without ascending universe levels
def impredicativeExample : Prop := ∀ (P : Prop), P → P

#check (∀ (P : Prop), P → P : Prop)  -- Still in Prop, not Type 0!

-- Why this matters: classical logic requires impredicativity
-- "For all propositions P, P or not P" is itself a proposition
def excludedMiddleType : Prop := ∀ (P : Prop), P ∨ ¬P

-- If Prop were predicative, this would be Type 1, breaking classical reasoning
-- Impredicativity lets us define propositions that quantify over all propositions

-- The danger: unrestricted impredicativity leads to paradox
-- Girard's paradox shows Type : Type is inconsistent
-- Lean avoids this by making only Prop impredicative

Lean’s Type hierarchy is predicative: ∀ (α : Type 0), α → α has type Type 1, not Type 0. This prevents Girard’s paradox, a type-theoretic ouroboros that arises when Type : Type. The infinite regress of universes is the price of consistency.

But Prop breaks the rule. Lean’s Prop is impredicative: ∀ (P : Prop), P → P has type Prop, staying at the same level despite quantifying over all propositions. The monastery has a secret inner sanctum where the old restrictions do not apply. How is this safe?

Proof irrelevance is the answer. In Prop, all proofs of the same proposition are equal. You cannot extract computational content from an impredicative definition over propositions because there is nothing to extract; all witnesses are indistinguishable. The dangerous circularity is defanged. The serpent may eat its tail here because the tail has no substance.

This matters for classical logic. The law of excluded middle, ∀ (P : Prop), P ∨ ¬P, quantifies over all propositions. If Prop were predicative, this would live in Type 0, making it a computational object rather than a logical axiom. The monks in the inner sanctum can do things forbidden to those outside because what happens in Prop stays in Prop. The walls are there to protect computation from paradox; proof lives by different rules.

-- PLift: lifts any type by exactly one level
def liftedFalse : Type := PLift False

-- ULift: lifts types by any amount
def liftedNat : Type u := ULift.{u} Nat

-- Lifting and unlifting values
def liftExample : ULift.{1} Nat := ⟨42⟩
example : liftExample.down = 42 := rfl

-- Non-cumulativity: types exist at exactly one level
def needsLifting (α : Type 1) : Type 2 := ULift.{2} α

Non-Cumulativity

In a cumulative type theory, every type at universe level n is automatically also a type at level n+1 and all higher levels. Coq and Idris work this way: if you have Nat : Type 0, you can use Nat anywhere a Type 1 is expected. The type “flows upward” through the hierarchy without explicit intervention. This makes polymorphic code more convenient since you rarely need to think about universe levels.

Lean takes the opposite approach. Each type lives at exactly one universe level. Nat has type Type 0 and only Type 0. If a function expects a Type 1 argument, you cannot pass Nat directly. You must explicitly lift it using ULift or PLift, wrapper types that move values to higher universes.

-- In Lean, Nat has type Type 0, not Type 1
#check (Nat : Type 0)    -- works
-- #check (Nat : Type 1) -- would fail: Nat is not in Type 1

-- A function expecting Type 1 cannot accept Nat directly
def wantsType1 (α : Type 1) : Type 1 := α

-- This fails: Nat lives in Type 0, not Type 1
-- def broken := wantsType1 Nat

-- Solution: explicitly lift Nat to Type 1
def works := wantsType1 (ULift Nat)

-- In a cumulative system (like Coq), this would work:
-- def coqStyle := wantsType1 Nat  -- Coq allows this

-- Practical example: polymorphic container at higher universe
def Container (α : Type 1) := List α

-- Cannot directly use Nat
-- def natContainer := Container Nat  -- fails

-- Must lift first
def natContainer := Container (ULift Nat)

-- Extracting values requires unlifting
def sumLifted (xs : List (ULift Nat)) : Nat :=
  xs.foldl (fun acc x => acc + x.down) 0

Note

This explicit lifting makes universe structure visible in your code. You always know exactly which universe level you are working at. The tradeoff is verbosity: code that would “just work” in Coq requires explicit lifts in Lean. In practice, most Lean code stays within Type 0 and Prop, so non-cumulativity rarely causes friction.

Proof Irrelevance

A bear catching a salmon does not care whether the fish swam upstream via the left channel or the right. Philosophers have spent centuries on equivalent questions about identity and equivalence, producing volumes that the bear would find unpersuasive. Proof irrelevance applies this principle to mathematics: any two proofs of the same proposition are equal. If you have two proofs p1 and p2 that both establish proposition P, then p1 = p2 holds definitionally. We care that the theorem is true, not which path led there. This is either profound or obvious depending on how much time you’ve spent arguing about equality, which is to say, depending on whether you’ve ever debugged a language with more than three equality operators.

-- Basic logical connectives
theorem and_intro (P Q : Prop) (hp : P) (hq : Q) : P ∧ Q := ⟨hp, hq⟩

theorem or_elim (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R :=
  h.elim hp hq

theorem iff_intro (P Q : Prop) (hpq : P → Q) (hqp : Q → P) : P ↔ Q :=
  ⟨hpq, hqp⟩

-- Proof irrelevance demonstration
theorem proof_irrel_demo (P : Prop) (p1 p2 : P) : p1 = p2 := rfl

-- Classical logic (via choice)
open Classical in
theorem excluded_middle (P : Prop) : P ∨ ¬P := Classical.em P

Proof irrelevance has profound computational implications. Because all proofs of a proposition are equal, the compiler can erase proofs at runtime. A function that takes a proof argument does not actually need to receive any runtime data for that argument. This erasure is essential for performance: without it, complex proofs would bloat compiled code with useless proof terms. Your elaborate justification for why the code is correct compiles down to nothing, much like comments but with mathematical guarantees.

The technical foundation is that Prop is a subsingleton universe. A subsingleton is a type with at most one element. For any proposition P, there is at most one proof of P up to definitional equality. This contrasts with Type, where Bool has two distinct elements true and false, and Nat has infinitely many.

Proof irrelevance also enables powerful automation. When a tactic constructs a proof term, the exact structure of that term does not matter. The tactic can use whatever construction is convenient, and the result will be equal to any other proof of the same statement. This freedom simplifies tactic implementation and allows for aggressive optimization of proof search.

Quotients and Parametricity

Quotient types allow you to define new types by identifying elements of an existing type according to an equivalence relation. The integers modulo n, for example, identify natural numbers that have the same remainder when divided by n. Quotients are essential for constructing mathematical objects like rational numbers, real numbers, and algebraic structures.

-- Simple modulo equivalence relation
def ModRel (n : Nat) : Nat → Nat → Prop :=
  fun a b => a % n = b % n

-- Prove it's an equivalence relation
theorem ModRel.refl (n : Nat) : ∀ x, ModRel n x x :=
  fun _ => rfl

theorem ModRel_symm (n : Nat) : ∀ x y, ModRel n x y → ModRel n y x :=
  fun _ _ h => h.symm

theorem ModRel.trans (n : Nat) : ∀ x y z, ModRel n x y → ModRel n y z → ModRel n x z :=
  fun _ _ _ hxy hyz => Eq.trans hxy hyz

-- Create setoid instance
instance ModSetoid (n : Nat) : Setoid Nat where
  r := ModRel n
  iseqv := {
    refl := ModRel.refl n
    symm := @ModRel_symm n
    trans := @ModRel.trans n
  }

-- Define the quotient type (integers modulo n)
def ZMod (n : Nat) : Type := Quotient (ModSetoid n)

However, quotients break parametricity. Parametricity is the principle that polymorphic functions must treat their type arguments uniformly. A function of type ∀ α, α → α can only be the identity function because it has no way to inspect what α is. It must work the same way for Nat, String, and any other type. This uniformity yields powerful “free theorems” about polymorphic functions.

Quotients violate this uniformity through the Quot.lift operation. When you lift a function to operate on a quotient type, you must prove that the function respects the equivalence relation. This proof obligation means that functions on quotients can behave differently depending on the specific equivalence relation, breaking the uniformity that parametricity requires.

namespace ZMod

-- Constructor respecting equivalence
def mk (n : Nat) (a : Nat) : ZMod n :=
  Quotient.mk (ModSetoid n) a

-- Addition operation via lifting (simplified - proper proof would be longer)
def add {n : Nat} [NeZero n] : ZMod n → ZMod n → ZMod n :=
  Quotient.lift₂
    (fun a b => mk n ((a + b) % n))
    (fun a₁ a₂ b₁ b₂ h₁ h₂ => by
      apply Quotient.sound
      simp only [ModSetoid] at h₁ h₂ ⊢
      unfold ModRel at h₁ h₂ ⊢
      rw [Nat.add_mod, h₁, h₂, ← Nat.add_mod]
      rfl)

-- Quotient.sound: related elements are equal
theorem mk_eq_of_rel {n : Nat} (a b : Nat) (h : ModRel n a b) :
  mk n a = mk n b :=
  Quotient.sound h

-- Quotient induction principle
theorem ind_on {n : Nat} {P : ZMod n → Prop} (q : ZMod n)
  (h : ∀ a, P (mk n a)) : P q :=
  Quotient.ind h q

end ZMod

Why is this acceptable? The trade-off is deliberate. Quotients are necessary for mathematics: you cannot construct the integers, rationals, or reals without them. The loss of parametricity is confined to quotient types and does not affect ordinary polymorphic functions. Moreover, the requirement to prove that lifted functions respect equivalence ensures that quotient operations are well-defined. You cannot accidentally distinguish between equivalent elements.

Comparative Type Systems

Different languages make different design choices in their type systems. The following table summarizes key features across proof assistants and programming languages.

FeatureLean 4CoqAgdaIdris 2HaskellRust
Dependent TypesFullFullFullFullLimitedNo
Universe HierarchyPredicativePredicativePredicativePredicativeNoneNone
Universe CumulativityNoYesNoYesN/AN/A
Proof IrrelevanceYes (Prop)Yes (Prop)OptionalYesN/AN/A
Tactic LanguageLean DSLLtacNoElabN/AN/A
Type InferencePartialPartialPartialPartialSorta FullFull
Termination CheckingRequiredRequiredRequiredOptionalNoNo
Linear TypesNoNoNoQTTExtensionOwnership
Effects SystemMonadMonadMonadAlgebraicMonadOwnership
Code GenerationNativeOCaml/HaskellHaskellNativeNativeNative
Cubical Type TheoryNoNoYesNoNoNo
Decidable Type CheckingNoNoNoNoSortaYes

Glossary:

  • Ltac: Coq’s original tactic language, a dynamically-typed scripting language for proof automation
  • QTT: Quantitative Type Theory, tracks how many times each variable is used to enable linear resource management
  • Predicative: A universe is predicative if quantifying over types at level n produces a type at level n+1 or higher
  • Cumulativity: Whether a type at level n is automatically also at level n+1
  • Sorta Full: Haskell has full type inference for base Haskell 2010, but enabling type system extensions (GADTs, TypeFamilies, RankNTypes, etc.) may require type annotations
  • Sorta (Decidable): Haskell 2010 has decidable type checking, but extensions like UndecidableInstances and TypeFamilies can make type checking undecidable or non-terminating

Lean and Coq provide full dependent types with rich proof automation, making them suitable for formal verification. Agda emphasizes explicit proof terms and supports cubical type theory for constructive equality. Idris 2 uses quantitative type theory to track resource usage, bridging the gap between theorem proving and systems programming.

Haskell approaches dependent types through extensions like GADTs, DataKinds, and type families. Base Haskell maintains decidable type checking, but common extensions can introduce undecidability. Rust’s ownership system provides memory safety guarantees through affine types, with decidable checking and predictable compile times.

A common critique of Lean is its lack of linear or affine types, which would enable compile-time guarantees about resource usage and in-place mutation. The Lean developers chose instead to rely on runtime reference counting with FBIP optimizations, trading static linearity guarantees for simpler types and the ability to share data freely without borrow checker complexity.

The table above looks like a feature comparison. It is actually a map of philosophical commitments. Each row is a question about the nature of computation; each column is an answer. The language you choose chooses what thoughts are easy to think.

The fundamental trade-off is expressiveness versus automation. Full dependent types let you express arbitrary properties but require manual proof effort. Decidable type systems like Rust and Haskell infer types automatically but cannot express many important invariants. Choose based on whether you need machine-checked proofs or just strong static guarantees.

In short: Lean and Coq make you prove everything is correct, Rust makes you prove memory is safe, Haskell makes you prove effects are tracked, and most other languages just trust you not to ship on Friday.

Dependent Types

Why bother with all this? The honest answer is that most working programmers will never need dependent types, the same way most drivers will never need to understand engine timing. The dishonest answer is that dependent types will make you more productive. The true answer is somewhere stranger: ordinary type systems cannot express the constraints that actually matter. You can say “this is a list” but not “this is a list of exactly five elements.” You can say “this function returns an integer” but not “this function returns a positive integer smaller than its input.” Every time you write a comment explaining what a function really does, every time you add a runtime check for an invariant the compiler cannot see, every time a bug slips through because the types were not precise enough, you are paying the cost of an insufficiently expressive type system. The comment is a wish. The runtime check is a prayer. Dependent types are a contract.

Dependent types are the solution. They let types talk about values. The type Vector α 5 is not just any list but a list of exactly five elements. The type Fin n is not just any natural number but one that is provably less than n. Array bounds checking happens at compile time. Protocol state machines live in the types. The invariants you used to hope were true become things the compiler verifies.

Per Martin-Löf spent the 1970s in Sweden developing this type theory where types could depend on values, not just on other types. The idea seems simple enough: if List can be parameterized by an element type, why not also by its length? But this small step has profound consequences. Suddenly types become a specification language. A function returning Vector α n does not merely return a list; it returns a list of exactly n elements, verified at compile time. The simply-typed lambda calculus that underlies Haskell and ML stops at the water’s edge here. Dependent types let you wade in, expressing invariants that would otherwise live only in documentation or runtime checks. Lean’s type system, based on the Calculus of Inductive Constructions, provides the full machinery: types that compute, proofs that are programs, and specifications precise enough to replace testing with theorem proving.

Warning

This section is dense and notation-heavy. If the mathematical symbols start to blur together, that is not a personal failing - it is the appropriate response to notation that took logicians fifty years to stabilize and still varies between textbooks. Skip ahead and return later. Or don’t. Some people read this material linearly, some spiral in from the edges, some open to random pages and see what sticks. There is no wrong way to learn type theory except to believe you should have understood it faster. The notation table below is a reference, not a prerequisite.

Notation

ConceptMathematical NotationLean SyntaxDescription
Function type\(\alpha \to \beta\)α → βNon-dependent function from α to β
Dependent function\(\Pi (x : \alpha), \beta(x)\)(x : α) → β xFunction where return type depends on input
For all\(\forall x : \alpha, P(x)\)∀ x : α, P xUniversal quantification
Exists\(\exists x : \alpha, P(x)\)∃ x : α, P xExistential quantification
Lambda abstraction\(\lambda x. t\)fun x => t or λ x => tAnonymous function
Equivalence\(a \equiv b\)N/ADefinitional equality
Conjunction\(P \land Q\)P ∧ QLogical AND
Disjunction\(P \lor Q\)P ∨ QLogical OR
Negation\(\neg P\)¬PLogical NOT
Type universe\(\text{Type}_n\)Type nUniverse of types at level n
Proposition universe\(\text{Prop}\)PropUniverse of propositions
Sort hierarchy\(\text{Sort}_n\)Sort nUnified universe hierarchy
Quotient type\(\alpha/{\sim}\)Quotient sType obtained by quotienting α by relation ∼
Natural numbers\(\mathbb{N}\)Nat or Natural numbers type
Integers\(\mathbb{Z}\)Int or Integer type
Sigma type\(\Sigma (x : \alpha), \beta(x)\)Σ x : α, β xDependent pair type
Product type\(\alpha \times \beta\)α × βCartesian product

Type System Overview

Lean’s type system supports definitional equality through several reduction rules:

  • β-reduction (beta): Function application through substitution - \((\lambda x. t) s \equiv t[s/x]\)
  • δ-reduction (delta): Replacement of defined constants with definitions
  • ι-reduction (iota): Reduction when recursors target constructors
  • ζ-reduction (zeta): Substitution of let-bound variables - \(\text{let } x := s \text{ in } t \equiv t[s/x]\)
example : (fun x => x + 1) 2 = 3 := rfl  -- β-reduction

def myConst := 42
example : myConst = 42 := rfl  -- δ-reduction

example : let x := 5; x + x = 10 := rfl  -- ζ-reduction

Functions

Function types are a built-in feature of Lean. Functions map values from one type (the domain) to another type (the codomain). In Lean’s core language, all function types are dependent - non-dependent function types are just a special case where the parameter doesn’t appear in the codomain.

Non-Dependent vs Dependent Functions

Non-dependent functions have a fixed return type that doesn’t vary based on the input value. These are the only kinds of functions available in languages like Haskell and OCaml:

  • Haskell: not :: Bool -> Bool - always returns a Bool
  • OCaml: let not : bool -> bool - always returns a bool
  • Lean: def not : Bool → Bool - the notation indicates non-dependence

Dependent functions have a return type that can depend on the actual value of the input. The type is written as \(\Pi (x : \alpha), \beta(x)\) or (x : α) → β x in Lean syntax, where the parameter name x appears in the return type β x. This feature has no equivalent in Haskell or OCaml.

Key insight: Dependent functions can return values from completely different types based on their input! This is sometimes called a dependent product because it corresponds to an indexed product of sets.

Tip

The name “dependent product” may seem backwards since we are building functions, not pairs. The terminology comes from set theory: a function \(f : \Pi (x : A), B(x)\) assigns to each \(x \in A\) an element of \(B(x)\), which is precisely an element of the Cartesian product \(\prod_{x \in A} B(x)\). The “product” is over all possible inputs.

Why Dependent Types Matter

Consider this function that cannot be typed in Haskell or OCaml:

def two (b : Bool) : if b then Unit × Unit else String :=
  match b with
  | true => ((), ())    -- Returns a pair when b is true
  | false => "two"      -- Returns a string when b is false

The return type literally changes based on the runtime value:

  • two true has type Unit × Unit
  • two false has type String

This should feel slightly transgressive. A function that returns different types? In most languages, this is either impossible or requires erasing all type information and hoping for the best. Here, the type system tracks it precisely. The function is total, the types are known, the compiler is satisfied. This enables encoding invariants directly in types. For example, Vector α n encodes the length n in the type itself, making it impossible to write functions that violate length constraints. Your off-by-one errors become compile-time errors. The compiler catches at build time what you would otherwise discover in production, at 3am, with the on-call phone ringing.

Typing Rules for Functions

Two rules govern how types flow through function definitions and applications. The first is application: when you apply a function to an argument, the return type can mention the argument itself. If \(f : \Pi (x : \alpha), \beta(x)\) and you apply it to some \(a : \alpha\), the result \(f \, a\) has type \(\beta(a)\). The type of the output depends on the value of the input. This is the essence of dependent typing. A function Vector.head : (n : Nat) → Vector α (n + 1) → α applied to 3 yields a function expecting a Vector α 4. The 3 propagates into the type.

The second rule is abstraction: to construct a function, you assume a variable of the input type and produce a term of the output type. If \(t : \beta\) under the assumption \(x : \alpha\), then \(\lambda x : \alpha. \, t\) has type \(\Pi (x : \alpha), \beta\). The abstraction binds the variable and packages the assumption into the function type. When \(\beta\) does not mention \(x\), this collapses to the familiar non-dependent arrow \(\alpha \to \beta\).

Beyond formation and elimination, functions satisfy eta-reduction: wrapping a function in a lambda that immediately applies it produces the same function. Formally, \(\lambda x. \, f \, x \equiv f\) when \(x\) does not appear free in \(f\). This is not just a simplification rule but a statement about extensionality: a function is determined by what it does to its arguments, not by how it is written

Examples: Dependent and Non-Dependent Functions

-- Non-dependent function: return type is fixed
-- Similar to Haskell: not :: Bool -> Bool
-- or OCaml: let not : bool -> bool
def double (n : Nat) : Nat := n * 2

-- Another non-dependent example (like Haskell's const)
def constantFive (_ : Nat) : Nat := 5

-- Dependent function: return type depends on the input value
-- This has NO equivalent in Haskell or OCaml!
def makeVec (n : Nat) : Fin (n + 1) := ⟨n, Nat.lt_succ_self n⟩

-- A more dramatic dependent function example
-- The return TYPE changes based on the input VALUE
def two (b : Bool) : if b then Unit × Unit else String :=
  match b with
  | true => ((), ())    -- Returns a pair of units
  | false => "two"      -- Returns a string

-- This function returns different TYPES based on input:
-- two true : Unit × Unit
-- two false : String

-- Dependent pairs: the second type depends on the first value
def dependentPair : (n : Nat) × Fin n :=
  ⟨5, 3⟩  -- Second component must be less than first

-- Compare with non-dependent versions:
-- Haskell: (Int, Int) -- no constraint between components
-- OCaml: int * int    -- no constraint between components
-- Lean enforces the invariant in the type!

-- Function composition (non-dependent)
def comp {α β γ : Type} (f : β → γ) (g : α → β) : α → γ :=
  fun x => f (g x)

-- This is like Haskell's (.) or OCaml's (@@)
-- But Lean can also do DEPENDENT composition!

-- Dependent function composition
def depComp {α : Type} {β : α → Type} {γ : (x : α) → β x → Type}
    (f : (x : α) → (y : β x) → γ x y)
    (g : (x : α) → β x) :
    (x : α) → γ x (g x) :=
  fun x => f x (g x)

-- Multiple parameters via currying (named after Haskell B. Curry)
def curriedAdd : Nat → Nat → Nat := fun x y => x + y

-- Function extensionality: equal outputs for equal inputs
theorem funext_example (f g : Nat → Nat) (h : ∀ x, f x = g x) : f = g :=
  funext h

Currying

Currying is the fundamental technique of transforming functions with multiple parameters into sequences of single-parameter functions. Named after logician Haskell Curry (yes, the programming language is also named after him), this approach is automatic in Lean. All multi-parameter functions are internally represented as curried functions. This enables partial application, where supplying fewer arguments than a function expects creates a new function waiting for the remaining arguments.

Note

The technique was actually discovered by Moses Schonfinkel in 1924, six years before Curry’s work. Academic naming conventions are not always fair. Schonfinkel’s life ended in obscurity in a Moscow hospital; Curry became a household name among programmers who have never heard of either.

The power of currying lies in its composability. You can create specialized functions by partially applying general ones, building up complex behavior from simple building blocks. While languages like Haskell make currying explicit, Lean handles it transparently, allowing you to work with multi-parameter functions naturally while still benefiting from the flexibility of curried representations.

/-!
### Currying

Currying is the transformation of functions with multiple parameters into
a sequence of functions, each taking a single parameter. In Lean, all
multi-parameter functions are automatically curried.
-/

-- Multi-parameter function (automatically curried)
def add3 (x y z : Nat) : Nat := x + y + z

-- Equivalent to nested lambdas
def add3' : Nat → Nat → Nat → Nat :=
  fun x => fun y => fun z => x + y + z

-- Partial application creates new functions
def add10 : Nat → Nat → Nat := add3 10
def add10And5 : Nat → Nat := add3 10 5

example : add10 3 7 = 20 := rfl
example : add10And5 2 = 17 := rfl

-- Function.curry: Convert uncurried to curried
def uncurriedAdd : Nat × Nat → Nat := fun p => p.1 + p.2
def curriedVer := Function.curry uncurriedAdd

example : curriedVer 3 4 = 7 := rfl

-- Function.uncurry: Convert curried to uncurried
def addPair := Function.uncurry Nat.add
example : addPair (3, 4) = 7 := rfl

-- Currying with dependent types
def depCurry {α : Type} {β : α → Type} {γ : (a : α) → β a → Type}
    (f : (p : (a : α) × β a) → γ p.1 p.2) :
    (a : α) → (b : β a) → γ a b :=
  fun a b => f ⟨a, b⟩

Function Extensionality

Function extensionality is a fundamental principle stating that two functions are equal if and only if they produce equal outputs for all equal inputs. This principle, while intuitively obvious, is not derivable from the other axioms of dependent type theory and must be added as an axiom in Lean. Without extensionality, we could only prove functions equal if they were syntactically identical - the same symbols in the same order.

The funext tactic in Lean implements this principle, allowing us to prove function equality by considering their behavior pointwise. This is essential for mathematical reasoning, where we often want to show that two different definitions actually describe the same function. The principle extends to dependent functions as well, where the output type can vary with the input.

/-!
### Extensionality

Function extensionality states that two functions are equal if they
produce equal outputs for all inputs. This is not provable from the
other axioms and is added as an axiom in Lean.
-/

-- funext: Basic function extensionality
theorem my_funext {α β : Type} (f g : α → β) :
    (∀ x, f x = g x) → f = g :=
  funext

-- Example: Proving function equality
def double' (n : Nat) : Nat := 2 * n
def double'' (n : Nat) : Nat := n + n

theorem doubles_equal : double' = double'' := by
  funext n
  simp [double', double'']
  omega

-- Dependent function extensionality
theorem dep_funext {α : Type} {β : α → Type}
    (f g : (x : α) → β x) :
    (∀ x, f x = g x) → f = g :=
  funext

-- Eta reduction: λ x, f x = f
theorem eta_reduction (f : Nat → Nat) : (fun x => f x) = f :=
  funext fun _ => rfl

-- Functions equal by behavior, not syntax
def addOne : Nat → Nat := fun x => x + 1
def succFunc : Nat → Nat := Nat.succ

theorem addOne_eq_succ : addOne = succFunc := by
  funext x
  simp [addOne, succFunc]

Totality and Termination

Important

All functions in Lean must be total - defined for every possible input of the correct type. This requirement ensures logical consistency: a function that could fail or loop forever would make Lean’s logic unsound. Partiality is the enemy. The function that hangs on edge cases, the recursion that never terminates, the match that forgot a constructor - these are not just bugs but logical contradictions waiting to invalidate your theorems.

To achieve totality while allowing recursion, Lean uses well-founded recursion based on decreasing measures.

For structural recursion on inductive types, Lean automatically proves termination by observing that recursive calls operate on structurally smaller arguments. For more complex recursion patterns, you can specify custom termination measures using termination_by and provide proofs that these measures decrease with decreasing_by. This approach allows expressing any computable function while maintaining logical soundness. If you have ever written while (true) and hoped for the best, this is the universe collecting on that debt.

/-!
### Totality and Termination

All functions in Lean must be total (defined for all inputs) and
terminating. Lean uses well-founded recursion to ensure termination.
-/

-- Total function: defined for all natural numbers
def safeDivide (n : Nat) (m : Nat) : Nat :=
  if m = 0 then 0 else n / m  -- Returns 0 for division by zero

-- Structural recursion (automatically proven terminating)
def fact : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * fact n

-- Well-founded recursion with explicit termination proof
def gcd (a b : Nat) : Nat :=
  if h : b = 0 then
    a
  else
    have : a % b < b := Nat.mod_lt _ (Nat.pos_of_ne_zero h)
    gcd b (a % b)
termination_by b

-- Mutual recursion with termination
mutual
  def isEvenMut : Nat → Bool
    | 0 => true
    | n + 1 => isOddMut n

  def isOddMut : Nat → Bool
    | 0 => false
    | n + 1 => isEvenMut n
end

-- Using decreasing_by for custom termination proof
def ackermann : Nat → Nat → Nat
  | 0, n => n + 1
  | m + 1, 0 => ackermann m 1
  | m + 1, n + 1 => ackermann m (ackermann (m + 1) n)
termination_by m n => (m, n)

Function API Reference

Lean’s standard library provides a rich collection of function combinators in the Function namespace. These combinators, familiar from functional programming, enable point-free style and function composition. The composition operator allows building complex functions from simpler ones, while combinators like const, flip, and id provide basic building blocks for function manipulation.

Function composition in Lean satisfies the expected mathematical properties: it’s associative, and the identity function acts as a neutral element. These properties are not just theorems but computational facts - they hold definitionally, meaning Lean can verify them by pure computation without requiring proof steps.

/-!
### Function API Reference

Lean provides standard function combinators in the Function namespace.
-/

-- Function.comp: Function composition
def composed := Function.comp (· + 10) (· * 2)
example : composed 5 = 20 := rfl  -- (5 * 2) + 10 = 20

-- Using ∘ notation for composition
open Function in
def composed' := (· + 10) ∘ (· * 2)
example : composed' 5 = 20 := rfl

-- Function.const: Constant function
def alwaysFive := Function.const Nat 5
example : alwaysFive 100 = 5 := rfl
example : alwaysFive 999 = 5 := rfl

-- id: Identity function
example : id 42 = 42 := rfl
example : (id ∘ id) = (id : Nat → Nat) := rfl

-- flip: Swap arguments
def subtract (a b : Nat) : Int := a - b
def flippedSubtract := flip subtract
example : subtract 10 3 = 7 := rfl
example : flippedSubtract 3 10 = 7 := rfl

-- Function composition laws
open Function in
theorem comp_assoc {α β γ δ : Type} (f : γ → δ) (g : β → γ) (h : α → β) :
    (f ∘ g) ∘ h = f ∘ (g ∘ h) := rfl

open Function in
theorem id_comp {α β : Type} (f : α → β) :
    id ∘ f = f := rfl

open Function in
theorem comp_id {α β : Type} (f : α → β) :
    f ∘ id = f := rfl

Function Properties

Mathematical properties of functions - injectivity, surjectivity, and bijectivity - play crucial roles in both mathematics and computer science. An injective function maps distinct inputs to distinct outputs, a surjective function reaches every possible output, and a bijective function is both injective and surjective, establishing a one-to-one correspondence between domain and codomain.

These properties connect to the concept of inverses. A function has a left inverse if and only if it’s injective, a right inverse if and only if it’s surjective, and a two-sided inverse if and only if it’s bijective. Lean provides definitions and theorems for reasoning about these properties, enabling formal verification of mathematical and algorithmic correctness.

/-!
### Function Properties

Important mathematical properties of functions.
-/

-- Injective: Different inputs give different outputs
def isInjective {α β : Type} (f : α → β) : Prop :=
  ∀ x y, f x = f y → x = y

theorem double_injective : isInjective (fun n : Nat => 2 * n) := by
  intro x y h
  simp only [] at h
  -- If 2*x = 2*y, then x = y
  have : x * 2 = y * 2 := by rw [mul_comm x 2, mul_comm y 2]; exact h
  exact Nat.eq_of_mul_eq_mul_right (by norm_num : 0 < 2) this

-- Using Lean's built-in Function.Injective
example : Function.Injective (fun n : Nat => 2 * n) := by
  intro x y h
  simp only [] at h
  have : x * 2 = y * 2 := by rw [mul_comm x 2, mul_comm y 2]; exact h
  exact Nat.eq_of_mul_eq_mul_right (by norm_num : 0 < 2) this

-- Surjective: Every output is reached by some input
def isSurjective {α β : Type} (f : α → β) : Prop :=
  ∀ b, ∃ a, f a = b

-- Not surjective: doubling doesn't produce odd numbers
theorem double_not_surjective :
    ¬Function.Surjective (fun n : Nat => 2 * n) := by
  intro h
  obtain ⟨n, hn⟩ := h 1
  simp at hn
  -- 2*n is always even, but 1 is odd
  cases n with
  | zero => simp at hn
  | succ m => simp [Nat.mul_succ] at hn

-- Bijective: Both injective and surjective
def isBijective {α β : Type} (f : α → β) : Prop :=
  Function.Injective f ∧ Function.Surjective f

-- Left inverse: g ∘ f = id
theorem has_left_inverse {α β : Type} (f : α → β) (g : β → α) :
    Function.LeftInverse g f ↔ ∀ a, g (f a) = a := by
  rfl

-- Right inverse: f ∘ g = id
theorem has_right_inverse {α β : Type} (f : α → β) (g : β → α) :
    Function.RightInverse g f ↔ ∀ b, f (g b) = b := by
  rfl

-- Example: Successor and predecessor
def succInt : Int → Int := (· + 1)
def predInt : Int → Int := (· - 1)

theorem succ_pred_inverse :
    Function.LeftInverse predInt succInt ∧
    Function.RightInverse predInt succInt := by
  constructor <;> intro x <;> simp [succInt, predInt]

-- Inverse functions
structure IsInverse {α β : Type} (f : α → β) (g : β → α) : Prop where
  left : Function.LeftInverse g f
  right : Function.RightInverse g f

theorem inverse_bijective {α β : Type} (f : α → β) (g : β → α)
    (h : IsInverse f g) : isBijective f := by
  constructor
  · -- Injective
    intro x y hxy
    have : g (f x) = g (f y) := by rw [hxy]
    rw [h.left x, h.left y] at this
    exact this
  · -- Surjective
    intro b
    use g b
    exact h.right b

Implicit and Auto Parameters

While not part of core type theory, Lean’s function types include indications of whether parameters are implicit. Implicit and explicit function types are definitionally equal. Implicit parameters are inferred from context, strict implicit parameters must be inferrable at the application site, and auto parameters are filled by type class resolution.

-- Implicit parameters (inferred from usage)
def implicitId {α : Type} (x : α) : α := x

-- Strict implicit (must be inferrable at application site)
def strictImplicit ⦃α : Type⦄ (x : α) : α := x

-- Auto parameters (filled by type class resolution)
def autoParam {α : Type} [Inhabited α] : α := default

-- Optional parameters with default values
def withDefault (n : Nat := 10) : Nat := n * 2

example : implicitId 5 = 5 := rfl
example : withDefault = 20 := rfl
example : withDefault 3 = 6 := rfl

Propositions

Propositions (Prop) are types representing logical statements. They feature proof irrelevance: any two proofs of the same proposition are definitionally equal. This means the specific proof does not matter, only that one exists. We covered this in the Type Theory article.

The Curry-Howard Correspondence Revisited

The Curry-Howard correspondence we encountered in earlier articles now reveals its full depth. With dependent types, the correspondence extends beyond simple propositional logic. Universal quantification becomes dependent function types. Existential quantification becomes dependent pair types (sigma types). The slogan “propositions are types, proofs are programs” turns out to be a precise mathematical equivalence.

LogicType TheoryLean Syntax
\(\forall x : \alpha, P(x)\)Dependent function \(\Pi (x : \alpha), P(x)\)∀ x : α, P x or (x : α) → P x
\(\exists x : \alpha, P(x)\)Dependent pair \(\Sigma (x : \alpha), P(x)\)∃ x : α, P x or Σ x : α, P x
Induction principleRecursorNat.rec, List.rec, etc.
Proof by casesPattern matchingmatch ... with

The dependent versions unify what simpler type systems treat separately. A proof of “for all natural numbers n, P(n) holds” is literally a function that takes any n : Nat and returns a proof of P n. A proof of “there exists a natural number n such that P(n)” is literally a pair: the witness n together with a proof of P n. This unification is not philosophical hand-waving; it is the operational semantics of Lean.

-- Basic logical connectives
theorem and_intro (P Q : Prop) (hp : P) (hq : Q) : P ∧ Q := ⟨hp, hq⟩

theorem or_elim (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R :=
  h.elim hp hq

theorem iff_intro (P Q : Prop) (hpq : P → Q) (hqp : Q → P) : P ↔ Q :=
  ⟨hpq, hqp⟩

-- Proof irrelevance demonstration
theorem proof_irrel_demo (P : Prop) (p1 p2 : P) : p1 = p2 := rfl

-- Classical logic (via choice)
open Classical in
theorem excluded_middle (P : Prop) : P ∨ ¬P := Classical.em P

Key Properties of Propositions

  • Run-time irrelevance: Propositions are erased during compilation
  • Impredicativity: Propositions can quantify over types from any universe
  • Restricted elimination: With limited exceptions (subsingletons), propositions cannot eliminate to non-proposition types
  • Extensionality: The propext axiom enables proving logically equivalent propositions are equal

Decidability

Decidable propositions bridge logic and computation, allowing propositions to be computed. A proposition \(P\) is decidable when we can algorithmically determine \(P \lor \neg P\):

$$\text{Decidable}(P) \triangleq P \lor \neg P$$

This connects to constructive mathematics where decidability provides computational content:

-- Custom decidable instance
instance decidableEven (n : Nat) : Decidable (n % 2 = 0) :=
  if h : n % 2 = 0 then isTrue h else isFalse h

-- Using decidability for computation
def isEven (n : Nat) : Bool := decide (n % 2 = 0)

example : isEven 4 = true := rfl
example : isEven 5 = false := rfl

-- Subsingletons: types with at most one element
theorem subsingleton_prop (P : Prop) : Subsingleton P :=
  ⟨fun _ _ => rfl⟩

Universes

Lean organizes types into a hierarchy of universes to prevent paradoxes, as we discussed in the Type Theory article. Every type belongs to exactly one universe level. The Sort operator constructs universes:

Tip

Universe levels rarely matter in practice. Lean infers them automatically in most cases, and you can write thousands of lines of code without thinking about them. They become relevant when you are doing highly polymorphic library design or when the compiler complains about universe inconsistencies. If that happens, the error messages will guide you. Until then, treat them as plumbing that handles itself.

  • \(\text{Sort } 0 = \text{Prop}\) (propositions)
  • \(\text{Sort } (u + 1) = \text{Type } u\) (data types)
  • Every universe is an element of the next larger universe: \(\text{Sort } u : \text{Sort } (u + 1)\)

The universe of a Pi type \(\Pi (x : \alpha), \beta\) depends on the universes of \(\alpha\) and \(\beta\):

  • If \(\beta : \text{Prop}\), then \(\Pi (x : \alpha), \beta : \text{Prop}\) (impredicative)
  • If \(\beta : \text{Sort } v\), then \(\Pi (x : \alpha), \beta : \text{Sort } (\max(u, v))\) where \(\alpha : \text{Sort } u\)
universe u v w

-- Universe polymorphism (explicit universe level)
def polyIdentity (α : Sort u) (a : α) : α := a

-- Universe level expressions
def maxLevel (α : Type u) (β : Type v) : Type (max u v) := α × β

-- Type 0 contains types, Type 1 contains Type 0, etc.
example : Type 0 = Sort 1 := rfl
example : Prop = Sort 0 := rfl

-- Impredicative Prop: functions into Prop stay in Prop
def propPredicate (P : Type u → Prop) : Prop := ∀ α, P α

-- Predicative Type: function types take maximum universe
def typePredicate (P : Type u → Type v) : Type (max (u+1) v) :=
  ∀ α, P α

Universe Polymorphism and Lifting

Definitions can take universe parameters, and universe levels support expressions with addition and maximum operations. Lean’s universes are non-cumulative: a type in Type u is not automatically in Type (u + 1).

-- PLift: lifts any type by exactly one level
def liftedFalse : Type := PLift False

-- ULift: lifts types by any amount
def liftedNat : Type u := ULift.{u} Nat

-- Lifting and unlifting values
def liftExample : ULift.{1} Nat := ⟨42⟩
example : liftExample.down = 42 := rfl

-- Non-cumulativity: types exist at exactly one level
def needsLifting (α : Type 1) : Type 2 := ULift.{2} α

Two operators bridge universe gaps:

  • PLift: Lifts any type (including propositions) by exactly one level
  • ULift: Lifts non-proposition types by any number of levels

Inductive Types

Inductive types are Lean’s primary mechanism for introducing new types. Every type is either inductive or built from universes, functions, and inductive types.

Warning

The recursor that Lean generates for each inductive type is the induction principle in computational form. If you find yourself writing a proof by induction and wondering where the induction hypothesis comes from, the answer is: the recursor. Understanding recursors deeply is optional for using Lean but essential for understanding why Lean works.

Each inductive type has:

  • A single type constructor (may take parameters)
  • Any number of constructors introducing new values
  • A derived recursor representing an induction principle

The general form of an inductive type declaration: $$ \begin{aligned} &\textbf{inductive } C (\vec{\alpha} : \vec{U}) : \Pi (\vec{\beta} : \vec{V}), s \textbf{ where} \\ &\quad | , c_1 : \Pi (\vec{x_1} : \vec{T_1}), C , \vec{\alpha} , \vec{t_1} \\ &\quad | , c_2 : \Pi (\vec{x_2} : \vec{T_2}), C , \vec{\alpha} , \vec{t_2} \\ &\quad \vdots \end{aligned} $$

Where \(\vec{\alpha}\) are parameters (fixed) and \(\vec{\beta}\) are indices (can vary).

-- Basic inductive with multiple constructors
inductive Color : Type where
  | red | green | blue
  deriving Repr, DecidableEq

-- Parameterized inductive type
inductive Result (ε α : Type) : Type where
  | ok : α → Result ε α
  | error : ε → Result ε α

-- Anonymous constructor syntax for single-constructor types
structure Point where
  x : Float
  y : Float

def origin : Point := ⟨0.0, 0.0⟩

Indexed Families

The distinction between parameters and indices is fundamental:

  • Parameters must be used consistently throughout the definition
  • Indices may vary among different occurrences

For example, with vectors (length-indexed lists): $$\text{Vector} : \text{Type} \to \mathbb{N} \to \text{Type}$$

The recursor for indexed families captures the dependency: $$\text{Vector.rec} : \Pi (P : \Pi n, \text{Vector } \alpha \, n \to \text{Sort } u), \ldots$$

-- Vector: length-indexed lists
inductive Vector (α : Type) : Nat → Type where
  | nil : Vector α 0
  | cons : ∀ {n}, α → Vector α n → Vector α (n + 1)

def vectorHead {α : Type} {n : Nat} : Vector α (n + 1) → α
  | Vector.cons a _ => a

-- Even/odd indexed list (from manual)
inductive EvenOddList (α : Type) : Bool → Type where
  | nil : EvenOddList α true
  | cons : ∀ {isEven}, α → EvenOddList α isEven → EvenOddList α (!isEven)

def exampleEvenList : EvenOddList Nat true :=
  EvenOddList.cons 1 (EvenOddList.cons 2 EvenOddList.nil)

Mutual and Nested Inductive Types

Multiple inductive types can be defined simultaneously when they reference each other. Nested inductive types are defined recursively through other type constructors.

mutual
  inductive Tree (α : Type) : Type where
    | leaf : α → Tree α
    | node : Forest α → Tree α

  inductive Forest (α : Type) : Type where
    | nil : Forest α
    | cons : Tree α → Forest α → Forest α
end

-- Nested inductive: recursive through other type constructors
inductive NestedTree (α : Type) : Type where
  | leaf : α → NestedTree α
  | node : List (NestedTree α) → NestedTree α

-- Recursors are automatically generated for pattern matching
def treeSize {α : Type} : Tree α → Nat
  | Tree.leaf _ => 1
  | Tree.node forest => forestSize forest
where forestSize : Forest α → Nat
  | Forest.nil => 0
  | Forest.cons t rest => treeSize t + forestSize rest

Structures

Structures are specialized single-constructor inductive types with no indices. They provide automatic projection functions, named-field syntax, update syntax, and inheritance:

structure Person where
  name : String
  age : Nat
  email : Option String := none
  deriving Repr

-- Inheritance
structure Student extends Person where
  studentId : Nat
  gpa : Float

-- Field access and update syntax
def alice : Person := { name := "Alice", age := 25 }
def olderAlice := { alice with age := 26 }

example : alice.name = "Alice" := rfl
example : olderAlice.age = 26 := rfl

Quotient Types

Quotient types create new types by identifying elements via equivalence relations. Given a type \(\alpha\) and an equivalence relation \(\sim\) on \(\alpha\), the quotient \(\alpha/\sim\) is a type where \(a = b\) in \(\alpha/\sim\) whenever \(a \sim b\). Elements related by the relation become equal in the quotient type. Equality is respected universally, and nothing in Lean’s logic can observe differences between equal terms.

Note

Mathematicians write \(\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\!\sim\) and software engineers write type Int = Quotient (Nat × Nat) equiv. Same idea, different notation. The integer \(-3\) is not any particular pair of naturals but the equivalence class of all pairs \((a, b)\) where \(a + 3 = b\): so \((0, 3)\), \((1, 4)\), \((2, 5)\), and infinitely many others. Two fields, one concept, a century of mutual incomprehension that turns out to be largely notational.

For example, the integers can be constructed as \(\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\sim\) where \((a,b) \sim (c,d)\) iff \(a + d = b + c\).

-- Simple modulo equivalence relation
def ModRel (n : Nat) : Nat → Nat → Prop :=
  fun a b => a % n = b % n

-- Prove it's an equivalence relation
theorem ModRel.refl (n : Nat) : ∀ x, ModRel n x x :=
  fun _ => rfl

theorem ModRel_symm (n : Nat) : ∀ x y, ModRel n x y → ModRel n y x :=
  fun _ _ h => h.symm

theorem ModRel.trans (n : Nat) : ∀ x y z, ModRel n x y → ModRel n y z → ModRel n x z :=
  fun _ _ _ hxy hyz => Eq.trans hxy hyz

-- Create setoid instance
instance ModSetoid (n : Nat) : Setoid Nat where
  r := ModRel n
  iseqv := {
    refl := ModRel.refl n
    symm := @ModRel_symm n
    trans := @ModRel.trans n
  }

-- Define the quotient type (integers modulo n)
def ZMod (n : Nat) : Type := Quotient (ModSetoid n)

Working with Quotients

Operations on quotients must respect the equivalence relation. The Quotient.lift functions ensure operations are well-defined, while Quotient.sound asserts equality of related elements.

The quotient axioms provide:

  • Quotient.mk: \(\alpha \to \alpha/{\sim}\) (constructor)
  • Quotient.lift: If \(f : \alpha \to \beta\) respects \(\sim\), then \(f\) lifts to \(\alpha/{\sim} \to \beta\)
  • Quotient.sound: If \(a \sim b\), then \([a] = [b]\) in \(\alpha/{\sim}\)
  • Quotient.exact: If \([a] = [b]\) in \(\alpha/{\sim}\), then \(a \sim b\)
namespace ZMod

-- Constructor respecting equivalence
def mk (n : Nat) (a : Nat) : ZMod n :=
  Quotient.mk (ModSetoid n) a

-- Addition operation via lifting (simplified - proper proof would be longer)
def add {n : Nat} [NeZero n] : ZMod n → ZMod n → ZMod n :=
  Quotient.lift₂
    (fun a b => mk n ((a + b) % n))
    (fun a₁ a₂ b₁ b₂ h₁ h₂ => by
      apply Quotient.sound
      simp only [ModSetoid] at h₁ h₂ ⊢
      unfold ModRel at h₁ h₂ ⊢
      rw [Nat.add_mod, h₁, h₂, ← Nat.add_mod]
      rfl)

-- Quotient.sound: related elements are equal
theorem mk_eq_of_rel {n : Nat} (a b : Nat) (h : ModRel n a b) :
  mk n a = mk n b :=
  Quotient.sound h

-- Quotient induction principle
theorem ind_on {n : Nat} {P : ZMod n → Prop} (q : ZMod n)
  (h : ∀ a, P (mk n a)) : P q :=
  Quotient.ind h q

end ZMod

Combined Examples

The following examples combine dependent functions, indexed families, and proof terms. Each demonstrates how types can enforce invariants that would be invisible to simpler type systems:

-- Dependent pairs (Sigma types)
def DependentPair := Σ n : Nat, Fin n

def examplePair : DependentPair := ⟨3, 2⟩

-- Type families and dependent pattern matching
def typeFamily : Nat → Type
  | 0 => Unit
  | 1 => Bool
  | _ => Nat

def familyValue : (n : Nat) → typeFamily n
  | 0 => ()
  | 1 => true
  | n@(_ + 2) => n * 2

-- Well-founded recursion via termination_by
def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

-- Custom well-founded recursion
def div2 : Nat → Nat
  | 0 => 0
  | 1 => 0
  | n + 2 => 1 + div2 n

example : factorial 5 = 120 := rfl
example : div2 10 = 5 := rfl

The machinery presented here forms the foundation of everything that follows. Dependent types are why Lean can serve simultaneously as a programming language and a proof assistant. When you write a type signature like Vector α n → Vector α (n + 1), you are making a mathematical claim that the compiler will verify. Specifications that the machine enforces, invariants that cannot be violated, programs that are correct by construction.

Most software is written fast, tested hopefully, and debugged frantically. Dependent types offer a different mode: slower to write, harder to learn, guarantees that survive contact with production. Whether the tradeoff makes sense depends on how much a bug costs. For most code, the answer is “not much.” For some code, the answer is “careers” or “lives.” Know which kind you are writing.

Proof Strategy

The previous articles taught you individual tactics. Now we learn how to think. A proof is not a random sequence of tactics that happens to work. It is a structured argument, and understanding that structure makes the difference between flailing and fluency. The gap between knowing the tactics and knowing how to prove things is the gap between knowing the rules of chess and knowing how to not lose immediately.

The Goal State

Every proof begins with a goal and ends with no goals. The goal state is your map. Learning to read it fluently is the most important skill in tactic-based proving.

case succ
n : Nat
ih : P n
⊢ P (n + 1)

This goal state tells you everything: you are in the succ case of an induction, you have a natural number n, you have an induction hypothesis ih stating that P n holds, and you must prove P (n + 1). The turnstile separates what you have from what you need.

When a proof has multiple goals, they appear stacked. The first goal is your current focus. Tactics typically operate on the first goal, though combinators like all_goals and any_goals can target multiple goals simultaneously.

Categories of Tactics

Tactics fall into natural categories based on what they do to the goal state. Understanding these categories helps you choose the right tool.

Introduction tactics move structure from the goal into the context. When your goal is P → Q, the tactic intro h assumes P (calling it h) and changes the goal to Q. When your goal is ∀ x, P x, the tactic intro x introduces a fresh x and changes the goal to P x. Introduction tactics make progress by assuming what you need to prove under.

Elimination tactics use structure from the context to transform the goal. When you have h : P ∧ Q and need P, the tactic exact h.1 extracts the left component. When you have h : P ∨ Q, the tactic cases h splits into two goals, one assuming P and one assuming Q. Elimination tactics make progress by using what you have.

Rewriting tactics transform the goal using equalities. The tactic rw [h] replaces occurrences of the left side of h with the right side. The tactic simp applies many such rewrites automatically. Rewriting makes progress by simplifying toward something obviously true.

Automation tactics search for proofs. The tactic simp tries simplification lemmas. The tactic omega solves linear arithmetic. The tactic aesop performs general proof search. Automation makes progress by doing work you would rather not do by hand.

Structural tactics manipulate the proof state without making logical progress. The tactic swap reorders goals. The tactic rename changes hypothesis names. The tactic clear removes unused hypotheses. These tactics keep your proof organized.

Reading the Goal

Before applying any tactic, ask: what is the shape of my goal?

Implication (P → Q): Use intro to assume P and prove Q.

Universal (∀ x, P x): Use intro x to introduce x and prove P x.

Conjunction (P ∧ Q): Use constructor to split into two goals, one for P and one for Q. Or use And.intro directly.

Disjunction (P ∨ Q): Use left to commit to proving P, or right to commit to proving Q.

Existential (∃ x, P x): Use use t to provide a witness t and prove P t.

Equality (a = b): Try rfl if definitionally equal. Try simp for simplification. Try rw with known equalities. Try ring for algebraic identities.

Negation (¬P): Remember that ¬P is P → False. Use intro h to assume P and derive a contradiction.

False: If your goal is False, you need a contradiction. Look for hypotheses that conflict.

Reading the Context

Your context contains hypotheses. Each hypothesis is a tool. Ask: what can I do with each hypothesis?

Implication (h : P → Q): If you can prove P, you can get Q. Use apply h when your goal is Q. Use have hq := h hp when you have hp : P.

Universal (h : ∀ x, P x): You can instantiate at any term. Use specialize h t to get P t. Use have ht := h t to keep the original.

Conjunction (h : P ∧ Q): You have both P and Q. Access them with h.1 and h.2, or use obtain ⟨hp, hq⟩ := h to destructure.

Disjunction (h : P ∨ Q): You have either P or Q but do not know which. Use cases h to split into cases.

Existential (h : ∃ x, P x): There is some x satisfying P. Use obtain ⟨x, hx⟩ := h to get the witness and its property.

Equality (h : a = b): You can substitute b for a. Use rw [h] to rewrite left-to-right. Use rw [← h] to rewrite right-to-left.

Proof Patterns

Certain proof structures recur constantly. Recognizing them saves time.

Direct proof: Introduce assumptions, manipulate, conclude. Most proofs follow this pattern.

theorem direct (P Q : Prop) (h : P → Q) (hp : P) : Q := by
  apply h
  exact hp

Proof by cases: When you have a disjunction or an inductive type, split and prove each case.

theorem by_cases (n : Nat) : n = 0 ∨ n ≥ 1 := by
  cases n with
  | zero => left; rfl
  | succ m => right; simp

Proof by induction: For properties of recursive types, prove the base case and the inductive step.

theorem by_induction (n : Nat) : 0 + n = n := by
  induction n with
  | zero => rfl
  | succ n ih => simp [ih]

Proof by contradiction: Assume the negation and derive False.

theorem by_contradiction (P : Prop) (h : ¬¬P) : P := by
  by_contra hnp
  exact h hnp

Proof by contraposition: To prove P → Q, prove ¬Q → ¬P instead.

theorem by_contraposition (P Q : Prop) (h : ¬Q → ¬P) : P → Q := by
  intro hp
  by_contra hnq
  exact h hnq hp

When You Get Stuck

Every proof hits obstacles. Here is how to get unstuck.

Simplify first. Try simp or simp only [relevant_lemmas]. Often the goal simplifies to something obvious.

Check your hypotheses. Do you have what you need? Use have to derive intermediate facts. Use obtain to destructure complex hypotheses.

Try automation. For arithmetic, try omega or linarith. For algebraic identities, try ring or field_simp. For general goals, try aesop or decide.

Work backwards. What would make your goal obviously true? If you need P ∧ Q, you need to prove both P and Q. What tactics produce those subgoals?

Work forwards. What can you derive from your hypotheses? If you have h : P → Q and hp : P, you can derive Q.

Split the problem. Use have to state and prove intermediate lemmas. Breaking a proof into steps often reveals the path.

Read the error. Lean’s error messages are verbose but precise. “Type mismatch” tells you what was expected and what you provided. “Unknown identifier” means a name is not in scope. “Unsolved goals” means you are not done.

Use the library. Mathlib contains thousands of lemmas. Use exact? to search for lemmas that close your goal. Use apply? to search for lemmas whose conclusion matches your goal.

Tactic Composition

Tactics compose in several ways.

Sequencing: Separate tactics with newlines or semicolons. Each tactic operates on the result of the previous one.

theorem seq_demo (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
  constructor
  exact h.2
  exact h.1

Focusing: Use · to focus on a single goal. Indentation groups tactics under a focus.

theorem focus_demo (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
  constructor
  · exact h.2
  · exact h.1

Combinators: Use <;> to apply a tactic to all goals produced by the previous tactic. Use first | t1 | t2 to try tactics in order. Use repeat t to apply a tactic until it fails.

theorem combinator_demo (P : Prop) (h : P) : P ∧ P ∧ P := by
  constructor <;> (try constructor) <;> exact h

Next-Generation Automation

The tactics described so far require you to think. You read the goal, choose a strategy, apply tactics step by step. This is how mathematicians have always worked, and there is value in understanding your proof at every stage. But a new generation of tactics is changing the calculus of what is worth formalizing.

Higher-order tactics like aesop, grind, and SMT integration lift proof development from low-level term manipulation to structured, automated search over rich proof states. Instead of specifying every proof step, you specify goals, rule sets, or search parameters, and these tactics synthesize proof terms that Lean’s kernel then checks. The soundness guarantee remains absolute since the kernel verifies everything, but the human cost drops dramatically. This decoupling of “what should be proved” from “how to construct the term” is what makes large-scale formalization feasible.

aesop implements white-box best-first proof search, exploring a tree of proof states guided by user-configurable rules. Unlike black-box automation, aesop lets you understand and tune the search: rules are indexed via discrimination trees for rapid retrieval, and you can register domain-specific lemmas to teach it new tricks. grind draws inspiration from modern SMT solvers, maintaining a shared workspace where congruence closure, E-matching, and forward chaining cooperate on a goal. It excels when many interacting equalities and logical facts are present, automatically deriving consequences that would be tedious to script by hand. For goals requiring industrial-strength decision procedures, SMT tactics send suitable fragments to proof-producing solvers like cvc5, then reconstruct proofs inside Lean so the kernel can verify them. This lets Lean leverage decades of solver engineering while preserving the LCF-style trust model where only the small kernel must be trusted.

The strategic question is when to reach for automation versus working by hand. The temptation is to try grind on everything and move on when it works. This is efficient but opaque: you learn nothing, and when automation fails on a similar goal later, you have no insight into why. A better approach is to use automation to explore, then understand what it found. Goals that would take an hour of tedious case analysis now take seconds. This frees you to tackle harder problems. But remember: when grind closes a goal, it has found a valid proof term. It has not gained insight. That remains your job.

What Comes Next

The following article is a reference. It documents every major tactic in Lean 4 and Mathlib, organized alphabetically. You do not need to memorize it. You need to know it exists, and you need to know how to find the tactic you need.

When you encounter a goal you do not know how to prove, return here. Ask: what is the shape of my goal? What is in my context? What pattern does this proof follow? The answer will point you to the right tactic, and the reference will tell you how to use it.

The strategies in this article apply beyond Lean. The structure of mathematical argument is universal. Direct proof, case analysis, induction, contradiction: these are the fundamental patterns of reason itself. Learning them in a proof assistant merely makes them explicit. You cannot handwave past a case you forgot to consider when the computer is watching.

Tactics Reference

Robin Milner’s LCF system introduced a radical idea in the 1970s: let users extend the theorem prover with custom proof procedures, but channel all proof construction through a small trusted kernel. You could write arbitrarily clever automation, and if it produced a proof, that proof was guaranteed valid. Tactics are this idea fully realized. They are programs that build proofs, metaprograms that manipulate the proof state, search procedures that explore the space of possible arguments. When you write simp and Lean simplifies your goal through dozens of rewrite steps, you are invoking a sophisticated algorithm. When you write omega and Lean discharges a linear arithmetic obligation, you are running a decision procedure. The proof terms these tactics construct may be enormous, but they are checked by the kernel, and the kernel is small enough to trust. Think of tactics as the code you write, and the kernel as the one colleague who actually reads your pull requests.

Table of Contents

The following covers all the major tactics in Lean 4 and Mathlib. Click on any tactic name to jump to its documentation and examples.

  • abel - Prove equalities in abelian groups
  • aesop - General automation tactic
  • all_goals - Apply tactic to all current goals
  • any_goals - Apply tactic to any applicable goal
  • apply - Apply hypotheses or lemmas to solve goals
  • apply_fun - Apply function to both sides of equality
  • assumption - Use hypothesis matching goal
  • bound - Prove inequalities from structure
  • by_cases - Perform case splitting
  • by_contra - Proof by contradiction
  • calc - Chain equations and inequalities
  • cases - Case analysis on inductive types
  • choose - Extract choice function from forall-exists
  • congr - Prove equality using congruence rules
  • constructor - Break down conjunctions, existentials, and iff
  • contradiction - Find contradictions in hypotheses
  • conv - Targeted rewriting in specific parts
  • convert - Prove by showing goal equals type of expression
  • decide - Run decision procedures
  • exact - Provide an exact proof term
  • exfalso - Prove anything from False
  • ext - Prove equality of functions extensionally
  • field_simp - Simplify field expressions
  • fin_cases - Split finite type into cases
  • first - Try tactics until one succeeds
  • focus - Limit tactics to first goal
  • gcongr - Prove inequalities using congruence
  • generalize - Replace expressions with variables
  • grind - Proof search using congruence closure
  • group - Prove equalities in groups
  • have - Introduce new hypotheses
  • hint - Get tactic suggestions
  • induction - Perform inductive proofs
  • interval_cases - Split bounded values into cases
  • intro - Introduce assumptions from implications and quantifiers
  • left - Choose left side of disjunction
  • lift - Lift variable to higher type
  • linarith - Prove linear inequalities
  • linear_combination - Prove from linear combinations
  • module - Prove equalities in modules
  • nlinarith - Handle nonlinear inequalities
  • noncomm_ring - Prove in non-commutative rings
  • norm_cast - Simplify by moving casts outward
  • norm_num - Simplify numerical expressions
  • nth_rw - Rewrite only the nth occurrence
  • obtain - Destructure existentials and structures
  • omega - Solve linear arithmetic over Nat and Int
  • pick_goal - Move specific goal to front
  • positivity - Prove positivity goals
  • push_cast - Push casts inward
  • push_neg - Push negations inward
  • qify - Shift to rationals
  • refine - Apply with holes to fill later
  • rename - Rename hypotheses for clarity
  • repeat - Apply tactic repeatedly until fails
  • revert - Move hypotheses back to the goal
  • rfl - Prove by reflexivity
  • right - Choose right side of disjunction
  • ring - Prove equalities in commutative rings
  • rw - Rewrite using equalities
  • simp - Apply simplification lemmas
  • simp_all - Simplify everything including hypotheses
  • simp_rw - Rewrite with simplification at each step
  • sorry - Admit goal without proof
  • specialize - Instantiate hypothesis with specific arguments
  • split - Handle if-then-else and pattern matching
  • split_ifs - Case on if-then-else expressions
  • subst - Substitute variable with its value
  • swap - Swap first two goals
  • symm - Swap symmetric relations
  • tauto - Prove logical tautologies
  • trans - Split transitive relations
  • trivial - Prove simple goals automatically
  • try - Attempt tactic, continue if fails
  • use - Provide witnesses for existential goals
  • zify - Shift natural numbers to integers

Logical Connectives

intro

The intro tactic moves hypotheses from the goal into the local context. When your goal is ∀ x, P x or P → Q, using intro names the bound variable or assumption and makes it available for use in the proof.

theorem intro_apply : ∀ x : Nat, x = x → x + 0 = x := by
  intro x h  -- Introduce x and hypothesis h : x = x
  simp       -- Simplify x + 0 = x

constructor

The constructor tactic applies the first constructor of an inductive type to the goal. For And (conjunction), it splits the goal into two subgoals. For Exists, it expects you to provide a witness. For Iff, it creates subgoals for both directions.

theorem constructor_example : True ∧ True := by
  constructor
  · trivial  -- Prove first True
  · trivial  -- Prove second True

left and right

The left and right tactics select which side of a disjunction to prove. When your goal is P ∨ Q, use left to commit to proving P or right to prove Q.

theorem or_example : 5 < 10 ∨ 10 < 5 := by
  left
  simp

use

The use tactic provides a concrete witness for an existential goal. When your goal is ∃ x, P x, using use t substitutes t for x and leaves you to prove P t.

theorem use_example : ∃ x : Nat, x * 2 = 10 := by
  exact ⟨5, rfl⟩

obtain

The obtain tactic extracts components from existential statements and structures in hypotheses. It combines have and pattern matching, letting you name both the witness and the proof simultaneously.

theorem have_example (x : Nat) : x + 0 = x := by
  have h : x + 0 = x := by simp
  exact h

theorem obtain_example (h : ∃ x : Nat, x > 5 ∧ x < 10) : ∃ y, y = 7 := by
  obtain ⟨_x, _hgt, _hlt⟩ := h  -- Destructure the existential
  exact ⟨7, rfl⟩

Applying Lemmas

exact

The exact tactic closes a goal by providing a term whose type matches the goal exactly. It performs no additional unification or elaboration beyond what is necessary.

theorem exact_example (h : 2 + 2 = 4) : 4 = 2 + 2 := by
  exact h.symm

/-- `refine` allows holes with ?_ -/
theorem refine_example : ∃ x : Nat, x > 5 := by
  refine ⟨10, ?_⟩  -- Use 10, but leave proof as a hole
  simp              -- Fill the hole: prove 10 > 5

apply

The apply tactic works backwards from the goal. Given a lemma h : A → B and a goal B, using apply h reduces the goal to proving A. It unifies the conclusion of the lemma with the current goal.

refine

The refine tactic is like exact but allows placeholders written as ?_ that become new goals. This lets you partially specify a proof term while deferring some parts.

theorem exact_example (h : 2 + 2 = 4) : 4 = 2 + 2 := by
  exact h.symm

/-- `refine` allows holes with ?_ -/
theorem refine_example : ∃ x : Nat, x > 5 := by
  refine ⟨10, ?_⟩  -- Use 10, but leave proof as a hole
  simp              -- Fill the hole: prove 10 > 5

convert

The convert tactic applies a term to the goal even when the types do not match exactly, generating side goals for the mismatches. It is useful when you have a lemma that is almost but not quite what you need.

theorem convert_example (x y : Nat) (h : x = y) : Nat.succ x = Nat.succ y := by
  convert rfl using 1
  rw [h]

specialize

The specialize tactic instantiates a universally quantified hypothesis with concrete values, replacing the general statement with a specific instance in your context.

theorem specialize_example (h : ∀ x : Nat, x > 0 → x ≥ 1) : 5 ≥ 1 := by
  specialize h 5 (by simp)
  exact h

Context Manipulation

have

The have tactic introduces a new hypothesis into the context. You state what you want to prove as an intermediate step, prove it, and then it becomes available for the rest of the proof.

theorem have_example (x : Nat) : x + 0 = x := by
  have h : x + 0 = x := by simp
  exact h

theorem obtain_example (h : ∃ x : Nat, x > 5 ∧ x < 10) : ∃ y, y = 7 := by
  obtain ⟨_x, _hgt, _hlt⟩ := h  -- Destructure the existential
  exact ⟨7, rfl⟩

rename

The rename tactic changes the name of a hypothesis in the local context, making proofs more readable when auto-generated names are unclear.

theorem rename_example (h : 1 = 1) : 1 = 1 := by
  exact h

revert

The revert tactic is the inverse of intro. It moves a hypothesis from the context back into the goal as an implication or universal quantifier, which is useful before applying induction or certain lemmas.

theorem revert_example (x : Nat) (h : x = 5) : x = 5 := by
  revert h x
  intro x h
  exact h

generalize

The generalize tactic replaces a specific expression in the goal with a fresh variable, abstracting over that value. This is useful when you need to perform induction on a compound expression.

theorem generalize_example : (2 + 3) * 4 = 20 := by
  simp

Rewriting and Simplifying

rw (rewrite)

The rw tactic replaces occurrences of the left-hand side of an equality with the right-hand side. Use rw [←h] to rewrite in the reverse direction. Multiple rewrites can be chained in a single rw [h1, h2, h3].

theorem rw_example (a b : Nat) (h : a = b) : a + 2 = b + 2 := by
  rw [h]  -- Rewrite a to b using h

theorem simp_example (x : Nat) : x + 0 = x ∧ 0 + x = x := by
  simp  -- Simplifies both sides

simp

The simp tactic repeatedly applies lemmas marked with @[simp] to simplify the goal. It handles common algebraic identities, list operations, and logical simplifications automatically.

theorem rw_example (a b : Nat) (h : a = b) : a + 2 = b + 2 := by
  rw [h]  -- Rewrite a to b using h

theorem simp_example (x : Nat) : x + 0 = x ∧ 0 + x = x := by
  simp  -- Simplifies both sides

simp_all

The simp_all tactic simplifies both the goal and all hypotheses simultaneously, using each simplified hypothesis to help simplify the others.

theorem simp_all_example (x : Nat) (h : x = 0) : x + x = 0 := by
  simp_all

simp_rw

The simp_rw tactic rewrites using the given lemmas but applies simplification at each step, which helps when rewrites would otherwise fail due to associativity or other issues.

theorem simp_rw_example (x y : Nat) : (x + y) + (y + x) = 2 * (x + y) := by
  simp_rw [Nat.add_comm y x]
  ring

nth_rw

The nth_rw tactic rewrites only a specific occurrence of a pattern, counting from 1. This gives precise control when an expression appears multiple times and you only want to change one instance.

theorem nth_rewrite_example (x : Nat) : x + x + x = 3 * x := by
  nth_rw 2 [← Nat.add_zero x]  -- Rewrite only the second occurrence of x
  simp
  ring

norm_num

The norm_num tactic evaluates and simplifies numeric expressions, proving goals like 2 + 2 = 4 or 7 < 10 by computation. It handles arithmetic in various number types.

theorem norm_num_example : 2 ^ 3 + 5 * 7 = 43 := by
  norm_num

norm_cast

The norm_cast tactic normalizes expressions involving type coercions by pushing casts outward and combining them, making goals about mixed numeric types easier to prove.

theorem norm_cast_example (n : Nat) : (n : Int) + 1 = ((n + 1) : Int) := by
  norm_cast

push_cast

The push_cast tactic pushes type coercions inward through operations, distributing a cast over addition, multiplication, and other operations.

set_option linter.unusedTactic false in
theorem push_cast_example (n m : Nat) : ((n + m) : Int) = (n : Int) + (m : Int) := by
  push_cast
  rfl

conv

The conv tactic enters a conversion mode that lets you navigate to specific subexpressions and rewrite only there. It is invaluable when rw affects the wrong occurrence or when you need surgical precision.

theorem conv_example (x y : Nat) : x + y = y + x := by
  conv =>
    lhs  -- Focus on left-hand side
    rw [Nat.add_comm]

Reasoning with Relations

rfl (reflexivity)

The rfl tactic proves goals of the form a = a where both sides are definitionally equal. It works even when the equality is not syntactically obvious but follows from definitions.

theorem rfl_example (x : Nat) : x = x := by
  rfl

symm

The symm tactic reverses a symmetric relation like equality. If your goal is a = b and you have h : b = a, using symm on h or the goal makes them match.

theorem symm_example (x y : Nat) (h : x = y) : y = x := by
  symm
  exact h

trans

The trans tactic splits a transitive goal like a = c into two subgoals a = b and b = c for a chosen intermediate value b. It works for any transitive relation.

theorem trans_example (a b c : Nat) (h1 : a ≤ b) (h2 : b ≤ c) : a ≤ c := by
  trans b
  · exact h1
  · exact h2

subst

The subst tactic eliminates a variable by substituting it everywhere with an equal expression. Given h : x = e, using subst h replaces all occurrences of x with e and removes x from the context.

theorem subst_example (x y : Nat) (h : x = 5) : x + y = 5 + y := by
  subst h
  rfl

ext (extensionality)

The ext tactic proves equality of functions, sets, or structures by showing they agree on all inputs or components. It introduces the necessary variables and reduces the goal to pointwise equality.

theorem ext_example (f g : Nat → Nat)
    (h : ∀ x, f x = g x) : f = g := by
  ext x
  exact h x

calc

The calc tactic provides a structured way to write chains of equalities or inequalities. Each step shows the current expression, the relation, and the justification, mirroring traditional mathematical proofs.

theorem calc_example (a b c : Nat)
    (h1 : a = b) (h2 : b = c) : a = c := by
  calc a = b := h1
       _ = c := h2

apply_fun

The apply_fun tactic applies a function to both sides of an equality hypothesis. It automatically generates a side goal requiring the function to be injective when needed.

theorem apply_fun_example (x y : Nat) (h : x = y) : x + 2 = y + 2 := by
  apply_fun (· + 2) at h
  exact h

congr

The congr tactic reduces an equality goal f a = f b to proving a = b, applying congruence recursively. It handles nested function applications by breaking them into component equalities.

theorem congr_example (f : Nat → Nat) (x y : Nat) (h : x = y) : f x = f y := by
  congr

gcongr

The gcongr tactic proves inequalities by applying monotonicity lemmas. It automatically finds and applies lemmas showing that operations preserve ordering, such as adding to both sides of an inequality.

theorem gcongr_example (x y a b : Nat) (h1 : x ≤ y) (h2 : a ≤ b) : x + a ≤ y + b := by
  gcongr

linear_combination

The linear_combination tactic proves an equality by showing it follows from a linear combination of given hypotheses. You specify the coefficients, and it verifies the algebra.

theorem linear_combination_example (x y : ℚ) (h1 : 2*x + y = 4) (h2 : x + 2*y = 5) :
    x + y = 3 := by
  linear_combination (h1 + h2) / 3

positivity

The positivity tactic proves goals asserting that an expression is positive, nonnegative, or nonzero. It analyzes the structure of the expression and applies appropriate lemmas automatically.

set_option linter.unusedVariables false in
theorem positivity_example (x : ℚ) (h : 0 < x) : 0 < x^2 + x := by
  positivity

bound

The bound tactic proves inequality goals by recursively analyzing expression structure and applying bounding lemmas. It is particularly effective for expressions built from well-behaved operations.

theorem bound_example (x y : ℕ) : x ≤ x + y := by
  simp

Reasoning Techniques

cases

The cases tactic performs case analysis on an inductive type, creating separate subgoals for each constructor. For a natural number, it splits into the zero case and the successor case.

theorem cases_example (n : Nat) : n = 0 ∨ n > 0 := by
  cases n with
  | zero => left; rfl
  | succ _m => right; exact Nat.succ_pos _

theorem induction_example (n : Nat) : n + 0 = n := by
  induction n with
  | zero => rfl
  | succ n _ih => rfl

induction

The induction tactic sets up a proof by induction on an inductive type. It creates a base case for each non-recursive constructor and an inductive case with an induction hypothesis for each recursive constructor.

theorem cases_example (n : Nat) : n = 0 ∨ n > 0 := by
  cases n with
  | zero => left; rfl
  | succ _m => right; exact Nat.succ_pos _

theorem induction_example (n : Nat) : n + 0 = n := by
  induction n with
  | zero => rfl
  | succ n _ih => rfl

split

The split tactic splits goals involving if-then-else expressions or pattern matching into separate cases. It creates subgoals for each branch with the appropriate condition as a hypothesis.

def abs (x : Int) : Nat :=
  if x ≥ 0 then x.natAbs else (-x).natAbs

theorem split_example (x : Int) : abs x ≥ 0 := by
  unfold abs
  split <;> simp

split_ifs

The split_ifs tactic finds all if-then-else expressions in the goal and splits on their conditions, creating cases for each combination of true and false branches.

theorem split_ifs_example (p : Prop) [Decidable p] (x y : Nat) :
    (if p then x else y) ≤ max x y := by
  split_ifs
  · exact le_max_left x y
  · exact le_max_right x y

contradiction

The contradiction tactic closes the goal by finding contradictory hypotheses in the context, such as h1 : P and h2 : ¬P, or an assumption of False.

theorem contradiction_example (h1 : False) : 0 = 1 := by
  contradiction

theorem exfalso_example (h : 0 = 1) : 5 = 10 := by
  exfalso  -- Goal becomes False
  simp at h

exfalso

The exfalso tactic changes any goal to False, applying the principle of explosion. Use this when you can derive a contradiction from your hypotheses.

theorem contradiction_example (h1 : False) : 0 = 1 := by
  contradiction

theorem exfalso_example (h : 0 = 1) : 5 = 10 := by
  exfalso  -- Goal becomes False
  simp at h

by_contra

The by_contra tactic starts a proof by contradiction. It adds the negation of the goal as a hypothesis and changes the goal to False, requiring you to derive a contradiction.

theorem by_contra_example : ∀ n : Nat, n = n := by
  intro n
  rfl

push_neg

The push_neg tactic pushes negations through quantifiers and connectives using De Morgan’s laws. It transforms ¬∀ x, P x into ∃ x, ¬P x and similar patterns.

theorem push_neg_example : ¬(∀ x : Nat, ∃ y, x < y) ↔ ∃ x : Nat, ∀ y, ¬(x < y) := by
  push_neg
  rfl

by_cases

The by_cases tactic splits the proof into two cases based on whether a proposition is true or false, adding the proposition as a hypothesis in one branch and its negation in the other.

theorem by_cases_example (p : Prop) : p ∨ ¬p := by
  by_cases h : p
  · left; exact h
  · right; exact h

choose

The choose tactic extracts a choice function from a hypothesis of the form ∀ x, ∃ y, P x y. It produces a function f and a proof that ∀ x, P x (f x).

theorem choose_example (h : ∀ x : Nat, ∃ y : Nat, x < y) :
    ∃ f : Nat → Nat, ∀ x, x < f x := by
  choose f hf using h
  exact ⟨f, hf⟩

lift

The lift tactic replaces a variable with one of a more specific type when you have a proof justifying the lift. For example, lifting an integer to a natural number given a proof it is nonnegative.

theorem lift_example (n : ℤ) (hn : 0 ≤ n) : ∃ m : ℕ, (m : ℤ) = n := by
  lift n to ℕ using hn
  exact ⟨n, rfl⟩

zify

The zify tactic converts a goal about natural numbers to one about integers, which often makes subtraction and other operations easier to handle since integers are closed under subtraction.

theorem zify_example (n m : ℕ) (_ : n ≥ m) : (n - m : ℤ) = n - m := by
  zify

qify

The qify tactic converts a goal about integers or naturals to one about rationals, enabling division and making certain algebraic manipulations possible.

theorem qify_example (n m : ℕ) : (n : ℚ) / (m : ℚ) = (n / m : ℚ) := by
  norm_cast

Searching

assumption

The assumption tactic closes the goal if there is a hypothesis in the context that exactly matches. It searches through all available hypotheses to find one with the right type.

theorem assumption_example (P Q : Prop) (h1 : P) (_h2 : Q) : P := by
  assumption  -- Finds h1

trivial

The trivial tactic tries a collection of simple tactics including rfl, assumption, and contradiction to close easy goals without you specifying which approach to use.

theorem trivial_example : True := by
  trivial

decide

The decide tactic evaluates decidable propositions by computation. For finite checks like 2 < 5 or membership in a finite list, it simply computes the answer and closes the goal.

theorem decide_example : 3 < 5 := by
  decide

hint

The hint tactic suggests which tactics might make progress on the current goal. It is a discovery tool that helps when you are unsure how to proceed.

theorem hint_example : 2 + 2 = 4 := by
  simp  -- hint would suggest this

General Automation

omega

The omega tactic is a decision procedure for linear arithmetic over natural numbers and integers. It handles goals involving addition, subtraction, multiplication by constants, and comparisons.

theorem omega_example (x y : Nat) : x < y → x + 1 ≤ y := by
  omega

linarith

The linarith tactic proves goals that follow from linear arithmetic over ordered rings. It combines hypotheses about inequalities to derive the goal using Fourier-Motzkin elimination.

theorem linarith_example (x y z : ℚ) (h1 : x < y) (h2 : y < z) : x < z := by
  linarith

nlinarith

The nlinarith tactic extends linarith to handle some nonlinear goals by first preprocessing with polynomial arithmetic before applying linear reasoning.

theorem nlinarith_example (x : ℚ) (h : x > 0) : x^2 > 0 := by
  nlinarith

ring

The ring tactic proves polynomial equalities in commutative rings by normalizing both sides to a canonical form and checking if they match. It handles addition, multiplication, and powers.

theorem ring_example (x y : ℤ) : (x + y)^2 = x^2 + 2*x*y + y^2 := by
  ring

noncomm_ring

The noncomm_ring tactic proves equalities in non-commutative rings where multiplication order matters, such as matrix rings or quaternions.

theorem noncomm_ring_example (x y z : ℤ) : x * (y + z) = x * y + x * z := by
  ring

field_simp

The field_simp tactic clears denominators in field expressions by multiplying through, reducing goals involving fractions to polynomial equalities that ring can handle.

theorem field_simp_example (x y : ℚ) (hy : y ≠ 0) : x / y + 1 = (x + y) / y := by
  field_simp

abel

The abel tactic proves equalities in abelian groups by normalizing expressions involving addition, subtraction, and negation to a canonical form.

theorem abel_example (x y z : ℤ) : x + y + z = z + x + y := by
  abel

group

The group tactic proves equalities in groups using the group axioms. It handles multiplication, inverses, and the identity element, normalizing expressions to compare them.

theorem group_example (x y : ℤ) : x + (-x + y) = y := by
  group

module

The module tactic proves equalities in modules over a ring, handling scalar multiplication and vector addition to normalize expressions.

theorem module_example (x y : ℤ) (a : ℤ) : a • (x + y) = a • x + a • y := by
  module

aesop

The aesop tactic is a general-purpose automation tactic that combines many strategies including simplification, introduction rules, and case splitting to solve goals automatically.

theorem aesop_example (p q r : Prop) : p → (p → q) → (q → r) → r := by
  aesop

grind

grind performs proof search using congruence closure, forward chaining, and case splitting. It’s particularly effective for goals involving equational reasoning and logical connectives.

theorem grind_example1 (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  grind

theorem grind_example2 (f : Nat → Nat) (x y : Nat)
    (h1 : x = y) (h2 : f x = 10) : f y = 10 := by
  grind

theorem grind_example3 (p q r : Prop)
    (h1 : p ∧ q) (h2 : q → r) : p ∧ r := by
  grind

theorem grind_example4 (x y : Nat) :
    (if x = y then x else y) = y ∨ x = y := by
  grind

tauto

The tauto tactic proves propositional tautologies involving , , , , ¬, True, and False. It handles classical and intuitionistic reasoning automatically.

theorem tauto_example (p q : Prop) : p → (p → q) → q := by
  tauto

Goal Operations

sorry

The sorry tactic closes any goal without actually proving it, leaving a hole in the proof. Use it as a placeholder during development, but never in finished proofs as it makes theorems unsound.

@[simp]  -- Add simp attribute to suppress sorry warning
theorem incomplete_proof : ∀ P : Prop, P ∨ ¬P := by
  sorry  -- Proof left as exercise

swap

The swap tactic exchanges the first two goals in the goal list, letting you work on the second goal first when that is more convenient.

theorem swap_example : True ∧ True := by
  constructor
  swap
  · trivial  -- Proves second goal first
  · trivial  -- Then first goal

pick_goal

The pick_goal tactic moves a specific numbered goal to the front of the goal list, allowing you to address goals in any order you choose.

theorem pick_goal_example : True ∧ True := by
  constructor
  pick_goal 2  -- Move second goal to front
  · trivial    -- Prove second goal
  · trivial    -- Prove first goal

all_goals

The all_goals tactic applies a given tactic to every goal in the current goal list, which is useful when multiple goals can be solved the same way.

theorem all_goals_example : (1 = 1) ∧ (2 = 2) := by
  constructor
  all_goals rfl

any_goals

The any_goals tactic applies a given tactic to each goal where it succeeds, skipping goals where it fails. It succeeds if it makes progress on at least one goal.

theorem any_goals_example : (1 = 1) ∧ (True) := by
  constructor
  · rfl
  · trivial

focus

The focus tactic restricts attention to the first goal, hiding all other goals. This helps ensure you complete one goal before moving to the next.

theorem focus_example : True ∧ True := by
  constructor
  · focus
      trivial
  · trivial

try

The try tactic attempts to apply a tactic and succeeds regardless of whether the inner tactic succeeds or fails. It is useful for optional simplification steps.

theorem try_example : 1 = 1 := by
  rfl

first

The first tactic tries a list of tactics in order and uses the first one that succeeds. It fails only if all tactics fail.

set_option linter.unusedTactic false in
set_option linter.unreachableTactic false in
theorem first_example (x : Nat) : x = x := by
  first | simp | rfl | sorry

repeat

The repeat tactic applies a given tactic repeatedly until it fails to make progress. It is useful for exhaustively applying a simplification or introduction rule.

set_option linter.unusedTactic false in
set_option linter.unreachableTactic false in
/-- `repeat` applies a tactic repeatedly -/
theorem repeat_example : True ∧ True ∧ True := by
  repeat constructor
  all_goals trivial

Tactic Combinators

The semicolon ; sequences tactics, while <;> applies the second tactic to all goals created by the first. These combinators help write concise proof scripts.

theorem combinator_example : (True ∧ True) ∧ (True ∧ True) := by
  constructor <;> (constructor <;> trivial)

Domain-Specific Tactics

interval_cases

The interval_cases tactic performs case analysis when a variable is known to lie in a finite range. Given bounds on a natural number, it generates a case for each possible value.

theorem interval_cases_example (n : ℕ) (h : n ≤ 2) : n = 0 ∨ n = 1 ∨ n = 2 := by
  interval_cases n
  · left; rfl
  · right; left; rfl
  · right; right; rfl

fin_cases

The fin_cases tactic performs case analysis on elements of a finite type like Fin n or Bool, creating a subgoal for each possible value of the type.

theorem fin_cases_example (i : Fin 3) : i.val < 3 := by
  fin_cases i <;> simp

Working with Quantifiers

Existential Quantifiers

Existential statements claim that some witness exists satisfying a property. To prove one, use use to provide the witness. To use an existential hypothesis, use obtain to extract the witness and its property.

theorem exists_intro : ∃ n : Nat, n > 10 := by
  exact ⟨42, by simp⟩

theorem exists_elim (h : ∃ n : Nat, n > 10) : True := by
  obtain ⟨n, hn⟩ := h
  trivial

Universal Quantifiers

Universal statements claim a property holds for all values. To prove one, use intro to introduce an arbitrary value. To use a universal hypothesis, use specialize or simply apply it to a specific value.

theorem forall_intro : ∀ x : Nat, x + 0 = x := by
  intro x
  simp

theorem forall_elim (h : ∀ x : Nat, x + 0 = x) : 5 + 0 = 5 := by
  exact h 5

Congruence and Subtyping

Every type system makes tradeoffs between precision and convenience. A function that takes Nat will accept any natural number, including zero, even when zero would cause a division error three stack frames later. A function that takes Int cannot directly accept a Nat without explicit conversion, even though every natural number is an integer. The constraints are either too loose or the syntax is too verbose. Pick your frustration.

Lean provides tools to fix both problems. Subtypes let you carve out precisely the values you want: positive numbers, non-empty lists, valid indices. The constraint travels with the value, enforced by the type system. Coercions let the compiler insert safe conversions automatically, so you can pass a Nat where an Int is expected without ceremony. These mechanisms together give you precise types with ergonomic syntax.

Types are sets with attitude. A Nat is not just the natural numbers but the natural numbers with all their operations and laws. A subtype narrows this: the positive natural numbers are the naturals with an extra constraint, a proof obligation that travels with every value. This is refinement: taking a broad type and carving out the subset you actually need.

The other direction is coercion. When Lean expects an Int but you give it a Nat, something must convert between them. Explicit casts are tedious. Coercions make the compiler do the work, inserting conversions automatically where safe. The result is code that looks like it mixes types freely but maintains type safety underneath.

Congruence handles the third concern: propagating equality. If a = b, then f a = f b. This seems obvious, but the compiler needs to be told. The congr tactic applies this principle systematically, breaking equality goals into their components.

Subtypes

A subtype refines an existing type with a predicate. Values of a subtype carry both the data and a proof that the predicate holds.

def Positive := { n : Nat // n > 0 }

def five : Positive := ⟨5, by decide⟩

#eval five.val  -- 5

def NonEmpty (α : Type) := { xs : List α // xs ≠ [] }

def singletonList : NonEmpty Nat := ⟨[42], by decide⟩

Working with Subtypes

Functions on subtypes can access the underlying value and use the proof to ensure operations are safe.

def doublePositive (p : Positive) : Positive :=
  ⟨p.val * 2, Nat.mul_pos p.property (by decide)⟩

#eval (doublePositive five).val  -- 10

def addPositive (a b : Positive) : Positive :=
  ⟨a.val + b.val, Nat.add_pos_left a.property b.val⟩

def safeHead {α : Type} (xs : NonEmpty α) : α :=
  match h : xs.val with
  | x :: _ => x
  | [] => absurd h xs.property

Refinement Types

Subtypes let you express precise invariants. Common patterns include bounded numbers, non-zero values, and values satisfying specific properties.

def Even := { n : Nat // n % 2 = 0 }
def Odd := { n : Nat // n % 2 = 1 }

def zero' : Even := ⟨0, rfl⟩
def two' : Even := ⟨2, rfl⟩
def one' : Odd := ⟨1, rfl⟩
def three' : Odd := ⟨3, rfl⟩

def BoundedNat (lo hi : Nat) := { n : Nat // lo ≤ n ∧ n < hi }

def inRange : BoundedNat 0 10 := ⟨5, by omega⟩

Basic Coercions

Coercions allow automatic type conversion. When a value of type A is expected but you provide type B, Lean looks for a coercion from B to A.

instance : Coe Positive Nat where
  coe p := p.val

def useAsNat (p : Positive) : Nat :=
  p + 10

#eval useAsNat five  -- 15

instance {α : Type} : Coe (NonEmpty α) (List α) where
  coe xs := xs.val

def listLength (xs : NonEmpty Nat) : Nat :=
  xs.val.length

#eval listLength singletonList  -- 1

Coercion Chains

Coercions can chain together. If there is a coercion from A to B and from B to C, Lean can automatically convert from A to C.

structure Meters where
  val : Float
  deriving Repr

structure Kilometers where
  val : Float
  deriving Repr

instance : Coe Meters Float where
  coe m := m.val

instance : Coe Kilometers Meters where
  coe km := ⟨km.val * 1000⟩

def addDistances (a : Meters) (b : Kilometers) : Meters :=
  ⟨a.val + (b : Meters).val⟩

#eval addDistances ⟨500⟩ ⟨1.5⟩  -- Meters.mk 2000.0

Function Coercions

CoeFun allows values to be used as functions. This is useful for callable objects and function-like structures.

instance : CoeFun Positive (fun _ => Nat → Nat) where
  coe p := fun n => p.val + n

#eval five 10  -- 15

structure Adder where
  amount : Nat

instance : CoeFun Adder (fun _ => Nat → Nat) where
  coe a := fun n => n + a.amount

def addFive : Adder := ⟨5⟩

#eval addFive 10  -- 15

Sort Coercions

CoeFun coerces values to functions, allowing structures to behave like callable objects.

structure Predicate' (α : Type) where
  test : α → Bool

instance {α : Type} : CoeFun (Predicate' α) (fun _ => α → Bool) where
  coe p := p.test

def isEven' : Predicate' Nat := ⟨fun n => n % 2 == 0⟩

#eval isEven' 4   -- true
#eval isEven' 5   -- false

Congruence

The congr tactic applies congruence reasoning: if you need to prove f a = f b and you know a = b, congruence can close the goal.

example (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
  congr

example (f : Nat → Nat) (a b : Nat) (h : a = b) : f a = f b := by
  congr

example (a b c d : Nat) (h1 : a = b) (h2 : c = d) : a + c = b + d := by
  congr <;> assumption

Congruence with Multiple Arguments

Congruence works with functions of multiple arguments, generating subgoals for each argument that differs.

example (f : Nat → Nat → Nat) (a b c d : Nat)
    (h1 : a = c) (h2 : b = d) : f a b = f c d := by
  rw [h1, h2]

example (xs ys : List Nat) (h : xs = ys) : xs.length = ys.length := by
  rw [h]

Substitution and Rewriting

The subst tactic substitutes equal terms, and rw rewrites using equalities. These are fundamental tactics for equality reasoning.

example (a b : Nat) (h : a = b) : a * a = b * b := by
  subst h
  rfl

example (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  rw [h1, h2]

example (a b : Nat) (h : a = b) (f : Nat → Nat) : f a = f b := by
  rw [h]

Type Conversion

Lean provides automatic coercion between numeric types and explicit conversion functions.

example (n : Nat) : Int := n

def natToInt (n : Nat) : Int := n

#eval natToInt 42  -- 42

def stringToNat? (s : String) : Option Nat :=
  s.toNat?

#eval stringToNat? "123"   -- some 123
#eval stringToNat? "abc"   -- none

Decidable Propositions

A proposition is decidable if there is an algorithm to determine its truth. This enables using propositions in if-expressions.

def isPositive (n : Int) : Decidable (n > 0) :=
  if h : n > 0 then isTrue h else isFalse h

def checkPositive (n : Int) : String :=
  if n > 0 then "positive" else "not positive"

#eval checkPositive 5    -- "positive"
#eval checkPositive (-3) -- "not positive"

def decideEqual (a b : Nat) : Decidable (a = b) :=
  if h : a = b then isTrue h else isFalse h

Type Class Inheritance

Type classes can extend other classes, creating inheritance hierarchies. A type implementing a subclass automatically implements its parent classes.

class Animal (α : Type) where
  speak : α → String

class Dog (α : Type) extends Animal α where
  fetch : α → String

structure Labrador where
  name : String

instance : Animal Labrador where
  speak lab := s!"{lab.name} says woof!"

instance : Dog Labrador where
  speak lab := s!"{lab.name} says woof!"
  fetch lab := s!"{lab.name} fetches the ball!"

def makeSpeak {α : Type} [Animal α] (a : α) : String :=
  Animal.speak a

def rex : Labrador := ⟨"Rex"⟩

#eval makeSpeak rex  -- "Rex says woof!"
#eval Dog.fetch rex  -- "Rex fetches the ball!"

Universe Polymorphism

Types live in a hierarchy of universes. Type is shorthand for Type 0, and each type has a type in the next universe.

def useType (α : Type) : Type := List α

def useType' (α : Type 1) : Type 1 := List α

universe u

def polyType (α : Type u) : Type u := List α

#check (Nat : Type)
#check (Type : Type 1)
#check (Type 1 : Type 2)

Structure Extension

Structures can extend other structures, inheriting their fields while adding new ones.

structure Shape where
  name : String

structure Circle extends Shape where
  radius : Float

structure Rectangle extends Shape where
  width : Float
  height : Float

def myCircle : Circle := { name := "unit circle", radius := 1.0 }
def myRect : Rectangle := { name := "square", width := 2.0, height := 2.0 }

#eval myCircle.name    -- "unit circle"
#eval myCircle.radius  -- 1.0

Nominal vs Structural Typing

Lean uses nominal typing: two types with identical structures are still distinct types. This prevents accidental mixing of values with different semantics. A UserId and a ProductId might both be integers underneath, but you cannot accidentally pass one where the other is expected. The bug where you deleted user 47 because product 47 was out of stock becomes a compile error. Nominal typing is the formal version of “label your variables.”

structure Meters' where
  val : Float

structure Seconds where
  val : Float

def distance : Meters' := ⟨100.0⟩
def time : Seconds := ⟨10.0⟩

abbrev Speed := Float

def calcSpeed (d : Meters') (t : Seconds) : Speed :=
  d.val / t.val

#eval calcSpeed distance time  -- 10.0

Classic Proofs

Euclid’s proof of the infinitude of primes has survived for over two thousand years. It requires no calculus, no abstract algebra, only the observation that \(n! + 1\) shares no prime factors with \(n!\). Yet formalizing this argument reveals hidden assumptions: that every number greater than one has a prime divisor, that primes are well-defined, that contradiction is a valid proof technique. The proofs here are not difficult by mathematical standards, but they exercise the full machinery of dependent types, tactics, and theorem proving. They are finger exercises, etudes that build fluency before attempting harder compositions.

The Infinitude of Primes

Traditional Proof

Theorem. There exist infinitely many prime numbers.

The proof proceeds in two parts. First, we establish that every integer \(n \geq 2\) has a prime divisor. If \(n\) is prime, it divides itself. Otherwise, \(n\) has a proper divisor \(m\) with \(1 < m < n\). By strong induction, \(m\) has a prime divisor \(p\), and since \(p \mid m\) and \(m \mid n\), we have \(p \mid n\).

Second, we show that for any \(n\), there exists a prime \(p > n\). Consider \(N = n! + 1\). Since \(n! \geq 1\), we have \(N \geq 2\), so \(N\) has a prime divisor \(p\). We claim \(p > n\). Suppose for contradiction that \(p \leq n\). Then \(p \mid n!\) since \(n!\) contains all factors from 1 to \(n\). Since \(p \mid N\) and \(p \mid n!\), we have \(p \mid (N - n!) = 1\). But \(p \geq 2\), so \(p \nmid 1\), a contradiction. Therefore \(p > n\). QED

Lean Formalization

The Lean proof mirrors this structure exactly. The theorem exists_prime_factor establishes the first part by case analysis and strong induction (via termination_by). The main theorem InfinitudeOfPrimes constructs \(n! + 1\), extracts a prime divisor, then derives a contradiction using dvd_factorial and Nat.dvd_add_right.

theorem exists_prime_factor (n : ℕ) (hn : 2 ≤ n) : ∃ p, Nat.Prime p ∧ p ∣ n := by
  by_cases hp : Nat.Prime n
  · exact ⟨n, hp, dvd_refl n⟩
  · obtain ⟨m, hm_dvd, hm_ne_one, hm_ne_n⟩ := exists_dvd_of_not_prime hn hp
    have hm_lt : m < n := lt_of_le_of_ne (Nat.le_of_dvd (by omega) hm_dvd) hm_ne_n
    have hm_ge : 2 ≤ m := by
      rcases m with _ | _ | m <;> simp_all
    obtain ⟨p, hp_prime, hp_dvd⟩ := exists_prime_factor m hm_ge
    exact ⟨p, hp_prime, dvd_trans hp_dvd hm_dvd⟩
termination_by n

theorem factorial_pos (n : ℕ) : 0 < n ! := Nat.factorial_pos n

theorem dvd_factorial {k n : ℕ} (hk : 0 < k) (hkn : k ≤ n) : k ∣ n ! :=
  Nat.dvd_factorial hk hkn

theorem InfinitudeOfPrimes : ∀ n, ∃ p > n, Nat.Prime p := by
  intro n
  have hN : 2 ≤ n ! + 1 := by
    have hfact : 0 < n ! := factorial_pos n
    omega
  obtain ⟨p, hp_prime, hp_dvd⟩ := exists_prime_factor (n ! + 1) hN
  refine ⟨p, ?_, hp_prime⟩
  by_contra hle
  push_neg at hle
  have hp_pos : 0 < p := hp_prime.pos
  have hdvd_fact : p ∣ n ! := dvd_factorial hp_pos hle
  have hdvd_one : p ∣ 1 := (Nat.dvd_add_right hdvd_fact).mp hp_dvd
  have hp_le_one : p ≤ 1 := Nat.le_of_dvd one_pos hdvd_one
  have hp_ge_two : 2 ≤ p := hp_prime.two_le
  omega

Irrationality of the Square Root of Two

Traditional Proof

Theorem. \(\sqrt{2}\) is irrational.

The key lemma is: if \(n^2\) is even, then \(n\) is even. We prove the contrapositive. Suppose \(n\) is odd, so \(n = 2k + 1\) for some integer \(k\). Then \(n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2 + 2k) + 1\), which is odd. Therefore, if \(n^2\) is even, \(n\) cannot be odd, so \(n\) must be even.

Now suppose \(\sqrt{2} = p/q\) where \(p, q\) are integers with \(q \neq 0\) and \(\gcd(p,q) = 1\). Then \(2q^2 = p^2\), so \(p^2\) is even, hence \(p\) is even. Write \(p = 2k\). Then \(2q^2 = 4k^2\), so \(q^2 = 2k^2\), meaning \(q^2\) is even, hence \(q\) is even. But then \(\gcd(p,q) \geq 2\), contradicting our assumption. QED

Lean Formalization

The Lean code proves the parity lemmas explicitly. The theorem sq_odd_of_odd shows that squaring an odd number yields an odd number by expanding \((2k+1)^2\). The theorem even_of_sq_even proves the contrapositive: assuming \(n\) is odd leads to \(n^2\) being odd, which contradicts \(n^2\) being even. The final irrationality result follows from Mathlib’s irrational_sqrt_two, which uses this same parity argument internally.

theorem sq_even_of_even {n : ℤ} (h : Even n) : Even (n ^ 2) := by
  obtain ⟨k, hk⟩ := h
  exact ⟨2 * k ^ 2, by rw [hk]; ring⟩

theorem sq_odd_of_odd {n : ℤ} (h : Odd n) : Odd (n ^ 2) := by
  obtain ⟨k, hk⟩ := h
  exact ⟨2 * k ^ 2 + 2 * k, by rw [hk]; ring⟩

theorem even_of_sq_even {n : ℤ} (h : Even (n ^ 2)) : Even n := by
  by_contra hodd
  rw [Int.not_even_iff_odd] at hodd
  have hsq_odd : Odd (n ^ 2) := sq_odd_of_odd hodd
  obtain ⟨k, hk⟩ := hsq_odd
  obtain ⟨m, hm⟩ := h
  omega

theorem sqrt2_irrational : Irrational (Real.sqrt 2) := irrational_sqrt_two

Euclid’s Lemma

Traditional Proof

Theorem (Euclid’s Lemma). If a prime \(p\) divides a product \(ab\), then \(p \mid a\) or \(p \mid b\).

Let \(p\) be prime with \(p \mid ab\). Since \(p\) is prime, the only divisors of \(p\) are 1 and \(p\). Therefore \(\gcd(p, a) \in \{1, p\}\).

Case 1: If \(\gcd(p, a) = p\), then \(p \mid a\) and we are done.

Case 2: If \(\gcd(p, a) = 1\), we show \(p \mid b\). Consider \(\gcd(pb, ab)\). Since \(p \mid pb\) and \(p \mid ab\), we have \(p \mid \gcd(pb, ab)\). By the property \(\gcd(pb, ab) = b \cdot \gcd(p, a) = b \cdot 1 = b\), we conclude \(p \mid b\). QED

Lean Formalization

The Lean proof follows this GCD-based argument directly. It case-splits on whether \(\gcd(p, a) = 1\) or \(\gcd(p, a) > 1\). In the coprime case, it uses Nat.gcd_mul_right to establish that \(\gcd(pb, ab) = b\), then shows \(p\) divides this GCD. In the non-coprime case, since \(p\) is prime, \(\gcd(p, a) = p\), so \(p \mid a\).

theorem euclid_lemma {a b p : ℕ} (hp : Nat.Prime p) (h : p ∣ a * b) :
    p ∣ a ∨ p ∣ b := by
  rcases Nat.eq_or_lt_of_le (Nat.one_le_iff_ne_zero.mpr (Nat.gcd_pos_of_pos_left a hp.pos).ne')
    with hcop | hncop
  · right
    have key : p ∣ Nat.gcd (p * b) (a * b) := Nat.dvd_gcd (dvd_mul_right p b) h
    rwa [Nat.gcd_mul_right, hcop.symm, one_mul] at key
  · left
    have hdvd := Nat.gcd_dvd_left p a
    rcases hp.eq_one_or_self_of_dvd _ hdvd with h1 | hp_eq
    · omega
    · have : p ∣ a := by rw [← hp_eq]; exact Nat.gcd_dvd_right p a
      exact this

theorem prime_divides_product_iff {p a b : ℕ} (hp : Nat.Prime p) :
    p ∣ a * b ↔ p ∣ a ∨ p ∣ b :=
  ⟨euclid_lemma hp, fun h => h.elim (dvd_mul_of_dvd_left · b) (dvd_mul_of_dvd_right · a)⟩

The Binomial Theorem

Traditional Proof

Theorem (Binomial Theorem). For real numbers \(x, y\) and natural number \(n\): \[(x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}\]

The proof proceeds by induction. The base case \(n = 0\) gives \((x+y)^0 = 1 = \binom{0}{0}x^0y^0\). For the inductive step, we expand \((x+y)^{n+1} = (x+y)(x+y)^n\), distribute, and apply Pascal’s identity \(\binom{n}{k} + \binom{n}{k-1} = \binom{n+1}{k}\) to combine terms.

As concrete examples: \((x+1)^2 = x^2 + 2x + 1\) and \((x+1)^3 = x^3 + 3x^2 + 3x + 1\). QED

Lean Formalization

Mathlib provides add_pow, which establishes the binomial theorem via the same inductive argument. Our binomial_theorem reformulates this in the standard notation. The specific cases binomial_two and binomial_three are verified by the ring tactic, which normalizes polynomial expressions.

theorem binomial_theorem (x y : ℝ) (n : ℕ) :
    (x + y) ^ n = (Finset.range (n + 1)).sum fun k => ↑(n.choose k) * x ^ k * y ^ (n - k) := by
  rw [add_pow]
  apply Finset.sum_congr rfl
  intros k _
  ring

theorem binomial_two (x : ℝ) : (x + 1) ^ 2 = x ^ 2 + 2 * x + 1 := by ring

theorem binomial_three (x : ℝ) : (x + 1) ^ 3 = x ^ 3 + 3 * x ^ 2 + 3 * x + 1 := by ring

example : (2 : ℝ) ^ 5 = 32 := by norm_num

Fibonacci Numbers

Traditional Proof

Definition. The Fibonacci sequence: \(F_0 = 0\), \(F_1 = 1\), \(F_{n+2} = F_{n+1} + F_n\). The sequence that appears everywhere: rabbit populations, sunflower spirals, financial markets, bad interview questions.

Theorem. \(\sum_{k=0}^{n-1} F_k + 1 = F_{n+1}\)

Base case (\(n = 0\)): The empty sum equals 0, and \(0 + 1 = 1 = F_1\).

Inductive step: Assume \(\sum_{k=0}^{n-1} F_k + 1 = F_{n+1}\). Then: \[\sum_{k=0}^{n} F_k + 1 = \left(\sum_{k=0}^{n-1} F_k + 1\right) + F_n = F_{n+1} + F_n = F_{n+2}\] which equals \(F_{(n+1)+1}\). QED

Lean Formalization

The Lean proof follows the same structure. The definition fib uses pattern matching on 0, 1, and \(n+2\). The theorem fib_sum proceeds by induction: the base case simplifies directly, and the inductive step uses Finset.sum_range_succ to split off the last term, applies the inductive hypothesis, then uses the recurrence relation.

def fib : ℕ → ℕ
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n

theorem fib_add_two (n : ℕ) : fib (n + 2) = fib (n + 1) + fib n := rfl

theorem fib_pos {n : ℕ} (h : 0 < n) : 0 < fib n := by
  cases n with
  | zero => contradiction
  | succ n =>
    cases n with
    | zero => decide
    | succ m =>
      rw [fib_add_two]
      exact Nat.add_pos_left (fib_pos (Nat.zero_lt_succ _)) _

theorem fib_sum (n : ℕ) : (Finset.range n).sum fib + 1 = fib (n + 1) := by
  induction n with
  | zero => simp [fib]
  | succ n ih =>
    rw [Finset.sum_range_succ, add_assoc, add_comm (fib n) 1, ← add_assoc, ih]
    rfl

The Pigeonhole Principle

Traditional Proof

Theorem (Pigeonhole Principle). Let \(f : A \to B\) be a function between finite sets with \(|A| > |B|\). Then \(f\) is not injective: there exist distinct \(a_1, a_2 \in A\) with \(f(a_1) = f(a_2)\).

Suppose for contradiction that \(f\) is injective, meaning \(f(a_1) = f(a_2)\) implies \(a_1 = a_2\). An injective function from \(A\) to \(B\) implies \(|A| \leq |B|\), since distinct elements of \(A\) map to distinct elements of \(B\). But we assumed \(|A| > |B|\), a contradiction. Therefore \(f\) is not injective, so there exist distinct \(a_1 \neq a_2\) with \(f(a_1) = f(a_2)\). QED

Corollary. In any group of \(n > 365\) people, at least two share a birthday.

Lean Formalization

The Lean proof mirrors this argument precisely. It assumes by contradiction (by_contra hinj) that no two distinct elements collide. The push_neg tactic transforms this into: for all \(a_1, a_2\), if \(a_1 \neq a_2\) then \(f(a_1) \neq f(a_2)\). This is exactly injectivity. We then apply Fintype.card_le_of_injective, which states that an injective function implies \(|A| \leq |B|\), contradicting our hypothesis \(|B| < |A|\).

theorem pigeonhole {α β : Type*} [Fintype α] [Fintype β]
    (f : α → β) (h : Fintype.card β < Fintype.card α) :
    ∃ a₁ a₂ : α, a₁ ≠ a₂ ∧ f a₁ = f a₂ := by
  by_contra hinj
  push_neg at hinj
  have inj : Function.Injective f := fun a₁ a₂ heq =>
    Classical.byContradiction fun hne => hinj a₁ a₂ hne heq
  exact Nat.not_lt.mpr (Fintype.card_le_of_injective f inj) h

theorem birthday_pigeonhole {n : ℕ} (hn : 365 < n) (birthday : Fin n → Fin 365) :
    ∃ i j : Fin n, i ≠ j ∧ birthday i = birthday j := by
  have hcard : Fintype.card (Fin 365) < Fintype.card (Fin n) := by simp [hn]
  exact pigeonhole birthday hcard

Divisibility

Traditional Proof

Definition. We write \(a \mid b\) if there exists \(k\) such that \(b = ak\).

Theorem. Divisibility satisfies:

  1. \(a \mid a\) (reflexivity)
  2. \(a \mid b \land b \mid c \Rightarrow a \mid c\) (transitivity)
  3. \(a \mid b \land a \mid c \Rightarrow a \mid (b + c)\)
  4. \(a \mid b \Rightarrow a \mid bc\)

Proof. (1) \(a = a \cdot 1\), so take \(k = 1\).

(2) If \(b = ak\) and \(c = bm\), then \(c = (ak)m = a(km)\).

(3) If \(b = ak\) and \(c = am\), then \(b + c = ak + am = a(k + m)\).

(4) If \(b = ak\), then \(bc = (ak)c = a(kc)\). QED

Lean Formalization

Each Lean proof constructs the witness \(k\) explicitly. The obtain tactic extracts the witnesses from divisibility hypotheses, then we provide the new witness as an anonymous constructor ⟨_, _⟩. The equality proofs use rw to substitute and mul_assoc or mul_add to rearrange.

example : 3 ∣ 12 := ⟨4, rfl⟩

example : ¬5 ∣ 12 := by decide

theorem dvd_refl' (n : ℕ) : n ∣ n := ⟨1, (mul_one n).symm⟩

theorem dvd_trans' {a b c : ℕ} (hab : a ∣ b) (hbc : b ∣ c) : a ∣ c := by
  obtain ⟨k, hk⟩ := hab
  obtain ⟨m, hm⟩ := hbc
  exact ⟨k * m, by rw [hm, hk, mul_assoc]⟩

theorem dvd_add' {a b c : ℕ} (hab : a ∣ b) (hac : a ∣ c) : a ∣ b + c := by
  obtain ⟨k, hk⟩ := hab
  obtain ⟨m, hm⟩ := hac
  exact ⟨k + m, by rw [hk, hm, mul_add]⟩

theorem dvd_mul_right' (a b : ℕ) : a ∣ a * b := ⟨b, rfl⟩

theorem dvd_mul_left' (a b : ℕ) : b ∣ a * b := ⟨a, (mul_comm b a).symm⟩

Software Verification

The promise of theorem provers extends beyond mathematics. We can verify that software does what we claim it does. This chapter explores three approaches to software verification: an intrinsically-typed interpreter where type safety is baked into the structure, Conway’s Game of Life where we prove that gliders glide and blinkers blink, and finally verification-guided development where we transcribe Rust code into Lean and prove properties that transfer back to the production implementation.

Intrinsically-Typed Interpreters

The standard approach to building interpreters involves two phases. First, parse text into an untyped abstract syntax tree. Second, run a type checker that rejects malformed programs. This works, but the interpreter must still handle the case where a program passes the type checker but evaluates to nonsense. The runtime carries the burden of the type system’s failure modes. It is like a bouncer who checks IDs at the door but still has to deal with troublemakers inside.

Intrinsically-typed interpreters refuse to play this game. The abstract syntax tree itself encodes typing judgments. An ill-typed program cannot be constructed. The type system statically excludes runtime type errors, not by checking them at runtime, but by making them unrepresentable. The bouncer is replaced by architecture: there is no door for troublemakers to enter.

Consider a small expression language with natural numbers, booleans, arithmetic, and conditionals. We start by defining the types our language supports and a denotation function that maps them to Lean types.

inductive Ty where
  | nat  : Ty
  | bool : Ty
  deriving Repr, DecidableEq

@[reducible] def Ty.denote : Ty → Type
  | .nat  => Nat
  | .bool => Bool

The denote function is key. It interprets our object-level types (Ty) as meta-level types (Type). When our expression language says something has type nat, we mean it evaluates to a Lean Nat. When it says bool, we mean a Lean Bool. This type-level interpretation function is what makes the entire approach work.

Expressions

The expression type indexes over the result type. Each constructor precisely constrains which expressions can be built and what types they produce.

inductive Expr : Ty → Type where
  | nat  : Nat → Expr .nat
  | bool : Bool → Expr .bool
  | add  : Expr .nat → Expr .nat → Expr .nat
  | mul  : Expr .nat → Expr .nat → Expr .nat
  | lt   : Expr .nat → Expr .nat → Expr .bool
  | eq   : Expr .nat → Expr .nat → Expr .bool
  | and  : Expr .bool → Expr .bool → Expr .bool
  | or   : Expr .bool → Expr .bool → Expr .bool
  | not  : Expr .bool → Expr .bool
  | ite  : {t : Ty} → Expr .bool → Expr t → Expr t → Expr t

Every constructor documents its typing rule. The add constructor requires both arguments to be natural number expressions and produces a natural number expression. The ite constructor requires a boolean condition and two branches of matching type.

This encoding makes ill-typed expressions unrepresentable. You cannot write add (nat 1) (bool true) because the types do not align. The Lean type checker rejects such expressions before they exist.

/-
def bad : Expr .nat := .add (.nat 1) (.bool true)
-- Error: type mismatch
-/

Evaluation

The evaluator maps expressions to their denotations. Because expressions are intrinsically typed, the evaluator is total. It never fails, never throws exceptions, never encounters impossible cases. Every pattern match is exhaustive.

def Expr.eval : {t : Ty} → Expr t → t.denote
  | _, .nat n      => n
  | _, .bool b     => b
  | _, .add e₁ e₂  => e₁.eval + e₂.eval
  | _, .mul e₁ e₂  => e₁.eval * e₂.eval
  | _, .lt e₁ e₂   => e₁.eval < e₂.eval
  | _, .eq e₁ e₂   => e₁.eval == e₂.eval
  | _, .and e₁ e₂  => e₁.eval && e₂.eval
  | _, .or e₁ e₂   => e₁.eval || e₂.eval
  | _, .not e      => !e.eval
  | _, .ite c t e  => if c.eval then t.eval else e.eval

The return type t.denote varies with the expression’s type index. A natural number expression evaluates to Nat. A boolean expression evaluates to Bool. This dependent return type is what makes the evaluator type-safe by construction.

def ex1 : Expr .nat := .add (.nat 2) (.nat 3)
def ex2 : Expr .bool := .lt (.nat 2) (.nat 3)
def ex3 : Expr .nat := .ite (.lt (.nat 2) (.nat 3)) (.nat 10) (.nat 20)
def ex4 : Expr .nat := .mul (.add (.nat 2) (.nat 3)) (.nat 4)

#eval ex1.eval  -- 5
#eval ex2.eval  -- true
#eval ex3.eval  -- 10
#eval ex4.eval  -- 20

Verified Optimization

Interpreters become interesting when we transform programs. Compilers do this constantly: dead code elimination, loop unrolling, strength reduction. Each transformation promises to preserve meaning while improving performance. But how do we know the promise is kept? A constant folder simplifies expressions by evaluating constant subexpressions at compile time. Adding two literal numbers produces a literal. Conditionals with constant conditions eliminate the untaken branch.

def Expr.constFold : {t : Ty} → Expr t → Expr t
  | _, .nat n => .nat n
  | _, .bool b => .bool b
  | _, .add e₁ e₂ =>
    match e₁.constFold, e₂.constFold with
    | .nat n, .nat m => .nat (n + m)
    | e₁', e₂' => .add e₁' e₂'
  | _, .mul e₁ e₂ =>
    match e₁.constFold, e₂.constFold with
    | .nat n, .nat m => .nat (n * m)
    | e₁', e₂' => .mul e₁' e₂'
  | _, .lt e₁ e₂ => .lt e₁.constFold e₂.constFold
  | _, .eq e₁ e₂ => .eq e₁.constFold e₂.constFold
  | _, .and e₁ e₂ => .and e₁.constFold e₂.constFold
  | _, .or e₁ e₂ => .or e₁.constFold e₂.constFold
  | _, .not e => .not e.constFold
  | _, .ite c t e =>
    match c.constFold with
    | .bool true => t.constFold
    | .bool false => e.constFold
    | c' => .ite c' t.constFold e.constFold

The optimization preserves types. If e : Expr t, then e.constFold : Expr t. The type indices flow through unchanged. The type system enforces this statically.

But type preservation is a weak property. We want semantic preservation: the optimized program computes the same result as the original. This requires a proof.

theorem constFold_correct : ∀ {t : Ty} (e : Expr t), e.constFold.eval = e.eval := by
  intro t e
  induction e with
  | nat n => rfl
  | bool b => rfl
  | add e₁ e₂ ih₁ ih₂ =>
    simp only [Expr.constFold, Expr.eval]
    cases he₁ : e₁.constFold <;> cases he₂ : e₂.constFold <;>
      simp only [Expr.eval, ← ih₁, ← ih₂, he₁, he₂]
  | mul e₁ e₂ ih₁ ih₂ =>
    simp only [Expr.constFold, Expr.eval]
    cases he₁ : e₁.constFold <;> cases he₂ : e₂.constFold <;>
      simp only [Expr.eval, ← ih₁, ← ih₂, he₁, he₂]
  | lt e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | eq e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | and e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | or e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | not e ih => simp only [Expr.constFold, Expr.eval, ih]
  | ite c t e ihc iht ihe =>
    simp only [Expr.constFold, Expr.eval]
    cases hc : c.constFold <;> simp only [Expr.eval, ← ihc, ← iht, ← ihe, hc]
    case bool b => cases b <;> simp [iht, ihe]

The theorem states that for any expression, evaluating the constant-folded expression yields the same result as evaluating the original. The proof proceeds by structural induction on the expression. Most cases follow directly from the induction hypotheses.

Conway’s Game of Life

Before we tackle the challenge of connecting proofs to production code, let us take a detour through cellular automata. Conway’s Game of Life is a zero-player game that evolves on an infinite grid. Each cell is either alive or dead. At each step, cells follow simple rules: a live cell with two or three neighbors survives, a dead cell with exactly three neighbors becomes alive, and everything else dies. From these rules emerges startling complexity: oscillators, spaceships, and patterns that compute arbitrary functions.

The Game of Life is an excellent verification target because we can prove properties about specific patterns without worrying about the infinite grid. The challenge is that the true Game of Life lives on an unbounded plane, which we cannot represent directly. We need a finite approximation that preserves the local dynamics.

The standard solution is a toroidal grid. Imagine taking a rectangular grid and gluing the top edge to the bottom edge, forming a cylinder. Then glue the left edge to the right edge, forming a torus. Geometrically, this is the surface of a donut. A cell at the right edge has its eastern neighbor on the left edge. A cell at the top has its northern neighbor at the bottom. Every cell has exactly eight neighbors, with no special boundary cases.

This topology matters for verification. On a bounded grid with walls, edge cells would have fewer neighbors, changing their evolution rules. We would need separate logic for corners, edges, and interior cells. The toroidal topology eliminates this complexity: the neighbor-counting function is uniform across all cells. More importantly, patterns that fit within the grid and do not interact with their wrapped-around selves behave exactly as they would on the infinite plane. A 5x5 blinker on a 10x10 torus evolves identically to a blinker on the infinite grid, because the pattern never grows large enough to meet itself coming around the other side.

abbrev Grid := Array (Array Bool)

def Grid.mk (n m : Nat) (f : Fin n → Fin m → Bool) : Grid :=
  Array.ofFn fun i => Array.ofFn fun j => f i j

def Grid.get (g : Grid) (i j : Nat) : Bool :=
  if h₁ : i < g.size then
    let row := g[i]
    if h₂ : j < row.size then row[j] else false
  else false

def Grid.dead (n m : Nat) : Grid :=
  Array.replicate n (Array.replicate m false)

def Grid.rows (g : Grid) : Nat := g.size
def Grid.cols (g : Grid) : Nat := if h : 0 < g.size then g[0].size else 0

The grid representation uses arrays of arrays, with accessor functions that handle boundary conditions. The countNeighbors function implements toroidal wrapping by computing indices modulo the grid dimensions.

def Grid.countNeighbors (g : Grid) (i j : Nat) : Nat :=
  let n := g.rows
  let m := g.cols
  let deltas : List (Int × Int) :=
    [(-1, -1), (-1, 0), (-1, 1),
     (0, -1),           (0, 1),
     (1, -1),  (1, 0),  (1, 1)]
  deltas.foldl (fun acc (di, dj) =>
    let ni := (((i : Int) + di + n) % n).toNat
    let nj := (((j : Int) + dj + m) % m).toNat
    if g.get ni nj then acc + 1 else acc) 0

The step function applies Conway’s rules to every cell. The pattern matching encodes the survival conditions directly: a live cell survives with 2 or 3 neighbors, a dead cell is born with exactly 3 neighbors.

def Grid.step (g : Grid) : Grid :=
  let n := g.rows
  let m := g.cols
  Array.ofFn fun (i : Fin n) => Array.ofFn fun (j : Fin m) =>
    let neighbors := g.countNeighbors i.val j.val
    let alive := g.get i.val j.val
    match alive, neighbors with
    | true, 2 => true
    | true, 3 => true
    | false, 3 => true
    | _, _ => false

def Grid.stepN (g : Grid) : Nat → Grid
  | 0 => g
  | k + 1 => (g.step).stepN k

Now for the fun part. We can define famous patterns and prove properties about them. The blinker is a period-2 oscillator: three cells in a row that flip between horizontal and vertical orientations. The block is a 2x2 square that never changes. And the glider, the star of our show, is a pattern that translates diagonally across the grid.

def blinker : Grid := Grid.mk 5 5 fun i j =>
  (i.val, j.val) ∈ [(1, 2), (2, 2), (3, 2)]

def blinkerPhase2 : Grid := Grid.mk 5 5 fun i j =>
  (i.val, j.val) ∈ [(2, 1), (2, 2), (2, 3)]

def glider : Grid := Grid.mk 6 6 fun i j =>
  (i.val, j.val) ∈ [(0, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

def gliderTranslated : Grid := Grid.mk 6 6 fun i j =>
  (i.val, j.val) ∈ [(1, 2), (2, 3), (3, 1), (3, 2), (3, 3)]

def block : Grid := Grid.mk 4 4 fun i j =>
  (i.val, j.val) ∈ [(1, 1), (1, 2), (2, 1), (2, 2)]

Here is where theorem proving earns its keep. We can prove that the blinker oscillates with period 2, that the block is stable, and that the glider translates after exactly four generations.

theorem blinker_oscillates : blinker.step = blinkerPhase2 := by native_decide

theorem blinker_period_2 : blinker.stepN 2 = blinker := by native_decide

theorem glider_translates : glider.stepN 4 = gliderTranslated := by native_decide

theorem block_is_stable : block.step = block := by native_decide

The native_decide tactic does exhaustive computation. Lean evaluates the grid evolution and confirms the equality. The proof covers every cell in the grid across the specified number of generations.

Think about what we have accomplished. We have formally verified that a glider translates diagonally after four steps. Every cellular automaton enthusiast knows this empirically, having watched countless gliders march across their screens. But we have proven it. The glider must translate. It is not a bug that the pattern moves; it is a theorem. (Readers of Greg Egan’s Permutation City may appreciate that we are now proving theorems about the computational substrate in which his characters would live.)

We can also verify that the blinker conserves population, and observe that the glider does too:

def Grid.population (g : Grid) : Nat :=
  g.foldl (fun acc row => row.foldl (fun acc cell => if cell then acc + 1 else acc) acc) 0

#eval blinker.population
#eval blinker.step.population
#eval glider.population
#eval glider.step.population

For visualization, we can print the grids:

def Grid.toString (g : Grid) : String :=
  String.intercalate "\n" <|
    g.toList.map fun row =>
      String.mk <| row.toList.map fun cell =>
        if cell then '#' else '.'

#eval IO.println blinker.toString
#eval IO.println blinker.step.toString
#eval IO.println glider.toString
#eval IO.println (glider.stepN 4).toString

The Verification Gap

Here is the sobering reality check. We have a beautiful proof that gliders translate. The Lean model captures Conway’s rules precisely. The theorems are watertight. And yet, if someone writes a Game of Life implementation in Rust, our proofs say nothing about it.

The Rust implementation in examples/game-of-life/ implements the same rules. It has the same step function, the same neighbor counting, the same pattern definitions. Run it and you will see blinkers blink and gliders glide. But the Lean proofs do not transfer automatically. The Rust code might have off-by-one errors in the wrap-around logic. It might use different integer semantics. It might have subtle bugs in edge cases that our finite grid proofs never exercise.

This is the central problem of software verification. Writing proofs about mathematical models is satisfying but insufficient. Real software runs on real hardware with real bugs. The gap matters most where the stakes are highest: matching engines that execute trades, auction mechanisms that allocate resources, systems where a subtle bug can cascade into market-wide failures. How do we bridge the gap between a verified model and a production implementation?

Verification-Guided Development

The interpreter example shows verification within Lean. But real systems are written in languages like Rust, C, or Go. How do we connect a verified specification to a production implementation?

The answer comes from verification-guided development. The approach has three components. First, write the production implementation in your target language. Second, transcribe the core logic into Lean as a pure functional program. Third, prove properties about the Lean model; the proofs transfer to the production code because the transcription is exact. This technique was used by AWS to verify their Cedar policy language, and it applies wherever a functional core can be isolated from imperative scaffolding.

The transcription must be faithful. Every control flow decision in the Rust code must have a corresponding decision in the Lean model. Loops become recursion. Mutable state becomes accumulator parameters. Early returns become validity flags. When the transcription is exact, we can claim that the Lean proofs apply to the Rust implementation.

To verify this correspondence, both systems produce execution traces. A trace records the state after each operation. If the Rust implementation and the Lean model produce identical traces on all inputs, the proof transfers. For finite input spaces, we can verify this exhaustively. For infinite spaces, we use differential testing to gain confidence.

The Stack Machine

We demonstrate this technique with a stack machine, the fruit fly of computer science. Like the fruit fly in genetics, stack machines are simple enough to study exhaustively yet complex enough to exhibit interesting behavior. The machine has six operations: push a value, pop the top, add the top two values, multiply the top two values, duplicate the top, and swap the top two. Despite its simplicity, the stack machine has a rich state space and non-trivial invariants around underflow detection.

The Rust implementation lives in examples/stack-machine/. Here is the functional core:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct State {
    pub stack: Vec<i64>,
    pub valid: bool,
}

pub fn step(state: &State, op: Op) -> State {
    if !state.valid {
        return State { stack: state.stack.clone(), valid: false };
    }

    match op {
        Op::Push(n) => {
            let mut stack = state.stack.clone();
            stack.push(n);
            State { stack, valid: true }
        }
        Op::Pop => {
            if state.stack.is_empty() {
                State { stack: state.stack.clone(), valid: false }
            } else {
                let mut stack = state.stack.clone();
                stack.pop();
                State { stack, valid: true }
            }
        }
        Op::Add => {
            if state.stack.len() < 2 {
                State { stack: state.stack.clone(), valid: false }
            } else {
                let mut stack = state.stack.clone();
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a + b);
                State { stack, valid: true }
            }
        }
        // Mul, Dup, Swap follow the same pattern
    }
}
}

The step function is pure despite using mutation internally. It takes a state and an operation, returns a new state. The valid flag tracks whether an underflow has occurred. Once invalid, the machine stays invalid regardless of subsequent operations. The machine remembers its sins.

Lean Transcription

The Lean model mirrors the Rust implementation exactly. It is the implementation, translated. We define the same operations and state structure:

inductive Op where
  | push : Int → Op
  | pop : Op
  | add : Op
  | mul : Op
  | dup : Op
  | swap : Op
  deriving Repr, DecidableEq
structure State where
  stack : List Int
  valid : Bool
  deriving Repr, DecidableEq

The step function transcribes the Rust logic line by line:

def step (s : State) (op : Op) : State :=
  if !s.valid then { stack := s.stack, valid := false }
  else match op with
    | .push n => { stack := n :: s.stack, valid := true }
    | .pop => match s.stack with
        | [] => { stack := [], valid := false }
        | _ :: rest => { stack := rest, valid := true }
    | .add => match s.stack with
        | b :: a :: rest => { stack := (a + b) :: rest, valid := true }
        | _ => { stack := s.stack, valid := false }
    | .mul => match s.stack with
        | b :: a :: rest => { stack := (a * b) :: rest, valid := true }
        | _ => { stack := s.stack, valid := false }
    | .dup => match s.stack with
        | [] => { stack := [], valid := false }
        | x :: rest => { stack := x :: x :: rest, valid := true }
    | .swap => match s.stack with
        | b :: a :: rest => { stack := a :: b :: rest, valid := true }
        | _ => { stack := s.stack, valid := false }

Every branch in the Rust match corresponds to a branch in the Lean match. The underflow checks are identical. The stack operations map directly. It is the Rust code written in Lean syntax.

We also define execution functions:

def initial : State := { stack := [], valid := true }

def run (program : List Op) : State :=
  program.foldl step initial

def runFrom (s : State) (program : List Op) : State :=
  program.foldl step s

Execution Traces

Both systems produce execution traces. The Lean trace function records the state before each operation:

def trace (program : List Op) : List State :=
  let rec go (s : State) : List Op → List State
    | [] => [s]
    | op :: rest => s :: go (step s op) rest
  go initial program

Running traces shows the parallel execution:

#eval trace [.push 3, .push 4, .add]
#eval trace [.push 5, .dup, .add]
#eval trace [.pop]
#eval trace [.push 1, .add]

The trace for [push 3, push 4, add] produces:

[{ stack := [], valid := true },
 { stack := [3], valid := true },
 { stack := [4, 3], valid := true },
 { stack := [7], valid := true }]

The trace for [pop] (underflow) produces:

[{ stack := [], valid := true },
 { stack := [], valid := false }]

The Rust implementation produces identical traces. When we run the same programs through both systems and compare the output, every state matches exactly. This trace equivalence is what justifies transferring proofs from Lean to Rust.

Bisimulation Proofs

Now we prove properties about specific programs. This is where the payoff arrives. These theorems verify that the Lean model produces the expected results:

theorem exec_push_add :
    run [.push 3, .push 4, .add] = { stack := [7], valid := true } := by native_decide

theorem exec_push_mul :
    run [.push 3, .push 4, .mul] = { stack := [12], valid := true } := by native_decide

theorem exec_dup_add :
    run [.push 5, .dup, .add] = { stack := [10], valid := true } := by native_decide

theorem exec_swap :
    run [.push 10, .push 3, .swap] = { stack := [10, 3], valid := true } := by native_decide

theorem exec_complex :
    run [.push 3, .push 4, .add, .push 2, .mul] = { stack := [14], valid := true } := by
  native_decide

theorem exec_underflow_pop :
    run [.pop] = { stack := [], valid := false } := by native_decide

theorem exec_underflow_add :
    run [.push 1, .add] = { stack := [1], valid := false } := by native_decide

theorem exec_underflow_propagates :
    run [.pop, .push 1, .push 2, .add] = { stack := [], valid := false } := by native_decide

Each theorem states the exact final state after running a program. The native_decide tactic evaluates the computation and confirms the equality. These are not tests that sample the behavior; they are proofs that the computation produces this exact result.

We also verify the traces themselves:

theorem trace_push_add :
    trace [.push 3, .push 4, .add] =
      [ { stack := [], valid := true },
        { stack := [3], valid := true },
        { stack := [4, 3], valid := true },
        { stack := [7], valid := true } ] := by native_decide

theorem trace_underflow :
    trace [.pop, .push 1] =
      [ { stack := [], valid := true },
        { stack := [], valid := false },
        { stack := [], valid := false } ] := by native_decide

The trace proofs verify the entire execution history, not just the final state. Every intermediate state is exactly what we claim.

Invariant Proofs

Beyond concrete examples, we prove general properties. The underflow detection works correctly:

theorem pop_empty_fails : (step { stack := [], valid := true } .pop).valid = false := by
  native_decide

theorem add_one_fails : (step { stack := [1], valid := true } .add).valid = false := by
  native_decide

theorem invalid_propagates (s : State) (op : Op) (h : !s.valid) :
    (step s op).valid = false := by
  simp [step, h]

The invalid_propagates theorem proves that once the machine enters an invalid state, it stays invalid. This is the critical safety property: underflow is detected and the machine does not produce garbage results.

We verify depth properties:

theorem push_increases_depth (s : State) (n : Int) (h : s.valid) :
    depth (step s (.push n)) = depth s + 1 := by
  simp [step, h, depth]

theorem add_decreases_depth_concrete :
    depth (step { stack := [4, 3], valid := true } .add) =
    depth { stack := [4, 3], valid := true } - 1 := by native_decide

theorem swap_preserves_depth_concrete :
    depth (step { stack := [4, 3], valid := true } .swap) =
    depth { stack := [4, 3], valid := true } := by native_decide

Push increases the stack depth by one. Add decreases it by one (consuming two elements, producing one). Swap preserves depth. These depth invariants ensure the machine behaves predictably.

Verified Programs

Finally, we verify complete programs:

def program1 : List Op := [.push 3, .push 4, .add]
def program2 : List Op := [.push 5, .dup, .add]
def program3 : List Op := [.push 3, .push 4, .add, .push 2, .mul]
def program4 : List Op := [.push 10, .push 3, .swap, .pop]

theorem program1_correct : run program1 = { stack := [7], valid := true } := by native_decide
theorem program2_correct : run program2 = { stack := [10], valid := true } := by native_decide
theorem program3_correct : run program3 = { stack := [14], valid := true } := by native_decide
theorem program4_correct : run program4 = { stack := [3], valid := true } := by native_decide

theorem program1_trace_length : (trace program1).length = 4 := by native_decide
theorem program2_trace_length : (trace program2).length = 4 := by native_decide

Each program computes the correct result. The Rust implementation passes the same tests. Because the Lean model is an exact transcription of the Rust code, and we have proven properties about the Lean model, those properties apply to the Rust implementation.

What We Proved

The stack machine demonstrates verification-guided development end to end. We have:

  1. A production Rust implementation with mutation, pattern matching, and control flow
  2. An exact Lean transcription that mirrors every decision
  3. Execution traces showing both systems produce identical state sequences
  4. Proofs that underflow is detected and propagates correctly
  5. Proofs that specific programs compute expected results

The transcription discipline is essential. We did not write an independent specification and hope it matches. We wrote the Rust logic in Lean syntax, preserving the structure exactly. Loops became recursion with the same termination condition. Mutable state became parameters threaded through recursive calls. The valid flag in both implementations serves the same purpose: tracking whether an underflow has occurred.

This technique scales to real systems. For any codebase with a functional core that can be isolated and transcribed, verification-guided development provides a path to proven correctness.

Closing Thoughts

Finding good verification examples is hard. The system must be small enough to specify cleanly, complex enough to have non-trivial properties, and simple enough that proofs are tractable. Too simple and the exercise is pointless; too complex and the proofs become intractable. The stack machine threads this needle. Six operations, a validity flag, and stack depth create enough complexity for interesting proofs without overwhelming the verification machinery.

The native_decide tactic makes finite verification automatic. For any decidable property on a finite domain, Lean evaluates both sides and confirms equality. This is proof by exhaustive computation, not sampling. The limitation is that it only works for concrete inputs. Universal statements over infinite domains require structural induction.

The key insight is that we do not verify the Rust code directly. Rust’s ownership system, borrow checker, and imperative features make direct verification impractical. Instead, we carve out the functional core, transcribe it to Lean, prove properties there, and transfer the proofs back through trace equivalence. The verification gap closes not through tool support, but through disciplined transcription and differential testing.

The techniques scale far beyond toy examples. CompCert verifies a C compiler. seL4 verifies an operating system kernel. Financial systems are a particularly compelling domain: matching engines, auction mechanisms, and clearing systems where bugs can trigger flash crashes or expose participants to unbounded losses. Expressive bidding, where market orders encode complex preferences over combinations of instruments, requires matching algorithms with precise economic properties. These mechanisms must satisfy strategy-proofness and efficiency; the theorems exist in papers, and the implementations exist in production. Verification-guided development bridges them. Somewhere beneath global markets, proven-correct code is already executing trades.

Artificial Intelligence

In 2024, a computer solved one of the hardest problems at the International Mathematical Olympiad with a formally verified proof. In 2025, another hit gold-medal standard. The proofs were checked down to the axioms. No trust required. The interesting part is not that machines beat humans at competition math. The interesting part is that reason is becoming executable at scale, and that changes everything.

Mathlib now contains 1.9 million lines of formalized mathematics spanning algebra, analysis, topology, and number theory. It grows by thousands of theorems monthly. No single person understands all of it, and no single person needs to. The theorem you formalize today may be imported by a researcher fifty years from now working on problems we cannot imagine. The proof will still check. Meanwhile, neural networks have learned to propose proof steps that formal systems verify. The model guesses, the kernel checks. DeepSeek-Prover and LeanDojo make this practical today. PhysLean is formalizing physics itself: Maxwell’s equations, quantum mechanics, field theory. The tooling has matured faster than most expected.

We should be honest about limits. Higher-order logic is undecidable. Church and Turing settled this in 1936. Formalization is expensive: the Polynomial Freiman-Ruzsa conjecture required 20,000 lines of Lean for a 50-page paper. Some domains resist entirely. Physics says “for large N” and expects you to understand. But within scope, something remarkable becomes possible: certainty. Not high confidence. Certainty. The proof typechecks or it does not. seL4 runs in critical systems with mathematical proof it cannot crash. CompCert compiles C with proof the assembly matches the source. These are not research toys. They are deployed. The parent whose child depends on a medical device wants that device verified. The family crossing a bridge wants the calculations checked. Reason serves love. It always has.

This work gets funded because it pushes the frontiers of human knowledge. Science foundations fund theorem provers because they see infrastructure for the future of mathematics. Trading firms fund them because they need systems that actually work. Both are right. Knight Capital lost $440 million in 45 minutes from a deployment bug. The code did exactly what it was written to do. It was simply written wrong. For firms whose existence depends on algorithm correctness, formal methods are existential.

This is where it gets interesting, at least if you find the intersection of mechanism design, game theory, and formal methods interesting, which you should, because it is genuinely one of the most exciting open problems in theoretical computer science and also immediately practical. Markets are mathematical objects. Combinatorial auctions, where bidders express preferences over bundles rather than individual items, turn resource allocation into constraint satisfaction problems. (Shameless plug: OneChronos builds these for financial markets, and if you are the kind of person who reads articles like this for fun, we should talk.) The question of how to clear such markets optimally is NP-hard in general. Proving properties about these systems, that they are incentive-compatible, that they converge, that they allocate efficiently under stated assumptions, is exactly the kind of problem where formal verification shines. Every improvement in market mechanism design, every formally verified property of an auction protocol, translates into real systems allocating real resources. Better reasoning about markets means systems that allocate capital more efficiently, and efficient allocation is the difference between prosperity and stagnation.

The functional programming shops of Wall Street, the quant firms in London and Hong Kong, the AI labs in China, all contribute to this ecosystem. DeepSeek’s open-source theorem provers emerged from it. The competition is global but the infrastructure is shared. A trading firm in New York open-sources a proof automation library; researchers in Beijing build on it. An AI lab in Hangzhou releases trained models; mathematicians in Paris use them. Private incentive aligns with public good. The tools developed for trading algorithms can verify medical devices. The techniques refined for financial models can prove properties of cryptographic protocols.

There is a future taking shape. AI agents will increasingly act in markets: trading, lending, allocating, optimizing. This is already happening. The question is not whether but how. An AI agent can be constrained by rules, but only if those rules are precise enough to check. Natural language policies are suggestions. Formally verified constraints are guarantees. Imagine market infrastructure where agents must prove, before executing, that their actions satisfy regulatory constraints, risk limits, fairness properties. Not “we reviewed the code” but “the system verified the proof.” The agent that cannot demonstrate compliance cannot act. Formal constraints are not a limitation on AI autonomy. They are the condition that makes AI autonomy safe.

We are building that infrastructure now, whether we recognize it or not. Every verified auction protocol, every theorem about market equilibria, becomes a potential constraint on future autonomous systems. AI agents in markets are not a hypothetical. They are here, and more are coming. The practical question is not whether to allow them but how to make them work well. Formal verification offers something concrete: constraints that actually constrain, rules that cannot be silently violated, guarantees that hold regardless of what the model learned. Skip the existential risk debates. The point is engineering systems that do what they are supposed to do.

If you are reading this as a student or someone early in your career: this stuff is fun. Watching a proof come together, seeing the goal state shrink to nothing, getting that green checkmark from the compiler when everything finally clicks. It is like solving puzzles, except the puzzles are deep and the solutions last. The theorems you formalize will still be valid when you are gone. That is a strange thing to be able to say about your work. The field is small enough that you can make real contributions and growing fast enough that there is plenty to do.

The work is hard. The learning curve is real. There will be days when the goal state mocks you and nothing seems to work. This is normal. The difficulty is not a bug; it is the cost of building things that matter. Scientists have always endured frustration because progress depends on it. The stoic response is not to complain but to continue: one lemma at a time, one proof at a time, one small piece of certainty added to the edifice. The edifice grows. It has been growing for centuries, and you can be part of it.

Jump in.


Resources

Formalized Mathematics

  • Mathlib: The formalized mathematics library
  • PhysLean: Formalizing physics

Neural Theorem Proving

Tools

References

Official Documentation

Theorem Proving Games

University Courses (Lean 4)

University Courses (Lean 3)