Introduction

Welcome to From Zero to QED, an informal introduction to formality in Lean 4. This article series teaches the language from first principles. Lean is expressive but the learning resources remain scattered and incomplete. This series is a best effort to fill that gap.

Read Online | Download PDF

Note

This is the beta release. There are bound to be typos, errors, and rough edges. If you spot something, send a PR on GitHub.

Tip

This article is itself a giant checkable theorem. Every code sample, every proof, every definition is extracted from source files that the Lean compiler typechecks on every build. If the article compiles, the theorems are valid. The full source lives in the GitHub repository.

What This Series Covers

The series divides into two arcs. The first arc treats Lean as a programming language. You will learn the syntax, type system, control flow, polymorphism, monads, and IO. By the end of this arc you can write real programs in Lean.

The second arc treats Lean as a theorem prover. You will learn to write proofs, understand type theory and dependent types, master tactics, and eventually prove classic mathematical results. The series concludes with the emerging intersection of theorem proving with artificial intelligence, and why formal methods may matter more in the coming decade than they have in the previous five.

No prior experience with theorem provers is assumed. Familiarity with a typed functional language like Haskell, OCaml, or Scala helps but is not strictly required.

Getting Started

There are several ways to follow along with the examples, from zero-install browser options to full local setup.

Option 1: Browser

Lean Live runs Lean 4 in your browser with no installation. Copy code snippets from the text and paste them into the editor. For compatibility with examples in this series, set the toolchain to leanprover/lean4:v4.24.0 and Mathlib to v4.24.0 in the settings. Some later chapters require Mathlib, which Lean Live supports but loads slowly on first use.

Option 2: One-Click Cloud Environment

Launch a complete Lean 4 environment in your browser with no local setup:

Open in GitHub Codespaces - Free for 120 core-hours/month
Open in Gitpod - Free tier available

Both options provide VS Code in the browser with Lean 4, the language extension, and all dependencies pre-installed. The environment runs lake exe cache get automatically on startup to download prebuilt Mathlib artifacts.

Option 3: Dev Container (Docker + VS Code)

If you have Docker and VS Code installed locally, clone the repo and open it in VS Code:

git clone https://github.com/sdiehl/zero-to-qed
code zero-to-qed

VS Code will detect the .devcontainer configuration and prompt you to “Reopen in Container”. This builds the same environment locally, giving you cloud-like convenience with local performance.

Option 4: Local Installation

For the full experience, install Lean 4 with VS Code and the Lean 4 extension. Other editors work too (Zed, Emacs, Neovim all have Lean support) but VS Code is the best documented and most widely used. Clone the repository:

git clone https://github.com/sdiehl/zero-to-qed
cd zero-to-qed
lake exe cache get   # Download prebuilt Mathlib (saves hours)
lake build

The lake exe cache get command downloads prebuilt artifacts for Mathlib, reducing the initial build from hours to minutes. Without it, Lake compiles Mathlib from source, which tests your patience more than your code.

You can also serve the documentation locally with just serve if you have mdBook installed.

Repository Structure

Code samples are extracted from Lean source files. Each chapter corresponds to modules in the src/ directory:

Chapter	Source File
Basics	src/ZeroToQED/Basics.lean
Data Structures	src/ZeroToQED/DataStructures.lean
Control Flow	src/ZeroToQED/ControlFlow.lean
Polymorphism	src/ZeroToQED/Polymorphism.lean
Effects	src/ZeroToQED/Effects.lean
IO	src/ZeroToQED/IO.lean
Proofs	src/ZeroToQED/Proving.lean
Type Theory	src/ZeroToQED/TypeTheory.lean
Tactics	src/ZeroToQED/Tactics.lean

Larger examples live in src/Examples/:

Example	Source File	Run Command
Magic: The Gathering	MagicTheGathering.lean	`lake exe mtg`
D&D Character Generator	DndCharacter.lean	`lake exe dnd 42`
Game of Life	GameOfLife.lean	`lake exe life`
Stack Machine	StackMachine.lean	-

Open these files in VS Code to explore with full IDE support. The Infoview panel shows types and proof states as you navigate.

Additional learning resources are collected in the References appendix. This series is an informal introduction to formality. If you want the stuffy formal introduction to formality, see Theorem Proving in Lean 4, Functional Programming in Lean, Mathematics in Lean, or university courses from CMU, Imperial, and Brown. They are more rigorous.

Why?

Software keeps getting more complex. Every year brings more dependencies, more attack surface, more emergent behavior no one designed or intended. We build systems by stacking abstractions, trusting that each layer does what it claims. And this mostly works, except when it doesn’t. Sometimes an algorithm trades against itself and the stock market loses a trillion dollars in thirty-six minutes. Sometimes a missing bounds check in a cryptographic library lets attackers read arbitrary memory from half the servers on Earth. The complexity is the natural consequence of building systems larger than any individual can hold in their head, maintained by teams that turn over faster than the code does.

And now we are increasingly automating the production of code itself. Large language models now generate plausible programs at unprecedented scale. They have read every GitHub repository, every Stack Overflow answer, every tutorial and textbook. They produce code that looks right, compiles often, and works sometimes. They are plausibility generators: statistical engines that have learned what code typically looks like without understanding what code actually means. The gap between “looks correct” and “is correct” has always mattered. It is about to matter more.

One response is to build richer formal systems that can constrain the chaos. Languages where the compiler verifies rather than merely translates. Type systems that encode invariants the programmer used to track mentally. Proof assistants that check whether code does what it claims, mechanically and exhaustively. The map must match the territory, and these tools force you to make the correspondence explicit. They also give us something to point at the output of language models: a verifier that can check whether generated code satisfies its specification, turning plausibility into proof.

Lean sits at this intersection: a theorem prover backed by a million-line mathematical library and a general-purpose programming language fast enough for production use. You can prove theorems about prime numbers in the morning and write a web server in the afternoon, using the same language, the same tools, the same mental model. Write a claim, and the compiler tells you whether it holds. No trust required. No hand-waving accepted.

If Dwarf Fortress is the most complex simulation ever built and Factorio is crystallized obsession in game form, then Lean is what happens when you point that same energy at the foundations of mathematics. Like Dwarf Fortress, losing is fun, because even a failed proof attempt teaches you something about why it failed. Like Factorio, there is always one more lemma to optimize, one more proof to refactor, one more elegant solution lurking just beyond the current mess.

So why learn Lean then? Maybe you want to get a taste of what the future of mathematics looks like. Maybe you want to learn what the engineers at top quant firms and AI labs are building, because that seems like a reasonable bet on where things are heading. Maybe you want to build high-assurance software to model and trade markets. Maybe you have no practical reason at all and just want to learn something new. All of these work. There is no entrance exam.

There are also practical reasons. Start small: prove that a parser handles every edge case, that a compiler preserves meaning, that a refactor did not silently break an invariant. These are theorems about code you actually write. Scale up from there. The world runs on systems that allocate resources: power grids, packet routing, markets matching buyers and sellers. A bug in market infrastructure misallocates capital, distorts prices, rewards the wrong participants. The matching algorithm that assigns medical residents to hospitals, the slot allocation system that schedules aircraft across congested airspace, high-assurance trading systems: these are theorems waiting to be verified. The gap between “we tested it extensively” and “we proved it correct” is the gap between confidence and certainty.

Most people do not need dependent type theory or formal verification for their day jobs. But necessity is a boring criterion for what to learn. Humans decipher Linear B, collide particle jets to conjure baryons from vacuum, compose fugues, map the cosmic microwave background to the first seconds after time began, and verify QED’s prediction of the electron magnetic moment to twelve decimal places. None of these have direct economic outcomes (yet), yet we do them anyway. Much of this text is about similar pursuits that are still early and straddle the line between practical and theoretical, but this is where the future is always born.

But what makes this cluster of ideas so rewarding is the overarching big ideas and trends. When a proof type-checks, it is correct. The compiler does not care about your reputation or your confidence. It cares whether the logic holds. You discover what you actually understand versus what you thought you understood. A proof assistant is a bicycle for the mind: it amplifies your ability to think correctly.

Lean will not let you wave your hands. It will not accept “it is obvious that”, “because of abstract nonsense” or “this margin is too narrow” as justifications. The feedback loop is immediate and unforgiving. This can be frustrating, but it is also what makes the successes satisfying. When a proof finally compiles, you know it works.

There is a kind of peace in this. The world outside is full of claims that cannot be verified, arguments that cannot be resolved, systems that fail in ways no one predicted. Inside a proof assistant, the rules are clear. A theorem holds or it does not. The machine tells you which. You can build something that will still be correct in a hundred years, long after the context that motivated it has faded. The difficulty is part of the appeal. Easy things do not teach you much, and they do not last.

If this sounds interesting, keep reading. If not, there are plenty of other things worth learning.

Why Lean Specifically?

Among proof assistants, several mature options exist. Coq has the largest ecosystem and decades of industrial use. Isabelle offers powerful automation and a massive library of formalized mathematics. Agda provides elegant dependent types with a minimalist core. Each has devoted communities and hard-won expertise. The choice between them involves tradeoffs: Coq’s tactic language is battle-tested but showing its age; Isabelle’s automation can feel like magic until it fails mysteriously; Agda prioritizes purity over pragmatism.

Lean 4 occupies a distinctive position. The system emerged from Microsoft Research in 2013 and has evolved through four major versions. Lean 4, released in 2021, was a ground-up rewrite that reimagined the tool as both proof assistant and practical programming language. The implementation is largely written in Lean itself, a feat of bootstrapping that demonstrates the language’s capabilities. The mathematical library, Mathlib, contains over a million lines of formalized mathematics spanning algebra, analysis, topology, number theory, and beyond.

Lean has better tooling than other theorem provers. The error messages are informative. The IDE integration works. You can ask for hints, search the library, and see exactly what remains to prove.

This article series is my (admittedly flawed) attempt to take you from zero knowledge of Lean to writing your own proofs and programs, with enough depth to tackle real problems. Whether you finish it or abandon it halfway through, you will have spent your time on something worthwhile.

Theorem Provers

This article covers the history and landscape of theorem provers. If you do not care about context, skip to Basics.

The idea of mechanizing mathematical reasoning dates back centuries, but the modern era of theorem proving began in the 1960s and 1970s when researchers first attempted to implement formal logic on computers. These early systems were primitive by today’s standards, but they established the fundamental insight that proofs could be represented as data structures and verified by algorithms.

A note on terminology: automated theorem provers attempt to find proofs without human guidance. Proof assistants (also called interactive theorem provers) have humans construct proofs while the machine verifies each step. Lean is a proof assistant with enough automation that the line blurs. This article uses the terms loosely, as most practitioners do.

Early Systems

The first generation of theorem provers emerged from two distinct traditions. One tradition, exemplified by systems like Automath developed by Nicolaas de Bruijn in the late 1960s, focused on encoding existing mathematical proofs in a formal language that a computer could check. De Bruijn’s work introduced many concepts that remain central to modern systems, including the idea that types could depend on values and that propositions could be represented as types. The other tradition focused on automated theorem proving, attempting to have computers discover proofs on their own through search procedures. While fully automated proving remains intractable for most interesting mathematics, techniques from this tradition inform the automation tactics available in modern proof assistants.

The 1980s saw the development of several influential systems. The Calculus of Constructions, introduced by Thierry Coquand and Gérard Huet, provided a unified foundation combining dependent types with a hierarchy of universes. This calculus became the theoretical basis for Coq, which remains one of the most widely used proof assistants today. Coq pioneered many features now standard in the field, including tactic-based proof development, extraction of executable programs from proofs, and a module system for organizing large developments. Major verification efforts in Coq include the CompCert certified C compiler and the mathematical proof of the four color theorem.

The LCF Tradition

Around the same time, researchers in Edinburgh developed the LCF system and its descendants, which introduced the influential LCF architecture. In this design, there is a small trusted kernel that defines what constitutes a valid proof, and all proof construction must ultimately pass through this kernel. This approach provides strong guarantees because only the kernel needs to be trusted, while tactics and automation can be implemented in untrusted code. The HOL family of provers, including HOL4 and Isabelle/HOL, descend from this tradition. Isabelle in particular has been used for major verification efforts including the seL4 verified operating system kernel.

First-Order Theorem Proving

A parallel tradition built theorem provers on first-order logic rather than type theory. The Boyer-Moore family of provers, culminating in ACL2, used an untyped computational substrate based on Lisp with powerful automation heuristics for discovering proofs. ACL2 achieved notable industrial successes, including verification of AMD’s floating-point division after the Pentium FDIV bug made hardware correctness suddenly interesting to executives.

Despite these successes, first-order theorem proving has not been widely adopted outside specialized industrial applications. First-order logic imposes an expressiveness ceiling that makes formalizing modern mathematics awkward. Without dependent types, you cannot easily express properties like “a vector of length n” or “a sorted list.” These systems rely heavily on opaque automation heuristics rather than user-programmable tactics, which makes it harder to understand why proofs fail and how to fix them. Most importantly, there is no Curry-Howard correspondence linking proofs to programs, which means verified algorithms cannot be extracted into executable code.

The contrast is instructive. Type-theoretic systems grew ecosystems of thousands of users, million-line mathematical libraries, and active research communities. First-order provers remained specialized tools for specific classes of problems. The Curry-Howard insight that proofs are programs and types are propositions turned out to be generatively powerful in ways that first-order theorem proving was not. When you can express your specification, your implementation, and your correctness proof in the same language, each informs the others. This unity is what makes dependent type theory feel like mathematics rather than a checkbox.

Dependent Type Theory

The development of Martin-Löf type theory in the 1970s and 1980s provided another foundational framework that influenced systems like Agda and later Idris. Per Martin-Löf’s intensional type theory emphasized the computational content of proofs and introduced identity types as a way to reason about equality. Agda, developed primarily at Chalmers University, implements a variant of this theory with sophisticated support for dependent pattern matching. Its syntax influenced Lean’s design, and it remains popular for research in type theory and programming language semantics.

Idris took a different approach by prioritizing practical programming with dependent types rather than theorem proving per se. Idris demonstrated that dependent types could be integrated into a language designed for general-purpose programming, with features like implicit arguments and type-driven development making dependently typed code more accessible to working programmers. Many of these ergonomic innovations influenced Lean 4’s design.

The 2010s brought renewed interest in the foundations of mathematics through homotopy type theory, which reinterprets types as spaces and equality as paths. This perspective, developed by Vladimir Voevodsky and others, led to new proof assistants like Cubical Agda that implement univalent foundations. While Lean does not natively support cubical type theory, the mathematical insights from this research have influenced how the community thinks about equality and transport.

The Ecosystem Problem

Most theorem provers die the same death: they work, but nobody uses them. The software is correct, the theory is sound, the papers get cited, and then the maintainer graduates or retires and the codebase rots. This is not a failure of engineering. It is a failure of ecosystem.

A theorem prover without a library is a programming language without packages. You can write everything from scratch, but you will not. The activation energy is too high. Automath proved you could encode mathematics in the 1960s. Mizar built a large library but with a syntax that looked like it was designed to repel newcomers. The Boyer-Moore provers achieved industrial success at AMD but never grew a community. Each system had technical merit. None achieved escape velocity.

The systems that thrive today share a common pattern: a killer application that proved the approach could work at scale. Coq had CompCert. Isabelle had seL4. These existence proofs mattered. When someone asked “can you build anything real?” there was an answer.

Lean’s Position

Lean emerged from this rich history. The first version was developed by Leonardo de Moura at Microsoft Research starting in 2013, with the goal of building a system suitable for both interactive theorem proving and automated reasoning. Lean 2 and Lean 3 refined the system and built a substantial mathematical library. Lean 4, shipped in 2021, was a ground-up rewrite with several things done right.

Speed. Lean 4 compiles to C and runs fast. Not “fast for a theorem prover” but actually fast. You can write command-line tools, build systems, even games. This matters because proof assistants are IDEs, and IDE responsiveness determines whether people finish their proofs or give up.

Metaprogramming. Lean 4’s tactic framework is written in Lean itself. You can inspect it, modify it, and write your own tactics without learning a separate metalanguage. In Coq, the tactic language (Ltac) is a different beast from the term language. In Lean, tactics are just programs.

Syntax. Lean looks like a normal programming language. Functions are functions. Pattern matching works how you expect. Unicode is optional. Lower friction means more users.

Mathlib. At 1.9 million lines, Mathlib is the largest coherent mathematical library ever created. When people ask “can I formalize real mathematics?” the answer is: probably someone already did, go look it up. Mathlib covers undergraduate and graduate-level material across algebra, analysis, topology, number theory, and other areas. These libraries demonstrate that modern proof assistants can handle serious mathematics, not just toy examples.

Community momentum. Mathlib grows by thousands of theorems monthly. The Lean Zulip is active and welcoming. Kevin Buzzard teaches undergraduates at Imperial. Terence Tao formalizes his papers. When working mathematicians adopt your tool, the library grows faster.

AI integration. Lean is the default target for neural theorem proving research. DeepSeek-Prover, LeanDojo, and others chose Lean because the metaprogramming API makes tool integration tractable. This creates a flywheel: more AI tooling attracts more users attracts more AI tooling.

The Modern Landscape

Today’s theorem provers share many features despite their different foundations. Most support some form of dependent types, allowing types to depend on values and enabling precise specifications. Most provide tactic languages for interactive proof development alongside term-mode proof construction. Most include automation ranging from simple rewriting to sophisticated decision procedures.

The systems differ in their logical foundations, their approach to equality and computation, their support for classical versus constructive reasoning, and their emphasis on programming versus pure mathematics. Lean occupies a distinctive position by providing classical logic by default while maintaining strong computational properties, and by treating programming and proving as equally important activities.

Agda has elegant syntax and supports cubical type theory for people who care about homotopy, but has no substantial mathematical library. Idris pioneered practical dependent types for programming, but the community has not coalesced around a shared library. The HOL family uses simpler type theories without full dependent types, which makes automation easier but specifications harder. These systems have their niches but face the same ecosystem challenges that have always separated survivors from the graveyard.

Getting Started

If you are starting today, Lean is a reasonable choice. The syntax is approachable, the tooling is modern, the library is substantial, and the community is active. The alternatives are not wrong, but the path is less well-trodden.

If you want the theoretical foundations: Type Theory covers the core calculus. Dependent Types explains why types can mention values. Tactics and Proof Strategy cover how to actually get proofs done. Artificial Intelligence discusses where this is heading.

Lake Build System

Every programming language eventually grows a build system, and that build system eventually grows into a small civilization with its own customs and territorial disputes. Lake is Lean’s entry in this tradition. It borrows good ideas from Cargo, is written in Lean itself, and mostly works. The documentation is sparse, the error messages occasionally cryptic, and there are two competing configuration formats that do almost but not quite the same thing. Welcome to the frontier.

That said, Lake gets the job done. Paired with Elan for version management, you get reproducible builds and workable dependency management. Code that compiles on your machine will usually compile on other machines. For a young ecosystem, this is not nothing.

Elan

Elan is the Lean version manager. It downloads, installs, and switches between different versions of Lean. Most users install Elan first and then let it manage their Lean installation. On Unix systems, installation is a single command that downloads and runs the installer script. On Windows, a dedicated installer is available.

Once installed, Elan reads a lean-toolchain file in your project directory to determine which Lean version to use. This file typically contains a single line specifying the version, such as leanprover/lean4:v4.3.0 or simply leanprover/lean4:stable for the latest stable release. When you enter a directory containing this file, Elan automatically activates the correct toolchain. If that version is not installed, Elan downloads it transparently.

This per-project versioning solves a common problem in software development. Different projects may require different Lean versions, and Elan lets them coexist without conflict. You can work on a project using Lean 4.2 in one terminal and a project using Lean 4.5 in another. The toolchain file checked into version control ensures all collaborators use the same Lean version.

Elan also manages additional toolchain components. The Lean installation includes the compiler, the language server for editor integration, and documentation tools. Updates happen through Elan with commands like elan update to fetch the latest versions.

Lake

Lake is the build system and package manager for Lean. The name combines “Lean” and “Make,” and every Lean project contains a lakefile.lean that describes its structure, dependencies, and build configuration. A minimal lakefile declares the package name and defines one or more build targets. The most common targets are libraries, which compile Lean source files into modules that other code can import, and executables, which produce standalone programs. Lake reads this configuration and orchestrates compilation, handling dependencies between modules automatically.

import Lake
open Lake DSL

package myproject where
  version := v!"0.1.0"

lean_lib MyLib where
  roots := #[`MyLib]

@[default_target]
lean_exe myapp where
  root := `Main

This lakefile defines a package named myproject containing a library called MyLib and an executable called myapp. The library compiles all modules under the MyLib namespace, while the executable uses Main as its entry point. The @[default_target] attribute marks myapp as the target built when you run lake build without arguments.

Dependencies on external packages are declared in the lakefile using the require keyword. Lake fetches dependencies from Git repositories, and you can specify versions through tags, branches, or commit hashes. When you build your project, Lake first ensures all dependencies are available and up to date, then compiles them before your own code. Reservoir serves as the community package registry, indexing Lean packages and providing searchable documentation, dependency graphs, and build status for the ecosystem.

require mathlib from git
  "https://github.com/leanprover-community/mathlib4" @ "v4.3.0"

require aesop from git
  "https://github.com/leanprover-community/aesop" @ "master"

Lake maintains a lake-manifest.json file that records the exact versions of all dependencies. This lockfile ensures reproducible builds across different machines and times. When you run lake update, Lake fetches the latest versions matching your constraints and updates the manifest.

The build process produces artifacts in a .lake directory within your project. Compiled Lean files become .olean files containing serialized proof terms and compiled code. These intermediate files enable incremental compilation, where Lake only recompiles modules that have changed or whose dependencies have changed. For large projects like Mathlib, this incremental approach is essential for practical development.

Lake also supports downloading precompiled artifacts called caches. Mathlib maintains a cache of compiled artifacts for anyone who would rather not spend hours rebuilding from source. The lake exe cache get command fetches these artifacts, reducing initial setup from hours to minutes.

Project Structure

A typical Lean project follows a conventional directory layout. The lakefile sits at the project root alongside the lean-toolchain file. Source files live in directories matching their module namespaces. A module named MyLib.Data.List would be in the file MyLib/Data/List.lean. This correspondence between filesystem paths and module names makes navigation straightforward.

myproject/
  lakefile.lean
  lean-toolchain
  lake-manifest.json
  MyLib/
    Basic.lean
    Data/
      List.lean
      Vector.lean
  Main.lean

Test files typically live in a separate directory, often called Test or Tests, with their own library target in the lakefile. Documentation, examples, and scripts occupy other directories as needed. Lake does not enforce a particular structure beyond the lakefile requirements, but conventions have emerged from the community.

Common Commands

Building a project uses lake build, which compiles all default targets. You can build specific targets by name, like lake build MyLib or lake build myapp. For development, lake build after editing a file recompiles only what changed.

Running an executable uses lake exe followed by the executable name, like lake exe myapp. Arguments after the executable name pass through to the program. You can also use lake run with the executable target name.

Managing dependencies uses lake update to refresh the manifest with the latest matching versions. After modifying the lakefile to add or change dependencies, running lake update fetches and locks the new versions.

Cleaning build artifacts uses lake clean, which removes the .lake/build directory.

The lake env command prints environment variables that configure Lean to find your project’s modules. This is useful when running Lean directly or integrating with external tools.

Editor Integration

The Lean language server provides IDE features like error highlighting, go-to-definition, type information on hover, and code completion. Lake integrates with the language server by providing project configuration. When you open a Lean file in an editor with Lean support, the language server reads your lakefile to understand the project structure.

Visual Studio Code with the lean4 extension is the most popular editor setup. The extension automatically starts the language server and provides a panel showing proof states and messages. Other editors like Emacs and Neovim have community-maintained Lean integrations that communicate with the same language server.

For the language server to work correctly, it must know about your project configuration. Opening a Lean file outside a Lake project, or opening a file before dependencies are built, can cause errors. Building the project with lake build before editing ensures the language server has the information it needs.

Unicode Symbol Entry

Lean uses Unicode symbols extensively. Mathematical notation has evolved over centuries to be information-dense and readable. Writing ∀ n, n + 0 = n is clearer than forall n, n + 0 = n, and α → β is more familiar to mathematicians than alpha -> beta. Rather than inventing ASCII approximations, Lean embraces the notation that mathematicians already use.

In VS Code with the Lean extension, type a backslash followed by a name to produce symbols. The editor replaces the sequence as you type. Hover over any symbol in existing code to see how to type it.

Input	Symbol	Input	Symbol	Input	Symbol
`\to`	→	`\and`	∧	`\alpha`	α
`\le`	≤	`\or`	∨	`\beta`	β
`\ge`	≥	`\not`	¬	`\N`	ℕ
`\ne`	≠	`\forall`	∀	`\Z`	ℤ
`\<`	⟨	`\exists`	∃	`\R`	ℝ
`\>`	⟩	`\in`	∈	`\x`	×
`\comp`	∘	`\sub`	⊂	`\l`	λ

The angle brackets ⟨ and ⟩ deserve special mention. They are shorthand for structure constructors. These two forms are equivalent:

def explicit : Point := Point.mk 3.0 4.0
def shorthand : Point := ⟨3.0, 4.0⟩

You will see angle brackets throughout Lean code wherever structures are constructed.

The Interactive Workflow

Lean development is fundamentally interactive. Unlike batch compilers where you write code, compile, and hope for the best, Lean provides continuous feedback as you type. This tight feedback loop is not a convenience feature but the primary way you develop in Lean.

The Infoview panel is your window into Lean’s reasoning. In VS Code, it appears on the right side when you open a Lean file. As you move your cursor through the code, the Infoview updates to show the state at that position. When writing proofs, it displays the current goal: what hypotheses you have available and what remains to be proved. When writing programs, it shows types and values. This panel is essential for understanding what Lean sees at every point in your code.

Consider a simple proof in progress:

theorem add_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => _

When your cursor is on the underscore, the Infoview shows:

case succ
m n : Nat
ih : n + m = m + n
⊢ n + 1 + m = m + (n + 1)

This goal state tells you everything: you are in the succ case of an induction, you have m and n as natural numbers, you have an induction hypothesis ih, and you must prove the equation shown after the turnstile ⊢. Without this feedback, tactic proving would be like navigating a maze blindfolded.

Running Single Files

Not every experiment needs a full project. For quick tests, you can run a single Lean file without creating a Lake project. Create a file like hello.lean:

#eval "Hello, world!"

Run it with:

lake env lean hello.lean

Each #eval command in the file prints its result. You do not need a main function or IO monad for simple output. This is the quickest path from idea to result.

For interactive development, the VS Code Infoview shows #eval results as you type, without running any commands. Place your cursor after an #eval line and the result appears in the panel. This feedback loop is often faster than switching to a terminal.

When your experiment grows beyond a single file, or when you need dependencies, create a proper Lake project. But for exploring syntax, testing small functions, or working through exercises, single files work well.

Evaluation Commands

Lean provides several commands that evaluate expressions and report results directly in the editor. These are invaluable for exploration and debugging.

The simplest demonstration:

#eval "Hello, world!"    -- Hello, world!

#check displays the type of an expression:

#check 1 + 1        -- 1 + 1 : Nat
#check [1, 2, 3]    -- [1, 2, 3] : List Nat
#check fun x => x   -- fun x => x : ?m.1 → ?m.1

#eval evaluates an expression and shows its value:

#eval 2 + 2         -- 4
#eval [1, 2, 3].length  -- 3
#eval "hello".toUpper   -- "HELLO"

#print shows the definition of a constant, including theorems:

#print Nat.add_comm
-- theorem Nat.add_comm : ∀ (n m : Nat), n + m = m + n := ...

#reduce fully reduces an expression using definitional equality:

#reduce (fun x => x + 1) 5  -- 6

These commands appear as blue underlines in VS Code. Hover over them or check the Infoview to see results. They let you test ideas immediately without writing a full program or proof.

Reading Error Messages

Lean’s error messages are verbose but precise. They tell you exactly what went wrong, which is both a blessing and a curse for newcomers who may find the detail overwhelming.

A type mismatch error shows what was expected and what was provided:

type mismatch
  h
has type
  P
but is expected to have type
  Q

This means you tried to use a proof of P where a proof of Q was needed. Look at the goal state to understand what type you actually need.

An unknown identifier error means a name is not in scope:

unknown identifier 'foo'

Check for typos, missing imports, or hypotheses you forgot to introduce.

An unsolved goals error at the end of a proof means you have not proved everything:

unsolved goals
⊢ P ∧ Q

Your proof is incomplete. Look at what remains and continue.

The habit of reading error messages carefully, rather than guessing at fixes, will save hours of confusion. Lean is trying to help; let it.

Mathlib Projects

Projects depending on Mathlib benefit from additional tooling. The cache executable bundled with Mathlib downloads prebuilt artifacts, avoiding the need to compile Mathlib yourself. After adding Mathlib as a dependency, running lake exe cache get fetches compiled files for your Lean version.

Mathlib projects often use a template that includes recommended configuration. The template sets up the toolchain file, lakefile, and auxiliary files for continuous integration and documentation generation. Starting from this template ensures compatibility with Mathlib’s infrastructure.

Because Mathlib updates frequently, projects must balance using new features against the cost of keeping up with changes. Pinning to specific Mathlib versions provides stability, while tracking recent versions provides access to new material. The Mathlib changelog documents breaking changes to help with updates.

Command Reference

Command	Description	Example
`lake new`	Create a new project	`lake new myproject`
`lake init`	Initialize project in current directory	`lake init myproject`
`lake build`	Build default targets	`lake build`
`lake build <target>`	Build specific target	`lake build MyLib`
`lake clean`	Remove build artifacts	`lake clean`
`lake update`	Update dependencies to latest versions	`lake update`
`lake exe <name>`	Run an executable	`lake exe myapp --flag`
`lake env`	Print environment variables	`lake env`
`lake script run`	Run a lakefile script	`lake script run test`
`lake test`	Run project tests	`lake test`
`lake exe cache get`	Download Mathlib cache	`lake exe cache get`
`elan show`	Show installed toolchains	`elan show`
`elan update`	Update all toolchains	`elan update`
`elan default`	Set default toolchain	`elan default leanprover/lean4:stable`
`elan override`	Set directory-specific toolchain	`elan override set leanprover/lean4:v4.3.0`

Compiler Backend and Runtime

Lean 4 compiles to C code, which is then compiled to native executables using a system C compiler (typically Clang or GCC). This compilation pipeline differs from most theorem provers, which either interpret code or extract to another language like OCaml or Haskell. The choice to target C provides portability and enables linking with existing C libraries.

The compilation process involves several stages. Lean source code is first type-checked and elaborated into the Lean kernel language. Proof terms are then erased since they have no computational content: proofs exist to satisfy the type checker, and once verified, they are deleted before the program runs. The remaining code is converted to an intermediate representation that resembles a simplified functional language. This intermediate form is then translated to C code that Lake compiles with your system’s C compiler. We explore the relationship between programming and proving in detail in Proofs.

Lean’s runtime uses reference counting rather than tracing garbage collection. Each heap-allocated object maintains a count of references to it. When the count drops to zero, the object is immediately freed. This approach has lower latency than tracing collectors since there are no garbage collection pauses. The Counting Immutable Beans paper describes the design in detail.

Reference counting enables a technique the Lean developers call Functional But In-Place. When you perform a functional update on a data structure and the original has a reference count of one, the runtime can reuse the memory in place rather than allocating new storage. This means that pure functional code operating on unshared data achieves performance comparable to imperative mutation. The Array type in Lean exploits this property: appending to an unshared array mutates it in place despite the pure functional semantics.

The runtime is strict, not lazy like Haskell. All function arguments are evaluated before the function body executes. This makes performance more predictable but requires different idioms for infinite data structures or expensive computations that might not be needed. Lean provides explicit thunks via the Thunk type when lazy evaluation is required.

Caution

The ecosystem lacks mature libraries for common tasks like HTTP clients, database connectors, encryption, and async I/O. While the Axiomed project is building HTTP support and the community has created socket bindings, these are far less polished than equivalents in established languages. Linking against system libraries requires out-of-band setup that Lake cannot manage portably across operating systems. Parallelism is supported in the form of cooperative scheduling on multiple threads.

Binary sizes tend to be large because the generated C code includes the Lean runtime and any Mathlib dependencies are substantial. Compile times for projects depending on Mathlib can be lengthy, though the cache system mitigates this for incremental builds. The compiler itself is under active development, with the Year 3 Roadmap promising improvements to code generation, smaller binaries, and better reference counting.

For production systems, Lean is best suited as a specification and verification tool rather than as the implementation language. A practical pattern is to write formal specifications in Lean, prove properties about algorithms, then implement the actual system in a production language while using Lean-generated tests or runtime checks to verify the implementation matches the specification. Alternatively, Lean excels for tools where correctness matters more than ecosystem maturity: proof automation, code generators, domain-specific languages, and programs where the type system’s expressiveness justifies the ecosystem tradeoffs.

Basics

True to the title of this article series, we start from zero. Not “Hello, World!” but an actual zero: the natural number that forms the foundation of arithmetic.

Zero

-- From Zero to QED: let's start at the very beginning
def zero : Nat := 0

#eval zero  -- Output: 0

-- The natural numbers are defined inductively:
-- Nat.zero is the base case
-- Nat.succ n is the successor of n
def one : Nat := Nat.succ Nat.zero
def two : Nat := Nat.succ (Nat.succ Nat.zero)

#eval one   -- Output: 1
#eval two   -- Output: 2

-- Of course, we can just write the literals directly
def fortyTwo : Nat := 42

-- The answer to life, the universe, and everything
theorem deep_thought : fortyTwo = 6 * 7 := rfl

This first example introduces three toplevel declarations that you will use constantly:

def defines a named value or function. Here def zero : Nat := 0 declares that zero has type Nat (natural number) and equals 0. Every Lean program is built from def declarations.
#eval evaluates an expression and prints the result. This command runs code immediately, useful for testing as you work. Commands starting with # are interactive queries that do not create permanent definitions.
theorem declares a proposition to be proved. The name deep_thought labels the statement fortyTwo = 6 * 7, and rfl (reflexivity) proves it by computation: both sides reduce to 42. Unlike def, theorem proofs are opaque and never unfold during type checking.

The natural numbers are perhaps the most fundamental type in mathematics and programming. Lean represents them inductively: zero is the base case, and every other natural number is the successor of another. This simple construction gives us the entire infinite sequence 0, 1, 2, 3, and so on.

Natural Numbers

Natural numbers in Lean represent non-negative integers, defined inductively just as Peano intended in 1889. They support standard arithmetic operations, but subtraction truncates at zero since negative results would fall outside the type. This has caused approximately as many bugs as unsigned integers in C, which is to say: more than anyone wants to admit.

-- Natural numbers (Nat) are non-negative integers: 0, 1, 2, 3, ...
def myNat : Nat := 42
def anotherNat : Nat := 100

-- Basic arithmetic
#eval 3 + 5      -- 8
#eval 10 - 3     -- 7 (truncated subtraction: 3 - 10 = 0)
#eval 4 * 7      -- 28
#eval 17 / 5     -- 3 (integer division)
#eval 17 % 5     -- 2 (modulo)

-- Natural number subtraction truncates at zero
#eval 3 - 10  -- 0, not -7

-- Comparison returns Bool
#eval 5 < 10     -- true
#eval 5 ≤ 5      -- true
#eval 10 == 10   -- true

Lean has two equality operators. The == operator is decidable equality, returning a Bool for use in programs. The = operator is propositional equality, returning a Prop for use in proofs. For runtime computation, use ==. For stating theorems, use =. Both work with #eval because Lean can decide equality for natural numbers.

Integers

When you need negative numbers, use Int. Integer arithmetic behaves as you would expect from standard mathematics, unburdened by the horrors of two’s complement overflow that have plagued systems programmers since the PDP-11.

-- Integers (Int) include negative numbers
def myInt : Int := -17
def posInt : Int := 42

-- Integer arithmetic handles negatives properly
-- Starting with a negative infers Int automatically
#eval -5 + 3     -- -2
#eval -4 * -3    -- 12
#eval -17 / 5    -- -3
#eval -17 % 5    -- -2
-- But positive minus larger needs annotation
#eval (3 : Int) - 10     -- -7

-- Converting between Nat and Int
#eval Int.ofNat 42             -- 42 as Int
#eval (42 : Int).toNat         -- 42 as Nat
#eval (-5 : Int).toNat         -- 0 (negative becomes 0)
#eval (42 : Int).natAbs        -- 42 (absolute value as Nat)
#eval (-42 : Int).natAbs       -- 42

Modules and Namespaces

Lean organizes code into modules and namespaces. This section covers the practical syntax; we revisit the underlying mechanics in Type Theory.

Files and Modules. Each .lean file defines a module. The file Foo/Bar/Baz.lean defines module Foo.Bar.Baz. To use definitions from another module, import it at the top of your file with import Mathlib.Data.Nat.Prime or import Mathlib for an entire library. Imports are transitive: if A imports B and B imports C, then A has access to C’s definitions. The Lake build system (covered in Build System) manages dependencies and ensures modules are compiled in the correct order.

Namespaces. Namespaces group related definitions under a common prefix. They prevent name collisions and organize large codebases:

namespace Geometry2
  structure Point2 where
    x : Float
    y : Float

  def theOrigin : Point2 := ⟨0.0, 0.0⟩

  def dist (p q : Point2) : Float :=
    let dx := p.x - q.x
    let dy := p.y - q.y
    Float.sqrt (dx * dx + dy * dy)
end Geometry2

-- Access with full path
#eval Geometry2.dist Geometry2.theOrigin ⟨3.0, 4.0⟩  -- 5.0

The angle brackets ⟨ and ⟩ are shorthand for structure constructors. These two lines are equivalent:

def explicit : Point2 := Point2.mk 3.0 4.0
def shorthand : Point2 := ⟨3.0, 4.0⟩

The open command brings namespace contents into scope, so you can write dist instead of Geometry2.dist:

-- Open brings namespace contents into scope
open Geometry2 in
#eval dist theOrigin ⟨3.0, 4.0⟩  -- 5.0

-- Open for a definition
open Geometry2 in
def unitCirclePoint (θ : Float) : Point2 := ⟨Float.cos θ, Float.sin θ⟩

Sections and Variables. The section command creates a scope for temporary declarations. Variables declared with variable inside a section are automatically added as parameters to definitions that use them:

section VectorOps
  variable (α : Type) [Add α] [Mul α]

  -- α and the instances are automatically added as implicit parameters
  def doubleIt (x : α) : α := x + x
  def squareIt (x : α) : α := x * x
end VectorOps

#eval doubleIt Nat 21    -- 42
#eval squareIt Nat 7     -- 49

The bracket notation deserves explanation. Round brackets mark explicit arguments you pass directly. Square brackets mark instance arguments that Lean finds automatically through type class resolution. Here, [Add α] means the type must have an Add instance, which provides the + operator. Curly braces mark implicit arguments that Lean infers from context.

Visibility. By default, all definitions are public. Mark definitions as private to hide them outside the current file:

namespace Internal
  private def helperVal : Nat := 42

  def publicApi : Nat := helperVal * 2
end Internal

#eval Internal.publicApi          -- 84
-- helperVal is not accessible outside this file

Export. The export command re-exports definitions from one namespace into another, making them available without opening the original:

namespace Math
  def square (x : Nat) : Nat := x * x
  def cube (x : Nat) : Nat := x * x * x
end Math

namespace Prelude
  -- Re-export square from Math into Prelude
  export Math (square)
end Prelude

-- Now square is available via Prelude without opening Math
#eval Prelude.square 5  -- 25

Functions

Functions are first-class values in Lean. You can define them in multiple ways and partially apply them to create new functions.

def add (x : Nat) (y : Nat) : Nat :=
  x + y

def double : Nat → Nat :=
  fun x => 2 * x

def addFive := add 5  -- Partially applied function

#eval add 3 4        -- Output: 7
#eval double 21      -- Output: 42
#eval addFive 10     -- Output: 15

When declaring multiple parameters of the same type, you can group them: (x y z : Nat) is identical to (x : Nat) (y : Nat) (z : Nat). Use whichever reads better.

Pattern Matching

Pattern matching is a powerful feature for destructuring data and defining functions by cases.

def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

def describe : Nat → String
  | 0 => "zero"
  | 1 => "one"
  | 2 => "two"
  | n => s!"many ({n})"

#eval factorial 5    -- Output: 120
#eval describe 0     -- Output: "zero"
#eval describe 100   -- Output: "many (100)"

The factorial definition uses n + 1 rather than the seemingly more natural | n => n * factorial (n - 1). This is not a style choice. Lean must verify that recursive calls terminate, and it does this by checking that arguments decrease structurally. The pattern n + 1 desugars to Nat.succ n, explicitly matching a successor. When you recurse on n, Lean sees that n is structurally smaller than Nat.succ n. With | n => ... factorial (n - 1), Lean cannot immediately see that n - 1 is smaller than n (subtraction is a function, not a constructor), so termination checking fails.

The describe function uses string interpolation with s!"many ({n})". The s! prefix enables interpolation: expressions inside {...} are evaluated and converted to strings. Without the prefix, curly braces are literal characters.

Toplevel Declarations

Every Lean file is a sequence of toplevel declarations. These are the building blocks of every program and proof. You have now encountered most of them; here is a complete reference with links to examples.

Definitions and Proofs:

Declaration	Purpose	Example
`def`	Define a value or function	Zero
`theorem`	State and prove a proposition (opaque)	Zero, Proving
`lemma`	Same as `theorem`	Proving
`example`	Anonymous proof (not saved)	Type Theory
`abbrev`	Transparent abbreviation	Abbrev example
`opaque`	Hide implementation	Opaque example
`axiom`	Unproven assumption	Axiom example

Type Declarations:

Declaration	Purpose	Example
`inductive`	Define type with constructors	Data Structures
`structure`	Single-constructor with fields	Data Structures
`class`	Type class interface	Polymorphism
`instance`	Type class implementation	Polymorphism
`mutual`	Mutually recursive definitions	Dependent Types

Organization:

Declaration	Purpose	Example
`import`	Load another module	Modules and Namespaces
`variable`	Auto-add to definitions	Modules and Namespaces
`namespace`	Group under prefix	Modules and Namespaces
`section`	Scope for variables	Modules and Namespaces
`open`	Bring names into scope	Modules and Namespaces
`universe`	Declare universe levels	Universe example
`attribute`	Attach metadata	Attribute example
`export`	Re-export from namespace	Modules and Namespaces
`notation`	Custom syntax	Notation example
`set_option`	Configure compiler	Set Option example

Interactive Commands:

Command	Purpose	Example
`#eval`	Evaluate and print	Zero
`#check`	Display type	Commands example
`#print`	Print declaration info	Commands example
`#reduce`	Reduce to normal form	Commands example

The distinction between def and theorem matters for performance. Lean marks theorem proofs as opaque, meaning they are never unfolded during type checking. This keeps proof terms from bloating computations. Use def for values you need to compute with and theorem for propositions you need to prove.

Abbrev Example

Abbreviations are transparent definitions that unfold automatically during elaboration. Use them for type aliases.

-- abbrev creates a transparent abbreviation (always unfolded)
abbrev NatPair := Nat × Nat
abbrev Predicate' (α : Type) := α → Bool

def isEvenPred : Predicate' Nat := fun n => n % 2 == 0
def sumPair (p : NatPair) : Nat := p.1 + p.2

#eval isEvenPred 4       -- true
#eval sumPair (3, 7)     -- 10

Opaque Example

Opaque definitions hide their implementation from the type checker. Useful for abstracting implementation details.

-- opaque hides the implementation (never unfolds)
opaque secretKey : Nat

-- The type checker cannot see any value for secretKey
-- This is useful for abstraction barriers

Axiom Example

The axiom declaration asserts something without proof. It is the escape hatch from the proof system: you declare that something is true and Lean believes you. This is extremely dangerous. If you assert something false, you can then prove anything at all, including False itself. The system becomes unsound.

Warning

Axioms should be used only in narrow circumstances: foundational assumptions like the law of excluded middle or the axiom of choice (which Mathlib already provides), FFI bindings where proofs are impossible because the implementation is external, or as temporary placeholders during development (though sorry is preferred since it generates a warning). Before adding a custom axiom, ask whether you actually need it. Usually the answer is no.

-- axiom asserts something without proof
-- WARNING: Incorrect axioms make the entire system inconsistent!
axiom magicNumber : Nat
axiom magicNumber_positive : magicNumber > 0

-- Use axioms only for:
-- 1. Foundational assumptions (excluded middle, choice)
-- 2. FFI bindings where proofs are impossible
-- 3. Temporary placeholders during development (prefer sorry)

Lean’s kernel accepts axioms unconditionally. The #print axioms command shows which axioms a theorem depends on, which is useful for verifying that your proofs rely only on the standard foundational axioms you expect.

Attribute Example

Attributes tag declarations with metadata that affects how Lean processes them. The @[simp] attribute is the most common; see Tactics for how simp uses it.

-- Attributes attach metadata to declarations
@[simp] theorem add_zero_right' (n : Nat) : n + 0 = n := Nat.add_zero n

-- The @[simp] attribute marks this for use by the simp tactic
-- Common attributes: simp, inline, reducible, instance, class

Commands Example

-- #check shows the type of an expression
#check (fun x : Nat => x + 1)  -- Nat → Nat
#check @List.map              -- shows full polymorphic type

-- #print shows information about a declaration
#print Nat.add
#print List

-- #reduce reduces an expression to normal form
#reduce (fun x => x + 1) 5    -- 6
#reduce List.map (· + 1) [1, 2, 3]  -- [2, 3, 4]

Universe Example

Universes prevent paradoxes in type theory. Types themselves have types, forming an infinite hierarchy: Nat : Type 0, Type 0 : Type 1, and so on. The .{u} syntax declares a universe parameter local to a single definition. For file-wide universe variables, use universe u v at the top level, then u and v are available throughout the file. See Universe Stratification for why this hierarchy exists and Universe Polymorphism for the full story.

-- Types themselves have types, forming a hierarchy
#check (Nat : Type 0)      -- Nat lives in Type 0
#check (Type 0 : Type 1)   -- Type 0 lives in Type 1
#check (Type 1 : Type 2)   -- and so on...

-- Universe variables let definitions work at any level
-- The .{u} syntax declares a universe parameter for this definition
def myId.{u} (α : Type u) (x : α) : α := x

-- myId works at any universe level
#check myId Nat 42         -- α = Nat (in Type 0)
#check myId (Type 0) Nat   -- α = Type 0 (in Type 1)

Notation Example

Custom notation lets you extend Lean’s syntax. Here is the basic syntax; see Dependent Types: Notation for more.

-- Custom notation for domain-specific syntax
notation "⟪" a ", " b "⟫" => (a, b)

#eval ⟪1, 2⟫  -- (1, 2)

-- Prefix notation
prefix:max "∼" => fun (n : Nat) => n + 1

#eval ∼5  -- 6

Set Option Example

Compiler options control elaboration and pretty-printing. You will rarely need these until you hit edge cases.

-- set_option configures compiler and elaborator behavior
-- Show implicit arguments that are normally hidden
set_option pp.explicit true in
#check @id Nat 5  -- shows: @id Nat 5 : Nat

-- maxRecDepth controls recursion during elaboration (type-checking),
-- not runtime. #reduce fully unfolds expressions at compile time:
-- #reduce (List.range 500).length  -- ERROR without increased limit
set_option maxRecDepth 2000 in
#reduce (List.range 500).length  -- 500

The @ prefix forces explicit argument mode: @id Nat 5 passes the type Nat explicitly instead of letting Lean infer it. Combined with set_option pp.explicit true, this shows all normally-hidden arguments. The maxRecDepth option increases how deeply Lean will recurse during elaboration, needed here because #reduce fully unfolds List.range 500 at compile time.

Functional Programming Essentials

Lean is a functional language. Functions compose, pipelines chain, and higher-order functions do the heavy lifting.

Composition combines functions right-to-left: (f ∘ g) x means f (g x). Pipelines chain left-to-right: x |> f |> g means g (f x). Both express the same computation; choose whichever reads better.

Higher-order functions take functions as arguments. The classics: map transforms each element, filter keeps elements matching a predicate, foldl reduces a list to a single value by accumulating from the left. In foldl (· + ·) 100 [1,2,3,4,5], the 100 is the initial accumulator; the function adds each element in turn, producing 115.

And because Lean is also a theorem prover, we can prove properties by computation. The theorem add_comm_example states that 2 + 3 = 3 + 2, and rfl proves it because both sides reduce to 5. The examples at the end go further: reversing a list twice returns the original, list lengths add correctly, mapping a function produces the expected result.

What is the difference between #eval [1,2,3].reverse.reverse = [1,2,3] and example : [1,2,3].reverse.reverse = [1,2,3] := rfl? The #eval runs at runtime and prints true. The example is verified at compile time by the type checker, and if the equality did not hold, compilation would fail. Both check the same fact, but example catches the error before you ship. Note that this rfl proof verifies this specific list; proving it for all lists requires a theorem with induction. We cover proofs properly in Proofs.

-- Function composition: (f ∘ g) x = f (g x)
def twice (n : Nat) := n * 2
def square (n : Nat) := n * n

#eval (square ∘ twice) 3           -- 36: square(twice(3))
#eval (twice ∘ square) 3           -- 18: twice(square(3))

-- Pipelines: x |> f |> g = g (f x)
#eval 5 |> twice |> square         -- 100
#eval [1, 2, 3] |> List.map twice  -- [2, 4, 6]

-- Higher-order functions
#eval [1, 2, 3, 4, 5].map (· * 2)         -- [2, 4, 6, 8, 10]
#eval [1, 2, 3, 4, 5].filter (· > 2)      -- [3, 4, 5]
#eval [1, 2, 3, 4, 5].foldl (· + ·) 100   -- 115 (100 + 1 + 2 + 3 + 4 + 5)

-- Programs are proofs: addition is commutative
theorem add_comm_example : 2 + 3 = 3 + 2 := rfl

-- Computation IS proof: complex operations verified by evaluation
example : [1, 2, 3].reverse.reverse = [1, 2, 3] := rfl
example : [1, 2, 3].length + [4, 5].length = 5 := rfl
example : [1, 2, 3].map (· + 10) = [11, 12, 13] := rfl

From Values to Structure

You now have the building blocks: numbers, functions, modules, and the fundamental declarations. Next we cover the data structures that make programs useful: lists, arrays, maps, and user-defined types. After that, we explore control flow, polymorphism, effects, and IO. By the end of Arc I, you will have built a D&D character generator, which is either a useful demonstration of structured programming or an excuse to start a D&D campaign. Possibly both.

Data Structures

Programs are nothing without data to manipulate. Here we cover the types that hold your data: from simple primitives like characters and booleans to collections like lists and arrays to user-defined structures and inductive types. By the end, you will have the vocabulary to represent any data your program needs.

Fin

Fin n represents natural numbers strictly less than n. The type carries a proof that its value is in bounds, making it useful for safe array indexing.

-- Fin n is the type of natural numbers less than n
def smallNum : Fin 5 := 3      -- 3 is less than 5
def anotherSmall : Fin 10 := 7 -- 7 is less than 10

-- Fin values carry a proof that they're in bounds
#eval (smallNum : Fin 5).val   -- 3 (the underlying Nat)

-- Useful for array indexing with guaranteed bounds
def safeIndex {α : Type} (arr : Array α) (i : Fin arr.size) : α :=
  arr[i]

-- Fin arithmetic wraps around
#eval (3 : Fin 5) + 4  -- 2 (wraps: 7 mod 5 = 2)

Tip

Notice that Fin n bundles a value with a proof about that value. This pattern appears everywhere in Lean. Types can contain proofs. Later you will see this is not a special feature but a consequence of something deeper: the Curry-Howard correspondence, where propositions are types and proofs are values.

Fixed-Precision Integers

For performance-critical code or when interfacing with external systems, Lean provides fixed-precision integers that map directly to machine types.

-- Fixed-precision unsigned integers
def byte : UInt8 := 255
def word : UInt16 := 65535
def dword : UInt32 := 0xDEADBEEF
def qword : UInt64 := 0xCAFEBABE12345678

-- Overflow wraps around
#eval (255 : UInt8) + 1  -- 0

-- Size type for platform-dependent sizing
def platformSize : USize := 42

-- Signed fixed-precision integers
def signedByte : Int8 := -128
def signedWord : Int16 := -32768

Bitvectors

Bitvectors represent fixed-width binary data and support bitwise operations. They are essential for low-level programming, cryptography, and hardware verification.

-- BitVec n is an n-bit vector
def bits8 : BitVec 8 := 0xFF
def bits16 : BitVec 16 := 0xABCD
def bits32 : BitVec 32 := 0xDEADBEEF

-- Bitwise operations
#eval (0b1100 : BitVec 4) &&& 0b1010  -- 0b1000 (AND)
#eval (0b1100 : BitVec 4) ||| 0b1010  -- 0b1110 (OR)
#eval (0b1100 : BitVec 4) ^^^ 0b1010  -- 0b0110 (XOR)
#eval ~~~(0b1100 : BitVec 4)          -- 0b0011 (NOT)

-- Shifts
#eval (0b0001 : BitVec 4) <<< 2  -- 0b0100 (left shift)
#eval (0b1000 : BitVec 4) >>> 2  -- 0b0010 (right shift)

Floats

Lean supports IEEE 754 double-precision floating-point numbers for scientific computing and applications that require real number approximations.

-- IEEE 754 double-precision floating-point
def myFloat : Float := 3.14159
def scientific : Float := 6.022e23
def negativeFloat : Float := -273.15

-- Floating-point arithmetic
#eval 3.14 + 2.86    -- 6.0
#eval 10.0 / 3.0     -- 3.333...
#eval Float.sqrt 2.0 -- 1.414...
#eval Float.sin 0.0  -- 0.0
#eval Float.cos 0.0  -- 1.0

-- Special values
#eval (1.0 / 0.0 : Float)   -- inf
#eval (0.0 / 0.0 : Float)   -- nan

Chars

Characters in Lean are Unicode scalar values, capable of representing any character from any human language, mathematical symbols, and bears.

-- Characters are Unicode scalar values
def letterA : Char := 'A'
def digit : Char := '7'
def unicode : Char := 'λ'
def bear : Char := '🐻'

-- Character properties
#eval 'A'.isAlpha      -- true
#eval '7'.isDigit      -- true
#eval ' '.isWhitespace -- true
#eval 'a'.isLower      -- true
#eval 'A'.isUpper      -- true

-- Character to/from Nat
#eval 'A'.toNat        -- 65
#eval Char.ofNat 65    -- 'A'

-- Case conversion
#eval 'a'.toUpper      -- 'A'
#eval 'Z'.toLower      -- 'z'

Strings

Strings are sequences of characters with a rich set of operations for text processing. They are UTF-8 encoded, which means you have already won half the battle that consumed the first decade of web development.

-- Strings are sequences of characters
def greeting : String := "Hello, Lean!"
def multiline : String := "Line 1\nLine 2\nLine 3"

-- String operations
#eval "Hello".length           -- 5
#eval "Hello".append " World"  -- "Hello World"
#eval "Hello" ++ " " ++ "World" -- "Hello World"
#eval "Hello".toList           -- ['H', 'e', 'l', 'l', 'o']

-- String interpolation
def shipName := "Mistake Not My Current State Of Alarm"
def shipClass := "GCU"
#eval s!"The {shipClass} {shipName} has entered the system."

-- Substring operations
#eval "Hello World".take 5     -- "Hello"
#eval "Hello World".drop 6     -- "World"
#eval "Hello".isPrefixOf "Hello World"  -- true

-- Splitting and joining
#eval "a,b,c".splitOn ","      -- ["a", "b", "c"]
#eval ",".intercalate ["a", "b", "c"]  -- "a,b,c"

Unit

The Unit type has exactly one value: (). It serves as a placeholder when a function has no meaningful return value, similar to void in C except that void is a lie and Unit is honest about being boring. Colloquially: every function can return Unit because there is only one possible value to return. Category theorists call this the terminal object (for any type $A$, there exists exactly one function $A \to \text{Unit}$), but you do not need category theory to use it.

-- Unit has exactly one value: ()
def nothing : Unit := ()

-- Often used for side-effecting functions
def printAndReturn : IO Unit := do
  IO.println "Side effect!"
  return ()

-- Unit in function types indicates "no meaningful return value"
def greetIO (name : String) : IO Unit :=
  IO.println s!"Hello, {name}!"

Empty

The Empty type has no values at all. Colloquially: you can write a function from Empty to anything because you will never have to actually produce an output, there are no inputs to handle. Category theorists call this the initial object (for any type $A$, there exists exactly one function $\text{Empty} \to A$), but again, the jargon is optional. Empty represents logical impossibility and marks unreachable code branches. If you somehow obtain a value of type Empty, you can derive anything from it, a principle the medievals called ex falso quodlibet: from falsehood, anything follows.

-- Empty has no values at all
-- It represents impossibility or unreachable code

-- If you have a value of type Empty, you can prove anything
def absurd' {α : Type} (e : Empty) : α :=
  Empty.elim e

-- Empty is useful for marking impossible cases
inductive Void where  -- Custom empty type (equivalent to Empty)

Note

You do not need to fully understand Unit, Empty, or Fin to write Lean programs. Use them when they fit; ignore the theory until you need it. The Proofs and Type Theory articles explain the deeper connections, including the Curry-Howard correspondence that links these types to logic.

Booleans

Booleans represent truth values and form the basis of conditional logic. George Boole would be pleased, though he might find it curious that his algebra of logic became the foundation for arguments about whether 0 or 1 should represent truth.

-- Booleans: true and false
def myBool : Bool := true
def myFalse : Bool := false

-- Boolean operations
#eval true && false   -- false (and)
#eval true || false   -- true (or)
#eval !true           -- false (not)
#eval true ^^ false   -- true (xor)

-- Conditionals
def absInt (x : Int) : Int :=
  if x < 0 then -x else x

#eval absInt (-5)  -- 5

-- Boolean decision
#eval if true then "yes" else "no"   -- "yes"
#eval if false then "yes" else "no"  -- "no"

Option

The Option type represents values that may or may not exist. It is Lean’s safe alternative to null references, which Tony Hoare famously called his “billion dollar mistake.” With Option, absence is explicit in the type: you cannot forget to check because the compiler will not let you. The hollow log either contains honey or it does not, and you must handle both cases.

-- Option represents a value that may or may not exist
def someValue : Option Nat := some 42
def noValue : Option Nat := none

-- Pattern matching on Option
def getOrDefault (opt : Option Nat) (default : Nat) : Nat :=
  match opt with
  | some x => x
  | none => default

#eval getOrDefault (some 10) 0  -- 10
#eval getOrDefault none 0       -- 0

-- Option combinators
#eval (some 5).map (· * 2)      -- some 10
#eval (none : Option Nat).map (· * 2)  -- none
#eval (some 5).getD 0           -- 5
#eval (none : Option Nat).getD 0       -- 0
#eval (some 5).isSome           -- true
#eval (some 5).isNone           -- false

-- Chaining Options
#eval (some 5).bind (fun x => some (x + 1))  -- some 6

Tuples

Tuples combine values of potentially different types into a single value. They are the basic building block for returning multiple values from functions.

-- Tuples combine values of different types
def pair : Nat × String := (42, "answer")
def triple : Nat × String × Bool := (1, "one", true)

-- Accessing tuple elements
#eval pair.1        -- 42
#eval pair.2        -- "answer"
#eval pair.fst      -- 42
#eval pair.snd      -- "answer"

-- Pattern matching on tuples
def swap {α β : Type} (p : α × β) : β × α :=
  let (a, b) := p
  (b, a)

#eval swap (1, "hello")  -- ("hello", 1)

-- Nested tuples
def nested : (Nat × Nat) × String := ((1, 2), "pair")
#eval nested.1.1    -- 1
#eval nested.1.2    -- 2

Sum Types

Sum types represent a choice between two alternatives. The Except variant is commonly used for error handling.

-- Sum types represent a choice between types
def leftValue : Nat ⊕ String := Sum.inl 42
def rightValue : Nat ⊕ String := Sum.inr "hello"

-- Pattern matching on Sum
def describeSum (s : Nat ⊕ String) : String :=
  match s with
  | Sum.inl n => s!"A number: {n}"
  | Sum.inr str => s!"A string: {str}"

#eval describeSum leftValue   -- "A number: 42"
#eval describeSum rightValue  -- "A string: hello"

-- Except is like Sum but for error handling
def divideExcept (x y : Nat) : Except String Nat :=
  if y == 0 then
    Except.error "Division by zero"
  else
    Except.ok (x / y)

#eval divideExcept 10 2   -- Except.ok 5
#eval divideExcept 10 0   -- Except.error "Division by zero"

Lists

Lists are singly-linked sequences of elements, the workhorse data structure of functional programming since LISP introduced them in 1958. They support pattern matching and have a rich set of higher-order operations. Prepending is $O(1)$; appending is $O(n)$. If this bothers you, wait until you meet Arrays.

-- Linked lists
def myList : List Nat := [1, 2, 3, 4, 5]
def emptyList : List Nat := []

-- List construction
def consExample := 0 :: [1, 2, 3]  -- [0, 1, 2, 3]
def appendExample := [1, 2] ++ [3, 4]  -- [1, 2, 3, 4]

-- Common operations
#eval [1, 2, 3].length       -- 3
#eval [1, 2, 3].head?        -- some 1
#eval [1, 2, 3].tail?        -- some [2, 3]
#eval [1, 2, 3][1]?          -- some 2
#eval [1, 2, 3].reverse      -- [3, 2, 1]

-- Higher-order functions
#eval [1, 2, 3].map (· * 2)  -- [2, 4, 6]
#eval [1, 2, 3, 4].filter (· > 2)  -- [3, 4]
#eval [1, 2, 3, 4].foldl (· + ·) 0  -- 10

-- Cartesian product via flatMap
#eval [1, 2].flatMap (fun x => [10, 20].map (fun y => x + y))  -- [11, 21, 12, 22]

Arrays

Arrays provide $O(1)$ random access and are the preferred choice when you need indexed access without the pointer-chasing of linked lists. Thanks to Lean’s reference counting, operations on unshared arrays mutate in place, giving you the performance of imperative code with the semantics of pure functions. Purity without the performance penalty. The trick is that “unshared” does a lot of work in that sentence.

-- Arrays provide O(1) random access
def myArray : Array Nat := #[1, 2, 3, 4, 5]
def emptyArray : Array Nat := #[]

-- Array operations
#eval #[1, 2, 3].size        -- 3
#eval #[1, 2, 3][0]!         -- 1 (panics if out of bounds)
#eval #[1, 2, 3][1]?         -- some 2
#eval #[1, 2, 3][10]?        -- none

-- Modifying arrays (creates new array)
#eval #[1, 2, 3].push 4      -- #[1, 2, 3, 4]
#eval #[1, 2, 3].pop         -- #[1, 2]
#eval #[1, 2, 3].set! 1 99   -- #[1, 99, 3]

-- Conversion
#eval #[1, 2, 3].toList      -- [1, 2, 3]
#eval [1, 2, 3].toArray      -- #[1, 2, 3]

-- Higher-order functions
#eval #[1, 2, 3].map (· * 2)     -- #[2, 4, 6]
#eval #[1, 2, 3, 4].filter (· % 2 == 0)  -- #[2, 4]

ByteArrays

ByteArrays are efficient arrays of bytes, useful for binary data, file I/O, and network protocols.

-- ByteArray is an efficient array of bytes
def bytes : ByteArray := ByteArray.mk #[0x48, 0x65, 0x6C, 0x6C, 0x6F]

-- Operations
#eval bytes.size                 -- 5
#eval bytes.get! 0               -- 72 (0x48 = 'H')
#eval bytes.toList               -- [72, 101, 108, 108, 111]

-- Convert to/from String (UTF-8)
#eval "Hello".toUTF8             -- ByteArray from string

Maps and Sets

Hash maps and hash sets provide efficient key-value storage and membership testing.

-- HashMap for key-value storage
open Std in
def myMap : HashMap String Nat :=
  HashMap.emptyWithCapacity
    |>.insert "one" 1
    |>.insert "two" 2
    |>.insert "three" 3

#eval myMap.get? "two"     -- some 2
#eval myMap.get? "four"    -- none
#eval myMap.contains "one" -- true
#eval myMap.size           -- 3

-- HashSet for unique elements
open Std in
def mySet : HashSet Nat :=
  HashSet.emptyWithCapacity
    |>.insert 1
    |>.insert 2
    |>.insert 3
    |>.insert 2  -- duplicate ignored

#eval mySet.contains 2     -- true
#eval mySet.contains 5     -- false
#eval mySet.size           -- 3

Subtypes

Subtypes refine an existing type with a predicate. The value carries both the data and a proof that the predicate holds. This is where dependent types begin to flex: instead of checking at runtime whether a number is positive, you encode positivity in the type itself. The predicate becomes part of the contract, enforced at compile time.

-- Subtypes refine a type with a predicate
def Positive := { n : Nat // n > 0 }

def five : Positive := ⟨5, by decide⟩

-- Accessing subtype values
#eval five.val    -- 5 (the underlying Nat)
-- five.property is a proof that 5 > 0

-- Common subtypes
def NonEmptyList (α : Type) := { xs : List α // xs ≠ [] }

def exampleNEL : NonEmptyList Nat :=
  ⟨[1, 2, 3], by decide⟩

-- Safe head function for non-empty lists
def safeHead {α : Type} [Inhabited α] (nel : NonEmptyList α) : α :=
  match nel.val with
  | x :: _ => x
  | [] => default  -- unreachable due to property, but needed for totality

#eval safeHead exampleNEL  -- 1

Structures

Structures are Lean’s way of grouping related data with named fields.

structure Point where
  x : Float
  y : Float
  deriving Repr

def origin : Point := ⟨0.0, 0.0⟩
def myPoint : Point := { x := 3.0, y := 4.0 }

def distance (p : Point) : Float :=
  Float.sqrt (p.x * p.x + p.y * p.y)

#eval distance myPoint  -- Output: 5.0

Inductive Types

Inductive types allow you to define custom data types by specifying their constructors.

inductive SpellSchool where
  | abjuration    -- Protective magic
  | conjuration   -- Summoning things from elsewhere
  | divination    -- Knowing things you shouldn't
  | enchantment   -- Making friends (involuntarily)
  | evocation     -- Fireballs, obviously
  | illusion      -- Lying, but with magic
  | necromancy    -- Asking corpses for favors
  | transmutation -- Turning lead into gold (or frogs)
  deriving Repr, DecidableEq

def schoolDanger : SpellSchool → Nat
  | .abjuration    => 1
  | .divination    => 2
  | .illusion      => 3
  | .transmutation => 4
  | .enchantment   => 5
  | .conjuration   => 6
  | .evocation     => 8  -- Fireballs in enclosed spaces
  | .necromancy    => 9  -- Ethical concerns

inductive MyList (α : Type) where
  | nil : MyList α
  | cons : α → MyList α → MyList α

def MyList.length {α : Type} : MyList α → Nat
  | MyList.nil => 0
  | MyList.cons _ tail => 1 + tail.length

Type Classes

Type classes provide a way to define generic interfaces that can be implemented for different types.

def showTwice {α : Type} [ToString α] (x : α) : String :=
  s!"{x} {x}"

#eval showTwice 42           -- Output: "42 42"
#eval showTwice "hello"      -- Output: "hello hello"
#eval showTwice true         -- Output: "true true"

Example: Magic The Gathering

We now have enough Lean to model something from the real world. Naturally, we choose Magic: The Gathering. The game has been proven Turing complete and as hard as the halting problem, so it makes a worthy adversary.

Mana System

Five colors of mana (white, blue, black, red, green) plus colorless. An inductive type captures the colors; a structure captures costs with default values of zero:

inductive ManaColor where
  | white | blue | black | red | green | colorless
  deriving Repr, DecidableEq

structure ManaCost where
  white : Nat := 0
  blue : Nat := 0
  black : Nat := 0
  red : Nat := 0
  green : Nat := 0
  colorless : Nat := 0
  deriving Repr

def ManaCost.total (c : ManaCost) : Nat :=
  c.white + c.blue + c.black + c.red + c.green + c.colorless

Players accumulate mana in a pool. The key question: can you afford a spell? The pay function returns Option ManaPool, giving back the remaining mana on success or none if you cannot afford the cost:

structure ManaPool where
  white : Nat := 0
  blue : Nat := 0
  black : Nat := 0
  red : Nat := 0
  green : Nat := 0
  colorless : Nat := 0
  deriving Repr

def ManaPool.total (p : ManaPool) : Nat :=
  p.white + p.blue + p.black + p.red + p.green + p.colorless

def ManaPool.canAfford (pool : ManaPool) (cost : ManaCost) : Bool :=
  pool.white >= cost.white &&
  pool.blue >= cost.blue &&
  pool.black >= cost.black &&
  pool.red >= cost.red &&
  pool.green >= cost.green &&
  pool.total >= cost.total

def ManaPool.pay (pool : ManaPool) (cost : ManaCost) : Option ManaPool :=
  if pool.canAfford cost then
    some { white := pool.white - cost.white
           blue := pool.blue - cost.blue
           black := pool.black - cost.black
           red := pool.red - cost.red
           green := pool.green - cost.green
           colorless := pool.total - cost.total -
             (pool.white - cost.white) - (pool.blue - cost.blue) -
             (pool.black - cost.black) - (pool.red - cost.red) -
             (pool.green - cost.green) }
  else
    none

Card Types

Cards come in different types. Creatures have power and toughness; instants, sorceries, and artifacts do not. An inductive type with arguments captures this: the creature constructor carries two Nat values, while others carry nothing:

inductive CardType where
  | creature (power : Nat) (toughness : Nat)
  | instant
  | sorcery
  | enchantment
  | artifact
  deriving Repr

structure Card where
  name : String
  cost : ManaCost
  cardType : CardType
  deriving Repr

Some iconic cards to work with. Note how .creature 2 2 constructs a creature type inline, using the dot notation shorthand:

def goblinGuide : Card :=
  { name := "Goblin Guide"
    cost := { red := 1 }
    cardType := .creature 2 2 }

def searingSpear : Card :=
  { name := "Searing Spear"
    cost := { red := 1, colorless := 1 }
    cardType := .instant }

def dayOfJudgment : Card :=
  { name := "Day of Judgment"
    cost := { white := 2, colorless := 2 }
    cardType := .sorcery }

def swordOfFire : Card :=
  { name := "Sword of Fire and Ice"
    cost := { colorless := 3 }
    cardType := .artifact }

def graveTitan : Card :=
  { name := "Grave Titan"
    cost := { black := 2, colorless := 4 }
    cardType := .creature 6 6 }

Pattern matching extracts information from cards. Querying a non-creature for its power returns none:

def Card.isCreature (c : Card) : Bool :=
  match c.cardType with
  | .creature _ _ => true
  | _ => false

def Card.power (c : Card) : Option Nat :=
  match c.cardType with
  | .creature p _ => some p
  | _ => none

def Card.toughness (c : Card) : Option Nat :=
  match c.cardType with
  | .creature _ t => some t
  | _ => none

Combat

Creatures on the battlefield can attack and block. A Creature structure tracks accumulated damage. When damage meets or exceeds toughness, the creature dies:

structure Creature where
  name : String
  power : Nat
  toughness : Nat
  damage : Nat := 0
  deriving Repr

def Creature.fromCard (c : Card) : Option Creature :=
  match c.cardType with
  | .creature p t => some { name := c.name, power := p, toughness := t }
  | _ => none

def Creature.isAlive (c : Creature) : Bool :=
  c.damage < c.toughness

def Creature.takeDamage (c : Creature) (dmg : Nat) : Creature :=
  { c with damage := c.damage + dmg }

def Creature.canBlock (blocker attacker : Creature) : Bool :=
  blocker.isAlive && attacker.isAlive

def combat (attacker blocker : Creature) : Creature × Creature :=
  let attackerAfter := attacker.takeDamage blocker.power
  let blockerAfter := blocker.takeDamage attacker.power
  (attackerAfter, blockerAfter)

Combat is simultaneous: both creatures deal damage at once. Two 2/2 creatures blocking each other results in mutual destruction.

Hand Management

A hand is a list of cards. List operations let us query it: which cards can we cast with our current mana? How many creatures do we have? What is the total mana cost?

abbrev Hand := List Card

def Hand.playable (hand : Hand) (pool : ManaPool) : List Card :=
  hand.filter (fun c => pool.canAfford c.cost)

def Hand.creatures (hand : Hand) : List Card :=
  hand.filter Card.isCreature

def Hand.totalCost (hand : Hand) : Nat :=
  hand.foldl (fun acc c => acc + c.cost.total) 0

The filter function takes a predicate and keeps matching elements. The foldl function reduces a list to a single value. These are the workhorses of functional programming, and they compose naturally: hand.filter Card.isCreature gives all creatures, hand.playable pool gives everything castable.

Tip

Run from the repository: lake exe mtg. The full source is on GitHub.

From Data to Control

You now have the building blocks for representing any data your program needs. Next we cover how to work with that data: control flow, recursion, and the patterns that give programs their structure.

Control Flow and Structures

Most languages let you lie to the compiler. Lean does not. There are no statements, only expressions. Every if produces a value. Every match is exhaustive or the compiler complains. Every recursive function must terminate or you must convince the system otherwise. Where imperative languages let you wave your hands, Lean demands you show your work.

Conditionals

Lean’s if-then-else is an expression, not a statement. Every branch must produce a value of the same type, and the entire expression evaluates to that value. There is no void, no implicit fall-through, no forgetting to return. The ternary operator that other languages treat as a curiosity is simply how conditionals work here.

-- If-then-else expressions
def absolute (x : Int) : Int :=
  if x < 0 then -x else x

#eval absolute (-5)   -- 5
#eval absolute 3      -- 3

-- Nested conditionals
def classifyNumber (n : Int) : String :=
  if n < 0 then "negative"
  else if n == 0 then "zero"
  else "positive"

#eval classifyNumber (-10)  -- "negative"
#eval classifyNumber 0      -- "zero"
#eval classifyNumber 42     -- "positive"

-- Conditionals are expressions, not statements
def minValue (a b : Nat) : Nat :=
  if a < b then a else b

#eval minValue 3 7  -- 3

Pattern Matching

Pattern matching with match expressions lets you destructure data and handle cases exhaustively. The compiler verifies you have covered every possibility, which eliminates an entire category of bugs that haunt languages where the default case is “do nothing and hope for the best.” You can match on constructors, literals, and multiple values simultaneously.

-- Pattern matching with match
def describeList {α : Type} (xs : List α) : String :=
  match xs with
  | [] => "empty"
  | [_] => "singleton"
  | [_, _] => "pair"
  | _ => "many elements"

#eval describeList ([] : List Nat)     -- "empty"
#eval describeList [1]                  -- "singleton"
#eval describeList [1, 2]               -- "pair"
#eval describeList [1, 2, 3, 4]         -- "many elements"

-- Matching on multiple values
def fizzbuzz (n : Nat) : String :=
  match n % 3, n % 5 with
  | 0, 0 => "FizzBuzz"
  | 0, _ => "Fizz"
  | _, 0 => "Buzz"
  | _, _ => toString n

#eval (List.range 16).map fizzbuzz
-- ["FizzBuzz", "1", "2", "Fizz", "4", "Buzz", ...]

-- Guards in pattern matching
def classifyAge (age : Nat) : String :=
  match age with
  | 0 => "infant"
  | n => if n < 13 then "child"
         else if n < 20 then "teenager"
         else "adult"

#eval classifyAge 5   -- "child"
#eval classifyAge 15  -- "teenager"
#eval classifyAge 30  -- "adult"

Simple Recursion

Recursive functions are fundamental to functional programming. A function processing a list works through elements one by one, patient and systematic, eventually reaching the empty case and returning upstream with its catch. In Lean, functions that call themselves must be shown to terminate. For simple structural recursion on inductive types like Nat or List, Lean can automatically verify termination.

-- Simple recursion on natural numbers
def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

#eval factorial 5   -- 120
#eval factorial 10  -- 3628800

-- Recursion on lists
def sum : List Nat → Nat
  | [] => 0
  | x :: xs => x + sum xs

#eval sum [1, 2, 3, 4, 5]  -- 15

-- Computing the length of a list
def length {α : Type} : List α → Nat
  | [] => 0
  | _ :: xs => 1 + length xs

#eval length [1, 2, 3]  -- 3

Tail Recursion

Naive recursion can overflow the stack for large inputs because each recursive call adds a frame. Tail recursion solves this by restructuring the computation so the recursive call is the last operation, allowing the compiler to optimize it into a loop. Scheme mandated tail call optimization in 1975. Most other languages did not, which is why stack traces exist.

-- Naive recursion can overflow the stack for large inputs
-- Tail recursion uses an accumulator to avoid this

-- Tail-recursive factorial
def factorialTR (n : Nat) : Nat :=
  let rec go (acc : Nat) : Nat → Nat
    | 0 => acc
    | k + 1 => go (acc * (k + 1)) k
  go 1 n

#eval factorialTR 20  -- 2432902008176640000

-- Tail-recursive sum
def sumTR (xs : List Nat) : Nat :=
  let rec go (acc : Nat) : List Nat → Nat
    | [] => acc
    | x :: rest => go (acc + x) rest
  go 0 xs

#eval sumTR (List.range 1000)  -- Sum of 0..999

-- Tail-recursive reverse
def reverseTR {α : Type} (xs : List α) : List α :=
  let rec go (acc : List α) : List α → List α
    | [] => acc
    | x :: rest => go (x :: acc) rest
  go [] xs

#eval reverseTR [1, 2, 3, 4, 5]  -- [5, 4, 3, 2, 1]

Structural Recursion

Lean requires all recursive functions to terminate, which prevents you from accidentally writing infinite loops and passing them off as proofs. For simple cases where the recursive argument structurally decreases, Lean verifies termination automatically. For more complex cases, you must provide termination hints, essentially explaining to the compiler why your clever recursion scheme actually finishes. The termination checker is not easily fooled.

-- Lean requires recursion to be well-founded
-- Structural recursion on decreasing arguments is automatic

-- Merge two sorted lists into one sorted list
def merge : List Nat → List Nat → List Nat
  | [], ys => ys
  | xs, [] => xs
  | x :: xs, y :: ys =>
    if x ≤ y then x :: merge xs (y :: ys)
    else y :: merge (x :: xs) ys

-- Full merge sort: split at midpoint, recurse, merge
def mergeSort (xs : List Nat) : List Nat :=
  if h : xs.length < 2 then xs
  else
    let mid := xs.length / 2
    let left := xs.take mid
    let right := xs.drop mid
    have hl : left.length < xs.length := by
      have h1 : mid < xs.length := Nat.div_lt_self (by omega) (by omega)
      have h2 : left.length ≤ mid := List.length_take_le mid xs
      omega
    have hr : right.length < xs.length := by
      simp only [List.length_drop, right, mid]
      have : mid > 0 := Nat.div_pos (by omega) (by omega)
      omega
    merge (mergeSort left) (mergeSort right)
termination_by xs.length

#eval mergeSort [3, 1, 4, 1, 5, 9, 2, 6]  -- [1, 1, 2, 3, 4, 5, 6, 9]

The merge function is structurally recursive: each call operates on a smaller list. The mergeSort function is trickier. It splits the list at the midpoint and recurses on both halves. Lean cannot immediately see that take and drop produce shorter lists, so we provide have clauses that prove the lengths decrease. The termination_by xs.length annotation tells Lean to measure termination by list length rather than structural decrease.

Escape Hatches: partial and do

Sometimes you just want to compute something. The termination checker is a feature, not a prison. When proving termination would require more ceremony than the problem warrants, Lean provides escape hatches.

The partial keyword marks a function that might not terminate. Lean skips the termination proof and trusts you. The tradeoff: partial functions cannot be used in proofs since a non-terminating function could “prove” anything. For computation, this is often acceptable.

-- When termination is hard to prove, partial lets you skip the proof
partial def findFixpoint (f : Nat → Nat) (x : Nat) : Nat :=
  let y := f x
  if y == x then x else findFixpoint f y

#eval findFixpoint (· / 2 + 1) 100  -- 2

-- Sum even Fibonacci numbers below a bound (Project Euler #2)
partial def sumEvenFibsBelow (bound : Nat) : Nat := Id.run do
  let mut a := 0
  let mut b := 1
  let mut sum := 0
  while b < bound do
    if b % 2 == 0 then
      sum := sum + b
    let next := a + b
    a := b
    b := next
  return sum

#eval sumEvenFibsBelow 4000000  -- 4613732

The second example uses Id.run do to write imperative-looking code in a pure context. The Id monad is the identity monad, and Id.run extracts the final value. The mut keyword introduces mutable bindings; := reassigns them. Lean compiles this into pure functional operations. The resulting code is referentially transparent, but the syntax is familiar to programmers from imperative backgrounds.

This style shines for algorithms where the functional version would be contorted. Consider Project Euler Problem 2: sum even Fibonacci numbers below four million. The imperative version is direct. The pure functional version would thread accumulators through recursive calls, which is correct but harder to read.

Use partial when exploring, prototyping, or when the termination argument would distract from the actual logic. When you need to prove properties about the function, you will need to establish termination. But not everything needs to be a theorem. Sometimes you just want an answer.

Structures

Structures group related data with named fields. If you have used records in ML, structs in Rust, or data classes in Kotlin, the concept is familiar. Unlike C structs, Lean structures come with automatically generated accessor functions, projection notation, and none of the memory layout anxiety.

-- Structures group related data with named fields
structure Point where
  x : Float
  y : Float
  deriving Repr

-- Creating structure instances
def origin : Point := { x := 0.0, y := 0.0 }
def point1 : Point := Point.mk 3.0 4.0
def point2 : Point := ⟨1.0, 2.0⟩

-- Accessing fields
#eval point1.x  -- 3.0
#eval point1.y  -- 4.0

-- Functions on structures
def distance (p : Point) : Float :=
  Float.sqrt (p.x * p.x + p.y * p.y)

#eval distance point1  -- 5.0

The deriving Repr clause automatically generates a Repr instance, which lets #eval display the structure’s contents. Without it, Lean would not know how to print a Point. Other commonly derived instances include BEq for equality comparison with ==, Hashable for use in hash maps, and DecidableEq for propositional equality that can be checked at runtime. You can derive multiple instances by listing them: deriving Repr, BEq, Hashable. The Polymorphism chapter covers this in more detail.

Updating Structures

Rather than modifying structures in place, Lean provides the with syntax for creating new structures based on existing ones with some fields changed. This functional update pattern means you never have to wonder whether some other part of the code is holding a reference to your data and will be surprised by your mutations.

-- Updating structures with "with" syntax
def moveRight (p : Point) (dx : Float) : Point :=
  { p with x := p.x + dx }

def moveUp (p : Point) (dy : Float) : Point :=
  { p with y := p.y + dy }

#eval moveRight origin 5.0  -- { x := 5.0, y := 0.0 }

-- Multiple field updates
def translate (p : Point) (dx dy : Float) : Point :=
  { p with x := p.x + dx, y := p.y + dy }

#eval translate origin 3.0 4.0  -- { x := 3.0, y := 4.0 }

Nested Structures

Structures can contain other structures, allowing you to build complex data hierarchies. Lean’s projection notation makes accessing nested fields readable: person.address.city works as you would hope, without the verbose getter chains of enterprise Java.

-- Nested structures
structure Rectangle where
  topLeft : Point
  bottomRight : Point
  deriving Repr

def myRect : Rectangle := {
  topLeft := { x := 0.0, y := 10.0 },
  bottomRight := { x := 10.0, y := 0.0 }
}

def width (r : Rectangle) : Float :=
  r.bottomRight.x - r.topLeft.x

def height (r : Rectangle) : Float :=
  r.topLeft.y - r.bottomRight.y

def area (r : Rectangle) : Float :=
  width r * height r

#eval area myRect  -- 100.0

Default Values

Structures can have default values for their fields, making it easy to create instances with sensible defaults while overriding only specific fields.

-- Structures with default values
structure Config where
  host : String := "localhost"
  port : Nat := 8080
  debug : Bool := false
  deriving Repr

-- Use defaults
def defaultConfig : Config := {}

-- Override some defaults
def prodConfig : Config := { host := "api.example.com", port := 443 }

#eval defaultConfig  -- { host := "localhost", port := 8080, debug := false }
#eval prodConfig     -- { host := "api.example.com", port := 443, debug := false }

Inductive Types: Enumerations

Inductive types let you define custom data types by specifying their constructors. This is the core mechanism that makes Lean’s type system expressive: natural numbers, lists, trees, and abstract syntax all emerge from the same primitive. Simple enumerations have constructors with no arguments; more complex types carry data in each variant. Like Starfleet’s ship classification system, each variant is distinct and the compiler ensures you handle them all.

-- Simple enumerations
inductive Direction where
  | north
  | south
  | east
  | west
  deriving Repr, DecidableEq

def opposite : Direction → Direction
  | .north => .south
  | .south => .north
  | .east => .west
  | .west => .east

#eval opposite Direction.north  -- Direction.south

-- Starfleet vessel classes
inductive StarshipClass where
  | galaxy      -- Galaxy-class (Enterprise-D)
  | sovereign   -- Sovereign-class (Enterprise-E)
  | defiant     -- Defiant-class (compact warship)
  | intrepid    -- Intrepid-class (Voyager)
  | constitution -- Constitution-class (original Enterprise)
  deriving Repr, DecidableEq

def crewComplement : StarshipClass → Nat
  | .galaxy => 1014       -- Families welcome
  | .sovereign => 855     -- More tactical
  | .defiant => 50        -- Tough little ship
  | .intrepid => 141      -- Long-range science
  | .constitution => 430  -- The classic

#eval crewComplement StarshipClass.defiant  -- 50

-- Enums with associated data (MTG spell types)
inductive Spell where
  | creature (power : Nat) (toughness : Nat) (manaCost : Nat)
  | instant (manaCost : Nat)
  | sorcery (manaCost : Nat)
  | enchantment (manaCost : Nat) (isAura : Bool)
  deriving Repr

def manaCost : Spell → Nat
  | .creature _ _ cost => cost
  | .instant cost => cost
  | .sorcery cost => cost
  | .enchantment cost _ => cost

def canBlock : Spell → Bool
  | .creature _ toughness _ => toughness > 0
  | _ => false

#eval manaCost (Spell.creature 3 3 4)        -- 4 (e.g., a 3/3 for 4 mana)
#eval manaCost (Spell.instant 2)              -- 2 (e.g., Counterspell)
#eval canBlock (Spell.creature 2 1 1)         -- true
#eval canBlock (Spell.enchantment 3 true)     -- false

Recursive Inductive Types

Inductive types can be recursive, allowing you to define trees, linked lists, and other self-referential structures. This is where inductive types earn their name: you define larger values in terms of smaller ones, and the recursion has a base case that grounds the whole edifice.

-- Recursive inductive types
inductive BinaryTree (α : Type) where
  | leaf : BinaryTree α
  | node : α → BinaryTree α → BinaryTree α → BinaryTree α
  deriving Repr

-- Building trees
def exampleTree : BinaryTree Nat :=
  .node 1
    (.node 2 .leaf .leaf)
    (.node 3
      (.node 4 .leaf .leaf)
      .leaf)

-- Counting nodes
def BinaryTree.size {α : Type} : BinaryTree α → Nat
  | .leaf => 0
  | .node _ left right => 1 + left.size + right.size

#eval exampleTree.size  -- 4

-- Computing depth
def BinaryTree.depth {α : Type} : BinaryTree α → Nat
  | .leaf => 0
  | .node _ left right => 1 + max left.depth right.depth

#eval exampleTree.depth  -- 3

-- In-order traversal
def BinaryTree.inorder {α : Type} : BinaryTree α → List α
  | .leaf => []
  | .node x left right => left.inorder ++ [x] ++ right.inorder

#eval exampleTree.inorder  -- [2, 1, 4, 3]

Reading constructor type signatures takes practice. The node constructor has type α → BinaryTree α → BinaryTree α → BinaryTree α. In any arrow chain A → B → C → D, the last type is the return type; everything before is an input. So node takes a value of type α, a left subtree, a right subtree, and produces a tree. The leaf constructor takes no arguments and represents an empty position where the tree ends.

Parameterized Types

Inductive types can be parameterized, making them generic over the types they contain. This is how you write a List α that works for any element type, or an expression tree parameterized by its literal type. One definition, infinitely many instantiations.

-- Expression trees parameterized by the literal type
inductive Expr (α : Type) where
  | lit : α → Expr α
  | add : Expr α → Expr α → Expr α
  | mul : Expr α → Expr α → Expr α
  deriving Repr

-- Evaluate for any type with Add and Mul instances
def Expr.eval {α : Type} [Add α] [Mul α] : Expr α → α
  | .lit n => n
  | .add e1 e2 => e1.eval + e2.eval
  | .mul e1 e2 => e1.eval * e2.eval

-- Integer expression: (2 + 3) * 4
def intExpr : Expr Int := .mul (.add (.lit 2) (.lit 3)) (.lit 4)
#eval intExpr.eval  -- 20

-- Float expression: (1.5 + 2.5) * 3.0
def floatExpr : Expr Float := .mul (.add (.lit 1.5) (.lit 2.5)) (.lit 3.0)
#eval floatExpr.eval  -- 12.0

-- Map a function over all literals
def Expr.map {α β : Type} (f : α → β) : Expr α → Expr β
  | .lit n => .lit (f n)
  | .add e1 e2 => .add (e1.map f) (e2.map f)
  | .mul e1 e2 => .mul (e1.map f) (e2.map f)

-- Convert int expression to float
def floatFromInt : Expr Float := intExpr.map (fun n => Float.ofInt n)
#eval floatFromInt.eval  -- 20.0

Mutual Recursion

Sometimes you need multiple definitions that refer to each other, like even and odd functions that call each other, or a parser and its sub-parsers. Lean supports mutually recursive definitions within a mutual block. The termination checker handles these jointly, so your circular definitions must still demonstrably finish.

-- Mutually recursive definitions
mutual
  def isEven : Nat → Bool
    | 0 => true
    | n + 1 => isOdd n

  def isOdd : Nat → Bool
    | 0 => false
    | n + 1 => isEven n
end

#eval isEven 10  -- true
#eval isOdd 7    -- true

-- Mutually recursive types
mutual
  inductive Tree (α : Type) where
    | node : α → Forest α → Tree α

  inductive Forest (α : Type) where
    | nil : Forest α
    | cons : Tree α → Forest α → Forest α
end

-- Example forest
def exampleForest : Forest Nat :=
  .cons (.node 1 .nil)
    (.cons (.node 2 (.cons (.node 3 .nil) .nil))
      .nil)

FizzBuzz

FizzBuzz is the canonical “can you actually program” interview question, famous for filtering candidates who cannot write a loop. Pattern matching on multiple conditions makes it elegant: match on whether divisible by 3, whether divisible by 5, and the four cases fall out naturally.

def fizzbuzz' (n : Nat) : String :=
  match n % 3 == 0, n % 5 == 0 with
  | true,  true  => "FizzBuzz"
  | true,  false => "Fizz"
  | false, true  => "Buzz"
  | false, false => toString n

def runFizzbuzz (limit : Nat) : List String :=
  (List.range limit).map fun i => fizzbuzz' (i + 1)

#eval runFizzbuzz 15

Tip

Run from the repository: lake exe fizzbuzz 20

The Collatz Conjecture

The Collatz conjecture states that repeatedly applying a simple rule (halve if even, triple and add one if odd) eventually reaches 1 for any positive starting integer. Proposed in 1937, it remains unproven. Mathematicians have verified it for numbers up to $2^{68}$, yet no one can prove it always works. Erdos said “Mathematics is not yet ready for such problems.”

The recursion here needs fuel (a maximum step count) because we cannot prove termination. If we could, we would have solved a famous open problem.

/-- The Collatz conjecture: every positive integer eventually reaches 1.
    Unproven since 1937, but we can at least watch the journey. -/
def collatzStep (n : Nat) : Nat :=
  if n % 2 == 0 then n / 2 else 3 * n + 1

def collatzSequence (n : Nat) (fuel : Nat := 1000) : List Nat :=
  match fuel with
  | 0 => [n]  -- give up, though Collatz would be disappointed
  | fuel' + 1 =>
    if n <= 1 then [n]
    else n :: collatzSequence (collatzStep n) fuel'

def collatzLength (n : Nat) : Nat :=
  (collatzSequence n).length

-- The famous 27: takes 111 steps and peaks at 9232
#eval collatzSequence 27
#eval collatzLength 27

-- Find the longest sequence for starting values 1 to n
def longestCollatz (n : Nat) : Nat × Nat :=
  (List.range n).map (· + 1)
    |>.map (fun k => (k, collatzLength k))
    |>.foldl (fun acc pair => if pair.2 > acc.2 then pair else acc) (1, 1)

#eval longestCollatz 100  -- (97, 119)

Tip

Run from the repository: lake exe collatz 27

Role Playing Game Example

The constructs above combine naturally in larger programs. What better way to demonstrate this than modeling the “real world”? We will build a Dungeons & Dragons character generator. D&D is a tabletop role-playing game where players create characters with ability scores, races, and classes, then embark on adventures guided by dice rolls and a referee called the Dungeon Master. The game has been running since 1974, which makes it older than most programming languages and considerably more fun than COBOL.

Structures hold character data, inductive types represent races and classes, pattern matching computes racial bonuses, and recursion drives the dice-rolling simulation.

We start by defining the data types that model our domain. The AbilityScores structure bundles the six core abilities. Inductive types enumerate races, classes, and backgrounds. The Character structure ties everything together:

structure AbilityScores where
  strength : Nat
  dexterity : Nat
  constitution : Nat
  intelligence : Nat
  wisdom : Nat
  charisma : Nat
  deriving Repr

inductive Race where
  | human
  | elf
  | dwarf
  | halfling
  | dragonborn
  | tiefling
  deriving Repr, DecidableEq

inductive CharClass where
  | fighter   -- d10 hit die
  | wizard    -- d6 hit die
  | rogue     -- d8 hit die
  | cleric    -- d8 hit die
  | barbarian -- d12 hit die
  | bard      -- d8 hit die
  deriving Repr, DecidableEq

inductive Background where
  | soldier
  | sage
  | criminal
  | noble
  | hermit
  | entertainer
  deriving Repr, DecidableEq

structure Character where
  name : String
  race : Race
  charClass : CharClass
  background : Background
  level : Nat
  abilities : AbilityScores
  hitPoints : Nat
  deriving Repr

Racial bonuses modify ability scores based on the character’s race. Pattern matching maps each race to its specific bonuses. Humans get +1 to everything; elves favor dexterity and intelligence; dwarves are sturdy:

def racialBonuses (r : Race) : AbilityScores :=
  match r with
  | .human      => { strength := 1, dexterity := 1, constitution := 1,
                     intelligence := 1, wisdom := 1, charisma := 1 }
  | .elf        => { strength := 0, dexterity := 2, constitution := 0,
                     intelligence := 1, wisdom := 0, charisma := 0 }
  | .dwarf      => { strength := 0, dexterity := 0, constitution := 2,
                     intelligence := 0, wisdom := 1, charisma := 0 }
  | .halfling   => { strength := 0, dexterity := 2, constitution := 0,
                     intelligence := 0, wisdom := 0, charisma := 1 }
  | .dragonborn => { strength := 2, dexterity := 0, constitution := 0,
                     intelligence := 0, wisdom := 0, charisma := 1 }
  | .tiefling   => { strength := 0, dexterity := 0, constitution := 0,
                     intelligence := 1, wisdom := 0, charisma := 2 }

Each character class has a different hit die, representing how much health they gain per level. Wizards are fragile with a d6, while barbarians are tanks with a d12:

def hitDie (c : CharClass) : Nat :=
  match c with
  | .wizard    => 6
  | .rogue     => 8
  | .bard      => 8
  | .cleric    => 8
  | .fighter   => 10
  | .barbarian => 12

Random character generation needs randomness. A linear congruential generator provides deterministic pseudo-random numbers, which means the same seed produces the same character:

structure RNG where
  state : Nat
  deriving Repr

def RNG.next (rng : RNG) : RNG × Nat :=
  let newState := (rng.state * 1103515245 + 12345) % (2^31)
  (⟨newState⟩, newState)

def RNG.range (rng : RNG) (lo hi : Nat) : RNG × Nat :=
  let (rng', n) := rng.next
  let range := hi - lo + 1
  (rng', lo + n % range)

D&D ability scores use 4d6-drop-lowest: roll four six-sided dice and discard the lowest. This generates scores between 3 and 18, heavily weighted toward the middle. We thread the RNG state through each dice roll to maintain determinism:

def roll4d6DropLowest (rng : RNG) : RNG × Nat :=
  let (rng1, d1) := rng.range 1 6
  let (rng2, d2) := rng1.range 1 6
  let (rng3, d3) := rng2.range 1 6
  let (rng4, d4) := rng3.range 1 6
  let dice := [d1, d2, d3, d4]
  let sorted := dice.toArray.qsort (· < ·) |>.toList
  let top3 := sorted.drop 1
  (rng4, top3.foldl (· + ·) 0)

def rollAbilityScores (rng : RNG) : RNG × AbilityScores :=
  let (rng1, str) := roll4d6DropLowest rng
  let (rng2, dex) := roll4d6DropLowest rng1
  let (rng3, con) := roll4d6DropLowest rng2
  let (rng4, int) := roll4d6DropLowest rng3
  let (rng5, wis) := roll4d6DropLowest rng4
  let (rng6, cha) := roll4d6DropLowest rng5
  (rng6, { strength := str, dexterity := dex, constitution := con,
           intelligence := int, wisdom := wis, charisma := cha })

Character generation threads the RNG through multiple operations: pick a name, race, class, and background, roll ability scores, apply racial bonuses, and calculate starting hit points:

def applyRacialBonuses (base : AbilityScores) (race : Race) : AbilityScores :=
  let bonus := racialBonuses race
  { strength := base.strength + bonus.strength
    dexterity := base.dexterity + bonus.dexterity
    constitution := base.constitution + bonus.constitution
    intelligence := base.intelligence + bonus.intelligence
    wisdom := base.wisdom + bonus.wisdom
    charisma := base.charisma + bonus.charisma }

def abilityModifier (score : Nat) : Int :=
  (score : Int) / 2 - 5

def startingHP (con : Nat) (c : CharClass) : Nat :=
  let conMod := abilityModifier con
  let baseHP := hitDie c
  if conMod < 0 && baseHP < conMod.natAbs then 1
  else (baseHP : Int) + conMod |>.toNat

def pickRandom {α : Type} (rng : RNG) (xs : List α) (default : α) : RNG × α :=
  match xs with
  | [] => (rng, default)
  | _ =>
    let (rng', idx) := rng.range 0 (xs.length - 1)
    (rng', xs[idx]?.getD default)

def allRaces : List Race := [.human, .elf, .dwarf, .halfling, .dragonborn, .tiefling]
def allClasses : List CharClass := [.fighter, .wizard, .rogue, .cleric, .barbarian, .bard]
def allBackgrounds : List Background := [.soldier, .sage, .criminal, .noble, .hermit, .entertainer]

-- Names that would make sense at both a D&D table and a PL theory conference
def fantasyNames : List String := [
  "Alonzo the Untyped",     -- Church, lambda calculus
  "Haskell the Memory Guzzler", -- I didn't choose the thunk life, the thunk life chose me
  "Dana the Continuous",    -- Scott, domain theory
  "Thorin Typechecker",     -- Tolkien meets compilation
  "Edsger the Structured",  -- Dijkstra, structured programming
  "Kurt the Incomplete"     -- Gödel, incompleteness theorems
]

def generateCharacter (seed : Nat) : Character :=
  let rng : RNG := ⟨seed⟩
  let (rng1, name) := pickRandom rng fantasyNames "Adventurer"
  let (rng2, race) := pickRandom rng1 allRaces .human
  let (rng3, charClass) := pickRandom rng2 allClasses .fighter
  let (rng4, background) := pickRandom rng3 allBackgrounds .soldier
  let (_, baseAbilities) := rollAbilityScores rng4
  let abilities := applyRacialBonuses baseAbilities race
  let hp := startingHP abilities.constitution charClass
  { name := name
    race := race
    charClass := charClass
    background := background
    level := 1
    abilities := abilities
    hitPoints := hp }

Display functions convert internal representations to human-readable strings. The modifier calculation implements D&D’s (score / 2 - 5) formula:

def raceName : Race → String
  | .human => "Human"
  | .elf => "Elf"
  | .dwarf => "Dwarf"
  | .halfling => "Halfling"
  | .dragonborn => "Dragonborn"
  | .tiefling => "Tiefling"

def className : CharClass → String
  | .fighter => "Fighter"
  | .wizard => "Wizard"
  | .rogue => "Rogue"
  | .cleric => "Cleric"
  | .barbarian => "Barbarian"
  | .bard => "Bard"

def backgroundName : Background → String
  | .soldier => "Soldier"
  | .sage => "Sage"
  | .criminal => "Criminal"
  | .noble => "Noble"
  | .hermit => "Hermit"
  | .entertainer => "Entertainer"

def formatModifier (score : Nat) : String :=
  let m := abilityModifier score
  if m >= 0 then s!"+{m}" else s!"{m}"

def printCharacter (c : Character) : IO Unit := do
  IO.println "======================================"
  IO.println s!"  {c.name}"
  IO.println "======================================"
  IO.println s!"  Level {c.level} {raceName c.race} {className c.charClass}"
  IO.println s!"  Background: {backgroundName c.background}"
  IO.println ""
  IO.println "  ABILITY SCORES"
  IO.println "  --------------"
  IO.println s!"  STR: {c.abilities.strength} ({formatModifier c.abilities.strength})"
  IO.println s!"  DEX: {c.abilities.dexterity} ({formatModifier c.abilities.dexterity})"
  IO.println s!"  CON: {c.abilities.constitution} ({formatModifier c.abilities.constitution})"
  IO.println s!"  INT: {c.abilities.intelligence} ({formatModifier c.abilities.intelligence})"
  IO.println s!"  WIS: {c.abilities.wisdom} ({formatModifier c.abilities.wisdom})"
  IO.println s!"  CHA: {c.abilities.charisma} ({formatModifier c.abilities.charisma})"
  IO.println ""
  IO.println s!"  Hit Points: {c.hitPoints}"
  IO.println s!"  Hit Die: d{hitDie c.charClass}"
  IO.println "======================================"

The main function reads a seed from command-line arguments (defaulting to 42), generates a character, and prints it in a formatted sheet:

def main (args : List String) : IO Unit := do
  let seed := match args.head? >>= String.toNat? with
    | some n => n
    | none => 42

  IO.println ""
  IO.println "  D&D CHARACTER GENERATOR"
  IO.println "  Type-Safe Adventures Await!"
  IO.println ""

  let character := generateCharacter seed

  printCharacter character

  IO.println ""
  IO.println "Your adventure begins..."
  IO.println "(Use a different seed for a different character: lake exe dnd <seed>)"

Try different seeds to generate different characters. The generator uses a deterministic pseudo-random number generator, so the same seed always produces the same character.

Tip

Run from the repository: lake exe dnd 42. The full source is on GitHub.

Toward Abstraction

With structures and inductive types, you can model complex domains. But real programs need abstraction over types themselves. Polymorphism and type classes let you write code that works for any type satisfying certain constraints. This is how you build generic libraries without sacrificing type safety.

Polymorphism and Type Classes

On September 23, 1999, the Mars Climate Orbiter disintegrated in the Martian atmosphere because one piece of software produced thrust data in pound-force seconds while another expected newton-seconds. A 327 million dollar spacecraft, destroyed by a unit conversion error that any undergraduate physics student could catch. The software worked. The math was correct. The types simply failed to express the constraints that mattered.

This is what motivates the machinery in this article. Type classes and phantom types let you encode dimensional constraints directly in the type system. You cannot add meters to seconds because the compiler will not let you. You cannot pass thrust in the wrong units because the function signature forbids it. These constraints cost nothing at runtime, they compile away completely, but they catch at compile time the very class of error that destroyed a spacecraft. By the end of this article, you will see how to build such a system yourself.

But safety is only half the story. Polymorphism is also about not writing the same code twice. A sorting algorithm should not care whether it sorts integers, strings, or financial instruments. A data structure should not be rewritten for each type it might contain. The alternative is copying code and changing types by hand, which is how bugs are born and how programmers lose their minds. Polymorphism is the machinery that makes abstraction possible without sacrificing type safety.

In 1967, Christopher Strachey drew a distinction that would shape programming languages for decades: parametric polymorphism, where code works uniformly for all types, versus ad-hoc polymorphism, where the behavior changes based on the specific type involved. The first gives you reverse : List α → List α, a function blissfully ignorant of what the list contains. The second gives you +, which does quite different things to integers, floats, and matrices. Lean provides both, unified under a type class system that traces its lineage back to the 1989 paper How to Make Ad-Hoc Polymorphism Less Ad Hoc. The result is generic code that is simultaneously flexible and precise.

Parametric Polymorphism

Functions can take type parameters, allowing them to work with any type without knowing or caring what that type is. The function length : List α → Nat counts elements whether they are integers, strings, or proof terms. The curly braces indicate implicit arguments that Lean infers automatically, sparing you the tedium of writing length (α := Int) myList everywhere.

def identity {α : Type} (x : α) : α := x

#eval identity 42          -- 42
#eval identity "hello"     -- "hello"
#eval identity [1, 2, 3]   -- [1, 2, 3]

def compose {α β γ : Type} (g : β → γ) (f : α → β) : α → γ :=
  fun x => g (f x)

def addOne (x : Nat) : Nat := x + 1
def double (x : Nat) : Nat := x * 2

#eval compose double addOne 5  -- 12

def flip {α β γ : Type} (f : α → β → γ) : β → α → γ :=
  fun b a => f a b

#eval flip Nat.sub 3 10  -- 7

Polymorphic Data Types

Data types can also be parameterized, creating generic containers that work with any element type. You define List α once and get lists of integers, lists of strings, and lists of lists for free. The alternative, writing IntList, StringList, and ListOfLists separately, is how Java programmers spent the 1990s.

def Pair (α β : Type) := α × β

def makePair {α β : Type} (a : α) (b : β) : Pair α β := (a, b)

#eval makePair 1 "one"  -- (1, "one")

inductive Either (α β : Type) where
  | left : α → Either α β
  | right : β → Either α β
  deriving Repr

def mapEither {α β γ : Type} (f : β → γ) : Either α β → Either α γ
  | .left a => .left a
  | .right b => .right (f b)

#eval mapEither (· + 1) (Either.right 5 : Either String Nat)

Implicit Arguments

Implicit arguments in curly braces are inferred by Lean from context. When inference fails or you want to override it, the @ prefix makes everything explicit. This escape hatch is rarely needed, but when you need it, you really need it.

def listLength {α : Type} (xs : List α) : Nat :=
  match xs with
  | [] => 0
  | _ :: rest => 1 + listLength rest

#eval listLength [1, 2, 3]        -- 3
#eval listLength ["a", "b"]       -- 2
#eval @listLength Nat [1, 2, 3]   -- explicit type argument

def firstOrDefault {α : Type} (xs : List α) (default : α) : α :=
  match xs with
  | [] => default
  | x :: _ => x

#eval firstOrDefault [1, 2, 3] 0      -- 1
#eval firstOrDefault ([] : List Nat) 0 -- 0

Instance Arguments

Square brackets denote instance arguments, resolved through type class inference. When you write [Add α], you are saying “give me any type that knows how to add.” The compiler finds the right implementation automatically. This is the mechanism that lets + work on integers, floats, vectors, and anything else with an Add instance.

def printTwice {α : Type} [ToString α] (x : α) : String :=
  s!"{x} and {x}"

#eval printTwice 42       -- "42 and 42"
#eval printTwice true     -- "true and true"
#eval printTwice "hi"     -- "hi and hi"

def maximum {α : Type} [Ord α] (xs : List α) : Option α :=
  xs.foldl (init := none) fun acc x =>
    match acc with
    | none => some x
    | some m => if compare x m == .gt then some x else some m

#eval maximum [3, 1, 4, 1, 5, 9]  -- some 9
#eval maximum ["b", "a", "c"]     -- some "c"

Defining Type Classes

Type classes define interfaces that types can implement, but unlike object-oriented interfaces, the implementation is external to the type. You can add new capabilities to existing types without modifying them. This is how Lean can make + work for types defined in libraries you do not control.

class Printable (α : Type) where
  format : α → String

instance : Printable Nat where
  format n := s!"Nat({n})"

instance : Printable Bool where
  format b := if b then "yes" else "no"

instance : Printable String where
  format s := s!"\"{s}\""

def showValue {α : Type} [Printable α] (x : α) : String :=
  Printable.format x

#eval showValue 42      -- "Nat(42)"
#eval showValue true    -- "yes"
#eval showValue "test"  -- "\"test\""

Tip

A type class constraint like [Add α] is a proof obligation. The compiler must find evidence that α supports addition. Instance resolution is proof search. This connection between “finding implementations” and “finding proofs” is not a metaphor; it is the same mechanism. When you reach the Proofs article, you will see the compiler searching for proofs exactly as it searches for instances here.

Polymorphic Instances

Instances themselves can be polymorphic, building complex instances from simpler ones. If you can print an α, you can print a List α. This compositionality is the quiet superpower of type classes: small building blocks assemble into sophisticated behavior without explicit wiring.

instance {α : Type} [Printable α] : Printable (List α) where
  format xs :=
    let items := xs.map Printable.format
    "[" ++ ", ".intercalate items ++ "]"

instance {α β : Type} [Printable α] [Printable β] : Printable (α × β) where
  format p := s!"({Printable.format p.1}, {Printable.format p.2})"

#eval showValue [1, 2, 3]           -- "[Nat(1), Nat(2), Nat(3)]"
#eval showValue (42, true)          -- "(Nat(42), yes)"
#eval showValue [(1, true), (2, false)]

Numeric Type Classes

Type classes excel at abstracting over numeric operations. Write your algorithm once against an abstract Mul and Add, and it works for integers, rationals, complex numbers, matrices, and polynomials. The abstraction costs nothing at runtime because instance resolution happens at compile time. The generic code specializes to concrete operations in the generated code.

class Addable (α : Type) where
  add : α → α → α
  zero : α

instance : Addable Nat where
  add := Nat.add
  zero := 0

instance : Addable Int where
  add := Int.add
  zero := 0

instance : Addable Float where
  add := Float.add
  zero := 0.0

def sumList {α : Type} [Addable α] (xs : List α) : α :=
  xs.foldl Addable.add Addable.zero

#eval sumList [1, 2, 3, 4, 5]           -- 15
#eval sumList [1.5, 2.5, 3.0]           -- 7.0
#eval sumList ([-1, 2, -3] : List Int)  -- -2

Extending Classes

Type classes can extend other classes, inheriting their operations while adding new ones. An Ord instance gives you compare, and from that you get <, ≤, >, ≥, min, and max for free. The hierarchy of algebraic structures in Mathlib, from magmas through groups to rings and fields, is built this way.

class Eq' (α : Type) where
  eq : α → α → Bool

class Ord' (α : Type) extends Eq' α where
  lt : α → α → Bool

instance : Eq' Nat where
  eq := (· == ·)

instance : Ord' Nat where
  eq := (· == ·)
  lt := (· < ·)

def sortedInsert {α : Type} [Ord' α] (x : α) (xs : List α) : List α :=
  match xs with
  | [] => [x]
  | y :: ys => if Ord'.lt x y then x :: y :: ys else y :: sortedInsert x ys

#eval sortedInsert 3 [1, 2, 4, 5]  -- [1, 2, 3, 4, 5]

Functor

The Functor pattern captures the idea of mapping a function over a structure while preserving its shape. Lists, options, arrays, trees, and IO actions are all functors. Once you see the pattern, you see it everywhere: any context that wraps a value and lets you transform that value without escaping the context. Category theory formalizes this intuition, and the Yoneda lemma reveals that an object is completely determined by how other objects map into it.

class Functor' (F : Type → Type) where
  map : {α β : Type} → (α → β) → F α → F β

instance : Functor' List where
  map := List.map

instance : Functor' Option where
  map f
    | none => none
    | some x => some (f x)

def doubleAll {F : Type → Type} [Functor' F] (xs : F Nat) : F Nat :=
  Functor'.map (· * 2) xs

#eval doubleAll [1, 2, 3]      -- [2, 4, 6]
#eval doubleAll (some 21)      -- some 42
#eval doubleAll (none : Option Nat)  -- none

Multiple Constraints

Functions can require multiple type class constraints, combining capabilities from different classes. A sorting function needs Ord; a function that sorts and prints needs [Ord α] [Repr α]. The constraints document exactly what your function requires, nothing more, nothing less.

def showCompare {α : Type} [ToString α] [Ord α] (x y : α) : String :=
  let result := match compare x y with
    | .lt => "less than"
    | .eq => "equal to"
    | .gt => "greater than"
  s!"{x} is {result} {y}"

#eval showCompare 3 5       -- "3 is less than 5"
#eval showCompare "b" "a"   -- "b is greater than a"

def sortAndShow {α : Type} [ToString α] [Ord α] (xs : List α) : String :=
  let sorted := xs.toArray.qsort (compare · · == .lt) |>.toList
  s!"{sorted}"

#eval sortAndShow [3, 1, 4, 1, 5]  -- "[1, 1, 3, 4, 5]"

Deriving Instances

Many standard type classes can be automatically derived, saving you from writing boilerplate that follows predictable patterns. The deriving clause generates instances for Repr, BEq, Hashable, and others. Let the machine do the mechanical work; save your creativity for the parts that require thought.

structure Point' where
  x : Nat
  y : Nat
  deriving Repr, BEq, Hashable

#eval Point'.mk 1 2 == Point'.mk 1 2  -- true
#eval Point'.mk 1 2 == Point'.mk 3 4  -- false

inductive Color where
  | red | green | blue
  deriving Repr, DecidableEq

#eval Color.red == Color.red    -- true
#eval Color.red == Color.blue   -- false

structure Person where
  name : String
  age : Nat
  deriving Repr, BEq

def alice : Person := { name := "Alice", age := 30 }
def bob : Person := { name := "Bob", age := 25 }

#eval alice == bob  -- false
#eval repr alice    -- { name := "Alice", age := 30 }

Composable Spell Effects

Type classes shine when you have multiple types that share a common interface but differ in implementation. Consider a spell system where spells can deal damage, heal, apply buffs, or inflict status effects. Each effect type is different, but all effects can be described and have a potency:

structure Damage where
  amount : Nat
  element : Element
  deriving Repr

structure Healing where
  amount : Nat
  deriving Repr

structure Buff where
  stat : String
  bonus : Int
  duration : Nat
  deriving Repr

structure StatusEffect where
  name : String
  duration : Nat
  deriving Repr

The SpellEffect type class captures this abstraction. Any type with a SpellEffect instance can describe itself and report its potency:

class SpellEffect (ε : Type) where
  describe : ε → String
  potency : ε → Nat

instance : SpellEffect Damage where
  describe d := s!"{d.amount} {d.element.name} damage"
  potency d := d.amount

instance : SpellEffect Healing where
  describe h := s!"restore {h.amount} HP"
  potency h := h.amount

instance : SpellEffect Buff where
  describe b := s!"+{b.bonus} {b.stat} for {b.duration} turns"
  potency b := b.bonus.natAbs * b.duration

instance : SpellEffect StatusEffect where
  describe s := s!"{s.name} for {s.duration} turns"
  potency s := s.duration * 2

Combining Effects

The real payoff comes when effects can be combined. A spell that damages AND heals (like Drain Life) or damages AND poisons (like Venom Strike) requires representing compound effects. The Effect inductive wraps all effect types and adds a compound constructor for combinations:

inductive Effect where
  | damage (d : Damage)
  | healing (h : Healing)
  | buff (b : Buff)
  | status (s : StatusEffect)
  | compound (fst snd : Effect)
  deriving Repr

def Effect.describe : Effect → String
  | .damage d => SpellEffect.describe d
  | .healing h => SpellEffect.describe h
  | .buff b => SpellEffect.describe b
  | .status s => SpellEffect.describe s
  | .compound fst snd => s!"{fst.describe} + {snd.describe}"

def Effect.potency : Effect → Nat
  | .damage d => SpellEffect.potency d
  | .healing h => SpellEffect.potency h
  | .buff b => SpellEffect.potency b
  | .status s => SpellEffect.potency s
  | .compound fst snd => fst.potency + snd.potency

instance : SpellEffect Effect where
  describe := Effect.describe
  potency := Effect.potency

instance : Append Effect where
  append := Effect.compound

The Append instance gives us the ++ operator. Effects form a semigroup: you can combine any two effects into a compound effect. The describe and potency functions recurse through the structure, building descriptions like “6 dark damage + restore 6 HP” and summing potencies.

Spells and Casting

The Spell type is parameterized by its effect type. A Spell Damage deals damage; a Spell Effect can do anything:

structure Spell (ε : Type) where
  name : String
  manaCost : Nat
  effect : ε

def castSpell {ε : Type} [SpellEffect ε] (s : Spell ε) : String :=
  let desc := SpellEffect.describe s.effect
  let pot := SpellEffect.potency s.effect
  s!"{s.name} (Cost: {s.manaCost} MP): {desc} [Potency: {pot}]"

Simple spells have a single effect type:

def fireball : Spell Damage := ⟨"Fireball", 3, ⟨8, .fire⟩⟩
def frostbolt : Spell Damage := ⟨"Frostbolt", 2, ⟨5, .ice⟩⟩
def heal : Spell Healing := ⟨"Heal", 4, ⟨20⟩⟩
def haste : Spell Buff := ⟨"Haste", 3, ⟨"Speed", 2, 5⟩⟩

Compound spells combine multiple effects with ++:

def drainLife : Spell Effect :=
  ⟨"Drain Life", 4,
    .damage ⟨6, .dark⟩ ++ .healing ⟨6⟩⟩

def chaosStorm : Spell Effect :=
  ⟨"Chaos Storm", 8,
    .damage ⟨5, .fire⟩ ++ .damage ⟨5, .ice⟩ ++ .damage ⟨5, .lightning⟩⟩

def holyWrath : Spell Effect :=
  ⟨"Holy Wrath", 6,
    .damage ⟨10, .holy⟩ ++ .buff ⟨"Defense", 3, 3⟩⟩

def venomStrike : Spell Effect :=
  ⟨"Venom Strike", 3,
    .damage ⟨4, .dark⟩ ++ .status ⟨"Poisoned", 4⟩⟩

def battleHymn : Spell Effect :=
  ⟨"Battle Hymn", 5,
    .buff ⟨"Strength", 2, 4⟩ ++ .buff ⟨"Speed", 1, 4⟩ ++ .healing ⟨8⟩⟩

The castSpell function works uniformly over any spell whose effect type has a SpellEffect instance. Simple spells, compound spells, future effect types you have not invented yet: all handled by the same polymorphic function.

Tip

Run from the repository: lake exe spells

Semigroups and Monoids

Algebraic structures like semigroups and monoids capture patterns that recur across mathematics and programming. A semigroup has an associative operation; a monoid adds an identity element. String concatenation, list append, function composition, and integer addition are all monoids. Recognizing the common structure lets you write code once and apply it to all of them.

class Semigroup (α : Type) where
  append : α → α → α

class Monoid' (α : Type) extends Semigroup α where
  empty : α

def concat {α : Type} [Monoid' α] (xs : List α) : α :=
  xs.foldl Semigroup.append Monoid'.empty

instance : Monoid' String where
  append := String.append
  empty := ""

instance {α : Type} : Monoid' (List α) where
  append := List.append
  empty := []

#eval concat ["Hello", " ", "World"]  -- "Hello World"
#eval concat [[1, 2], [3], [4, 5]]    -- [1, 2, 3, 4, 5]

Example: Type-safe Dimensional Analysis

Here is the payoff for all the type class machinery. We can prevent the Mars Orbiter bug entirely, not through runtime checks but through types that make the error inexpressible. Consider representing physical quantities with phantom types:

/-!
# Type-Safe Units of Measurement

A demonstration of using phantom types and type classes to enforce dimensional
correctness at compile time with zero runtime overhead. The types exist only
in the compiler's mind; the generated code manipulates raw floats.
-/

-- Phantom types representing physical dimensions
-- These have no constructors and no runtime representation
structure Meters
structure Seconds
structure Kilograms
structure MetersPerSecond
structure MetersPerSecondSquared
structure Newtons
structure NewtonSeconds

-- A quantity is a value tagged with its unit type
-- At runtime, this is just a Float
structure Quantity (unit : Type) where
  val : Float
  deriving Repr

-- Type class for multiplying quantities with compatible units
class UnitMul (u1 u2 result : Type) where

-- Type class for dividing quantities with compatible units
class UnitDiv (u1 u2 result : Type) where

-- Define the dimensional relationships
instance : UnitDiv Meters Seconds MetersPerSecond where
instance : UnitDiv MetersPerSecond Seconds MetersPerSecondSquared where
instance : UnitMul Kilograms MetersPerSecondSquared Newtons where
instance : UnitMul Newtons Seconds NewtonSeconds where

-- Arithmetic operations that preserve dimensional correctness
def Quantity.add {u : Type} (x y : Quantity u) : Quantity u :=
  ⟨x.val + y.val⟩

def Quantity.sub {u : Type} (x y : Quantity u) : Quantity u :=
  ⟨x.val - y.val⟩

def Quantity.mul {u1 u2 u3 : Type} [UnitMul u1 u2 u3] (x : Quantity u1) (y : Quantity u2) : Quantity u3 :=
  ⟨x.val * y.val⟩

def Quantity.div {u1 u2 u3 : Type} [UnitDiv u1 u2 u3] (x : Quantity u1) (y : Quantity u2) : Quantity u3 :=
  ⟨x.val / y.val⟩

def Quantity.scale {u : Type} (x : Quantity u) (s : Float) : Quantity u :=
  ⟨x.val * s⟩

-- Smart constructors for readability
def meters (v : Float) : Quantity Meters := ⟨v⟩
def seconds (v : Float) : Quantity Seconds := ⟨v⟩
def kilograms (v : Float) : Quantity Kilograms := ⟨v⟩
def newtons (v : Float) : Quantity Newtons := ⟨v⟩
def newtonSeconds (v : Float) : Quantity NewtonSeconds := ⟨v⟩

-- Example: Computing velocity from distance and time
-- This compiles because Meters / Seconds = MetersPerSecond
def computeVelocity (distance : Quantity Meters) (time : Quantity Seconds)
    : Quantity MetersPerSecond :=
  distance.div time

-- Example: Computing force from mass and acceleration
-- This compiles because Kilograms * MetersPerSecondSquared = Newtons
def computeForce (mass : Quantity Kilograms) (accel : Quantity MetersPerSecondSquared)
    : Quantity Newtons :=
  mass.mul accel

-- Example: Computing impulse from force and time
-- This compiles because Newtons * Seconds = NewtonSeconds
def computeImpulse (force : Quantity Newtons) (time : Quantity Seconds)
    : Quantity NewtonSeconds :=
  force.mul time

-- The Mars Climate Orbiter scenario:
-- One team computes impulse in Newton-seconds
-- Another team expects the same units
-- The type system ensures they match

def thrusterImpulse : Quantity NewtonSeconds :=
  let force := newtons 450.0
  let burnTime := seconds 10.0
  computeImpulse force burnTime

-- This would NOT compile - you cannot add NewtonSeconds to Meters:
-- def nonsense := thrusterImpulse.add (meters 100.0)

-- This would NOT compile - you cannot pass Meters where Seconds expected:
-- def badVelocity := computeVelocity (meters 100.0) (meters 50.0)

-- The key insight: all the Quantity wrappers and phantom types vanish at runtime.
-- The generated code is just floating-point arithmetic.
-- The safety is free.

def main : IO Unit := do
  let distance := meters 100.0
  let time := seconds 9.58  -- Usain Bolt's 100m record
  let velocity := computeVelocity distance time

  IO.println s!"Distance: {distance.val} m"
  IO.println s!"Time: {time.val} s"
  IO.println s!"Velocity: {velocity.val} m/s"

  let mass := kilograms 1000.0
  let accel : Quantity MetersPerSecondSquared := ⟨9.81⟩
  let force := computeForce mass accel

  IO.println s!"Mass: {mass.val} kg"
  IO.println s!"Acceleration: {accel.val} m/s²"
  IO.println s!"Force: {force.val} N"

  let impulse := computeImpulse force (seconds 5.0)
  IO.println s!"Impulse: {impulse.val} N·s"

The Quantity type wraps a Float but carries a phantom type parameter representing its unit. You cannot add meters to seconds because Quantity.add requires both arguments to have the same unit type. You cannot pass thrust in the wrong units because the function signature encodes the dimensional requirements.

The crucial insight is that these phantom types vanish at runtime. The Meters and Seconds types have no constructors, no fields, no runtime representation whatsoever. The generated code manipulates raw floats with raw floating-point operations. The type checker enforces dimensional correctness; the compiled program pays no cost for it. This is the dream of static typing: safety that exists only in the compiler’s mind, free at runtime, catching at compile time the very class of error that destroyed a spacecraft.

There is a broader lesson here about the direction of software. The mathematics that physicists scribble on whiteboards, the dimensional analysis that engineers perform by hand, the invariants that programmers hold in their heads and document in comments: these are all pseudocode. They are precise enough for humans to follow but not precise enough for machines to verify. The project of programming language research, from Curry and Howard through ML and Haskell to Lean and dependent types, has been to formalize this pseudocode. To turn informal reasoning into machine-checked artifacts.

As code generation by large language models becomes routine, this formalization becomes essential. A neural network can produce syntactically correct code that passes tests yet harbors subtle unit errors, off-by-one mistakes, and violated invariants. The guardrails cannot be more tests, more code review, more human attention. The guardrails must be formalisms that make entire categories of errors unrepresentable. Type classes, phantom types, dependent types: these are not academic curiosities but safety controls for a future where most code is synthesized. The Mars Climate Orbiter was written by humans who made a human error. The code that replaces them must be held to a higher standard. (For more on this trajectory, see Artificial Intelligence.)

Build and run with:

lake exe units

Side Effects Ahead

Type classes and phantom types give you abstraction and compile-time safety. But programs must eventually interact with the world: reading files, handling errors, managing state. Next up: monads. Yes, those monads. Do not worry, we will not explain them using burritos.

Effects

“A monad is just a monoid in the category of endofunctors, what’s the problem?” This infamous quip became the ur-meme of functional programming, spawning a thousand blog posts explaining monads via burritos, boxes, and space suits. The tragedy is that the concept is not hard. It just got wrapped in mystique before anyone explained it clearly.

Here is what matters: programs have effects. They might fail, consult state, perform IO, branch nondeterministically, or launch the missiles. In languages without effect tracking, any function call might do any of these things. You call getUsername() and hope it only reads from a database rather than initiating thermonuclear war. The type signature offers no guarantees. The question is how to represent effects in a way that lets us reason about composition and know, from the types alone, what a function might do. Monads are one answer. They capture a pattern for sequencing operations where each step produces both a result and some context. Bind chains these operations, threading the context automatically. Do notation makes the sequencing readable. The interface is minimal; the applications are broad.

But monads are not the only answer, and treating them as sacred obscures the deeper point. Algebraic effect systems, linear types, graded monads, and effect handlers all attack the same problem from different angles. What they share is the conviction that effects should be visible in types and that composition should be governed by laws. The specific mechanism matters less than the principle: make the structure explicit so that humans and machines can reason about it.

Lean uses monads because they work well and the ecosystem inherited them from functional programming research of the 1990s. They are a good tool. But the goal is not to be monadic; the goal is to capture effects algebraically, whatever form that takes. When you understand monads, you understand one particularly elegant solution to sequencing effectful computations. You also understand a template for how programming abstractions should work: a minimal interface, a set of laws, and the discipline to respect both.

The Option Monad

The simplest monad handles computations that might fail. You already understand this pattern: look something up, and if it exists, do something with it; if not, propagate the absence. Every programmer has written this code a hundred times. The monad just gives it a name and a uniform interface.

def safeDivide (x y : Nat) : Option Nat :=
  if y == 0 then none else some (x / y)

def safeHead {α : Type} (xs : List α) : Option α :=
  match xs with
  | [] => none
  | x :: _ => some x

#eval safeDivide 10 2   -- some 5
#eval safeDivide 10 0   -- none
#eval safeHead [1, 2]   -- some 1
#eval safeHead ([] : List Nat)  -- none

Chaining Without Monads

Without the abstraction, chaining fallible operations produces the pyramid of doom: nested conditionals, each handling failure explicitly, the actual logic buried under boilerplate. This is not hypothetical. This is what error handling looks like in languages without monadic structure. It is also what early JavaScript looked like before Promises, which are, of course, monads by another name.

def computation (xs : List Nat) : Option Nat :=
  match safeHead xs with
  | none => none
  | some x =>
    match safeDivide 100 x with
    | none => none
    | some y => some (y + 1)

#eval computation [5, 2, 3]   -- some 21
#eval computation [0, 2, 3]   -- none (division by zero)
#eval computation []          -- none (empty list)

The bind Operation

The bind operation (written >>=) is the heart of the monad. It takes a value in context and a function that produces a new value in context, and chains them together. For Option, this means: if the first computation succeeded, apply the function; if it failed, propagate the failure. The pattern generalizes far beyond failure, but failure is the clearest example.

def computation' (xs : List Nat) : Option Nat :=
  safeHead xs |>.bind fun x =>
  safeDivide 100 x |>.bind fun y =>
  some (y + 1)

#eval computation' [5, 2, 3]  -- some 21
#eval computation' [0, 2, 3]  -- none
#eval computation' []         -- none

Do Notation

Do notation is syntactic sugar that makes monadic code look imperative. The left arrow ← desugars to bind; the semicolon sequences operations. This is not a concession to programmers who cannot handle functional style. It is recognition that sequential composition is how humans think about processes, and fighting that serves no purpose. The abstraction remains; only the syntax yields to ergonomics.

def computationDo (xs : List Nat) : Option Nat := do
  let x ← safeHead xs
  let y ← safeDivide 100 x
  return y + 1

#eval computationDo [5, 2, 3]  -- some 21
#eval computationDo [0, 2, 3]  -- none
#eval computationDo []         -- none

def validateInput (name : String) (age : Nat) : Option (String × Nat) := do
  if name.isEmpty then none
  if age == 0 then none
  return (name, age)

#eval validateInput "Alice" 30  -- some ("Alice", 30)
#eval validateInput "" 30       -- none
#eval validateInput "Bob" 0     -- none

The Except Monad

Option tells you that something failed but not why. Except carries the reason. This is the difference between a function returning null and a function throwing an exception with a message. The monadic structure is identical; only the context changes. This uniformity is the point. Learn the pattern once, apply it to failure, to errors, to state, to nondeterminism, to parsing, to probability distributions. The shape is always the same.

inductive ValidationError where
  | emptyName
  | invalidAge (age : Nat)
  | missingField (field : String)
  deriving Repr

def validateName (name : String) : Except ValidationError String :=
  if name.isEmpty then .error .emptyName
  else .ok name

def validateAge (age : Nat) : Except ValidationError Nat :=
  if age == 0 || age > 150 then .error (.invalidAge age)
  else .ok age

def validatePerson (name : String) (age : Nat) : Except ValidationError (String × Nat) := do
  let validName ← validateName name
  let validAge ← validateAge age
  return (validName, validAge)

#eval validatePerson "Alice" 30   -- Except.ok ("Alice", 30)
#eval validatePerson "" 30        -- Except.error ValidationError.emptyName
#eval validatePerson "Bob" 200    -- Except.error (ValidationError.invalidAge 200)

Transporter Malfunction

The transporter is Star Trek’s matter-energy teleportation device. It is also perhaps the most horrifying device in science fiction. It disassembles you at the atomic level, transmits your pattern as information, and reassembles a copy at the destination. The original is destroyed. What arrives is a perfect duplicate with all your memories, convinced it is you. Philosophers call this the teleportation problem. Starfleet calls it Tuesday.

From an engineering perspective, the transporter is a pipeline of fallible operations that must also maintain a log. Initialize the pattern buffer. Scan the subject. Dematerialize. Transmit. Rematerialize. Each step can fail (buffer overflows, Heisenberg compensator malfunctions), and each step should be logged for the accident investigation. This requires combining two effects: error handling and state.

inductive TransporterError where
  | bufferOverflow
  | patternDegradation (percent : Nat)
  | heisenbergCompensatorFailure
  | annularConfinementFailure
  deriving Repr

def TransporterError.describe : TransporterError → String
  | .bufferOverflow => "Pattern buffer overflow"
  | .patternDegradation p => s!"Pattern degradation at {p}%"
  | .heisenbergCompensatorFailure => "Heisenberg compensator offline"
  | .annularConfinementFailure => "Annular confinement beam failure"

structure CrewMember where
  name : String
  rank : String
  deriving Repr, BEq

structure TransportLog where
  entries : List String := []
  deriving Repr

def TransportLog.add (log : TransportLog) (entry : String) : TransportLog :=
  { entries := log.entries ++ [entry] }

Transformer Ordering Matters

When you combine StateT and Except, the order matters. These two type aliases have different semantics:

-- Two different monad transformer orderings with DIFFERENT semantics:

-- Stack A: State OUTSIDE Except
-- On error, state changes are PRESERVED (audit log semantics)
abbrev TransporterA := StateT TransportLog (Except TransporterError)

-- Stack B: State INSIDE Except
-- On error, state changes are ROLLED BACK (transaction semantics)
abbrev TransporterB := ExceptT TransporterError (StateM TransportLog)

The difference emerges when an operation fails partway through. With StateT on the outside, the error discards the accumulated state. With ExceptT on the outside, the state persists even after failure. In database terms: transaction rollback versus audit logging.

def logStepB (msg : String) : TransporterB Unit :=
  modify (·.add msg)

def transportB (crew : CrewMember) : TransporterB CrewMember := do
  logStepB s!"Initializing buffer for {crew.name}"
  logStepB s!"Scanning pattern..."
  if crew.name.length > 15 then
    logStepB s!"ERROR: Buffer overflow during scan"
    throw .bufferOverflow
  logStepB s!"Dematerializing {crew.name}..."
  logStepB s!"Pattern in buffer, subject no longer exists at origin"
  if crew.name == "Thomas Riker" then
    logStepB s!"ERROR: Heisenberg compensator failure during transmission"
    throw .heisenbergCompensatorFailure
  logStepB s!"Transmitting..."
  logStepB s!"Rematerializing at destination"
  logStepB s!"Transport complete: {crew.name} arrived safely"
  pure crew

The Philosophical Horror

Consider what happens when transport fails after dematerialization but before rematerialization. The log reads: “Pattern in buffer, subject no longer exists at origin” followed by “ERROR: Heisenberg compensator failure.”

With audit log semantics (ExceptT outside), the log persists. The crew member does not.

def runTransporterA (crew : CrewMember) : Except TransporterError TransportLog :=
  match StateT.run (transportA crew) {} with
  | .ok (_, log) => .ok log
  | .error e => .error e  -- log is lost!

def runTransporterA' (crew : CrewMember) : TransportLog × Option TransporterError :=
  -- Actually extract the log even on failure
  match StateT.run (transportA crew) {} with
  | .ok (_, log) => (log, none)
  | .error _ =>
    -- With StateT outside, we need a different approach to get the log
    -- This demonstrates the limitation: the log IS preserved in the state
    -- but Except's error case doesn't give us access to it directly
    ({}, some .bufferOverflow)

-- Better: run and always get the log
def runTransporterAWithLog (crew : CrewMember)
    : (Except TransporterError CrewMember) × TransportLog :=
  -- StateT s (Except e) α → s → Except e (α × s)
  -- We need to restructure to always return the log
  match StateT.run (transportA crew) {} with
  | .ok (crew, log) => (.ok crew, log)
  | .error e => (.error e, {})  -- log lost in this formulation!

def runTransporterB (crew : CrewMember) : TransportLog × Except TransporterError CrewMember :=
  -- ExceptT e (StateM s) α → s → (Except e α) × s
  -- State is ALWAYS returned, even on error
  let ((result, log)) := StateT.run (ExceptT.run (transportB crew)) {}
  (log, result)

For databases, you choose the transformer ordering based on requirements. Financial audits need persistent logs; atomic transactions need rollback. For humans, “transaction aborted after dematerialization” raises questions not covered in the Starfleet manual. The log shows you were disassembled. The error shows you were never reassembled. What exactly should rollback mean here?

Tip

Run from the repository: lake exe transporter

If formal verification ever finds a practical purpose beyond software, reassembling humans at the molecular level might be a suitable application. You would want mathematical certainty that the rematerialized crew member has all their organs in the correct configuration, that no quantum state got flipped during transmission, that the thing stepping off the pad is topologically equivalent to what stepped on. (A human turned inside-out is homeomorphic to the original, but the crew tends to file complaints.) Supposing we can get the Heisenberg compensator to work, of course. Until then, we practice on programs.

The State Monad

The state monad threads mutable state through a pure computation. You get the ergonomics of mutation, the ability to read and write a value as you go, without actually mutating anything. Each computation takes a state and returns a new state alongside its result. The threading is automatic, hidden behind the monadic interface. This is not a trick. It is a different way of thinking about state: not as a mutable box but as a value that flows through your computation, transformed at each step.

abbrev State' (σ α : Type) := σ → (α × σ)

def get'' {σ : Type} : State' σ σ := fun s => (s, s)

def set'' {σ : Type} (newState : σ) : State' σ Unit := fun _ => ((), newState)

def modify'' {σ : Type} (f : σ → σ) : State' σ Unit := fun s => ((), f s)

def runState' {σ α : Type} (init : σ) (m : State' σ α) : α × σ :=
  m init

def counter' : State' Nat Nat := fun n => (n, n + 1)

#eval runState' 0 counter'       -- (0, 1)
#eval runState' 10 counter'      -- (10, 11)

StateM in Practice

Lean provides StateM, a production-ready state monad. The operations get, set, and modify do exactly what their names suggest. Combined with do notation, stateful code looks almost identical to imperative code, except that the state is explicit in the type and the purity is preserved. You can run the same computation with different initial states and get reproducible results. You can reason about what the code does without worrying about hidden mutation elsewhere.

def tick : StateM Nat Unit := modify (· + 1)

def getTicks : StateM Nat Nat := get

def countOperations : StateM Nat Nat := do
  tick
  tick
  tick
  let count ← getTicks
  return count

#eval countOperations.run 0    -- (3, 3)
#eval countOperations.run 10   -- (13, 13)

The List Monad

Lists as a monad represent nondeterministic computation: a value that could be many things at once. Bind explores all combinations, like nested loops but without the nesting. This is how you generate permutations, enumerate possibilities, or implement backtracking search. The abstraction is the same; only the interpretation differs. A monad does not care whether its context is failure, state, or multiplicity. It only knows how to sequence.

def pairs (xs : List Nat) (ys : List Nat) : List (Nat × Nat) :=
  xs.flatMap fun x => ys.map fun y => (x, y)

#eval pairs [1, 2] [10, 20]  -- [(1, 10), (1, 20), (2, 10), (2, 20)]

def pythagTriples (n : Nat) : List (Nat × Nat × Nat) :=
  (List.range n).flatMap fun a =>
  (List.range n).flatMap fun b =>
  (List.range n).filterMap fun c =>
    if a * a + b * b == c * c && a > 0 && b > 0 then
      some (a, b, c)
    else
      none

#eval pythagTriples 15  -- [(3, 4, 5), (4, 3, 5), (5, 12, 13), ...]

The Monad Type Class

Under the hood, all monads implement the same interface. pure lifts a plain value into the monadic context. bind sequences two computations, passing the result of the first to the second. That is the entire interface. Everything else, the do notation, the specialized operations, the ergonomic helpers, builds on these two primitives. The simplicity is deliberate. A minimal interface means maximal generality.

class Monad' (M : Type → Type) extends Functor M where
  pure' : {α : Type} → α → M α
  bind' : {α β : Type} → M α → (α → M β) → M β

instance : Monad' Option where
  map f
    | none => none
    | some x => some (f x)
  pure' := some
  bind' m f := match m with
    | none => none
    | some x => f x

instance : Monad' List where
  map := List.map
  pure' x := [x]
  bind' m f := m.flatMap f

Monad Laws

Here is where the algebra becomes essential. Monads must satisfy three laws: left identity, right identity, and associativity. These are not suggestions. They are the contract that makes generic programming possible.

In the traditional bind/return formulation:

Law	Lean	Math
Left Identity	`pure a >>= f = f a`	$\eta(a) \star f = f(a)$
Right Identity	`m >>= pure = m`	$m \star \eta = m$
Associativity	`(m >>= f) >>= g = m >>= (λ x => f x >>= g)`	$(m \star f) \star g = m \star (\lambda x. f(x) \star g)$

The same laws look cleaner in the Kleisli category, where we compose monadic functions directly. If $f : A \to M B$ and $g : B \to M C$, their Kleisli composition is $g \circ f : A \to M C$:

Law	Lean	Math
Left Identity	`pure >=> f = f`	$\eta \circ f = f$
Right Identity	`f >=> pure = f`	$f \circ \eta = f$
Associativity	`(f >=> g) >=> h = f >=> (g >=> h)`	$(h \circ g) \circ f = h \circ (g \circ f)$

The Kleisli formulation reveals that monads give you a category where objects are types and morphisms are functions $A \to M B$. The laws say pure is the identity morphism and >=> is associative composition. A monad is a way of embedding effectful computation into the compositional structure of functions.

-- Left identity: pure a >>= f  =  f a
example (f : Nat → Option Nat) (a : Nat) :
    (pure a >>= f) = f a := rfl

-- Right identity: m >>= pure  =  m
example (m : Option Nat) : (m >>= pure) = m := by
  cases m <;> rfl

-- Associativity: (m >>= f) >>= g  =  m >>= (fun x => f x >>= g)
example (m : Option Nat) (f : Nat → Option Nat) (g : Nat → Option Nat) :
    ((m >>= f) >>= g) = (m >>= fun x => f x >>= g) := by
  cases m <;> rfl

Tip

In most languages, monad laws are documentation or tests. In Lean, they are theorems. You can state them as propositions and prove them. This is the Curry-Howard correspondence at work: the law “left identity” becomes a type, and proving it means constructing a value of that type. The Proofs article shows how.

At this point someone usually asks what a monad “really is.” The answers have become a genre: a burrito, a spacesuit, a programmable semicolon, a monoid in the category of endofunctors. These metaphors are not wrong, but they are not enlightening either. A monad is the three laws above and nothing else. Everything follows from the laws. The metaphors are for people who want to feel like they understand before they do the work of understanding.

If you want the category theory (colloquially known as “abstract nonsense,” which is their term of endearment for their own field): a monad is a monoid object in the monoidal category of endofunctors under composition. Equivalently, it is a lax 2-functor from the terminal 2-category to Cat. A category is just a monoid in the endomorphism hom-category of the bicategory of spans. The Kleisli category is the free algebra of the monad. some is just the identity morphism in the Kleisli category of Option. In Haskell it’s called Just, which humorously is Just an endomorphism in the Kleisli category of Option. Har har. Feeling enlightened yet? 🙃

Early Return

Do notation supports early return, loops, and mutable references, all the imperative conveniences. Combined with monads, this gives you the syntax of imperative programming with the semantics of pure functions. You can write code that reads like Python and reasons like Haskell. This is not cheating. It is the whole point: capturing effects in types so that the compiler knows what your code might do, while letting you write in whatever style is clearest.

def findFirst {α : Type} (p : α → Bool) (xs : List α) : Option α := do
  for x in xs do
    if p x then return x
  none

#eval findFirst (· > 5) [1, 2, 3, 7, 4]  -- some 7
#eval findFirst (· > 10) [1, 2, 3]       -- none

def processUntilError (xs : List Nat) : Except String (List Nat) := do
  let mut results := []
  for x in xs do
    if x == 0 then
      throw "encountered zero"
    results := results ++ [x * 2]
  return results

#eval processUntilError [1, 2, 3]     -- Except.ok [2, 4, 6]
#eval processUntilError [1, 0, 3]     -- Except.error "encountered zero"

Combining Monadic Operations

Functions like mapM and filterMap combine monadic operations over collections. Map a fallible function over a list and either get all the results or the first failure. Filter a list with a predicate that consults external state. These combinators emerge naturally once you have the abstraction. They are not special cases but instances of a general pattern, composable because they respect the monad laws.

def mayFail (x : Nat) : Option Nat :=
  if x == 0 then none else some (100 / x)

def processAll (xs : List Nat) : Option (List Nat) :=
  xs.mapM mayFail

#eval processAll [1, 2, 4, 5]   -- some [100, 50, 25, 20]
#eval processAll [1, 0, 4, 5]   -- none

def filterValid (xs : List Nat) : List Nat :=
  xs.filterMap mayFail

#eval filterValid [1, 0, 2, 0, 4]  -- [100, 50, 25]

The Larger Pattern

Monads are one algebraic structure among many. Functors capture mapping. Applicatives capture independent combination. Monads capture dependent sequencing. Comonads capture context extraction. Arrows generalize computation graphs. Algebraic effects decompose monads into composable pieces. Each abstraction comes with laws, and those laws are the actual content. The specific names matter less than the discipline: identify a pattern, find its laws, and build an interface that enforces them.

The trajectory of programming language research has been toward making this structure explicit. Effects that C programmers handle with conventions, functional programmers handle with types. Invariants that documentation describes, dependent types enforce. Properties that tests sample, proofs establish. Each step reduces the burden on human memory and attention, encoding knowledge in artifacts that machines can check.

This matters because the economics of software are changing. When code is cheap to generate, correctness becomes the bottleneck. A language model can produce plausible implementations faster than any human, but “plausible” is not “correct.” The leverage shifts to whoever can specify precisely what correct means. Types, laws, contracts, proofs: these are the tools for specifying. Monads are a small example, one worked case of a pattern made precise. The concept itself was always simple. Sequencing with context. The value was never in the mystery but in the laws that let us reason compositionally about programs we increasingly do not write ourselves and cannot fully understand. (For more on where this is heading, see Artificial Intelligence.)

From Abstract to Concrete

Monads describe effects abstractly. The next article makes them concrete: actual file I/O, process management, environment variables, and the runtime machinery that connects your pure descriptions to the impure world. This completes the programming half of our journey.

IO and Concurrency

Programs must touch the world. They read files, open sockets, query databases, render pixels, and occasionally launch missiles. Pure functions, by definition, cannot do any of this. They map inputs to outputs, indifferent to the universe outside their arguments. And yet we write programs precisely to change that universe, to impose our will on entropy, to build something that persists after the process exits. This is the fundamental tension of typed functional programming, and IO is how we resolve it.

The insight is to describe impure computations as pure values. An IO String is not a string obtained through side effects; it is a pure, inert description of a computation that, when executed by the runtime, will interact with the world and produce a string. The description itself is referentially transparent. You can pass it around, store it in data structures, combine it with other descriptions, all without anything actually happening. Only when the runtime interprets the description do effects occur. The effect is separated from the specification of the effect, and that separation is everything.

When effects are explicit in types, the signature readFile : FilePath → IO String tells you something important: this function touches the filesystem. Compare this to a language where any function might secretly perform IO, where you cannot know without reading the implementation whether getUserName() is a pure lookup or a network call. The type becomes documentation that cannot lie. More than that, it becomes a constraint that tools can check. A function typed as pure cannot sneak in a database write. The compiler enforces the separation.

Effects come in varieties. Observable effects change the world in ways the program can later detect: writing a file, then reading it back. Unobservable effects change the world but leave no trace the program can see: logging to stderr, incrementing a metrics counter, writing to /dev/null. Then there are effects with international consequences: the missile launch, the funds transfer, the irreversible deletion. Code that launches missiles should look different from code that formats strings. When effects are tracked in types, it does. The engineer reading the signature knows what they are dealing with. The engineer writing the code must confront, at every function boundary, whether this operation belongs in IO or whether it can remain pure. A bear foraging in the forest leaves tracks; a program touching the world should leave type signatures. This is engineering discipline: knowing the consequences of your logic and designing systems where those consequences are visible, reviewable, and constrained.

This matters for the same reason everything in this series matters: we are building systems too complex to hold in our heads, increasingly written by machines that have no heads at all. Explicit effect tracking is not academic purity worship. It is engineering discipline for an era when the codebase is larger than any human can read and the authors include entities that optimize for plausibility rather than correctness. The type signature is the contract. The contract is the only thing we can trust.

IO Basics

The IO monad represents computations that can perform side effects. All effectful operations live inside IO, and you sequence them using do notation. The “Hello, World!” of functional programming is not a string; it is IO Unit, a description of an action that, when run, will emit bytes to a file descriptor and return nothing interesting. The Unit return type is honest about its triviality: we did not call this function for its result but for its consequences.

def greet : IO Unit := do
  IO.println "What is your name?"
  let stdin ← IO.getStdin
  let name ← stdin.getLine
  IO.println s!"Hello, {name.trim}!"

def printNumbers : IO Unit := do
  for i in [1, 2, 3, 4, 5] do
    IO.println s!"Number: {i}"

def getCurrentTime : IO Unit := do
  let now ← IO.monoMsNow
  IO.println s!"Milliseconds since start: {now}"

Pure Computations in IO

Not everything in IO needs to be effectful. Pure computations can be lifted into IO when you need to mix them with effects. The pure function wraps a value in an IO that does nothing but return that value. This is not contamination; it is embedding. A pure value inside IO is like a diplomat with immunity: present in foreign territory but untouched by local laws. You can always extract the pure part because it never actually did anything.

def pureComputation : IO Nat := do
  let x := 10
  let y := 20
  return x + y

#eval pureComputation  -- 30

def combineIO : IO String := do
  let a ← pure "Hello"
  let b ← pure "World"
  return s!"{a} {b}"

#eval combineIO  -- "Hello World"

File Operations

File operations live in the IO.FS namespace: read, write, append, delete, the usual POSIX inheritance. Files are where programs meet persistence, where the ephemeral computation leaves a mark that outlasts the process. The filesystem is the boundary between the volatile and the durable, between the RAM that vanishes when power fails and the spinning rust or flash cells that remember. Every configuration file, every log, every database is ultimately bytes on a filesystem, and the operations here are how you reach them.

def writeToFile (path : String) (content : String) : IO Unit := do
  IO.FS.writeFile path content

def readFromFile (path : String) : IO String := do
  IO.FS.readFile path

def appendToFile (path : String) (content : String) : IO Unit := do
  let handle ← IO.FS.Handle.mk path .append
  handle.putStrLn content

def copyFile (src dst : String) : IO Unit := do
  let content ← IO.FS.readFile src
  IO.FS.writeFile dst content

Processing Lines

Reading a file line by line is the fundamental pattern of Unix text processing, the reason awk and sed exist, the shape of a thousand shell pipelines. The newline character is civilization’s oldest API, a contract that dates to the teletype and has outlived every attempt to replace it. In Lean, you get the same streaming capability with type safety and without the quoting nightmares that have spawned more security vulnerabilities than any other single source.

def processLines (path : String) : IO (List String) := do
  let content ← IO.FS.readFile path
  return content.splitOn "\n"

def countLines (path : String) : IO Nat := do
  let lines ← processLines path
  return lines.length

def filterLines' (path : String) (pred : String → Bool) : IO (List String) := do
  let lines ← processLines path
  return lines.filter pred

Error Handling

IO operations can fail. Files go missing, networks drop packets, disks fill up, permissions change, the NFS mount goes stale, the USB drive gets yanked mid-write. The universe is hostile to running programs, and error handling is how we cope with entropy’s preference for chaos. Like a bear preparing for winter, a robust program anticipates scarcity: the resource that was there yesterday may not be there tomorrow. Lean provides try-catch blocks that should feel familiar to anyone who has written Java, Python, or JavaScript, except that here the error handling is explicit in the type. An IO action either succeeds or throws, and try/catch is how you intercept the failure before it propagates to somewhere you cannot recover.

def safeDivideIO (x y : Nat) : IO Nat := do
  if y == 0 then
    throw <| IO.userError "Division by zero"
  return x / y

def trySafeDivide : IO Unit := do
  try
    let result ← safeDivideIO 10 0
    IO.println s!"Result: {result}"
  catch e =>
    IO.println s!"Error: {e}"

def withDefault' {α : Type} (action : IO α) (default : α) : IO α := do
  try
    action
  catch _ =>
    return default

Mutable References

Sometimes you need a mutable cell, a box you can read from and write to as the computation proceeds. IO.Ref provides this: a reference that lives in IO, supporting get, set, and modify. This is controlled mutation, explicit in the type system, not the invisible aliasing that makes imperative code a maze of spooky action at a distance. The reference is a value. You can see who has access to it. You can trace where it flows. The mutation is real, but the accounting is honest.

def counterExample : IO Nat := do
  let counter ← IO.mkRef 0
  for _ in List.range 10 do
    counter.modify (· + 1)
  counter.get

#eval counterExample  -- 10

def accumulate (values : List Nat) : IO Nat := do
  let sum ← IO.mkRef 0
  for v in values do
    sum.modify (· + v)
  sum.get

#eval accumulate [1, 2, 3, 4, 5]  -- 15

Monad Transformers: ExceptT

Real programs layer effects. You want IO for the filesystem, but you also want typed errors that propagate cleanly. Monad transformers stack these capabilities like geological strata, each layer adding something to the computation below. ExceptT wraps another monad and adds the ability to short-circuit with a typed error. The result is a computation that can both perform IO and fail with a specific error type, the signature making both capabilities explicit. The layering is architectural, not just convenient. The type tells you exactly what can go wrong and what side effects might occur.

abbrev AppM := ExceptT String IO

def validatePositive (n : Int) : AppM Int := do
  if n <= 0 then throw "Must be positive"
  return n

def validateRange (n : Int) (lo hi : Int) : AppM Int := do
  if n < lo || n > hi then throw s!"Must be between {lo} and {hi}"
  return n

def processNumber : AppM Int := do
  let n ← validatePositive 42
  let m ← validateRange n 0 100
  return m * 2

def runApp : IO Unit := do
  match ← processNumber.run with
  | .ok result => IO.println s!"Success: {result}"
  | .error msg => IO.println s!"Failed: {msg}"

Monad Transformers: ReaderT

ReaderT provides access to a read-only environment, an implicit parameter that threads through your computation without cluttering every function signature. This is the dependency injection pattern done right: instead of global variables that any code can mutate, you have a typed environment that flows lexically through your program. Configuration, database connections, logger handles, the parameters that everything needs but nothing should own. The environment is read-only, which means you can reason about it. No function can secretly change the configuration and break something downstream.

structure Config where
  verbose : Bool
  maxRetries : Nat
  timeout : Nat
  deriving Repr

abbrev ConfigM := ReaderT Config IO

def getVerbose : ConfigM Bool := do
  let cfg ← read
  return cfg.verbose

def logIfVerbose (msg : String) : ConfigM Unit := do
  if ← getVerbose then
    IO.println s!"[LOG] {msg}"

def runWithConfig : IO Unit := do
  let config : Config := { verbose := true, maxRetries := 3, timeout := 5000 }
  (logIfVerbose "Starting process").run config

Running External Processes

Lean can spawn external processes and capture their output, bridging the gap between the typed world inside your program and the untyped chaos of shell commands. This is where you call out to git, invoke compilers, run linters, orchestrate builds. The interface is necessarily stringly-typed at the boundary, but you can wrap it in whatever validation your domain requires. Just remember that every subprocess is a portal to undefined behavior: the external program can do anything, and your types cannot save you from what happens on the other side.

def runCommand (cmd : String) (args : List String) : IO String := do
  let output ← IO.Process.output {
    cmd := cmd
    args := args.toArray
  }
  return output.stdout

def shellExample : IO Unit := do
  let result ← runCommand "echo" ["Hello", "World"]
  IO.println result

Environment Variables

Environment variables are the original configuration mechanism, a key-value store that predates databases by decades. They are inherited from parent processes, invisible in your source code, and different on every machine. This is both their power and their peril. PATH, HOME, USER, the secret sauce that makes your program behave differently in development and production. Access them through IO because reading them is an effect: the same code will behave differently depending on what the shell exported before launch.

def getEnvVar (name : String) : IO (Option String) := do
  IO.getEnv name

def printPath : IO Unit := do
  match ← getEnvVar "PATH" with
  | some path => IO.println s!"PATH: {path}"
  | none => IO.println "PATH not set"

def getCwd' : IO System.FilePath := do
  IO.currentDir

The Discipline of Effects

The history of programming is a history of discovering what we should not have been allowed to do. Unrestricted goto gave us spaghetti code; we constrained control flow with structured programming. Unrestricted memory access gave us buffer overflows; we constrained pointers with ownership and garbage collection. Unrestricted side effects gave us programs impossible to test, impossible to parallelize, impossible to reason about. We are still learning to constrain effects.

Every constraint is a gift. When the type system refuses to let you perform IO in a pure function, it is saving you from bugs that would surface only in production, only under load, only on the third Tuesday of months with an R in them. The constraint is leverage.

The IO monad is a template for how to think about the boundary between your program and the world. The world is messy, stateful, concurrent, and hostile. Your program can be clean, pure, sequential, and predictable. The boundary between them should be narrow, explicit, and scrutinized. That is engineering discipline for systems that must work when no one is watching.

The Second Arc

This concludes the programming arc. You can now write typed functions, define data structures, use type classes for abstraction, sequence effects with monads, and interact with the operating system. You have written FizzBuzz, parsed command-line arguments, read files, handled errors, and spawned processes. Congratulations, you are a Lean programmer.

Time to become a Lean prover.

Lean is both a programming language and a theorem prover. The same type system that checks your functions can verify mathematical claims. The same compiler that rejects type errors can reject invalid proofs. Everything you have learned carries forward, because in Lean, proofs are programs and propositions are types. This is not a metaphor. It is a mathematical theorem called the Curry-Howard correspondence, and it means the skills you just built are directly applicable to theorem proving. You have already seen hints of this: Fin n bundles a number with a proof it is in bounds; subtypes attach predicates to values. These were not special cases but instances of the general pattern.

Consider what you already know. You know that every expression has a type, and the type checker verifies that expressions have the types you claim. You know how to construct values: literals, function applications, pattern matching, recursion. You know that the compiler rejects programs that do not type-check. These facts carry over unchanged. A proposition is just a type. A proof is just a value of that type. When you prove $P \to Q$, you are constructing a function from P to Q. When you prove $P \land Q$, you are constructing a pair. When you prove $\forall n, P(n)$, you are constructing a function that takes any n and returns a proof of P n. The vocabulary changes, but the mechanics are the same.

The second arc introduces new tools for constructing these values. Tactics are commands that manipulate an incomplete proof, filling in pieces step by step. Where term-mode proof requires you to write the entire proof term at once, tactic-mode lets you work interactively: state a goal, apply a tactic, see what remains, repeat until nothing remains. The Infoview panel that showed you types and values now shows you goals and hypotheses. The tight feedback loop remains.

You will also encounter new concepts. Induction is recursion wearing a different hat. Case analysis is pattern matching. Proof by contradiction is constructing a function from \neg P to False. The jargon can be intimidating, but the underlying structures are ones you already understand. When someone says “we proceed by induction on n,” they mean “we define the proof recursively, with a base case for zero and a step case that assumes the result for n and proves it for n + 1.” You have been writing recursive functions all along.

What changes is the goal. In Arc I, you wrote programs to compute outputs from inputs. The type system ensured your functions were well-formed. In Arc II, you write proofs to establish that propositions are true. The type system ensures your proofs are valid. A proof is a program whose return type happens to be a logical statement rather than a data structure. The compiler verifying your proof is the same compiler that verified your FizzBuzz implementation. It just happens to be checking a different claim.

This is the strange magic of dependent types. The wall between computation and logic dissolves. You can write a sorting function and prove it sorts. You can implement a parser and prove it handles all inputs. You can design a protocol and prove it never deadlocks. The same language, the same compiler, the same workflow. The boundary between “programming” and “proving” becomes a matter of which types you are constructing values for.

The articles ahead cover: the foundations of proof in Lean, type theory and the Curry-Howard correspondence in depth, dependent types and their power, strategies for approaching proofs, a comprehensive tactics reference, and applications from classic mathematics to software verification. By the end, you will be proving theorems about the programs you write, and understanding why that matters.

The transition may feel abrupt. Programming and mathematics have been taught as separate subjects for so long that their unification can seem like a category error. It is not. The historical accident is that they were separated. Lean reveals them as aspects of the same thing: constructing terms of specified types. What you learned in Arc I is directly applicable to Arc II. The rest is vocabulary.

Let us begin.

Proofs

You have written functions. You have defined types. You have pattern matched, recursed, and composed. But you have not yet proved anything.

The difference matters. When you write a function, the compiler checks that types align, but it does not verify that your code does what you claim. You say this function sorts? The compiler shrugs. In theorem proving, you make claims and then justify them. The compiler verifies that your justification actually establishes the claim. You cannot bluff your way through a proof.

A bear learns to fish by watching the stream, understanding where salmon pause, developing patience for the moment when motion becomes certainty. Proving is similar. You learn to read the goal state, understand where progress stalls, develop patience for the tactic that transforms confusion into clarity.

Programming and Proving

Lean unifies programming and theorem proving through type theory. The same language that lets you define a function also lets you state and prove properties about it. Understanding how these fit together is essential before writing your first proof.

-- A function computes values - it has computational content
def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

-- A theorem proves properties - it has no computational content at runtime
theorem factorial_pos : ∀ n, 0 < factorial n := by
  intro n
  induction n with
  | zero => simp [factorial]
  | succ n ih => simp [factorial]; omega

The factorial function computes values. It has computational content because it produces output from input. At runtime, it runs and returns numbers.

The factorial_pos theorem proves that factorial always returns a positive number. This proof convinces the type checker that the property holds, but it does not compute anything useful at runtime. The proof exists only to satisfy Lean’s verification. Once the compiler confirms the proof is valid, the proof term itself can be discarded. Proofs are checked at compile time and deleted before the program runs.

The distinction between def and theorem reflects this. Both define named values, but theorem marks its body as opaque: Lean will never unfold it during type checking. This prevents proofs from slowing down compilation when they appear in types. A def can be unfolded and computed with; a theorem cannot. If you need a lemma that Lean should simplify through, use def or mark the theorem with @[simp].

What about proofs that appear as function arguments?

-- Proofs as function arguments: checked at compile time, erased at runtime
def safeDiv (n : Nat) (d : Nat) (h : d ≠ 0) : Nat := n / d

-- The proof argument h vanishes in the compiled code
#eval safeDiv 10 3 (by decide)  -- 3

The proof h ensures at compile time that you cannot call safeDiv with a zero divisor. But at runtime, h vanishes. The compiled code receives only n and d. This is the power of Lean’s type system: proofs enforce invariants during development, then disappear from the final executable.

Notation

Before we write our first proof, we need a shared language. The notation below bridges three worlds: the mathematical symbols you find in logic textbooks, the inference rules used in programming language theory (as in Pierce’s Types and Programming Languages and Harper’s Practical Foundations for Programming Languages), and the Lean syntax you will write. Learning to read all three simultaneously is the key to fluency.

Symbol	Name	Meaning
$\vdash$	turnstile	“proves” or “entails”
$\Gamma$	Gamma	the context (hypotheses we can use)
$\to$	arrow	implication or function type
$\forall$	for all	universal quantification
$\exists$	exists	existential quantification
$\land$	and	conjunction
$\lor$	or	disjunction
$\top$	top	truth (trivially provable)
$\bot$	bottom	falsehood (unprovable)

A judgment $\Gamma \vdash P$ reads “from context $\Gamma$, we can prove $P$.” An inference rule shows how to derive new judgments from existing ones:

\[ \frac{\Gamma \vdash P \quad \Gamma \vdash Q}{\Gamma \vdash P \land Q} \text{(∧-intro)} \]

This rule says: if you can prove $P$ and you can prove $Q$, then you can prove $P \land Q$. The premises sit above the line; the conclusion below. The name on the right identifies the rule. Every tactic you learn corresponds to one or more such rules. The tactic is the mechanism; the rule is the justification.

Each logical connective and type former comes with two kinds of rules. Introduction rules tell you how to construct a proof or value: to prove P ∧ Q, prove both P and Q. Elimination rules tell you how to use a proof or value: from P ∧ Q, you can extract P or Q. This pattern is universal. For implication, introduction is fun h => ... (assume the premise), elimination is function application (use the implication). For existence, introduction provides a witness, elimination uses the witness. Once you internalize this pattern, you can work with any connective by asking: “How do I build one?” and “How do I use one?”

Tactics as Proof-State Transformers

You may have repressed the trauma of high school algebra, but the core idea was sound: you start with $2x + 5 = 11$ and apply operations until you reach $x = 3$. Subtract 5, divide by 2, each step transforming the equation into something simpler. The tedium was doing it by hand, error-prone and joyless. But the method itself, symbolic manipulation through mechanical transformation, turns out to be extraordinarily powerful when the machine handles the bookkeeping.

Tactics work the same way. You start with a goal (what you want to prove) and a context (what you already know). Each tactic transforms the goal into simpler subgoals. You keep applying tactics until no goals remain. The proof is the sequence of transformations, not a single flash of insight.

Think of it as a game. Your current position is the proof state: the facts you hold and the destination you seek. Each tactic is a legal move that changes your position. Some moves split one goal into two (like constructor creating two subgoals). Some moves close a goal entirely (like rfl finishing with a checkmate). You win when the board is empty.

Formally, a proof state is a judgment $\Gamma \vdash G$: context $\Gamma$, goal $G$. A tactic transforms one proof state into zero or more new proof states. When no goals remain, the proof is complete. This table is your Rosetta Stone:

Tactic	Before	After	Rule
`intro h`	$\Gamma \vdash P \to Q$	$\Gamma, h:P \vdash Q$	$\to$-intro
`apply f`	$\Gamma \vdash Q$	$\Gamma \vdash P$	$\to$-elim (given $f : P \to Q$)
`exact h`	$\Gamma, h:P \vdash P$	$\square$	assumption
`rfl`	$\Gamma \vdash t = t$	$\square$	refl
`constructor`	$\Gamma \vdash P \land Q$	$\Gamma \vdash P$, $\Gamma \vdash Q$	$\land$-intro
`left`	$\Gamma \vdash P \lor Q$	$\Gamma \vdash P$	$\lor$-intro₁
`right`	$\Gamma \vdash P \lor Q$	$\Gamma \vdash Q$	$\lor$-intro₂
`cases h`	$\Gamma, h:P \lor Q \vdash R$	$\Gamma, h:P \vdash R$, $\Gamma, h:Q \vdash R$	$\lor$-elim
`induction n`	$\Gamma \vdash \forall n., P(n)$	$\Gamma \vdash P(0)$, $\Gamma, ih:P(k) \vdash P(k{+}1)$	Nat-ind
`rw [h]`	$\Gamma, h: a=b \vdash P[a]$	$\Gamma, h:a=b \vdash P[b]$	subst
`simp`	$\Gamma \vdash G$	$\Gamma \vdash G’$	rewrite*
`contradiction`	$\Gamma, h:\bot \vdash P$	$\square$	$\bot$-elim

The symbol $\square$ marks a completed goal. Multiple goals after “After” mean the tactic created subgoals. Read left to right: you have the state on the left, you apply the tactic, you must now prove everything on the right. This is the algebra of proof. Each tactic is a function from proof states to proof states, and a complete proof is a composition that maps your theorem to $\square$.

If the table above looks like both logic and programming, that is not a coincidence.

Proving vs Programming

The surprising insight is that proving and programming are the same activity viewed differently. A proof is a program. A theorem is a type. When you prove $P \to Q$, you are writing a function that transforms evidence for $P$ into evidence for $Q$. This correspondence, the Curry-Howard isomorphism, means that logic and computation are two views of the same underlying structure:

Logic	Programming
proposition	type
proof	program
$P \to Q$	function from `P` to `Q`
$P \land Q$	pair `(P, Q)`
$P \lor Q$	either `P` or `Q`
$\top$	unit type
$\bot$	empty type

Every function you have written so far was secretly a proof. Every proof you write from now on is secretly a program. Two cultures, mathematicians and programmers, spoke the same language for decades without knowing it.

Your First Proof

Let us prove something undeniably true: one plus one equals two.

-- Your very first proof: 1 + 1 = 2
theorem one_plus_one : 1 + 1 = 2 := by
  rfl

-- Without tactics, just a direct proof term
theorem one_plus_one' : 1 + 1 = 2 := rfl

Whitehead and Russell famously required 362 pages of Principia Mathematica before reaching this result. We have done it in three characters. This is not because we are cleverer than Russell; it is because we inherited infrastructure. The Principia was an attempt to place all of mathematics on rigorous foundations, to banish the intuition and hand-waving that had allowed paradoxes to creep into set theory. It was a heroic, doomed effort: the notation was unreadable, the proofs were uncheckable by any human in finite time, and Gödel would soon prove that the program could never fully succeed. But the ambition was right. The ambition was to make mathematics a science of proof rather than a craft of persuasion.

A century later, the ambition survives in different form. We do not write proofs in Russell’s notation; we write them in languages that machines can check. The 362 pages compress to three characters not because the mathematics got simpler but because the verification got automated. What mathematicians have been writing all along was pseudocode: informal instructions meant for human execution, full of implicit steps and assumed context, correct only if the reader filled in the gaps charitably. We are finally compiling that pseudocode.

The keyword by enters tactic mode. Instead of writing a proof term directly, you give commands that build the proof incrementally. The tactic rfl (reflexivity) says “both sides of this equation compute to the same value, so they are equal.” Lean evaluates 1 + 1, gets 2, sees that 2 = 2, and accepts the proof. No faith required. No appeals to authority. The machine checked, and the machine does not lie.

Or does it? Ken Thompson’s Reflections on Trusting Trust demonstrated that a compiler can be trojaned to insert backdoors into code it compiles, including into future versions of itself. Turtles all the way down. At some point you trust the hardware, the firmware, the operating system, the compiler that compiled your proof assistant. We choose to stop the regress somewhere, not because the regress ends but because we must act in the world despite uncertainty. This is the stoic’s bargain: do the work carefully, verify what can be verified, and accept that perfection is not on offer. The alternative is paralysis, and paralysis builds nothing.

The Goal State

When you write proofs in Lean, the editor shows you the current goal state. This is your map, your honest accounting of where you stand. Unlike tests that can pass while bugs lurk, unlike documentation that drifts from reality, the goal state cannot lie. It tells you exactly what you have (hypotheses) and exactly what you need to prove (the goal). The gap between aspiration and achievement is always visible.

-- Demonstrating how the goal state changes
theorem add_zero (n : Nat) : n + 0 = n := by
  rfl

theorem zero_add (n : Nat) : 0 + n = n := by
  simp

When you place your cursor after by in add_zero, you see:

n : Nat
⊢ n + 0 = n

The line n : Nat is your context: the facts you know, the tools you have. The symbol ⊢ (turnstile) separates what you have from what you need. The goal n + 0 = n is your obligation. After applying rfl, the goal disappears. No goals means the proof is complete. You have caught your fish.

Reflexivity: `rfl`

The rfl tactic proves goals of the form $a = a$ where both sides are definitionally equal. In inference rule notation:

\[ \frac{}{\Gamma \vdash a = a} \text{(refl)} \]

No premises above the line means the rule is an axiom: equality is reflexive, always, unconditionally. “Definitionally equal” means Lean can compute both sides to the same value without any lemmas. This is equality by computation, the most basic form of truth: run the program on both sides and see if you get the same answer.

-- rfl works when both sides compute to the same value
theorem two_times_three : 2 * 3 = 6 := by rfl

theorem list_length : [1, 2, 3].length = 3 := by rfl

theorem string_append : "hello " ++ "world" = "hello world" := by rfl

theorem bool_and : true && false = false := by rfl

def double (n : Nat) : Nat := n + n

theorem double_two : double 2 = 4 := by rfl

When rfl works, it means the equality is “obvious” to Lean’s computation engine. When it fails, you need other tactics to transform the goal into something rfl can handle.

Triviality: `trivial`

The trivial tactic handles goals that are straightforwardly true. It combines several simple tactics and works well for basic logical facts.

theorem true_is_true : True := by
  trivial

theorem one_le_two : 1 ≤ 2 := by
  trivial

theorem and_true : True ∧ True := by
  trivial

Simplification: `simp`

The simp tactic is your workhorse. It applies a database of hundreds of rewrite rules, accumulated over years by the mathlib community, to simplify the goal. This is collective knowledge made executable: every time someone proved that x + 0 = x or list.reverse.reverse = list, they added to the arsenal that simp deploys on your behalf.

theorem add_zero_simp (n : Nat) : n + 0 = n := by
  simp

theorem zero_add_simp (n : Nat) : 0 + n = n := by
  simp

theorem silly_arithmetic : (1 + 0) * (2 + 0) + 0 = 2 := by
  simp

theorem list_append_nil {α : Type*} (xs : List α) : xs ++ [] = xs := by
  simp

theorem use_hypothesis (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
  simp [h]

When simp alone does not suffice, you can give it additional lemmas: simp [lemma1, lemma2]. You can also tell it to use hypotheses from your context: simp [h].

Tip

When stuck, try simp first. It solves a surprising number of goals. If it does not solve the goal completely, look at what remains.

Introducing Assumptions: `intro`

When your goal is an implication $P \to Q$, you assume $P$ and prove $Q$. This is the introduction rule for implication:

\[ \frac{\Gamma, P \vdash Q}{\Gamma \vdash P \to Q} \text{(→-intro)} \]

The comma means “extended context”: if you can prove $Q$ assuming $P$, then you can prove $P \to Q$ outright. The intro tactic performs this transformation, moving the antecedent from goal to hypothesis.

-- intro: move assumptions from goal into context
theorem imp_self (P : Prop) : P → P := by
  intro hp
  exact hp

-- apply: use a lemma whose conclusion matches the goal
theorem imp_trans (P Q R : Prop) : (P → Q) → (Q → R) → P → R := by
  intro hpq hqr hp
  apply hqr
  apply hpq
  exact hp

-- intro with universal quantifiers
theorem forall_self (P : Nat → Prop) : (∀ n, P n) → (∀ n, P n) := by
  intro h n
  exact h n

After intro hp, the goal changes from P → P to just P, and you gain hypothesis hp : P. Multiple intro commands can be combined: intro h1 h2 h3 introduces three assumptions at once.

Applying Lemmas: `apply` and `exact`

The apply tactic uses the elimination rule for implication, also called modus ponens:

\[ \frac{\Gamma \vdash P \to Q \quad \Gamma \vdash P}{\Gamma \vdash Q} \text{(→-elim)} \]

If you have a proof of $P \to Q$ and a proof of $P$, you can derive $Q$. When your goal is $Q$ and you apply a hypothesis $h : P \to Q$, Lean transforms your goal to $P$. The exact tactic says “this term is exactly what we need” and closes the goal.

In the imp_trans proof, apply hqr transforms the goal from R to Q, because hqr : Q → R. Then apply hpq transforms Q to P. Finally exact hp provides the P we need. Each step is modus ponens, chained backward.

Intermediate Steps: `have`

Sometimes you want to prove a helper fact before using it. The have tactic introduces a new hypothesis with its own proof. This is how knowledge accumulates: you establish a stepping stone, name it, and build on it.

-- have: introduce intermediate results
theorem have_demo (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  have step : a = b := h1
  rw [step, h2]

theorem sum_example (n : Nat) : n + n = 2 * n := by
  have h : 2 * n = n + n := by ring
  rw [h]

The pattern have name : type := proof adds name : type to your context.

Case Analysis: `cases`

When you have a value of an inductive type, cases splits the proof into one case per constructor. This is exhaustive reasoning: you consider every possible form the value could take, and you prove your claim holds in each. The compiler ensures you miss nothing. This is how careful decisions should be made: enumerate the possibilities, handle each one, leave no branch unexamined.

theorem bool_cases (b : Bool) : b = true ∨ b = false := by
  cases b with
  | true => left; rfl
  | false => right; rfl

theorem nat_zero_or_succ (n : Nat) : n = 0 ∨ n ≥ 1 := by
  cases n with
  | zero => left; rfl
  | succ m => right; simp

theorem option_destruct (o : Option Nat) : o = none ∨ ∃ n, o = some n := by
  cases o with
  | none => left; rfl
  | some n => right; exact ⟨n, rfl⟩

For Bool, there are two cases: true and false. For Nat, there are two cases: zero and succ m. For Option, there are none and some n.

Induction

For properties of natural numbers, mathematical induction is the fundamental principle:

\[ \frac{\Gamma \vdash P(0) \quad \Gamma, P(n) \vdash P(n+1)}{\Gamma \vdash \forall n., P(n)} \text{(Nat-ind)} \]

Prove the base case $P(0)$. Then prove the inductive step: assuming $P(n)$, show $P(n+1)$. From these two finite proofs, you derive a statement about infinitely many numbers. The induction tactic generates both proof obligations automatically. The principle dates to Pascal and Fermat, but the mechanization is new.

theorem sum_twice (n : Nat) : n + n = 2 * n := by
  induction n with
  | zero => rfl
  | succ n ih => omega

theorem length_append {α : Type*} (xs ys : List α) :
    (xs ++ ys).length = xs.length + ys.length := by
  induction xs with
  | nil => simp
  | cons x xs ih => simp [ih]; ring

In the succ case, you get an induction hypothesis ih that assumes the property holds for n, and you must prove it holds for n + 1.

Arithmetic: `omega`

For goals involving linear arithmetic over natural numbers or integers, omega is powerful. It implements a decision procedure for Presburger arithmetic, a fragment of number theory that is provably decidable. Within its domain, omega does not search or guess; it decides.

theorem omega_simple (n : Nat) (h : n < 10) : n < 100 := by
  omega

theorem omega_transitive (a b c : Int) (h1 : a < b) (h2 : b < c) : a < c := by
  omega

theorem omega_sum (x y : Nat) (h : x + y = 10) : x ≤ 10 := by
  omega

If your goal involves only addition, subtraction, multiplication by constants, and comparisons, try omega.

Decision Procedures: `decide`

For decidable propositions, decide simply computes the answer. Is 7 less than 10? Run the comparison. Is this list empty? Check. Some questions have algorithms that answer them definitively, and decide invokes those algorithms. When it works, there is nothing to prove; the computation is the proof.

theorem three_lt_five : (3 : Nat) < 5 := by
  decide

theorem bool_compute : (true && false) = false := by
  decide

theorem list_membership : 3 ∈ [1, 2, 3, 4, 5] := by
  decide

theorem fin_in_bounds : (2 : Fin 5).val < 5 := by
  decide

Putting It Together

Real proofs combine multiple tactics. You introduce assumptions, simplify, split cases, apply lemmas, and close with computation. The art is knowing which tool fits which moment. With practice, patterns emerge: implications call for intro, equalities for rw or simp, inductive types for cases or induction. The goal state guides you.

theorem worked_example (n : Nat) : n + 0 = 0 + n := by
  simp

theorem worked_example2 (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
  rw [h]

theorem combined_proof (n : Nat) (h : n > 0) : n - 1 + 1 = n := by
  omega

theorem list_nonempty (xs : List Nat) (h : xs ≠ []) : xs.length > 0 := by
  cases xs with
  | nil => contradiction
  | cons x xs' => simp

The Tactics You Need

Tactic	Purpose
`rfl`	Prove `a = a` when both sides compute to the same value
`trivial`	Prove obviously true goals
`simp`	Simplify using rewrite rules
`intro`	Assume hypotheses from implications and universals
`apply`	Use a lemma whose conclusion matches the goal
`exact`	Provide exactly the term needed
`have`	Introduce intermediate results
`cases`	Split on constructors of inductive types
`induction`	Prove by induction on recursive types
`omega`	Solve linear arithmetic
`decide`	Compute decidable propositions
`rw`	Rewrite using an equality

These twelve tactics will carry you through most of what follows.

Exercises

The best way to learn tactics is to use them. These exercises progress from straightforward applications of single tactics to combinations that require reading the goal state carefully.

-- Exercise 1: Use rfl to prove this computation
theorem ex_rfl : 3 * 4 = 12 := by
  rfl

-- Exercise 2: Use simp to simplify this expression
theorem ex_simp (n : Nat) : n * 1 + 0 = n := by
  simp

-- Exercise 3: Use intro and exact
theorem ex_intro_exact (P Q : Prop) (h : P) (hpq : P → Q) : Q := by
  exact hpq h

-- Exercise 4: Use cases to prove this about booleans
theorem ex_bool_not_not (b : Bool) : !!b = b := by
  cases b <;> rfl

-- Exercise 5: Use induction to prove addition is commutative
theorem ex_add_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => omega

-- Exercise 6: Use omega to prove this inequality
theorem ex_omega (x y : Nat) (h1 : x ≤ 5) (h2 : y ≤ 3) : x + y ≤ 8 := by
  omega

-- Exercise 7: Combine multiple tactics
theorem ex_combined (xs : List Nat) : ([] ++ xs).length = xs.length := by
  simp

-- Exercise 8: Prove implication transitivity
theorem ex_imp_chain (A B C D : Prop) : (A → B) → (B → C) → (C → D) → A → D := by
  intro hab hbc hcd ha
  exact hcd (hbc (hab ha))

-- Exercise 9: Use cases on a natural number
theorem ex_nat_lt (n : Nat) : n = 0 ∨ 0 < n := by
  cases n with
  | zero => left; rfl
  | succ m => right; omega

-- Exercise 10: Prove a property about list reversal
theorem ex_reverse_nil : ([] : List Nat).reverse = [] := by
  rfl

If you get stuck, ask yourself: what is the shape of my goal? What tactic handles that shape? What hypotheses do I have available? The Infoview is your guide.

The Liar’s Trap

Try to prove something false:

-- Try to prove something false. Every tactic will fail.
theorem liar : 0 = 1 := by
  sorry  -- Try: rfl, simp, omega, decide. Nothing works.

-- The goal state shows: ⊢ 0 = 1
-- This goal is unprovable because it is false.

Every tactic fails. rfl cannot make 0 equal 1. simp finds nothing to simplify. omega knows arithmetic and refuses. decide computes the answer and it is false. The goal state sits there, immovable: ⊢ 0 = 1. You can stare at it, curse at it, try increasingly desperate combinations. Nothing works because nothing can work. The machine will not let you prove a falsehood.

This is the point. The compiler is not your collaborator; it is your adversary. It checks every step and rejects handwaving. When someone tells you their code is correct, you can ask: does it typecheck? When someone tells you their proof is valid, you can ask: did the machine accept it? The answers are not always the same, but when they are, you know something real.

De Morgan’s Little Theorem

Augustus De Morgan formalized the laws that bear his name in the 1850s: the negation of a conjunction is the disjunction of negations, and vice versa. Every programmer knows these laws intuitively from boolean expressions. Let us prove one.

-- De Morgan's Law: ¬(P ∧ Q) → (¬P ∨ ¬Q)
theorem demorgan (P Q : Prop) (h : ¬(P ∧ Q)) : ¬P ∨ ¬Q := by
  by_cases hp : P
  · -- Case: P is true
    right
    intro hq
    apply h
    constructor
    · exact hp
    · exact hq
  · -- Case: P is false
    left
    exact hp

The proof proceeds by case analysis. We have h : ¬(P ∧ Q), a proof that P ∧ Q is false. We must show ¬P ∨ ¬Q. The by_cases tactic splits on whether P holds:

If P is true (call this hp), we go right and prove ¬Q. Why? If Q were true, then P ∧ Q would be true, contradicting h. So ¬Q.
If P is false (call this hnp), we go left and prove ¬P directly. We have it: hnp.

Each branch uses tactics from this article: intro, apply, exact, left, right, constructor. The contradiction tactic spots when hypotheses conflict. Read the proof slowly, watch the goal state at each step, and trace how the logical structure maps to the tactic sequence. This is the texture of real mathematics: case splits, contradictions, and the steady narrowing of possibilities until only truth remains.

De Morgan died in 1871. His laws persist in every boolean expression, every logic gate, every conditional branch. If you want to test your understanding, try proving the other direction: from ¬P ∨ ¬Q to ¬(P ∧ Q). It is easier, which tells you something about the asymmetry of classical logic.

The Theory Beneath

You can now prove things. The proofs have been simple, but the mental model is in place. You understand goals, hypotheses, and the tactic dance that connects them. Next we introduce type theory and dependent types, the language for stating claims worth proving.

Type Theory

Humans classify. We sort animals into species, books into genres, people into roles, and programmers into those who have mass-assigned any to silence the compiler and those who are lying about it. Classification is how finite minds manage infinite variety. Types are classification for computation: every value belongs to a type, and the type determines what operations make sense. You can add numbers but not strings. You can take the length of a list but not the length of an integer. The type system enforces these distinctions before your program runs, which sounds obvious until you remember that most of the world’s software runs on languages where the type system’s considered opinion is “looks plausible” right up until production catches fire.

This seems pedestrian until you push it. What if types could say not just “this is a list” but “this is a list of exactly five elements”? What if they could say not just “this function returns an integer” but “this function returns a positive integer”? What if the type of a function could express its complete specification, so that any implementation with that type is correct by construction?

Dependent type theory answers yes to all of these. It is the most expressive type system in common use, and it blurs the line between programming and mathematics. A type becomes a proposition. A program becomes a proof. The compiler becomes a theorem checker. This is not metaphor; it is the Curry-Howard correspondence that we met in the previous article, now unleashed to its full power.

(The correspondence runs deeper than logic and computation. Category theory provides a third vertex: types correspond to objects, functions to morphisms, and the equations governing programs to commutative diagrams. This three-way relationship, sometimes called computational trinitarianism or the Curry-Howard-Lambek correspondence, means that insights from any vertex illuminate the others. A categorical construction suggests a type former; a type-theoretic proof technique suggests a logical inference rule; a logical connective suggests a categorical limit. The triangle constitutes a precise mathematical isomorphism, providing a conceptual map for navigating modern type theory.)

Logic	Type Theory	Category Theory
Proposition	Type	Object
Proof	Term / Program	Morphism
Implication $P \to Q$	Function type `A → B`	Exponential object $B^A$
Conjunction $P \land Q$	Product type `A × B`	Product $A \times B$
Disjunction $P \lor Q$	Sum type `A ⊕ B`	Coproduct $A + B$
True $\top$	Unit type `Unit`	Terminal object $1$
False $\bot$	Empty type `Empty`	Initial object $0$
Universal $\forall x. P(x)$	Dependent product `(x : A) → B x`	Right adjoint to pullback
Existential $\exists x. P(x)$	Dependent sum `(x : A) × B x`	Left adjoint to pullback

The Ladder of Expressiveness

Type systems form a ladder. Each rung lets you say more.

Simple types (C, Java): Values have types. int, string, bool. You cannot add a string to an integer. This catches typos and category errors, but nothing deeper.

Polymorphic types (Haskell, OCaml): Types can be parameterized. List α works for any element type. You write one reverse function that works on lists of integers, strings, or custom objects. The type ∀ α. List α → List α says “for any type α, give me a list of α and I’ll return a list of α.”

Dependent types (Lean, Coq, Agda): Types can depend on values. Vector α n is a list of exactly n elements. The number n is a value that appears in the type. Now the type system can express array bounds, matrix dimensions, protocol states, and any property you can state precisely.

The jump from polymorphic to dependent types is where things get interesting. Consider matrix multiplication. Two matrices can only be multiplied if the columns of the first equal the rows of the second. With dependent types:

Matrix m n → Matrix n p → Matrix m p

The shared n enforces compatibility at compile time. Multiply a $3 \times 4$ by a $5 \times 2$? Type error. The bug is caught before any code runs. Your linear algebra homework now has compile errors, which is somehow both better and worse.

The lambda cube formalizes these distinctions. Starting from the simply typed lambda calculus at the origin, each axis adds a new kind of abstraction:

x-axis (Polymorphism): Terms can abstract over types. Moving along this axis gives you generic functions like id : ∀ α, α → α that work uniformly across all types.
y-axis (Type Operators): Types can abstract over types. This axis adds type-level functions like List or Map k that take types as arguments and produce new types.
z-axis (Dependent Types): Types can depend on terms. Moving along this axis allows types like Vector α n where the type itself contains a value.

The eight vertices represent all possible combinations of these three features:

λ→ (STLC): The simply typed lambda calculus, with only term-to-term functions. (C, Pascal)
λ2 (System F): Adds polymorphism, where terms can abstract over types. (Haskell core, ML)
λω: Adds type operators, where types can abstract over types. (Rarely implemented alone)
λP (LF): Adds dependent types, where types can depend on terms. (Twelf, LF)
λω̲ (System Fω): Combines polymorphism and type operators. (Haskell with type families, OCaml modules)
λP2: Combines polymorphism and dependent types. (Rare in practice)
λPω̲: Combines dependent types and type operators. (Rare in practice)
λC (CoC): The Calculus of Constructions, combining all three axes. (Lean, Coq, Agda)

Lean sits at λC, the corner where all three meet.

The Lambda Cube — The lambda cube. Each axis represents an abstraction mechanism: polymorphism allows terms to depend on types, type operators allow types to depend on types, and dependent types allow types to depend on terms. Moving along an edge adds that capability to the system.

Types as First-Class Citizens

In simple type systems, types and values live in separate worlds. You cannot write a function that takes a type as an argument or returns a type as a result. The wall between them is absolute.

Dependent types tear down this wall. Types become values. You can compute with them, pass them to functions, store them in data structures. The function that constructs Vector Int n takes a number n and returns a type. This uniformity is what makes the whole system work: if types can depend on values, then types must be values.

The theoretical foundations trace through the 20th century: Church’s simply typed lambda calculus, Martin-Löf’s intuitionistic type theory that unified logic and computation, and various attempts to resolve paradoxes that plagued early formal systems. Lean implements a refinement called the Calculus of Inductive Constructions, which adds inductive types and a hierarchy of universes to keep everything consistent. Understanding why that hierarchy exists requires a detour into the history of mathematics.

The practical experience differs from conventional programming. Types become more informative but also more demanding. You must often provide proofs alongside your code, demonstrating that values satisfy required properties. The compiler becomes an adversary that checks your reasoning at every step, as we saw with tactics. When a program type-checks, you gain strong guarantees about its behavior. When it fails, the error messages guide you toward the gap between what you claimed and what you proved.

The Foundational Crisis

This section covers historical background and can be skipped without losing the thread.

By the late 19th century, mathematics faced a crisis of foundations. Mathematicians had built analysis on set theory, set theory on logic, and logic on intuition. The foundations kept shifting. Georg Cantor’s work on infinite sets produced results that seemed paradoxical. The question became urgent: could mathematics be placed on a foundation that was provably secure?

Russell’s Paradox

In 1901, Bertrand Russell sent a letter to Gottlob Frege, who had just completed his life’s work: a logical foundation for all of mathematics. Russell’s letter contained a single question. Consider the set $R$ of all sets that do not contain themselves. Does $R$ contain itself? If yes, then by definition it should not. If no, then by definition it should. Frege’s system was inconsistent. His life’s work collapsed. He wrote back: “Hardly anything more unfortunate can befall a scientific writer than to have one of the foundations of his edifice shaken after the work is finished.”

This is the danger of self-reference. A set that asks about its own membership. A sentence that asserts its own falsehood. A type that contains itself. These constructions look innocent but harbor contradictions. Mathematics needed walls to prevent them.

Hilbert’s Program

David Hilbert proposed an ambitious response. His program, articulated in the 1920s, aimed to formalize all of mathematics in a finite, complete, and consistent axiomatic system. Complete meant every true statement could be proved. Consistent meant no contradiction could be derived. The dream was a mechanical procedure that could, in principle, determine the truth of any mathematical claim. Mathematics would become a closed system, immune to further crisis.

Principia Mathematica, published by Russell and Whitehead between 1910 and 1913, was the most sustained attempt at this vision. Three volumes, nearly 2000 pages, laboriously deriving mathematics from logical axioms. The proof that $1 + 1 = 2$ appears on page 379 of the second volume. The work demonstrated that formalization was possible but also hinted at its costs. The notation was impenetrable, the proofs were tedious, and the system still required axioms whose consistency could not be established from within.

Gödel’s Incompleteness Theorems

Two decades after Principia, Kurt Gödel showed that the consistency problem was not a limitation of Russell’s system but an inescapable feature of mathematics itself. His incompleteness theorems of 1931 proved that any consistent formal system powerful enough to express arithmetic contains true statements that cannot be proved within the system. The first theorem says completeness is impossible: there will always be truths beyond the reach of your axioms. The second theorem is worse: such a system cannot prove its own consistency. The tools Hilbert wanted to use to secure mathematics are necessarily inadequate for the task. You cannot lift yourself by your own bootstraps.

What This Means for Lean

This might seem to doom the entire enterprise of formal verification. If mathematics cannot be complete, if consistency cannot be proved, what is the point of proof assistants?

The answer is that Lean is not attempting Hilbert’s program. Nobody believes Mathlib will eventually contain all mathematical truth or that its foundations can be proved consistent using only its own axioms. The goals are more modest and more practical. What Lean actually provides is mechanical verification of derivations, not philosophical certainty about foundations.

Lean’s kernel accepts a small set of axioms: the rules of the Calculus of Inductive Constructions, plus optional classical principles like the law of excluded middle and the axiom of choice. These axioms are not provably consistent from within the system. They are simply accepted, much as working mathematicians accept ZFC set theory without demanding a consistency proof that Gödel showed cannot exist. Given these axioms, is this proof valid? That question has a definite answer, and Lean provides it.

Yes, there exist true statements about natural numbers that Lean cannot prove. Yes, Lean cannot prove its own consistency. But these limitations do not prevent you from formalizing the theorems mathematicians actually care about. The prime number theorem, the fundamental theorem of calculus, the classification of finite simple groups: none of these bump against incompleteness. The unprovable statements Gödel constructs are specifically engineered to be unprovable. They are curiosities, not obstacles to mathematical practice.

You have not solved Hilbert’s problem. You have sidestepped it. The foundations rest on trust in a small kernel and a handful of axioms that the mathematical community has examined for decades without finding contradiction. This is not absolute certainty, but it is far more than hand-waving. Principia Mathematica failed because it tried to be a closed system answering every question from first principles. Mathlib succeeds because it tries to be a library: a growing collection of verified results that mathematicians can use, extend, and build upon. The goal is not to end mathematics but to record it in a form that machines can check. That turns out to be achievable, useful, and entirely compatible with Gödel’s theorems.

With the philosophical groundwork laid, we can examine how type theory actually prevents the paradoxes that plagued earlier systems.

Universe Stratification

Type theory builds walls against self-reference through stratification. Types are organized into a hierarchy of universes. In Lean, Prop sits at Sort 0, Type 0 sits at Sort 1, Type 1 sits at Sort 2, and so on. A type at level n can only mention types at levels below n. The type Type 0 itself has type Type 1, not Type 0. This breaks the self-reference that doomed Frege’s system. You cannot ask whether Type contains itself because Type is not a single thing; it is an infinite ladder, and each rung can only see the rungs below.

universe u v w

-- Universe polymorphism (explicit universe level)
def polyIdentity (α : Sort u) (a : α) : α := a

-- Universe level expressions
def maxLevel (α : Type u) (β : Type v) : Type (max u v) := α × β

-- Type 0 contains types, Type 1 contains Type 0, etc.
example : Type 0 = Sort 1 := rfl
example : Prop = Sort 0 := rfl

-- Impredicative Prop: functions into Prop stay in Prop
def propPredicate (P : Type u → Prop) : Prop := ∀ α, P α

-- Predicative Type: function types take maximum universe
def typePredicate (P : Type u → Type v) : Type (max (u+1) v) :=
  ∀ α, P α

Universe Hierarchy — *The universe hierarchy: Prop sits at the bottom, with Type levels stacking infinitely above.*

When you write universe u v w in Lean, you are declaring universe level variables. The declaration lets you define functions that work at any universe level. When you write def polyIdentity (α : Sort u) (a : α) : α := a, you are defining a function that works across the entire hierarchy. The Sort u includes both Prop (when u = 0) and Type n (when u = n + 1). This universe polymorphism lets you write single definitions that work everywhere.

Predicativity

Here is a rule that sounds obvious until you think about it: you cannot be in the photograph you are taking. The photographer stands outside the frame. A committee that selects its own members creates paradoxes of legitimacy. A definition that refers to a collection containing itself is suspect. This intuition, that the definer must stand apart from the defined, is called predicativity.

Imagine a monastery where knowledge is organized into concentric walls. Acolytes in the outer ring may study only texts from their own ring. Scholars who wish to reference the entire outer collection must do so from the second ring, looking inward. Those who would survey the second ring must stand in the third. And so on, each level permitted to see only what lies below. No scholar may cite a collection that includes their own work. The hierarchy prevents the serpent from eating its tail.

This is how predicative universes work. When you quantify over all types at level n, the resulting type lives at level n+1. The definition “for all types α in Type 0, the type α → α” must itself live in Type 1 because it speaks about the entirety of Type 0. You cannot make universal claims about a collection while remaining inside it. The quantification must ascend.

-- Predicative: quantifying over Type u produces Type (u+1)
-- The result must be "larger" than what we quantify over
def predicativeExample : Type 1 := ∀ (α : Type 0), α → α

-- Check the universe levels explicitly
#check (∀ (α : Type 0), α → α : Type 1)  -- Lives in Type 1
#check (∀ (α : Type 1), α → α : Type 2)  -- Lives in Type 2

-- Impredicative: quantifying over Prop still produces Prop
-- Prop can "talk about itself" without ascending universe levels
def impredicativeExample : Prop := ∀ (P : Prop), P → P

#check (∀ (P : Prop), P → P : Prop)  -- Still in Prop, not Type 0!

-- Why this matters: classical logic requires impredicativity
-- "For all propositions P, P or not P" is itself a proposition
def excludedMiddleType : Prop := ∀ (P : Prop), P ∨ ¬P

-- If Prop were predicative, this would be Type 1, breaking classical reasoning
-- Impredicativity lets us define propositions that quantify over all propositions

-- The danger: unrestricted impredicativity leads to paradox
-- Girard's paradox shows Type : Type is inconsistent
-- Lean avoids this by making only Prop impredicative

Lean’s Type hierarchy is predicative: ∀ (α : Type 0), α → α has type Type 1, not Type 0. This prevents Girard’s paradox, a type-theoretic version of Russell’s paradox that arises when Type : Type. The infinite regress of universes is the price of consistency.

Non-Cumulativity

In a cumulative type theory, every type at universe level n is automatically also a type at level n+1 and all higher levels. Coq and Idris work this way: if you have Nat : Type 0, you can use Nat anywhere a Type 1 is expected. The type “flows upward” through the hierarchy without explicit intervention. This makes polymorphic code more convenient since you rarely need to think about universe levels.

Lean takes the opposite approach. Each type lives at exactly one universe level. Nat has type Type 0 and only Type 0. If a function expects a Type 1 argument, you cannot pass Nat directly. You must explicitly lift it using ULift or PLift, wrapper types that move values to higher universes.

-- In Lean, Nat has type Type 0, not Type 1
#check (Nat : Type 0)    -- works
-- #check (Nat : Type 1) -- would fail: Nat is not in Type 1

-- A function expecting Type 1 cannot accept Nat directly
def wantsType1 (α : Type 1) : Type 1 := α

-- This fails: Nat lives in Type 0, not Type 1
-- def broken := wantsType1 Nat

-- Solution: explicitly lift Nat to Type 1
def works := wantsType1 (ULift Nat)

-- In a cumulative system (like Coq), this would work:
-- def coqStyle := wantsType1 Nat  -- Coq allows this

-- Practical example: polymorphic container at higher universe
def Container (α : Type 1) := List α

-- Cannot directly use Nat
-- def natContainer := Container Nat  -- fails

-- Must lift first
def natContainer := Container (ULift Nat)

-- Extracting values requires unlifting
def sumLifted (xs : List (ULift Nat)) : Nat :=
  xs.foldl (fun acc x => acc + x.down) 0

Note

This explicit lifting makes universe structure visible in your code. You always know exactly which universe level you are working at. The tradeoff is verbosity: code that would “just work” in Coq requires explicit lifts in Lean. In practice, most Lean code stays within Type 0 and Prop, so non-cumulativity rarely causes friction.

The World of Prop

Lean’s universe hierarchy has a special member at the bottom: Prop, the universe of propositions. Unlike Type, which holds computational data, Prop holds logical statements. This distinction enables two features that would be dangerous elsewhere: impredicativity and proof irrelevance. Together, they make Prop a safe space for classical reasoning.

Impredicativity of Prop

Prop breaks the predicativity rule. While ∀ (α : Type 0), α → α must live in Type 1, the analogous ∀ (P : Prop), P → P has type Prop, staying at the same level despite quantifying over all propositions. The monastery has a secret inner sanctum where the old restrictions do not apply.

-- PLift: lifts any type by exactly one level
def liftedFalse : Type := PLift False

-- ULift: lifts types by any amount
def liftedNat : Type u := ULift.{u} Nat

-- Lifting and unlifting values
def liftExample : ULift.{1} Nat := ⟨42⟩
example : liftExample.down = 42 := rfl

-- Non-cumulativity: types exist at exactly one level
def needsLifting (α : Type 1) : Type 2 := ULift.{2} α

This matters for classical logic. The law of excluded middle, ∀ (P : Prop), P ∨ ¬P, quantifies over all propositions. If Prop were predicative, this would live in Type 0, making it a computational object rather than a logical axiom. But how is impredicativity safe here when it causes paradoxes elsewhere?

Proof Irrelevance

The answer is proof irrelevance. A bear catching a salmon does not care whether the fish swam upstream via the left channel or the right. Proof irrelevance applies this principle to mathematics: any two proofs of the same proposition are equal. If you have two proofs p1 and p2 that both establish proposition P, then p1 = p2 holds definitionally. We care that the theorem is true, not which path led there.

-- Basic logical connectives
theorem and_intro (P Q : Prop) (hp : P) (hq : Q) : P ∧ Q := ⟨hp, hq⟩

theorem or_elim (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R :=
  h.elim hp hq

theorem iff_intro (P Q : Prop) (hpq : P → Q) (hqp : Q → P) : P ↔ Q :=
  ⟨hpq, hqp⟩

-- Proof irrelevance demonstration
theorem proof_irrel_demo (P : Prop) (p1 p2 : P) : p1 = p2 := rfl

-- Classical logic (via choice)
open Classical in
theorem excluded_middle (P : Prop) : P ∨ ¬P := Classical.em P

The technical foundation is that Prop is a subsingleton universe. A subsingleton is a type with at most one element. For any proposition P, there is at most one proof of P up to definitional equality. This contrasts with Type, where Bool has two distinct elements true and false, and Nat has infinitely many.

Proof irrelevance is what makes impredicativity safe. You cannot extract computational content from an impredicative definition over propositions because there is nothing to extract; all witnesses are indistinguishable. The dangerous circularity is defanged. The serpent may eat its tail here because the tail has no substance.

Computational Erasure

Proof irrelevance has profound computational implications. Because all proofs of a proposition are equal, the compiler can erase proofs at runtime. A function that takes a proof argument does not actually need to receive any runtime data for that argument. This erasure is essential for performance: without it, complex proofs would bloat compiled code with useless proof terms. Your elaborate justification for why the code is correct compiles down to nothing, much like comments but with mathematical guarantees.

Proof irrelevance also enables powerful automation. When a tactic constructs a proof term, the exact structure of that term does not matter. The tactic can use whatever construction is convenient, and the result will be equal to any other proof of the same statement. This freedom simplifies tactic implementation and allows for aggressive optimization of proof search.

Constructive and Classical Logic

Lean’s type theory is constructive at its core. A constructive proof of existence must provide a witness: to prove ∃ n, P n, you must exhibit a specific n and show P n holds. You cannot merely argue that non-existence leads to contradiction. This discipline has a profound consequence: constructive proofs are programs. A proof of ∃ n, n * n = 4 contains the number 2. You can extract it and compute with it. The categorical semantics of this intuitionistic logic is the theory of toposes, where every topos provides a model in which constructive reasoning holds.

-- Constructive proof: we provide an explicit witness
theorem constructive_exists : ∃ n : Nat, n * n = 4 :=
  ⟨2, rfl⟩  -- The witness is 2, and we can compute 2 * 2 = 4

-- Constructive: we can extract and run the witness
def constructive_even : { n : Nat // n % 2 = 0 } :=
  ⟨4, rfl⟩  -- The subtype bundles value with proof

#eval constructive_even.val  -- Outputs: 4

-- The law of excluded middle itself (classical axiom)
theorem lem (P : Prop) : P ∨ ¬P := Classical.em P

-- Double negation elimination (classical, not constructive)
-- In constructive logic, ¬¬P does not imply P
theorem dne (P : Prop) : ¬¬P → P := Classical.byContradiction

-- Classical proof by contradiction: no even number is odd
theorem even_not_odd (n : Nat) : n % 2 = 0 → ¬(n % 2 = 1) := by
  intro heven hodd
  omega

Classical logic adds axioms that break this computational interpretation. The law of excluded middle (P ∨ ¬P for any proposition) lets you prove existence by contradiction without producing a witness. Double negation elimination (¬¬P → P) lets you escape a double negation without constructing a direct proof. These principles are mathematically sound but computationally empty. When you prove something exists using excluded middle, the proof does not contain the thing that exists.

Lean provides classical axioms through the Classical namespace. When you use Classical.em or tactics like by_contra, you are stepping outside constructive logic. Lean tracks this: definitions that depend on classical axioms are marked noncomputable, meaning they cannot be evaluated at runtime.

-- Classical choice: given existence, extract a witness
-- This is noncomputable because we cannot run it
noncomputable def classical_witness (P : Nat → Prop) (h : ∃ n, P n) : Nat :=
  Classical.choose h

-- The witness satisfies the property (but we cannot compute what it is)
theorem classical_witness_spec (P : Nat → Prop) (h : ∃ n, P n) :
    P (classical_witness P h) :=
  Classical.choose_spec h

-- Contrast: decidable existence gives computable witnesses
def decidable_witness (p : Nat → Bool) (bound : Nat) : Nat :=
  -- We can search by enumeration because the domain is finite
  (List.range bound).find? (fun n => p n) |>.getD 0

-- The key insight: constructive proofs compute, classical proofs assert

Why does this matter? For pure mathematics, classical reasoning is often more convenient. Many standard proofs use contradiction freely. But for verified programming, constructive proofs have an advantage: they produce code. A constructive proof that a sorting algorithm returns a sorted list can be extracted into an actual sorting function. A classical proof merely asserts the sorted list exists.

The practical guidance: use constructive methods when you can, classical when you must. Lean supports both. When you see noncomputable on a definition, you know it relies on classical axioms and cannot be executed. When a definition lacks that marker, it is constructive and can run. The type system tracks the distinction so you always know which world you are in.

Type Equivalences

When are two types “the same”? Having functions in both directions is not enough. You can map Bool to Nat and back, but Nat has infinitely many values while Bool has two. The round-trip loses information.

An equivalence $A \simeq B$ requires functions in both directions that are mutual inverses: composing them in either order gives back the original value. This captures the idea that $A$ and $B$ have the same structure, just with different names for their elements.

-- Two types are "the same" when they are equivalent
-- An equivalence requires functions in both directions that are mutual inverses

variable {α β γ : Type*}

-- Having functions both ways is NOT enough
def boolToNat : Bool → Nat
  | true => 1
  | false => 0

def natToBool : Nat → Bool
  | 0 => false
  | _ => true

-- These are NOT inverses: natToBool (boolToNat true) = true, but
-- boolToNat (natToBool 42) = 1, and natToBool 1 = true ≠ 42

-- A proper equivalence between Unit ⊕ Unit and Bool
-- Both have exactly two elements
def sumUnitEquivBool : Unit ⊕ Unit ≃ Bool where
  toFun | .inl () => false | .inr () => true
  invFun | false => .inl () | true => .inr ()
  left_inv := by intro x; cases x <;> rfl
  right_inv := by intro b; cases b <;> rfl

-- Equivalences compose
example : (α ≃ β) → (β ≃ γ) → (α ≃ γ) := Equiv.trans

-- Equivalences are symmetric
example : (α ≃ β) → (β ≃ α) := Equiv.symm

-- Every type is equivalent to itself
example : α ≃ α := Equiv.refl α

-- Product types commute (up to equivalence)
def prodComm : α × β ≃ β × α where
  toFun p := (p.2, p.1)
  invFun p := (p.2, p.1)
  left_inv := by intro ⟨a, b⟩; rfl
  right_inv := by intro ⟨b, a⟩; rfl

-- Currying is an equivalence
def curryEquiv : (α × β → γ) ≃ (α → β → γ) where
  toFun f a b := f (a, b)
  invFun g p := g p.1 p.2
  left_inv := by intro f; ext ⟨a, b⟩; rfl
  right_inv := by intro g; rfl

Equivalences form a well-behaved notion of sameness. They are reflexive (every type is equivalent to itself), symmetric (if $A \simeq B$ then $B \simeq A$), and transitive (equivalences compose). This makes them an equivalence relation on types, which is exactly what you want for a notion of “sameness.”

The distinction matters in mathematics and programming alike. When you prove that two types are equivalent, you can transport theorems and constructions between them. If you prove a property about Bool, the equivalence $\text{Unit} \oplus \text{Unit} \simeq \text{Bool}$ lets you conclude the same property holds for the sum type. Mathlib uses equivalences extensively to connect different representations of the same mathematical structure.

Quotients and Parametricity

Quotient types allow you to define new types by identifying elements of an existing type according to an equivalence relation. The integers modulo n, for example, identify natural numbers that have the same remainder when divided by n. Quotients are essential for constructing mathematical objects like rational numbers, real numbers, and algebraic structures.

-- Simple modulo equivalence relation
def ModRel (n : Nat) : Nat → Nat → Prop :=
  fun a b => a % n = b % n

-- Prove it's an equivalence relation
theorem ModRel.refl (n : Nat) : ∀ x, ModRel n x x :=
  fun _ => rfl

theorem ModRel_symm (n : Nat) : ∀ x y, ModRel n x y → ModRel n y x :=
  fun _ _ h => h.symm

theorem ModRel.trans (n : Nat) : ∀ x y z, ModRel n x y → ModRel n y z → ModRel n x z :=
  fun _ _ _ hxy hyz => Eq.trans hxy hyz

-- Create setoid instance
instance ModSetoid (n : Nat) : Setoid Nat where
  r := ModRel n
  iseqv := {
    refl := ModRel.refl n
    symm := @ModRel_symm n
    trans := @ModRel.trans n
  }

-- Define the quotient type (integers modulo n)
def ZMod (n : Nat) : Type := Quotient (ModSetoid n)

However, quotients break parametricity. Parametricity is the principle that polymorphic functions must treat their type arguments uniformly. A function of type ∀ α, α → α can only be the identity function because it has no way to inspect what α is. It must work the same way for Nat, String, and any other type. This uniformity yields powerful “free theorems” about polymorphic functions.

Quotients violate this uniformity through the Quot.lift operation. When you lift a function to operate on a quotient type, you must prove that the function respects the equivalence relation. This proof obligation means that functions on quotients can behave differently depending on the specific equivalence relation, breaking the uniformity that parametricity requires.

namespace ZMod

-- Constructor respecting equivalence
def mk (n : Nat) (a : Nat) : ZMod n :=
  Quotient.mk (ModSetoid n) a

-- Addition operation via lifting
def add {n : Nat} [NeZero n] : ZMod n → ZMod n → ZMod n :=
  Quotient.lift₂
    (fun a b => mk n ((a + b) % n))
    (fun a₁ a₂ b₁ b₂ h₁ h₂ => by
      apply Quotient.sound
      simp only [ModSetoid] at h₁ h₂ ⊢
      unfold ModRel at h₁ h₂ ⊢
      rw [Nat.add_mod, h₁, h₂, ← Nat.add_mod]
      rfl)

-- Quotient.sound: related elements are equal
theorem mk_eq_of_rel {n : Nat} (a b : Nat) (h : ModRel n a b) :
  mk n a = mk n b :=
  Quotient.sound h

-- Quotient induction principle
theorem ind_on {n : Nat} {P : ZMod n → Prop} (q : ZMod n)
  (h : ∀ a, P (mk n a)) : P q :=
  Quotient.ind h q

end ZMod

Why is this acceptable? The trade-off is deliberate. Quotients are necessary for mathematics: you cannot construct the integers, rationals, or reals without them. The loss of parametricity is confined to quotient types and does not affect ordinary polymorphic functions. Moreover, the requirement to prove that lifted functions respect equivalence ensures that quotient operations are well-defined. You cannot accidentally distinguish between equivalent elements.

Comparative Type Systems

Different languages make different design choices in their type systems. The following table summarizes key features across proof assistants and programming languages.

Feature	Lean 4	Coq	Agda	Idris 2	Haskell	Rust
Dependent Types	Full	Full	Full	Full	Limited	No
Universe Hierarchy	Predicative	Predicative	Predicative	Predicative	None	None
Universe Cumulativity	No	Yes	No	Yes	N/A	N/A
Proof Irrelevance	Yes (Prop)	Yes (Prop)	Optional	Yes	N/A	N/A
Tactic Language	Lean DSL	Ltac	No	Elab	N/A	N/A
Type Inference	Partial	Partial	Partial	Partial	Full*	Full
Termination Checking	Required	Required	Required	Optional	No	No
Linear Types	No	No	No	QTT	Extension	Ownership
Effects System	Monad	Monad	Monad	Algebraic	Monad	Ownership
Code Generation	Native	OCaml/Haskell	Haskell	Native	Native	Native
Cubical Type Theory	No	No	Yes	No	No	No
Decidable Type Checking	No	No	No	No	Yes*	Yes

Glossary:

Ltac: Coq’s original tactic language, a dynamically-typed scripting language for proof automation
QTT: Quantitative Type Theory, tracks how many times each variable is used to enable linear resource management
Predicative: A universe is predicative if quantifying over types at level n produces a type at level n+1 or higher
Cumulativity: Whether a type at level n is automatically also at level n+1
*: Haskell 2010 has full type inference and decidable type checking, but enabling extensions (GADTs, TypeFamilies, RankNTypes, UndecidableInstances) may require type annotations or introduce undecidability

Lean and Coq provide full dependent types with rich proof automation, making them suitable for formal verification. Agda emphasizes explicit proof terms and supports cubical type theory for constructive equality, connecting to homotopy type theory and higher topos theory. Idris 2 uses quantitative type theory to track resource usage, bridging the gap between theorem proving and systems programming.

Haskell approaches dependent types through extensions like GADTs, DataKinds, and type families. Base Haskell maintains decidable type checking, but common extensions can introduce undecidability. The language proved enormously influential as a research vehicle, but its industrial adoption has stalled and its type system extensions increasingly resemble a Rube Goldberg machine for approximating features that dependent types provide directly. For new projects requiring strong static guarantees, Haskell represents something of an evolutionary dead end: powerful enough to show what is possible, but not powerful enough to deliver it cleanly.

Rust’s ownership system provides memory safety guarantees through affine types, with decidable checking and predictable compile times. However, Rust’s type system proves memory safety, not functional correctness. The borrow checker ensures you will not have use-after-free bugs, but it cannot verify that your sorting algorithm actually sorts or that your cryptographic protocol maintains confidentiality. Proving Rust code correct requires external tools like Prusti, Creusot, Kani, or Verus, each with significant limitations: some cannot handle unsafe code, others require bounded verification, and all demand substantial annotation effort. The Rust Formal Methods Interest Group coordinates ongoing research, but verifying real-world Rust remains an open problem. We touch on these challenges in Verification.

A common critique of Lean is its lack of linear or affine types, which would enable compile-time guarantees about resource usage and in-place mutation. The Lean developers chose instead to rely on runtime reference counting with FBIP optimizations, trading static linearity guarantees for simpler types and the ability to share data freely without borrow checker complexity.

The table above looks like a feature comparison. It is actually a map of philosophical commitments. Each row is a question about the nature of computation; each column is an answer. The language you choose chooses what thoughts are easy to think.

The fundamental trade-off is expressiveness versus automation. Full dependent types let you express arbitrary properties but require manual proof effort. Decidable type systems like Rust and Haskell infer types automatically but cannot express many important invariants. Choose based on whether you need machine-checked proofs or just strong static guarantees.

In short: Lean and Coq make you prove everything is correct, Rust makes you prove memory is safe, Haskell makes you prove effects are tracked, and most other languages just trust you not to ship on Friday.

Where Types Meet Values

Type theory provides the foundation. The next article explores dependent types in depth: how types can depend on values, how propositions become types, and how proofs become programs. This is where Lean’s power as a theorem prover emerges from its foundations as a programming language.

Dependent Types

Why bother with all this? The honest answer is that most working programmers will never need dependent types, the same way most drivers will never need to understand engine timing. The dishonest answer is that dependent types will make you more productive. The true answer is somewhere stranger: ordinary type systems cannot express the constraints that actually matter. You can say “this is a list” but not “this is a list of exactly five elements.” You can say “this function returns an integer” but not “this function returns a positive integer smaller than its input.” Every time you write a comment explaining what a function really does, every time you add a runtime check for an invariant the compiler cannot see, every time a bug slips through because the types were not precise enough, you are paying the cost of an insufficiently expressive type system. The comment is a wish. The runtime check is a prayer. Dependent types are a contract.

Dependent types are the solution. They let types talk about values. The type Vector α 5 denotes a list of exactly five elements. The type Fin n represents a natural number provably less than n. Array bounds checking happens at compile time. Protocol state machines live in the types. The invariants you used to hope were true become things the compiler verifies.

Per Martin-Löf spent the 1970s in Sweden developing this type theory where types could depend on values, not just on other types. The idea seems simple enough: if List can be parameterized by an element type, why not also by its length? But this small step has profound consequences. Suddenly types become a specification language. A function returning Vector α n does not merely return a list; it returns a list of exactly n elements, verified at compile time. Polymorphic type systems like those in Haskell (built on System FC, an extension of System Fω with type equality coercions) and OCaml (an extended ML core with row polymorphism and first-class modules) stop at the water’s edge here. Dependent types let you wade in, expressing invariants that would otherwise live only in documentation or runtime checks. Lean’s type system, based on the Calculus of Inductive Constructions, provides the full machinery: types that compute, proofs that are programs, and specifications precise enough to replace testing with theorem proving.

Warning

This section is dense and notation-heavy. If the mathematical symbols start to blur together, that is not a personal failing. It is the appropriate response to notation that took logicians fifty years to stabilize and still varies between textbooks. Skip ahead and return later. Or don’t. Some people read this material linearly, some spiral in from the edges, some open to random pages and see what sticks. There is no wrong way to learn type theory except to believe you should have understood it faster. The notation table below is a reference, not a prerequisite.

Notation

Concept	Mathematical Notation	Lean Syntax	Description
Function type	$\alpha \to \beta$	`α → β`	Non-dependent function from α to β
Dependent function	$\Pi (x : \alpha), \beta(x)$	`(x : α) → β x`	Function where return type depends on input
For all	$\forall x : \alpha, P(x)$	`∀ x : α, P x`	Universal quantification
Exists	$\exists x : \alpha, P(x)$	`∃ x : α, P x`	Existential quantification
Lambda abstraction	$\lambda x. t$	`fun x => t` or `λ x => t`	Anonymous function
Equivalence	$a \equiv b$	N/A	Definitional equality
Conjunction	$P \land Q$	`P ∧ Q`	Logical AND
Disjunction	$P \lor Q$	`P ∨ Q`	Logical OR
Negation	$\neg P$	`¬P`	Logical NOT
Type universe	$\text{Type}_n$	`Type n`	Universe of types at level n
Proposition universe	$\text{Prop}$	`Prop`	Universe of propositions
Sort hierarchy	$\text{Sort}_n$	`Sort n`	Unified universe hierarchy
Quotient type	$\alpha/{\sim}$	`Quotient s`	Type obtained by quotienting α by relation ∼
Natural numbers	$\mathbb{N}$	`Nat` or `ℕ`	Natural numbers type
Integers	$\mathbb{Z}$	`Int` or `ℤ`	Integer type
Sigma type	$\Sigma (x : \alpha), \beta(x)$	`Σ x : α, β x`	Dependent pair type
Product type	$\alpha \times \beta$	`α × β`	Cartesian product

Type System Overview

Lean’s type system supports definitional equality through several reduction rules. Two terms are definitionally equal when one reduces to the other, and the type checker treats them as interchangeable without proof.

Beta reduction ($\beta$) is function application. When you apply $(\lambda x. t)$ to an argument $s$, the result is $t$ with $s$ substituted for $x$. This is the computational heart of the lambda calculus.

Delta reduction ($\delta$) unfolds definitions. When you define def f x := x + 1, any occurrence of f 3 can be replaced with 3 + 1. The type checker sees through your naming conventions.

Iota reduction ($\iota$) handles pattern matching on inductive types. When a recursor meets a constructor, the match fires and computation proceeds. This is how Nat.rec applied to Nat.succ n knows to take the successor branch.

Zeta reduction ($\zeta$) substitutes let-bound variables. The expression $\text{let } x := s \text{ in } t$ reduces to $t[s/x]$. Local definitions are just convenient names.

example : (fun x => x + 1) 2 = 3 := rfl  -- β-reduction

def myConst := 42
example : myConst = 42 := rfl  -- δ-reduction

example : let x := 5; x + x = 10 := rfl  -- ζ-reduction

Functions

Function types are a built-in feature of Lean. Functions map values from one type (the domain) to another type (the codomain). In Lean’s core language, all function types are dependent. Non-dependent function types are just a special case where the parameter does not appear in the codomain.

Non-Dependent vs Dependent Functions

Non-dependent functions have a fixed return type that does not vary based on the input value. These are the only kinds of functions available in languages like Haskell (not :: Bool -> Bool) and OCaml (let not : bool -> bool). In Lean, def not : Bool → Bool uses the arrow notation to indicate non-dependence. The return type is always Bool, regardless of which boolean you pass in.

Dependent functions have a return type that can depend on the actual value of the input. The type is written as $\Pi (x : \alpha), \beta(x)$ or (x : α) → β x in Lean syntax, where the parameter name x appears in the return type β x. This feature has no equivalent in Haskell or OCaml.

Key insight: Dependent functions can return values from completely different types based on their input! This is sometimes called a dependent product (or Pi type) because it corresponds to an indexed product of sets.

Tip

The name “dependent product” may seem backwards since we are building functions, not pairs. The terminology comes from set theory: a function $f : \Pi (x : A), B(x)$ assigns to each $x \in A$ an element of $B(x)$, which is precisely an element of the Cartesian product $\prod_{x \in A} B(x)$. The “product” is over all possible inputs.

Why Dependent Types Matter

Consider this function that cannot be typed in Haskell or OCaml:

-- Return type depends on runtime value - impossible in Haskell/OCaml
def dependentTwo (b : Bool) : if b then Unit × Unit else String :=
  match b with
  | true => ((), ())    -- Returns a pair when b is true
  | false => "two"      -- Returns a string when b is false

The return type literally changes based on the runtime value. Call two true and you get a Unit × Unit. Call two false and you get a String. This should feel slightly transgressive. A function that returns different types? In most languages, this is either impossible or requires erasing all type information and hoping for the best. Here, the type system tracks it precisely. The function is total, the types are known, the compiler is satisfied.

-- Dependent pattern matching: types change based on scrutinee

-- Return type depends on the boolean value
def boolToType (b : Bool) : Type :=
  if b then Nat else String

def boolExample (b : Bool) : boolToType b :=
  match b with
  | true => (42 : Nat)  -- returns Nat
  | false => "hello"    -- returns String

-- Pattern matching on Fin must handle all cases
def finToString : Fin 3 → String
  | 0 => "zero"
  | 1 => "one"
  | 2 => "two"

This enables encoding invariants directly in types. For example, Vector α n encodes the length n in the type itself, making it impossible to write functions that violate length constraints. Your off-by-one errors become compile-time errors. The compiler catches at build time what you would otherwise discover in production, at 3am, with the on-call phone ringing.

Typing Rules for Functions

Two rules govern how types flow through function definitions and applications. The first is application: when you apply a function to an argument, the return type can mention the argument itself. If $f : \Pi (x : \alpha), \beta(x)$ and you apply it to some $a : \alpha$, the result $f \, a$ has type $\beta(a)$. The type of the output depends on the value of the input. This is the essence of dependent typing. A function Vector.head : (n : Nat) → Vector α (n + 1) → α applied to 3 yields a function expecting a Vector α 4. The 3 propagates into the type.

The second rule is abstraction: to construct a function, you assume a variable of the input type and produce a term of the output type. If $t : \beta$ under the assumption $x : \alpha$, then $\lambda x : \alpha. \, t$ has type $\Pi (x : \alpha), \beta$. The abstraction binds the variable and packages the assumption into the function type. When $\beta$ does not mention $x$, this collapses to the familiar non-dependent arrow $\alpha \to \beta$.

Beyond formation and elimination, functions satisfy eta-reduction: wrapping a function in a lambda that immediately applies it produces the same function. Formally, $\lambda x. \, f \, x \equiv f$ when $x$ does not appear free in $f$. This goes beyond simplification; it expresses extensionality: a function is determined by what it does to its arguments, not by how it is written.

Examples: Dependent and Non-Dependent Functions

-- Non-dependent function: return type is fixed
-- Similar to Haskell: not :: Bool -> Bool
-- or OCaml: let not : bool -> bool
def double (n : Nat) : Nat := n * 2

-- Another non-dependent example (like Haskell's const)
def constantFive (_ : Nat) : Nat := 5

-- Dependent function: return type depends on the input value
-- This has NO equivalent in Haskell or OCaml!
def makeVec (n : Nat) : Fin (n + 1) := ⟨n, Nat.lt_succ_self n⟩

-- A more dramatic dependent function example
-- The return TYPE changes based on the input VALUE
def two (b : Bool) : if b then Unit × Unit else String :=
  match b with
  | true => ((), ())    -- Returns a pair of units
  | false => "two"      -- Returns a string

-- This function returns different TYPES based on input:
-- two true : Unit × Unit
-- two false : String

-- Dependent pairs: the second type depends on the first value
def dependentPair : (n : Nat) × Fin n :=
  ⟨5, 3⟩  -- Second component must be less than first

-- Compare with non-dependent versions:
-- Haskell: (Int, Int) -- no constraint between components
-- OCaml: int * int    -- no constraint between components
-- Lean enforces the invariant in the type!

-- Function composition (non-dependent)
def comp {α β γ : Type} (f : β → γ) (g : α → β) : α → γ :=
  fun x => f (g x)

-- This is like Haskell's (.) or OCaml's (@@)
-- But Lean can also do DEPENDENT composition!

-- Dependent function composition
def depComp {α : Type} {β : α → Type} {γ : (x : α) → β x → Type}
    (f : (x : α) → (y : β x) → γ x y)
    (g : (x : α) → β x) :
    (x : α) → γ x (g x) :=
  fun x => f x (g x)

-- Multiple parameters via currying (named after Haskell B. Curry)
def curriedAdd : Nat → Nat → Nat := fun x y => x + y

-- Function extensionality: equal outputs for equal inputs
theorem funext_example (f g : Nat → Nat) (h : ∀ x, f x = g x) : f = g :=
  funext h

Currying

Currying is the fundamental technique of transforming functions with multiple parameters into sequences of single-parameter functions. Named after logician Haskell Curry (yes, the programming language is also named after him), this approach is automatic in Lean. All multi-parameter functions are internally represented as curried functions. This enables partial application, where supplying fewer arguments than a function expects creates a new function waiting for the remaining arguments.

Note

The technique was actually discovered by Moses Schonfinkel in 1924, six years before Curry’s work. Academic naming conventions are not always fair. Schonfinkel’s life ended in obscurity in a Moscow hospital; Curry became a household name among programmers who have never heard of either.

The power of currying lies in its composability. You can create specialized functions by partially applying general ones, building up complex behavior from simple building blocks. While languages like Haskell make currying explicit, Lean handles it transparently, allowing you to work with multi-parameter functions naturally while still benefiting from the flexibility of curried representations.

/-!
### Currying

Currying is the transformation of functions with multiple parameters into
a sequence of functions, each taking a single parameter. In Lean, all
multi-parameter functions are automatically curried.
-/

-- Multi-parameter function (automatically curried)
def add3 (x y z : Nat) : Nat := x + y + z

-- Equivalent to nested lambdas
def add3' : Nat → Nat → Nat → Nat :=
  fun x => fun y => fun z => x + y + z

-- Partial application creates new functions
def add10 : Nat → Nat → Nat := add3 10
def add10And5 : Nat → Nat := add3 10 5

example : add10 3 7 = 20 := rfl
example : add10And5 2 = 17 := rfl

-- Function.curry: Convert uncurried to curried
def uncurriedAdd : Nat × Nat → Nat := fun p => p.1 + p.2
def curriedVer := Function.curry uncurriedAdd

example : curriedVer 3 4 = 7 := rfl

-- Function.uncurry: Convert curried to uncurried
def addPair := Function.uncurry Nat.add
example : addPair (3, 4) = 7 := rfl

-- Currying with dependent types
def depCurry {α : Type} {β : α → Type} {γ : (a : α) → β a → Type}
    (f : (p : (a : α) × β a) → γ p.1 p.2) :
    (a : α) → (b : β a) → γ a b :=
  fun a b => f ⟨a, b⟩

Function Extensionality

Function extensionality is a fundamental principle stating that two functions are equal if and only if they produce equal outputs for all equal inputs. This principle, while intuitively obvious, is not derivable from the other axioms of dependent type theory and must be added as an axiom in Lean. Without extensionality, we could only prove functions equal if they were syntactically identical: the same symbols in the same order.

The funext tactic in Lean implements this principle, allowing us to prove function equality by considering their behavior pointwise. This is essential for mathematical reasoning, where we often want to show that two different definitions actually describe the same function. The principle extends to dependent functions as well, where the output type can vary with the input.

/-!
### Extensionality

Function extensionality states that two functions are equal if they
produce equal outputs for all inputs. This is not provable from the
other axioms and is added as an axiom in Lean.
-/

-- funext: Basic function extensionality
theorem my_funext {α β : Type} (f g : α → β) :
    (∀ x, f x = g x) → f = g :=
  funext

-- Example: Proving function equality
def double' (n : Nat) : Nat := 2 * n
def double'' (n : Nat) : Nat := n + n

theorem doubles_equal : double' = double'' := by
  funext n
  simp [double', double'']
  omega

-- Dependent function extensionality
theorem dep_funext {α : Type} {β : α → Type}
    (f g : (x : α) → β x) :
    (∀ x, f x = g x) → f = g :=
  funext

-- Eta reduction: λ x, f x = f
theorem eta_reduction (f : Nat → Nat) : (fun x => f x) = f :=
  funext fun _ => rfl

-- Functions equal by behavior, not syntax
def addOne : Nat → Nat := fun x => x + 1
def succFunc : Nat → Nat := Nat.succ

theorem addOne_eq_succ : addOne = succFunc := by
  funext x
  simp [addOne, succFunc]

Totality and Termination

Important

All functions in Lean must be total, meaning they must be defined for every possible input of the correct type. This requirement ensures logical consistency: a function that could fail or loop forever would make Lean’s logic unsound. Partiality is the enemy. The function that hangs on edge cases, the recursion that never terminates, the match that forgot a constructor: these are not just bugs but logical contradictions waiting to invalidate your theorems.

To achieve totality while allowing recursion, Lean uses well-founded recursion based on decreasing measures.

For structural recursion on inductive types, Lean automatically proves termination by observing that recursive calls operate on structurally smaller arguments. For more complex recursion patterns, you can specify custom termination measures using termination_by and provide proofs that these measures decrease with decreasing_by. This approach allows expressing any computable function while maintaining logical soundness. If you have ever written while (true) and hoped for the best, this is the universe collecting on that debt.

/-!
### Totality and Termination

All functions in Lean must be total (defined for all inputs) and
terminating. Lean uses well-founded recursion to ensure termination.
-/

-- Total function: defined for all natural numbers
def safeDivide (n : Nat) (m : Nat) : Nat :=
  if m = 0 then 0 else n / m  -- Returns 0 for division by zero

-- Structural recursion (automatically proven terminating)
def fact : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * fact n

-- Well-founded recursion with explicit termination proof
def gcd (a b : Nat) : Nat :=
  if h : b = 0 then
    a
  else
    have : a % b < b := Nat.mod_lt _ (Nat.pos_of_ne_zero h)
    gcd b (a % b)
termination_by b

-- Mutual recursion with termination
mutual
  def isEvenMut : Nat → Bool
    | 0 => true
    | n + 1 => isOddMut n

  def isOddMut : Nat → Bool
    | 0 => false
    | n + 1 => isEvenMut n
end

-- Using decreasing_by for custom termination proof
def ackermann : Nat → Nat → Nat
  | 0, n => n + 1
  | m + 1, 0 => ackermann m 1
  | m + 1, n + 1 => ackermann m (ackermann (m + 1) n)
termination_by m n => (m, n)

For non-structural recursion, you must provide a termination measure that decreases on each recursive call. The classic examples are the GCD algorithm (where the second argument decreases) and the Ackermann function (where the lexicographic pair decreases):

-- Termination proofs for recursive functions

-- Lean requires proof that recursion terminates
-- For structural recursion, it's automatic
def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

-- For non-structural recursion, provide a termination measure
def gcd (a b : Nat) : Nat :=
  if h : b = 0 then a
  else
    have : a % b < b := Nat.mod_lt a (Nat.pos_of_ne_zero h)
    gcd b (a % b)
termination_by b

-- The measure must decrease on each recursive call
def ackermann : Nat → Nat → Nat
  | 0, n => n + 1
  | m + 1, 0 => ackermann m 1
  | m + 1, n + 1 => ackermann m (ackermann (m + 1) n)
termination_by m n => (m, n)

Function API Reference

Lean’s standard library provides a rich collection of function combinators in the Function namespace. These combinators, familiar from functional programming, enable point-free style and function composition. The composition operator ∘ allows building complex functions from simpler ones, while combinators like const, flip, and id provide basic building blocks for function manipulation.

Function composition in Lean satisfies the expected mathematical properties: it is associative, and the identity function acts as a neutral element. These properties are not just theorems but computational facts. They hold definitionally, meaning Lean can verify them by pure computation without requiring proof steps.

/-!
### Function API Reference

Lean provides standard function combinators in the Function namespace.
-/

-- Function.comp: Function composition
def composed := Function.comp (· + 10) (· * 2)
example : composed 5 = 20 := rfl  -- (5 * 2) + 10 = 20

-- Using ∘ notation for composition
open Function in
def composed' := (· + 10) ∘ (· * 2)
example : composed' 5 = 20 := rfl

-- Function.const: Constant function
def alwaysFive := Function.const Nat 5
example : alwaysFive 100 = 5 := rfl
example : alwaysFive 999 = 5 := rfl

-- id: Identity function
example : id 42 = 42 := rfl
example : (id ∘ id) = (id : Nat → Nat) := rfl

-- flip: Swap arguments
def subtract (a b : Nat) : Int := a - b
def flippedSubtract := flip subtract
example : subtract 10 3 = 7 := rfl
example : flippedSubtract 3 10 = 7 := rfl

-- Function composition laws
open Function in
theorem comp_assoc {α β γ δ : Type} (f : γ → δ) (g : β → γ) (h : α → β) :
    (f ∘ g) ∘ h = f ∘ (g ∘ h) := rfl

open Function in
theorem id_comp {α β : Type} (f : α → β) :
    id ∘ f = f := rfl

open Function in
theorem comp_id {α β : Type} (f : α → β) :
    f ∘ id = f := rfl

Function Properties

Mathematical properties of functions (injectivity, surjectivity, and bijectivity) play crucial roles in both mathematics and computer science. An injective function maps distinct inputs to distinct outputs, a surjective function reaches every possible output, and a bijective function is both injective and surjective, establishing a one-to-one correspondence between domain and codomain.

These properties connect to the concept of inverses. A function has a left inverse if and only if it’s injective, a right inverse if and only if it’s surjective, and a two-sided inverse if and only if it’s bijective. Lean provides definitions and theorems for reasoning about these properties, enabling formal verification of mathematical and algorithmic correctness.

/-!
### Function Properties

Important mathematical properties of functions.
-/

-- Injective: Different inputs give different outputs
def isInjective {α β : Type} (f : α → β) : Prop :=
  ∀ x y, f x = f y → x = y

theorem double_injective : isInjective (fun n : Nat => 2 * n) := by
  intro x y h
  simp only [] at h
  -- If 2*x = 2*y, then x = y
  have : x * 2 = y * 2 := by rw [mul_comm x 2, mul_comm y 2]; exact h
  exact Nat.eq_of_mul_eq_mul_right (by norm_num : 0 < 2) this

-- Using Lean's built-in Function.Injective
example : Function.Injective (fun n : Nat => 2 * n) := by
  intro x y h
  simp only [] at h
  have : x * 2 = y * 2 := by rw [mul_comm x 2, mul_comm y 2]; exact h
  exact Nat.eq_of_mul_eq_mul_right (by norm_num : 0 < 2) this

-- Surjective: Every output is reached by some input
def isSurjective {α β : Type} (f : α → β) : Prop :=
  ∀ b, ∃ a, f a = b

-- Not surjective: doubling doesn't produce odd numbers
theorem double_not_surjective :
    ¬Function.Surjective (fun n : Nat => 2 * n) := by
  intro h
  obtain ⟨n, hn⟩ := h 1
  simp at hn
  -- 2*n is always even, but 1 is odd
  cases n with
  | zero => simp at hn
  | succ m => simp [Nat.mul_succ] at hn

-- Bijective: Both injective and surjective
def isBijective {α β : Type} (f : α → β) : Prop :=
  Function.Injective f ∧ Function.Surjective f

-- Left inverse: g ∘ f = id
theorem has_left_inverse {α β : Type} (f : α → β) (g : β → α) :
    Function.LeftInverse g f ↔ ∀ a, g (f a) = a := by
  rfl

-- Right inverse: f ∘ g = id
theorem has_right_inverse {α β : Type} (f : α → β) (g : β → α) :
    Function.RightInverse g f ↔ ∀ b, f (g b) = b := by
  rfl

-- Example: Successor and predecessor
def succInt : Int → Int := (· + 1)
def predInt : Int → Int := (· - 1)

theorem succ_pred_inverse :
    Function.LeftInverse predInt succInt ∧
    Function.RightInverse predInt succInt := by
  constructor <;> intro x <;> simp [succInt, predInt]

-- Inverse functions
structure IsInverse {α β : Type} (f : α → β) (g : β → α) : Prop where
  left : Function.LeftInverse g f
  right : Function.RightInverse g f

theorem inverse_bijective {α β : Type} (f : α → β) (g : β → α)
    (h : IsInverse f g) : isBijective f := by
  constructor
  · -- Injective
    intro x y hxy
    have : g (f x) = g (f y) := by rw [hxy]
    rw [h.left x, h.left y] at this
    exact this
  · -- Surjective
    intro b
    use g b
    exact h.right b

Implicit and Auto Parameters

While not part of core type theory, Lean’s function types include indications of whether parameters are implicit. Implicit and explicit function types are definitionally equal. Implicit parameters are inferred from context, strict implicit parameters must be inferrable at the application site, and auto parameters are filled by type class resolution.

-- Implicit parameters (inferred from usage)
def implicitId {α : Type} (x : α) : α := x

-- Strict implicit (must be inferrable at application site)
def strictImplicit ⦃α : Type⦄ (x : α) : α := x

-- Auto parameters (filled by type class resolution)
def autoParam {α : Type} [Inhabited α] : α := default

-- Optional parameters with default values
def withDefault (n : Nat := 10) : Nat := n * 2

example : implicitId 5 = 5 := rfl
example : withDefault = 20 := rfl
example : withDefault 3 = 6 := rfl

Propositions

Propositions (Prop) are types representing logical statements. They feature proof irrelevance: any two proofs of the same proposition are definitionally equal. This means the specific proof does not matter, only that one exists. We covered this in the Type Theory article.

The Curry-Howard Correspondence Revisited

The Curry-Howard correspondence we encountered in earlier articles now reveals its full depth. With dependent types, the correspondence extends beyond simple propositional logic. Universal quantification becomes dependent function types. Existential quantification becomes dependent pair types (sigma types). The slogan “propositions are types, proofs are programs” turns out to be a precise mathematical equivalence.

Logic	Type Theory	Lean Syntax
$\forall x : \alpha, P(x)$	Dependent function $\Pi (x : \alpha), P(x)$	`∀ x : α, P x` or `(x : α) → P x`
$\exists x : \alpha, P(x)$	Dependent pair $\Sigma (x : \alpha), P(x)$	`∃ x : α, P x` or `Σ x : α, P x`
Induction principle	Recursor	`Nat.rec`, `List.rec`, etc.
Proof by cases	Pattern matching	`match ... with`

The dependent versions unify what simpler type systems treat separately. A proof of “for all natural numbers n, P(n) holds” is literally a function that takes any n : Nat and returns a proof of P n. A proof of “there exists a natural number n such that P(n)” is literally a pair: the witness n together with a proof of P n. This unification is not philosophical hand-waving; it is the operational semantics of Lean.

-- Basic logical connectives
theorem and_intro (P Q : Prop) (hp : P) (hq : Q) : P ∧ Q := ⟨hp, hq⟩

theorem or_elim (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R :=
  h.elim hp hq

theorem iff_intro (P Q : Prop) (hpq : P → Q) (hqp : Q → P) : P ↔ Q :=
  ⟨hpq, hqp⟩

-- Proof irrelevance demonstration
theorem proof_irrel_demo (P : Prop) (p1 p2 : P) : p1 = p2 := rfl

-- Classical logic (via choice)
open Classical in
theorem excluded_middle (P : Prop) : P ∨ ¬P := Classical.em P

Key Properties of Propositions

Run-time irrelevance: Propositions are erased during compilation
Impredicativity: Propositions can quantify over types from any universe
Restricted elimination: With limited exceptions (subsingletons), propositions cannot eliminate to non-proposition types
Extensionality: The propext axiom enables proving logically equivalent propositions are equal

Decidability

Decidable propositions bridge logic and computation, allowing propositions to be computed. A proposition $P$ is decidable when we can algorithmically determine $P \lor \neg P$:

$$\text{Decidable}(P) \triangleq P \lor \neg P$$

This connects to constructive mathematics where decidability provides computational content:

-- Custom decidable instance
instance decidableEven (n : Nat) : Decidable (n % 2 = 0) :=
  if h : n % 2 = 0 then isTrue h else isFalse h

-- Using decidability for computation
def isEven (n : Nat) : Bool := decide (n % 2 = 0)

example : isEven 4 = true := rfl
example : isEven 5 = false := rfl

-- Subsingletons: types with at most one element
theorem subsingleton_prop (P : Prop) : Subsingleton P :=
  ⟨fun _ _ => rfl⟩

Universes

Lean organizes types into a hierarchy of universes to prevent paradoxes, as we discussed in the Type Theory article. Every type belongs to exactly one universe level. The Sort operator constructs universes:

Tip

Universe levels rarely matter in practice. Lean infers them automatically in most cases, and you can write thousands of lines of code without thinking about them. They become relevant when you are doing highly polymorphic library design or when the compiler complains about universe inconsistencies. If that happens, the error messages will guide you. Until then, treat them as plumbing that handles itself.

$\text{Sort } 0 = \text{Prop}$ (propositions)
$\text{Sort } (u + 1) = \text{Type } u$ (data types)
Every universe is an element of the next larger universe: $\text{Sort } u : \text{Sort } (u + 1)$

The universe of a Pi type $\Pi (x : \alpha), \beta$ depends on the universes of $\alpha$ and $\beta$:

If $\beta : \text{Prop}$, then $\Pi (x : \alpha), \beta : \text{Prop}$ (impredicative)
If $\beta : \text{Sort } v$, then $\Pi (x : \alpha), \beta : \text{Sort } (\max(u, v))$ where $\alpha : \text{Sort } u$

universe u v w

-- Universe polymorphism (explicit universe level)
def polyIdentity (α : Sort u) (a : α) : α := a

-- Universe level expressions
def maxLevel (α : Type u) (β : Type v) : Type (max u v) := α × β

-- Type 0 contains types, Type 1 contains Type 0, etc.
example : Type 0 = Sort 1 := rfl
example : Prop = Sort 0 := rfl

-- Impredicative Prop: functions into Prop stay in Prop
def propPredicate (P : Type u → Prop) : Prop := ∀ α, P α

-- Predicative Type: function types take maximum universe
def typePredicate (P : Type u → Type v) : Type (max (u+1) v) :=
  ∀ α, P α

Universe Polymorphism and Lifting

Definitions can take universe parameters, and universe levels support expressions with addition and maximum operations. Lean’s universes are non-cumulative: a type in Type u is not automatically in Type (u + 1).

-- PLift: lifts any type by exactly one level
def liftedFalse : Type := PLift False

-- ULift: lifts types by any amount
def liftedNat : Type u := ULift.{u} Nat

-- Lifting and unlifting values
def liftExample : ULift.{1} Nat := ⟨42⟩
example : liftExample.down = 42 := rfl

-- Non-cumulativity: types exist at exactly one level
def needsLifting (α : Type 1) : Type 2 := ULift.{2} α

Universe polymorphism lets definitions work at any universe level. The identity function, for instance, should work on values, on types, and on types of types. By parameterizing over universe levels, a single definition serves all purposes:

-- Universe polymorphism

-- Types live in universes: Type 0, Type 1, Type 2, ...
-- Type : Type would lead to paradoxes (Russell, Girard)

-- Universe-polymorphic identity function
universe u
def id' {α : Type u} (x : α) : α := x

-- Works at any universe level
example : id' 42 = 42 := rfl           -- Type 0
example : id' Nat = Nat := rfl         -- Type 1
example : id' (Type 0) = Type 0 := rfl -- Type 2

Two operators bridge universe gaps:

PLift: Lifts any type (including propositions) by exactly one level
ULift: Lifts non-proposition types by any number of levels

Inductive Types

Inductive types are Lean’s primary mechanism for introducing new types. Every type is either inductive or built from universes, functions, and inductive types.

Warning

The recursor that Lean generates for each inductive type is the induction principle in computational form. If you find yourself writing a proof by induction and wondering where the induction hypothesis comes from, the answer is: the recursor. Understanding recursors deeply is optional for using Lean but essential for understanding why Lean works.

Each inductive type has:

A single type constructor (may take parameters)
Any number of constructors introducing new values
A derived recursor representing an induction principle

The general form of an inductive type declaration: $$\begin{aligned} &\textbf{inductive } C (\vec{\alpha} : \vec{U}) : \Pi (\vec{\beta} : \vec{V}), s \textbf{ where} \\ &\quad | , c_1 : \Pi (\vec{x_1} : \vec{T_1}), C , \vec{\alpha} , \vec{t_1} \\ &\quad | , c_2 : \Pi (\vec{x_2} : \vec{T_2}), C , \vec{\alpha} , \vec{t_2} \\ &\quad \vdots \end{aligned}$$

Where $\vec{\alpha}$ are parameters (fixed) and $\vec{\beta}$ are indices (can vary).

-- Basic inductive with multiple constructors
inductive Color : Type where
  | red | green | blue
  deriving Repr, DecidableEq

-- Parameterized inductive type
inductive Result (ε α : Type) : Type where
  | ok : α → Result ε α
  | error : ε → Result ε α

-- Anonymous constructor syntax for single-constructor types
structure Point where
  x : Float
  y : Float

def origin : Point := ⟨0.0, 0.0⟩

Indexed Families

The distinction between parameters and indices is fundamental. Parameters are fixed across the entire definition: if you declare inductive Foo (α : Type), then every constructor must produce a Foo α with that same α. Indices can vary: each constructor can target a different index value. In Vector α n, the type α is a parameter (all elements have the same type) but n is an index (constructors produce vectors of different lengths). The nil constructor produces Vector α 0. The cons constructor takes a Vector α n and produces Vector α (n + 1). The index changes; the parameter does not.

This distinction affects how Lean generates recursors and what pattern matching can learn. When you match on a Vector α n, Lean learns the specific value of the index n in each branch. Matching on nil tells you n = 0. Matching on cons tells you n = m + 1 for some m. This index refinement is what makes length-indexed vectors useful: the type system tracks information that flows from pattern matching.

For vectors (length-indexed lists), the signature is: $$\text{Vector} : \text{Type} \to \mathbb{N} \to \text{Type}$$

The recursor for indexed families captures the dependency: $$\text{Vector.rec} : \Pi (P : \Pi n, \text{Vector } \alpha , n \to \text{Sort } u), \ldots$$

-- Vector: length-indexed lists
inductive Vector (α : Type) : Nat → Type where
  | nil : Vector α 0
  | cons : ∀ {n}, α → Vector α n → Vector α (n + 1)

def vectorHead {α : Type} {n : Nat} : Vector α (n + 1) → α
  | Vector.cons a _ => a

-- Even/odd indexed list (from manual)
inductive EvenOddList (α : Type) : Bool → Type where
  | nil : EvenOddList α true
  | cons : ∀ {isEven}, α → EvenOddList α isEven → EvenOddList α (!isEven)

def exampleEvenList : EvenOddList Nat true :=
  EvenOddList.cons 1 (EvenOddList.cons 2 EvenOddList.nil)

The Fin n type represents natural numbers strictly less than n. It is perhaps the simplest useful indexed type: the index constrains which values can exist. A Fin 3 can only be 0, 1, or 2. Attempting to construct a Fin 3 with value 3 is a type error, not a runtime error.

-- Fin n: natural numbers less than n

-- Fin 3 has exactly three values: 0, 1, 2
example : Fin 3 := 0
example : Fin 3 := 1
example : Fin 3 := 2
-- example : Fin 3 := 3  -- Error: 3 is not less than 3

-- Fin carries a proof that the value is in bounds
def two : Fin 5 := ⟨2, by omega⟩

-- Safe array indexing using Fin
def safeIndex {α : Type} (arr : Array α) (i : Fin arr.size) : α :=
  arr[i]  -- No bounds check needed at runtime

Vectors generalize lists by tracking their length in the type. A Vec α n is a list of exactly n elements of type α. The head function can only be called on non-empty vectors because its type requires Vec α (n + 1). No runtime check needed; the type system enforces the precondition.

-- Vector: lists with length in the type

-- A vector is a list with its length tracked in the type
inductive Vec (α : Type) : Nat → Type where
  | nil : Vec α 0
  | cons {n : Nat} : α → Vec α n → Vec α (n + 1)

-- The length is known at compile time
def exampleVec : Vec Nat 3 := .cons 1 (.cons 2 (.cons 3 .nil))

-- Head is safe: can only call on non-empty vectors
def Vec.head {α : Type} {n : Nat} : Vec α (n + 1) → α
  | .cons x _ => x

-- Tail preserves the length relationship
def Vec.tail {α : Type} {n : Nat} : Vec α (n + 1) → Vec α n
  | .cons _ xs => xs

-- Map over a vector (preserves length)
def Vec.map {α β : Type} {n : Nat} (f : α → β) : Vec α n → Vec β n
  | .nil => .nil
  | .cons x xs => .cons (f x) (xs.map f)

Mutual and Nested Inductive Types

Multiple inductive types can be defined simultaneously when they reference each other. Nested inductive types are defined recursively through other type constructors.

mutual
  inductive Tree (α : Type) : Type where
    | leaf : α → Tree α
    | node : Forest α → Tree α

  inductive Forest (α : Type) : Type where
    | nil : Forest α
    | cons : Tree α → Forest α → Forest α
end

-- Nested inductive: recursive through other type constructors
inductive NestedTree (α : Type) : Type where
  | leaf : α → NestedTree α
  | node : List (NestedTree α) → NestedTree α

-- Recursors are automatically generated for pattern matching
def treeSize {α : Type} : Tree α → Nat
  | Tree.leaf _ => 1
  | Tree.node forest => forestSize forest
where forestSize : Forest α → Nat
  | Forest.nil => 0
  | Forest.cons t rest => treeSize t + forestSize rest

Structures

Structures are specialized single-constructor inductive types with no indices. They provide automatic projection functions, named-field syntax, update syntax, and inheritance:

structure Person where
  name : String
  age : Nat
  email : Option String := none
  deriving Repr

-- Inheritance
structure Student extends Person where
  studentId : Nat
  gpa : Float

-- Field access and update syntax
def alice : Person := { name := "Alice", age := 25 }
def olderAlice := { alice with age := 26 }

example : alice.name = "Alice" := rfl
example : olderAlice.age = 26 := rfl

Sigma Types and Subtypes

Sigma types (dependent pairs) package a value with data that depends on it. The notation Σ x : α, β x describes pairs where the second component’s type depends on the first component’s value. This is the dependent version of the product type α × β.

-- Sigma types: dependent pairs

-- A dependent pair where the second type depends on the first value
-- Σ (n : Nat), Fin n  means "a natural number n paired with a Fin n"
def dependentPair : Σ n : Nat, Fin n := ⟨5, 3⟩

-- The second component's type depends on the first
example : dependentPair.fst = 5 := rfl
example : dependentPair.snd < dependentPair.fst := by decide

-- Contrast with non-dependent product
def regularPair : Nat × Nat := (5, 3)  -- both components are Nat

Subtypes refine existing types with predicates. The type { x : α // P x } contains values of type α that satisfy predicate P. Each element bundles a value with a proof that it satisfies the constraint. This is how you express “positive integers” or “sorted lists” at the type level.

-- Subtypes: values with proofs

-- { x : Nat // x > 0 } is the type of positive naturals
def posNat : { n : Nat // n > 0 } := ⟨5, by omega⟩

-- Access the value and proof separately
example : posNat.val = 5 := rfl
example : posNat.property = (by omega : 5 > 0) := rfl

-- Functions can require positive inputs
def safeDivide (a : Nat) (b : { n : Nat // n > 0 }) : Nat :=
  a / b.val

Equality as a Type

The equality type a = b is itself a dependent type: it depends on the values a and b. The only constructor is rfl : a = a, which proves that any value equals itself. Proofs of equality can be used to substitute equal values, and equality satisfies the expected properties of symmetry and transitivity.

-- Equality as a dependent type

-- The equality type: a = b is a type that is inhabited iff a equals b
-- rfl : a = a is the only constructor
example : 2 + 2 = 4 := rfl

-- Proofs of equality can be used to substitute
theorem subst_example (a b : Nat) (h : a = b) (P : Nat → Prop) (pa : P a) : P b :=
  h ▸ pa  -- transport pa along h

-- Equality is symmetric and transitive
theorem eq_symm (a b : Nat) (h : a = b) : b = a := h.symm
theorem eq_trans (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := h1.trans h2

Quotient Types

Quotient types create new types by identifying elements via equivalence relations. Given a type $\alpha$ and an equivalence relation $\sim$ on $\alpha$, the quotient $\alpha/\sim$ is a type where $a = b$ in $\alpha/\sim$ whenever $a \sim b$. Elements related by the relation become equal in the quotient type. Equality is respected universally, and nothing in Lean’s logic can observe differences between equal terms.

Note

Mathematicians write $\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\!\sim$ and software engineers write type Int = Quotient (Nat × Nat) equiv. Same idea, different notation. The integer $-3$ is not any particular pair of naturals but the equivalence class of all pairs $(a, b)$ where $a + 3 = b$: so $(0, 3)$, $(1, 4)$, $(2, 5)$, and infinitely many others. Two fields, one concept, a century of mutual incomprehension that turns out to be largely notational.

For example, the integers can be constructed as $\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\sim$ where $(a,b) \sim (c,d)$ iff $a + d = b + c$.

-- Simple modulo equivalence relation
def ModRel (n : Nat) : Nat → Nat → Prop :=
  fun a b => a % n = b % n

-- Prove it's an equivalence relation
theorem ModRel.refl (n : Nat) : ∀ x, ModRel n x x :=
  fun _ => rfl

theorem ModRel_symm (n : Nat) : ∀ x y, ModRel n x y → ModRel n y x :=
  fun _ _ h => h.symm

theorem ModRel.trans (n : Nat) : ∀ x y z, ModRel n x y → ModRel n y z → ModRel n x z :=
  fun _ _ _ hxy hyz => Eq.trans hxy hyz

-- Create setoid instance
instance ModSetoid (n : Nat) : Setoid Nat where
  r := ModRel n
  iseqv := {
    refl := ModRel.refl n
    symm := @ModRel_symm n
    trans := @ModRel.trans n
  }

-- Define the quotient type (integers modulo n)
def ZMod (n : Nat) : Type := Quotient (ModSetoid n)

Working with Quotients

Operations on quotients must respect the equivalence relation. The Quotient.lift functions ensure operations are well-defined, while Quotient.sound asserts equality of related elements.

The quotient axioms provide:

Quotient.mk: $\alpha \to \alpha/{\sim}$ (constructor)
Quotient.lift: If $f : \alpha \to \beta$ respects $\sim$, then $f$ lifts to $\alpha/{\sim} \to \beta$
Quotient.sound: If $a \sim b$, then $[a] = [b]$ in $\alpha/{\sim}$
Quotient.exact: If $[a] = [b]$ in $\alpha/{\sim}$, then $a \sim b$

namespace ZMod

-- Constructor respecting equivalence
def mk (n : Nat) (a : Nat) : ZMod n :=
  Quotient.mk (ModSetoid n) a

-- Addition operation via lifting
def add {n : Nat} [NeZero n] : ZMod n → ZMod n → ZMod n :=
  Quotient.lift₂
    (fun a b => mk n ((a + b) % n))
    (fun a₁ a₂ b₁ b₂ h₁ h₂ => by
      apply Quotient.sound
      simp only [ModSetoid] at h₁ h₂ ⊢
      unfold ModRel at h₁ h₂ ⊢
      rw [Nat.add_mod, h₁, h₂, ← Nat.add_mod]
      rfl)

-- Quotient.sound: related elements are equal
theorem mk_eq_of_rel {n : Nat} (a b : Nat) (h : ModRel n a b) :
  mk n a = mk n b :=
  Quotient.sound h

-- Quotient induction principle
theorem ind_on {n : Nat} {P : ZMod n → Prop} (q : ZMod n)
  (h : ∀ a, P (mk n a)) : P q :=
  Quotient.ind h q

end ZMod

Combined Examples

The following examples combine dependent functions, indexed families, and proof terms. Each demonstrates how types can enforce invariants that would be invisible to simpler type systems:

-- Dependent pairs (Sigma types)
def DependentPair := Σ n : Nat, Fin n

def examplePair : DependentPair := ⟨3, 2⟩

-- Type families and dependent pattern matching
def typeFamily : Nat → Type
  | 0 => Unit
  | 1 => Bool
  | _ => Nat

def familyValue : (n : Nat) → typeFamily n
  | 0 => ()
  | 1 => true
  | n@(_ + 2) => n * 2

-- Well-founded recursion via termination_by
def factorial : Nat → Nat
  | 0 => 1
  | n + 1 => (n + 1) * factorial n

-- Custom well-founded recursion
def div2 : Nat → Nat
  | 0 => 0
  | 1 => 0
  | n + 2 => 1 + div2 n

example : factorial 5 = 120 := rfl
example : div2 10 = 5 := rfl

The machinery presented here forms the foundation of everything that follows. Dependent types are why Lean can serve simultaneously as a programming language and a proof assistant. When you write a type signature like Vector α n → Vector α (n + 1), you are making a mathematical claim that the compiler will verify. Specifications that the machine enforces, invariants that cannot be violated, programs that are correct by construction.

Type-Indexed State Machines

State machines appear everywhere in software: network protocols, UI workflows, resource management, authentication flows. The traditional approach represents state as a runtime value and scatters checks throughout the code. “Is the connection open? Is the user logged in? Has the transaction started?” Each check is a potential bug: forget one and you have undefined behavior, check the wrong condition and you have a security hole.

Type-indexed state machines take a different approach. Instead of tracking state at runtime and checking it manually, we encode state in the type itself. The type checker then verifies that operations happen in the correct order. Invalid sequences become type errors, caught at compile time rather than runtime.

Consider a vending machine. The naive implementation tracks balance as a runtime value, checking at each operation whether funds suffice. Bugs lurk: what if someone calls vend before inserting coins? What if returnChange is called twice? These are not type errors in conventional languages. They are runtime failures waiting to happen.

inductive Product where
  | HoneyComb
  | SalmonJerky
  | BerryMix
  | GrubBar
  | AcornCrunch
  deriving Repr, DecidableEq

def Product.price : Product → Nat
  | .HoneyComb => 150
  | .SalmonJerky => 200
  | .BerryMix => 100
  | .GrubBar => 125
  | .AcornCrunch => 175

def Product.name : Product → String
  | .HoneyComb => "Honey Comb"
  | .SalmonJerky => "Salmon Jerky"
  | .BerryMix => "Berry Mix"
  | .GrubBar => "Grub Bar"
  | .AcornCrunch => "Acorn Crunch"

The Machine type is indexed by cents inserted. This index exists only in the type system. At runtime, Machine 0 and Machine 200 are identical unit values with no data. The number is a phantom type parameter that the compiler tracks but that costs nothing at runtime.

structure Machine (cents : Nat) where mk ::

def insertCoin (coin : Nat) {n : Nat} (_m : Machine n) : Machine (n + coin) := ⟨⟩
def insertDollar {n : Nat} (m : Machine n) : Machine (n + 100) := insertCoin 100 m

def vend (p : Product) {n : Nat} (_m : Machine n) (_h : n ≥ p.price) :
    Product × Machine (n - p.price) := (p, ⟨⟩)

def returnChange {n : Nat} (_m : Machine n) : Nat × Machine 0 := (n, ⟨⟩)

def empty : Machine 0 := ⟨⟩

Study the type signatures carefully. insertCoin takes a Machine n and returns a Machine (n + coin). The balance increases by exactly the inserted amount. vend requires a proof $n \geq p.price$ and returns a Machine (n - p.price). You cannot call vend without providing this proof, and the compiler will reject any attempt to vend with insufficient funds. returnChange resets to Machine 0 regardless of the input balance, modeling the fact that all remaining money is returned.

The key insight is that each operation transforms the type index in a way that reflects its effect on the state. The compiler tracks these transformations and ensures they compose correctly. If you try to write code that vends without inserting money, the type checker will demand a proof of $0 \geq 100$ (or whatever the price is), which is unprovable because it is false.

def exampleTransaction : Product × Nat := Id.run do
  let m := empty
  let m := insertDollar m
  let m := insertDollar m
  let (snack, m) := vend .BerryMix m (by native_decide)
  let (change, _) := returnChange m
  return (snack, change)

The example shows a complete transaction. We start with an empty machine, insert two dollars (200 cents), vend a berry mix for 100 cents, and return the remaining 100 cents as change. At each step, the type system knows exactly how much money is in the machine. The by native_decide proof discharge works because $200 \geq 100$ is decidably true.

This pattern scales to real systems. A file handle can be indexed by whether it is open or closed: read requires Handle Open and returns Handle Open, while close takes Handle Open and returns Handle Closed. Calling read on a closed handle becomes a type error. A network socket can track connection state: you cannot send on an unconnected socket because the types forbid it.

Authentication systems benefit particularly. A session token can be indexed by authentication level: Session Guest, Session User, Session Admin. Functions that require admin privileges take Session Admin and the compiler ensures you cannot access admin functionality without proper authentication. Privilege escalation bugs become impossible because the type system enforces the security policy.

The tradeoff is complexity. Type-indexed state machines require careful API design and more sophisticated type signatures. The proof obligations can become burdensome for complex protocols. But for systems where correctness matters (financial transactions, security boundaries, safety-critical code), the guarantee that invalid states are unrepresentable is worth the investment.

Constraint Satisfaction: N-Queens

The N-Queens puzzle asks: place N queens on an $N \times N$ chessboard so that no two attack each other. Queens attack along rows, columns, and diagonals. The naive approach generates placements and filters invalid ones. The dependent type approach makes invalid placements unrepresentable.

A placement is a list of column positions, one per row. Two queens attack if they share a column or diagonal:

abbrev Placement := List Nat

def attacks (r1 c1 r2 c2 : Nat) : Bool :=
  c1 == c2 || (if r1 ≥ r2 then r1 - r2 else r2 - r1) == (if c1 ≥ c2 then c1 - c2 else c2 - c1)

def isSafeAux (newRow col : Nat) : Placement → Nat → Bool
  | [], _ => true
  | qc :: rest, row => !attacks newRow col row qc && isSafeAux newRow col rest (row + 1)

def isSafe (col : Nat) (p : Placement) : Bool := isSafeAux p.length col p 0

A valid placement has the right length, all columns in bounds, and no attacking pairs:

def Valid (n : Nat) (p : Placement) : Prop :=
  p.length = n ∧ p.all (· < n) ∧
  ∀ i j, i < j → j < p.length → !attacks i p[i]! j p[j]!

structure Board (n : Nat) where
  placement : Placement
  valid : Valid n placement

The Board n type bundles a placement with its validity proof. You cannot construct a Board 8 with queens that attack each other; the proof obligation cannot be discharged. The constraint is not checked at runtime but enforced at compile time.

theorem board_length {n : Nat} (b : Board n) : b.placement.length = n := b.valid.1

The theorem board_length extracts the length invariant from a valid board. The proof is trivial projection because the invariant is baked into the type. This is the dependent types payoff: properties that would require defensive runtime checks become facts the type system guarantees.

Most software is written fast, tested hopefully, and debugged frantically. Dependent types offer a different mode: slower to write, harder to learn, guarantees that survive contact with production. Whether the tradeoff makes sense depends on how much a bug costs. For most code, the answer is “not much.” For some code, the answer is “careers” or “lives.” Know which kind you are writing.

From Theory to Practice

You now understand the type-theoretic machinery. The next article turns to strategy: how to approach proofs systematically, read goal states, choose tactics, and develop the intuition for what technique applies where. Less “what does this mean” and more “how do I make this red squiggle go away.”

Proof Strategy

The previous articles taught you individual tactics. Now we learn how to think. A proof is not a random sequence of tactics that happens to work. It is a structured argument, and understanding that structure makes the difference between flailing and fluency. The gap between knowing the tactics and knowing how to prove things is the gap between knowing the rules of chess and knowing how to not lose immediately.

The Goal State

Every proof begins with a goal and ends with no goals. The goal state is your map. Learning to read it fluently is the most important skill in tactic-based proving.

case succ
n : Nat
ih : P n
⊢ P (n + 1)

This goal state tells you everything: you are in the succ case of an induction, you have a natural number n, you have an induction hypothesis ih stating that $P(n)$ holds, and you must prove $P(n + 1)$. The turnstile $\vdash$ separates what you have from what you need.

When a proof has multiple goals, they appear stacked. The first goal is your current focus. Tactics typically operate on the first goal, though combinators like all_goals and any_goals can target multiple goals simultaneously.

Goal State Evolution

Here is an induction proof showing how the goal state evolves at each step:

-- A typical induction proof showing goal state evolution
theorem double_sum (n : Nat) : 2 * n = n + n := by
  induction n with
  | zero =>
    -- Goal state: ⊢ 2 * 0 = 0 + 0
    rfl
  | succ n ih =>
    -- Goal state here:
    -- n : Nat
    -- ih : 2 * n = n + n
    -- ⊢ 2 * (n + 1) = (n + 1) + (n + 1)
    omega

Categories of Tactics

Tactics fall into natural categories based on what they do to the goal state. Understanding these categories helps you choose the right tool.

Introduction tactics move structure from the goal into the context. When your goal is $P \to Q$, the tactic intro h assumes $P$ (calling it h) and changes the goal to $Q$. When your goal is $\forall x, P(x)$, the tactic intro x introduces a fresh $x$ and changes the goal to $P(x)$. Introduction tactics make progress by assuming what you need to prove under.

-- Introduction tactics: moving from goal to context

-- intro moves implications into the context
theorem intro_example (P Q : Prop) : P → Q → P := by
  intro hp  -- assumes P, calling it hp
  intro _   -- assumes Q (unused)
  exact hp  -- goal is now P, which we have

-- intro works with forall too
theorem forall_intro (P : Nat → Prop) (h : ∀ n, P n) : ∀ m, P m := by
  intro m   -- introduces arbitrary m
  exact h m -- apply universal hypothesis

Elimination tactics use structure from the context to transform the goal. When you have h : P ∧ Q and need $P$, the tactic exact h.1 extracts the left component. When you have h : P ∨ Q, the tactic cases h splits into two goals, one assuming $P$ and one assuming $Q$. Elimination tactics make progress by using what you have.

-- Elimination tactics: using hypotheses to transform goals

-- cases eliminates disjunctions
theorem cases_example (P Q R : Prop) (h : P ∨ Q) (hp : P → R) (hq : Q → R) : R := by
  cases h with
  | inl p => exact hp p  -- left case: we have P
  | inr q => exact hq q  -- right case: we have Q

-- And.left/right eliminate conjunctions
theorem and_elim (P Q : Prop) (h : P ∧ Q) : P := h.1

-- exists elimination via obtain
theorem exists_elim (P : Nat → Prop) (h : ∃ n, P n ∧ n > 0) : ∃ m, P m := by
  obtain ⟨n, hn, _⟩ := h
  exact ⟨n, hn⟩

Rewriting tactics transform the goal using equalities. The tactic rw [h] replaces occurrences of the left side of h with the right side. The tactic simp applies many such rewrites automatically. Rewriting makes progress by simplifying toward something obviously true.

-- Rewriting tactics: transforming goals with equalities

theorem rewrite_example (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  rw [h1]      -- goal becomes b = c
  rw [h2]      -- goal becomes c = c
  -- or: rw [h1, h2] in one step

theorem rewrite_reverse (a b : Nat) (h : a = b) : b = a := by
  rw [← h]     -- rewrite right-to-left using ←

theorem simp_example (xs : List Nat) : (xs ++ []).length = xs.length := by
  simp         -- applies simp lemmas automatically

Automation tactics search for proofs. The tactic simp tries simplification lemmas. The tactic omega solves linear arithmetic. The tactic aesop performs general proof search. Automation makes progress by doing work you would rather not do by hand.

Structural tactics manipulate the proof state without making logical progress. The tactic swap reorders goals. The tactic rename changes hypothesis names. The tactic clear removes unused hypotheses. These tactics keep your proof organized.

Reading the Goal

Before applying any tactic, ask: what is the shape of my goal? The outermost connective determines your next move.

Goals that require building structure call for introduction tactics. If your goal is an implication $P \to Q$, use intro to assume $P$ and reduce the goal to $Q$. Universal statements $\forall x, P(x)$ work the same way: intro x gives you an arbitrary $x$ and asks you to prove $P(x)$. For conjunctions $P \land Q$, use constructor to split into two subgoals. For disjunctions $P \lor Q$, you must commit: left obligates you to prove $P$, while right obligates you to prove $Q$. Existentials $\exists x, P(x)$ require a witness: use t provides the term $t$ and leaves you to prove $P(t)$.

Goals that are equations or basic facts call for different tactics. For equality $a = b$, try rfl if the terms are definitionally equal, simp for simplification, rw with known equalities, or ring for algebraic identities. Negation $\neg P$ is secretly an implication: since $\neg P$ means $P \to \bot$, you use intro h to assume $P$ and then derive a contradiction. If your goal is $\bot$ itself, you need to find conflicting hypotheses.

Reading the Context

Your context contains hypotheses. Each one is a tool waiting to be used. The shape of a hypothesis determines what you can do with it.

Hypotheses that provide conditional information let you make progress when you can satisfy their conditions. An implication $h : P \to Q$ gives you $Q$ if you can prove $P$. When your goal is $Q$, use apply h to reduce it to proving $P$. A universal $h : \forall x, P(x)$ can be instantiated at any term: specialize h t replaces $h$ with $P(t)$, or have ht := h t keeps the original.

Hypotheses that package multiple facts can be taken apart. A conjunction $h : P \land Q$ gives you both pieces: access them with h.1 and h.2, or destructure with obtain ⟨hp, hq⟩ := h. An existential $h : \exists x, P(x)$ packages a witness and a proof: obtain ⟨x, hx⟩ := h extracts both. A disjunction $h : P \lor Q$ requires case analysis since you do not know which side holds: cases h splits your proof into two branches.

An equality $h : a = b$ lets you substitute. Use rw [h] to replace $a$ with $b$ in your goal, or rw [← h] to go the other direction.

Proof Patterns

Certain proof structures recur constantly. Recognizing them saves time.

Direct proof: Introduce assumptions, manipulate, conclude. Most proofs follow this pattern.

-- Direct proof: apply implication and provide hypothesis
theorem direct (P Q : Prop) (h : P → Q) (hp : P) : Q := by
  apply h
  exact hp

Proof by cases: When you have a disjunction or an inductive type, split and prove each case.

-- Case analysis: split into cases and handle each
theorem by_cases_template (n : Nat) : n = 0 ∨ n ≥ 1 := by
  cases n with
  | zero => left; rfl
  | succ m => right; simp

Proof by induction: For properties of recursive types, prove the base case and the inductive step.

-- Induction: prove base case, then prove successor using IH
theorem by_induction (n : Nat) : 0 + n = n := by
  induction n with
  | zero => rfl
  | succ n ih => simp [ih]

Proof by contradiction: Assume the negation and derive $\bot$.

-- Contradiction: assume negation, derive False
theorem by_contradiction (P : Prop) (h : ¬¬P) : P := by
  by_contra hnp
  exact h hnp

Proof by contraposition: To prove $P \to Q$, prove $\neg Q \to \neg P$ instead.

-- Contraposition: prove ¬Q → ¬P instead of P → Q
theorem by_contraposition (P Q : Prop) (h : ¬Q → ¬P) : P → Q := by
  intro hp
  by_contra hnq
  exact h hnq hp

Backward and Forward Reasoning

Backward reasoning works from goal toward hypotheses. Forward reasoning builds from hypotheses toward the goal:

-- Backward reasoning: working from goal toward hypotheses

-- apply works backward from the goal
theorem backward_example (P Q R : Prop) (hpq : P → Q) (hqr : Q → R) (hp : P) : R := by
  apply hqr   -- goal becomes Q (we need to prove what hqr needs)
  apply hpq   -- goal becomes P
  exact hp    -- we have P

-- have introduces intermediate lemmas
theorem have_example (a b : Nat) (ha : a > 5) (hb : b < 10) : a + b > 5 := by
  have h1 : a ≥ 6 := ha
  have h2 : b ≥ 0 := Nat.zero_le b
  omega

-- Forward reasoning: building from hypotheses toward goal

-- calc chains equational reasoning
theorem calc_example (a b c : Nat) (h1 : a = b + 1) (h2 : b = c + 1) : a = c + 2 := by
  calc a = b + 1 := h1
       _ = (c + 1) + 1 := by rw [h2]
       _ = c + 2 := by ring

-- obtain destructs existentials and conjunctions
theorem obtain_example (h : ∃ n : Nat, n > 0 ∧ n < 10) : ∃ m, m < 10 := by
  obtain ⟨n, _, hlt⟩ := h
  exact ⟨n, hlt⟩

Induction Patterns

Induction is the workhorse for recursive types:

-- Induction patterns

-- Simple structural induction on Nat
theorem nat_induction (P : Nat → Prop) (base : P 0) (step : ∀ n, P n → P (n + 1))
    : ∀ n, P n := by
  intro n
  induction n with
  | zero => exact base
  | succ n ih => exact step n ih

-- Strong induction when you need smaller cases
theorem strong_induction (n : Nat) (h : n > 0) : n ≥ 1 := by
  omega  -- trivial here, but pattern matters

-- Induction on lists
theorem list_induction (xs : List Nat) : xs.reverse.reverse = xs := by
  induction xs with
  | nil => rfl
  | cons x xs ih => simp [ih]

Case Splitting

When the path forward depends on which case holds:

-- Case splitting strategies

-- split_ifs handles if-then-else
def abs (n : Int) : Int := if n < 0 then -n else n

theorem abs_nonneg (n : Int) : abs n ≥ 0 := by
  unfold abs
  split_ifs with h
  · omega  -- case: n < 0, so -n ≥ 0
  · omega  -- case: n ≥ 0

-- by_cases for arbitrary propositions
theorem by_cases_example (P : Prop) [Decidable P] : P ∨ ¬P := by
  by_cases h : P
  · left; exact h
  · right; exact h

-- interval_cases for bounded natural numbers
theorem small_cases (n : Nat) (h : n < 3) : n = 0 ∨ n = 1 ∨ n = 2 := by
  omega

Proof by Contradiction

When direct proof fails, assume the negation and derive absurdity:

-- Proof by contradiction

theorem contradiction_example (P : Prop) (h : P) (hn : ¬P) : False := by
  exact hn h  -- ¬P is P → False

theorem by_contra_example (n : Nat) (h : ¬(n = 0)) : n > 0 := by
  by_contra h'
  push_neg at h'
  omega

Choosing Automation

Different automation tactics excel at different domains:

-- Choosing the right automation

-- omega for linear arithmetic
theorem omega_example (a b : Nat) (h : a + b = 10) (h2 : a ≤ 3) : b ≥ 7 := by
  omega

-- ring for polynomial identities
theorem ring_example (x y : Int) : (x + y)^2 = x^2 + 2*x*y + y^2 := by
  ring

-- simp for definitional simplification
theorem simp_list (xs ys : List Nat) : (xs ++ ys).length = xs.length + ys.length := by
  simp

-- aesop for general proof search
theorem aesop_example (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
  aesop

When You Get Stuck

Every proof hits obstacles. Here is how to get unstuck.

Simplify first. Try simp or simp only [relevant_lemmas]. Often the goal simplifies to something obvious.

Check your hypotheses. Do you have what you need? Use have to derive intermediate facts. Use obtain to destructure complex hypotheses.

Try automation. For arithmetic, try omega or linarith. For algebraic identities, try ring or field_simp. For general goals, try aesop or decide.

Work backwards. What would make your goal obviously true? If you need $P \land Q$, you need to prove both $P$ and $Q$. What tactics produce those subgoals?

Work forwards. What can you derive from your hypotheses? If you have $h : P \to Q$ and hp : P, you can derive $Q$.

Split the problem. Use have to state and prove intermediate lemmas. Breaking a proof into steps often reveals the path.

Read the error. Lean’s error messages are verbose but precise. “Type mismatch” tells you what was expected and what you provided. “Unknown identifier” means a name is not in scope. “Unsolved goals” means you are not done.

Use the library. Mathlib contains thousands of lemmas. Use exact? to search for lemmas that close your goal. Use apply? to search for lemmas whose conclusion matches your goal.

Tactic Composition

Tactics compose in several ways.

Sequencing: Separate tactics with newlines or semicolons. Each tactic operates on the result of the previous one.

theorem seq_demo (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
  constructor
  exact h.2
  exact h.1

Focusing: Use · to focus on a single goal. Indentation groups tactics under a focus.

theorem focus_demo (P Q : Prop) (h : P ∧ Q) : Q ∧ P := by
  constructor
  · exact h.2
  · exact h.1

Combinators: Use <;> to apply a tactic to all goals produced by the previous tactic. Use first | t1 | t2 to try tactics in order. Use repeat t to apply a tactic until it fails.

theorem combinator_demo (P : Prop) (h : P) : P ∧ P ∧ P := by
  constructor <;> (try constructor) <;> exact h

Next-Generation Automation

The tactics described so far require you to think. You read the goal, choose a strategy, apply tactics step by step. This is how mathematicians have always worked, and there is value in understanding your proof at every stage. But a new generation of tactics is changing the calculus of what is worth formalizing.

Higher-order tactics like aesop, grind, and SMT integration lift proof development from low-level term manipulation to structured, automated search over rich proof states. Instead of specifying every proof step, you specify goals, rule sets, or search parameters, and these tactics synthesize proof terms that Lean’s kernel then checks. The soundness guarantee remains absolute since the kernel verifies everything, but the human cost drops dramatically. This decoupling of “what should be proved” from “how to construct the term” is what makes large-scale formalization feasible.

aesop implements white-box best-first proof search, exploring a tree of proof states guided by user-configurable rules. Unlike black-box automation, aesop lets you understand and tune the search: rules are indexed via discrimination trees for rapid retrieval, and you can register domain-specific lemmas to teach it new tricks. grind draws inspiration from modern SMT solvers, maintaining a shared workspace where congruence closure, E-matching, and forward chaining cooperate on a goal. It excels when many interacting equalities and logical facts are present, automatically deriving consequences that would be tedious to script by hand. For goals requiring industrial-strength decision procedures, SMT tactics send suitable fragments to proof-producing solvers like cvc5, then reconstruct proofs inside Lean so the kernel can verify them. This lets Lean leverage decades of solver engineering while preserving the LCF-style trust model where only the small kernel must be trusted.

The strategic question is when to reach for automation versus working by hand. The temptation is to try grind on everything and move on when it works. This is efficient but opaque: you learn nothing, and when automation fails on a similar goal later, you have no insight into why. A better approach is to use automation to explore, then understand what it found. Goals that would take an hour of tedious case analysis now take seconds. This frees you to tackle harder problems. But remember: when grind closes a goal, it has found a valid proof term. It has not gained insight. That remains your job.

The Tactics Reference

The following article is a reference. It documents every major tactic in Lean 4 and Mathlib, organized alphabetically. You do not need to memorize it. You need to know it exists, and you need to know how to find the tactic you need.

When you encounter a goal you do not know how to prove, return here. Ask: what is the shape of my goal? What is in my context? What pattern does this proof follow? The answer will point you to the right tactic, and the reference will tell you how to use it.

The strategies in this article apply beyond Lean. The structure of mathematical argument is universal. Direct proof, case analysis, induction, contradiction: these are the fundamental patterns of reason itself. Learning them in a proof assistant merely makes them explicit. You cannot handwave past a case you forgot to consider when the computer is watching.

Tactics Reference

Robin Milner’s LCF system introduced a radical idea in the 1970s: let users extend the theorem prover with custom proof procedures, but channel all proof construction through a small trusted kernel. You could write arbitrarily clever automation, and if it produced a proof, that proof was guaranteed valid. Tactics are this idea fully realized. They are programs that build proofs, metaprograms that manipulate the proof state, search procedures that explore the space of possible arguments. When you write simp and Lean simplifies your goal through dozens of rewrite steps, you are invoking a sophisticated algorithm. When you write omega and Lean discharges a linear arithmetic obligation, you are running a decision procedure. The proof terms these tactics construct may be enormous, but they are checked by the kernel, and the kernel is small enough to trust. Think of tactics as the code you write, and the kernel as the one colleague who actually reads your pull requests.

The following covers all the major tactics in Lean 4 and Mathlib. Click on any tactic name to jump to its documentation and examples.

abel - Prove equalities in abelian groups
aesop - General automation tactic
all_goals - Apply tactic to all current goals
any_goals - Apply tactic to any applicable goal
apply - Apply hypotheses or lemmas to solve goals
apply_fun - Apply function to both sides of equality
assumption - Use hypothesis matching goal
bound - Prove inequalities from structure
by_cases - Perform case splitting
by_contra - Proof by contradiction
calc - Chain equations and inequalities
cases - Case analysis on inductive types
choose - Extract choice function from forall-exists
congr - Prove equality using congruence rules
constructor - Break down conjunctions, existentials, and iff
contradiction - Find contradictions in hypotheses
conv - Targeted rewriting in specific parts
convert - Prove by showing goal equals type of expression
decide - Run decision procedures
exact - Provide an exact proof term
exfalso - Prove anything from False
ext - Prove equality of functions extensionally
field_simp - Simplify field expressions
fin_cases - Split finite type into cases
first - Try tactics until one succeeds
focus - Limit tactics to first goal
gcongr - Prove inequalities using congruence
generalize - Replace expressions with variables
grind - Proof search using congruence closure
group - Prove equalities in groups
have - Introduce new hypotheses
hint - Get tactic suggestions
induction - Perform inductive proofs
interval_cases - Split bounded values into cases
intro - Introduce assumptions from implications and quantifiers
left - Choose left side of disjunction
lift - Lift variable to higher type
linarith - Prove linear inequalities
linear_combination - Prove from linear combinations
module - Prove equalities in modules
nlinarith - Handle nonlinear inequalities
noncomm_ring - Prove in non-commutative rings
norm_cast - Simplify by moving casts outward
norm_num - Simplify numerical expressions
nth_rw - Rewrite only the nth occurrence
obtain - Destructure existentials and structures
omega - Solve linear arithmetic over Nat and Int
pick_goal - Move specific goal to front
positivity - Prove positivity goals
push_cast - Push casts inward
push_neg - Push negations inward
qify - Shift to rationals
refine - Apply with holes to fill later
rename - Rename hypotheses for clarity
repeat - Apply tactic repeatedly until fails
revert - Move hypotheses back to the goal
rfl - Prove by reflexivity
right - Choose right side of disjunction
ring - Prove equalities in commutative rings
rw - Rewrite using equalities
simp - Apply simplification lemmas
simp_all - Simplify everything including hypotheses
simp_rw - Rewrite with simplification at each step
sorry - Admit goal without proof
specialize - Instantiate hypothesis with specific arguments
split - Handle if-then-else and pattern matching
split_ifs - Case on if-then-else expressions
subst - Substitute variable with its value
swap - Swap first two goals
symm - Swap symmetric relations
tauto - Prove logical tautologies
trans - Split transitive relations
trivial - Prove simple goals automatically
try - Attempt tactic, continue if fails
use - Provide witnesses for existential goals
zify - Shift natural numbers to integers

Logical Connectives

intro

The intro tactic moves hypotheses from the goal into the local context. When your goal is ∀ x, P x or P → Q, using intro names the bound variable or assumption and makes it available for use in the proof.

theorem intro_apply : ∀ x : Nat, x = x → x + 0 = x := by
  intro x h  -- Introduce x and hypothesis h : x = x
  simp       -- Simplify x + 0 = x

constructor

The constructor tactic applies the first constructor of an inductive type to the goal. For And (conjunction), it splits the goal into two subgoals. For Exists, it expects you to provide a witness. For Iff, it creates subgoals for both directions.

theorem constructor_example : True ∧ True := by
  constructor
  · trivial  -- Prove first True
  · trivial  -- Prove second True

left and right

The left and right tactics select which side of a disjunction to prove. When your goal is P ∨ Q, use left to commit to proving P or right to prove Q.

left tactic transformation — *The left tactic commits to proving the left side of a disjunction.*

theorem or_example : 5 < 10 ∨ 10 < 5 := by
  left
  simp

use

The use tactic provides a concrete witness for an existential goal. When your goal is ∃ x, P x, using use t substitutes t for x and leaves you to prove P t.

use tactic transformation — *The use tactic provides a witness for an existential goal.*

theorem use_example : ∃ x : Nat, x * 2 = 10 := by
  exact ⟨5, rfl⟩

obtain

The obtain tactic extracts components from existential statements and structures in hypotheses. It combines have and pattern matching, letting you name both the witness and the proof simultaneously.

theorem have_example (x : Nat) : x + 0 = x := by
  have h : x + 0 = x := by simp
  exact h

theorem obtain_example (h : ∃ x : Nat, x > 5 ∧ x < 10) : ∃ y, y = 7 := by
  obtain ⟨_x, _hgt, _hlt⟩ := h  -- Destructure the existential
  exact ⟨7, rfl⟩

Applying Lemmas

exact

The exact tactic closes a goal by providing a term whose type matches the goal exactly. It performs no additional unification or elaboration beyond what is necessary.

theorem exact_example (h : 2 + 2 = 4) : 4 = 2 + 2 := by
  exact h.symm

/-- `refine` allows holes with ?_ -/
theorem refine_example : ∃ x : Nat, x > 5 := by
  refine ⟨10, ?_⟩  -- Use 10, but leave proof as a hole
  simp              -- Fill the hole: prove 10 > 5

apply

The apply tactic works backwards from the goal. Given a lemma h : A → B and a goal B, using apply h reduces the goal to proving A. It unifies the conclusion of the lemma with the current goal.

refine

The refine tactic is like exact but allows placeholders written as ?_ that become new goals. This lets you partially specify a proof term while deferring some parts.

theorem exact_example (h : 2 + 2 = 4) : 4 = 2 + 2 := by
  exact h.symm

/-- `refine` allows holes with ?_ -/
theorem refine_example : ∃ x : Nat, x > 5 := by
  refine ⟨10, ?_⟩  -- Use 10, but leave proof as a hole
  simp              -- Fill the hole: prove 10 > 5

convert

The convert tactic applies a term to the goal even when the types do not match exactly, generating side goals for the mismatches. It is useful when you have a lemma that is almost but not quite what you need.

theorem convert_example (x y : Nat) (h : x = y) : Nat.succ x = Nat.succ y := by
  convert rfl using 1
  rw [h]

specialize

The specialize tactic instantiates a universally quantified hypothesis with concrete values, replacing the general statement with a specific instance in your context.

theorem specialize_example (h : ∀ x : Nat, x > 0 → x ≥ 1) : 5 ≥ 1 := by
  specialize h 5 (by simp)
  exact h

Context Manipulation

have

The have tactic introduces a new hypothesis into the context. You state what you want to prove as an intermediate step, prove it, and then it becomes available for the rest of the proof.

theorem have_example (x : Nat) : x + 0 = x := by
  have h : x + 0 = x := by simp
  exact h

theorem obtain_example (h : ∃ x : Nat, x > 5 ∧ x < 10) : ∃ y, y = 7 := by
  obtain ⟨_x, _hgt, _hlt⟩ := h  -- Destructure the existential
  exact ⟨7, rfl⟩

rename

The rename tactic changes the name of a hypothesis in the local context, making proofs more readable when auto-generated names are unclear.

theorem rename_example (h : 1 = 1) : 1 = 1 := by
  exact h

revert

The revert tactic is the inverse of intro. It moves a hypothesis from the context back into the goal as an implication or universal quantifier, which is useful before applying induction or certain lemmas.

theorem revert_example (x : Nat) (h : x = 5) : x = 5 := by
  revert h x
  intro x h
  exact h

generalize

The generalize tactic replaces a specific expression in the goal with a fresh variable, abstracting over that value. This is useful when you need to perform induction on a compound expression.

theorem generalize_example : (2 + 3) * 4 = 20 := by
  simp

Rewriting and Simplifying

rw (rewrite)

The rw tactic replaces occurrences of the left-hand side of an equality with the right-hand side. Use rw [←h] to rewrite in the reverse direction. Multiple rewrites can be chained in a single rw [h1, h2, h3].

rw tactic transformation — *The rw tactic substitutes using an equality hypothesis.*

theorem rw_example (a b : Nat) (h : a = b) : a + 2 = b + 2 := by
  rw [h]  -- Rewrite a to b using h

theorem simp_example (x : Nat) : x + 0 = x ∧ 0 + x = x := by
  simp  -- Simplifies both sides

Tip

rw rewrites the first occurrence it finds. Use rw [h] at hyp to rewrite in a hypothesis instead of the goal. If rewriting fails due to dependent types or metavariables, try simp_rw which handles these cases more gracefully. Use nth_rw n [h] to target a specific occurrence.

simp

The simp tactic repeatedly applies lemmas marked with @[simp] to simplify the goal. It handles common algebraic identities, list operations, and logical simplifications automatically.

theorem rw_example (a b : Nat) (h : a = b) : a + 2 = b + 2 := by
  rw [h]  -- Rewrite a to b using h

theorem simp_example (x : Nat) : x + 0 = x ∧ 0 + x = x := by
  simp  -- Simplifies both sides

Tip

Use simp only [lemma1, lemma2] for reproducible proofs. Bare simp can break when new simp lemmas are added to the library. Use simp? to see which lemmas were applied, then replace with simp only [...] for stability. In Mathlib code reviews, bare simp at non-terminal positions is discouraged.

`simp_all`

The simp_all tactic simplifies both the goal and all hypotheses simultaneously, using each simplified hypothesis to help simplify the others.

theorem simp_all_example (x : Nat) (h : x = 0) : x + x = 0 := by
  simp_all

`simp_rw`

The simp_rw tactic rewrites using the given lemmas but applies simplification at each step, which helps when rewrites would otherwise fail due to associativity or other issues.

theorem simp_rw_example (x y : Nat) : (x + y) + (y + x) = 2 * (x + y) := by
  simp_rw [Nat.add_comm y x]
  ring

`nth_rw`

The nth_rw tactic rewrites only a specific occurrence of a pattern, counting from 1. This gives precise control when an expression appears multiple times and you only want to change one instance.

theorem nth_rewrite_example (x : Nat) : x + x + x = 3 * x := by
  nth_rw 2 [← Nat.add_zero x]  -- Rewrite only the second occurrence of x
  simp
  ring

`norm_num`

The norm_num tactic evaluates and simplifies numeric expressions, proving goals like 2 + 2 = 4 or 7 < 10 by computation. It handles arithmetic in various number types.

theorem norm_num_example : 2 ^ 3 + 5 * 7 = 43 := by
  norm_num

`norm_cast`

The norm_cast tactic normalizes expressions involving type coercions by pushing casts outward and combining them, making goals about mixed numeric types easier to prove.

theorem norm_cast_example (n : Nat) : (n : Int) + 1 = ((n + 1) : Int) := by
  norm_cast

`push_cast`

The push_cast tactic pushes type coercions inward through operations, distributing a cast over addition, multiplication, and other operations.

set_option linter.unusedTactic false in
theorem push_cast_example (n m : Nat) : ((n + m) : Int) = (n : Int) + (m : Int) := by
  push_cast
  rfl

conv

The conv tactic enters a conversion mode that lets you navigate to specific subexpressions and rewrite only there. It is invaluable when rw affects the wrong occurrence or when you need surgical precision.

theorem conv_example (x y : Nat) : x + y = y + x := by
  conv =>
    lhs  -- Focus on left-hand side
    rw [Nat.add_comm]

Tip

Navigation commands in conv mode: lhs/rhs select sides of an equation, arg n selects the nth argument, ext introduces binders, and enter [1, 2] navigates by path. Use conv_lhs or conv_rhs as shortcuts when you only need to work on one side of an equation.

Reasoning with Relations

rfl (reflexivity)

The rfl tactic proves goals of the form a = a where both sides are definitionally equal. It works even when the equality is not syntactically obvious but follows from definitions.

rfl tactic transformation — *The rfl tactic closes reflexive equality goals.*

theorem rfl_example (x : Nat) : x = x := by
  rfl

symm

The symm tactic reverses a symmetric relation like equality. If your goal is a = b and you have h : b = a, using symm on h or the goal makes them match.

theorem symm_example (x y : Nat) (h : x = y) : y = x := by
  symm
  exact h

trans

The trans tactic splits a transitive goal like a = c into two subgoals a = b and b = c for a chosen intermediate value b. It works for any transitive relation.

theorem trans_example (a b c : Nat) (h1 : a ≤ b) (h2 : b ≤ c) : a ≤ c := by
  trans b
  · exact h1
  · exact h2

subst

The subst tactic eliminates a variable by substituting it everywhere with an equal expression. Given h : x = e, using subst h replaces all occurrences of x with e and removes x from the context.

theorem subst_example (x y : Nat) (h : x = 5) : x + y = 5 + y := by
  subst h
  rfl

ext (extensionality)

The ext tactic proves equality of functions, sets, or structures by showing they agree on all inputs or components. It introduces the necessary variables and reduces the goal to pointwise equality.

ext tactic transformation — *The ext tactic reduces function equality to pointwise equality.*

theorem ext_example (f g : Nat → Nat)
    (h : ∀ x, f x = g x) : f = g := by
  ext x
  exact h x

calc

The calc tactic provides a structured way to write chains of equalities or inequalities. Each step shows the current expression, the relation, and the justification, mirroring traditional mathematical proofs.

theorem calc_example (a b c : Nat)
    (h1 : a = b) (h2 : b = c) : a = c := by
  calc a = b := h1
       _ = c := h2

`apply_fun`

The apply_fun tactic applies a function to both sides of an equality hypothesis. It automatically generates a side goal requiring the function to be injective when needed.

theorem apply_fun_example (x y : Nat) (h : x = y) : x + 2 = y + 2 := by
  apply_fun (· + 2) at h
  exact h

congr

The congr tactic reduces an equality goal f a = f b to proving a = b, applying congruence recursively. It handles nested function applications by breaking them into component equalities.

theorem congr_example (f : Nat → Nat) (x y : Nat) (h : x = y) : f x = f y := by
  congr

gcongr

The gcongr tactic proves inequalities by applying monotonicity lemmas. It automatically finds and applies lemmas showing that operations preserve ordering, such as adding to both sides of an inequality.

theorem gcongr_example (x y a b : Nat) (h1 : x ≤ y) (h2 : a ≤ b) : x + a ≤ y + b := by
  gcongr

`linear_combination`

The linear_combination tactic proves an equality by showing it follows from a linear combination of given hypotheses. You specify the coefficients, and it verifies the algebra.

theorem linear_combination_example (x y : ℚ) (h1 : 2*x + y = 4) (h2 : x + 2*y = 5) :
    x + y = 3 := by
  linear_combination (h1 + h2) / 3

positivity

The positivity tactic proves goals asserting that an expression is positive, nonnegative, or nonzero. It analyzes the structure of the expression and applies appropriate lemmas automatically.

set_option linter.unusedVariables false in
theorem positivity_example (x : ℚ) (h : 0 < x) : 0 < x^2 + x := by
  positivity

bound

The bound tactic proves inequality goals by recursively analyzing expression structure and applying bounding lemmas. It is particularly effective for expressions built from well-behaved operations.

theorem bound_example (x y : ℕ) : x ≤ x + y := by
  bound

Reasoning Techniques

cases

The cases tactic performs case analysis on an inductive type, creating separate subgoals for each constructor. For a natural number, it splits into the zero case and the successor case.

theorem cases_example (n : Nat) : n = 0 ∨ n > 0 := by
  cases n with
  | zero => left; rfl
  | succ _m => right; exact Nat.succ_pos _

theorem induction_example (n : Nat) : n + 0 = n := by
  induction n with
  | zero => rfl
  | succ n _ih => rfl

induction

The induction tactic sets up a proof by induction on an inductive type. It creates a base case for each non-recursive constructor and an inductive case with an induction hypothesis for each recursive constructor.

theorem cases_example (n : Nat) : n = 0 ∨ n > 0 := by
  cases n with
  | zero => left; rfl
  | succ _m => right; exact Nat.succ_pos _

theorem induction_example (n : Nat) : n + 0 = n := by
  induction n with
  | zero => rfl
  | succ n _ih => rfl

Tip

Use induction n with | zero => ... | succ n ih => ... for structured case syntax. If your induction hypothesis is too weak, try revert on additional variables before inducting, or use induction n generalizing x y to strengthen the hypothesis. For mutual or nested induction, consider induction ... using with a custom recursor.

split

The split tactic splits goals involving if-then-else expressions or pattern matching into separate cases. It creates subgoals for each branch with the appropriate condition as a hypothesis.

def abs (x : Int) : Nat :=
  if x ≥ 0 then x.natAbs else (-x).natAbs

theorem split_example (x : Int) : abs x ≥ 0 := by
  unfold abs
  split <;> simp

`split_ifs`

The split_ifs tactic finds all if-then-else expressions in the goal and splits on their conditions, creating cases for each combination of true and false branches.

theorem split_ifs_example (p : Prop) [Decidable p] (x y : Nat) :
    (if p then x else y) ≤ max x y := by
  split_ifs
  · exact le_max_left x y
  · exact le_max_right x y

contradiction

The contradiction tactic closes the goal by finding contradictory hypotheses in the context, such as h1 : P and h2 : ¬P, or an assumption of False.

theorem contradiction_example (h1 : False) : 0 = 1 := by
  contradiction

theorem exfalso_example (h : 0 = 1) : 5 = 10 := by
  exfalso  -- Goal becomes False
  simp at h

exfalso

The exfalso tactic changes any goal to False, applying the principle of explosion. Use this when you can derive a contradiction from your hypotheses.

theorem contradiction_example (h1 : False) : 0 = 1 := by
  contradiction

theorem exfalso_example (h : 0 = 1) : 5 = 10 := by
  exfalso  -- Goal becomes False
  simp at h

`by_contra`

The by_contra tactic starts a proof by contradiction. It adds the negation of the goal as a hypothesis and changes the goal to False, requiring you to derive a contradiction.

theorem by_contra_example : ∀ n : Nat, n = n := by
  intro n
  rfl

Proof of negation vs proof by contradiction: These are often confused but differ in an important way. A proof of negation proves ¬P by assuming P and deriving False. This is constructive since ¬P is defined as P → False. A proof by contradiction proves P by assuming ¬P and deriving False. This requires classical logic (double negation elimination) because you must go from ¬¬P to P. The by_contra tactic performs proof by contradiction and relies on Classical.byContradiction. If you are proving a negation, you can use intro h instead, which is constructive.

`push_neg`

The push_neg tactic pushes negations through quantifiers and connectives using De Morgan’s laws. It transforms ¬∀ x, P x into ∃ x, ¬P x and similar patterns.

theorem push_neg_example : ¬(∀ x : Nat, ∃ y, x < y) ↔ ∃ x : Nat, ∀ y, ¬(x < y) := by
  push_neg
  rfl

`by_cases`

The by_cases tactic splits the proof into two cases based on whether a proposition is true or false, adding the proposition as a hypothesis in one branch and its negation in the other.

theorem by_cases_example (p : Prop) : p ∨ ¬p := by
  by_cases h : p
  · left; exact h
  · right; exact h

choose

The choose tactic extracts a choice function from a hypothesis of the form ∀ x, ∃ y, P x y. It produces a function f and a proof that ∀ x, P x (f x).

theorem choose_example (h : ∀ x : Nat, ∃ y : Nat, x < y) :
    ∃ f : Nat → Nat, ∀ x, x < f x := by
  choose f hf using h
  exact ⟨f, hf⟩

lift

The lift tactic replaces a variable with one of a more specific type when you have a proof justifying the lift. For example, lifting an integer to a natural number given a proof it is nonnegative.

theorem lift_example (n : ℤ) (hn : 0 ≤ n) : ∃ m : ℕ, (m : ℤ) = n := by
  lift n to ℕ using hn
  exact ⟨n, rfl⟩

zify

The zify tactic converts a goal about natural numbers to one about integers, which often makes subtraction and other operations easier to handle since integers are closed under subtraction.

theorem zify_example (n m : ℕ) (_ : n ≥ m) : (n - m : ℤ) = n - m := by
  zify

qify

The qify tactic converts a goal about integers or naturals to one about rationals, enabling division and making certain algebraic manipulations possible.

theorem qify_example (n m : ℕ) : (n : ℚ) / (m : ℚ) = (n / m : ℚ) := by
  norm_cast

Searching

assumption

The assumption tactic closes the goal if there is a hypothesis in the context that exactly matches. It searches through all available hypotheses to find one with the right type.

theorem assumption_example (P Q : Prop) (h1 : P) (_h2 : Q) : P := by
  assumption  -- Finds h1

trivial

The trivial tactic tries a collection of simple tactics including rfl, assumption, and contradiction to close easy goals without you specifying which approach to use.

theorem trivial_example : True := by
  trivial

decide

The decide tactic evaluates decidable propositions by computation. For finite checks like 2 < 5 or membership in a finite list, it simply computes the answer and closes the goal.

theorem decide_example : 3 < 5 := by
  decide

Note

decide works in the kernel and produces small proof terms but can be slow. native_decide compiles to native code and runs faster but produces larger proof terms that just assert the result. For quick checks use decide; for expensive computations like verifying grid states in our Game of Life proofs, native_decide is essential.

hint

The hint tactic suggests which tactics might make progress on the current goal. It is a discovery tool that helps when you are unsure how to proceed.

theorem hint_example : 2 + 2 = 4 := by
  simp  -- hint would suggest this

General Automation

omega

The omega tactic is a decision procedure for linear arithmetic over natural numbers and integers. It handles goals involving addition, subtraction, multiplication by constants, and comparisons.

theorem omega_example (x y : Nat) : x < y → x + 1 ≤ y := by
  omega

Note

omega handles Nat and Int but not Rat or Real. It solves linear constraints but fails on nonlinear multiplication like x * y < z. For rationals, try linarith after qify. For nonlinear goals, try nlinarith or polyrith.

linarith

The linarith tactic proves goals that follow from linear arithmetic over ordered rings. It combines hypotheses about inequalities to derive the goal using Fourier-Motzkin elimination.

theorem linarith_example (x y z : ℚ) (h1 : x < y) (h2 : y < z) : x < z := by
  linarith

nlinarith

The nlinarith tactic extends linarith to handle some nonlinear goals by first preprocessing with polynomial arithmetic before applying linear reasoning.

theorem nlinarith_example (x : ℚ) (h : x > 0) : x^2 > 0 := by
  nlinarith

ring

The ring tactic proves polynomial equalities in commutative rings by normalizing both sides to a canonical form and checking if they match. It handles addition, multiplication, and powers.

theorem ring_example (x y : ℤ) : (x + y)^2 = x^2 + 2*x*y + y^2 := by
  ring

`noncomm_ring`

The noncomm_ring tactic proves equalities in non-commutative rings where multiplication order matters, such as matrix rings or quaternions.

theorem noncomm_ring_example (x y z : ℤ) : x * (y + z) = x * y + x * z := by
  ring

`field_simp`

The field_simp tactic clears denominators in field expressions by multiplying through, reducing goals involving fractions to polynomial equalities that ring can handle.

theorem field_simp_example (x y : ℚ) (hy : y ≠ 0) : x / y + 1 = (x + y) / y := by
  field_simp

abel

The abel tactic proves equalities in abelian groups by normalizing expressions involving addition, subtraction, and negation to a canonical form.

theorem abel_example (x y z : ℤ) : x + y + z = z + x + y := by
  abel

group

The group tactic proves equalities in groups using the group axioms. It handles multiplication, inverses, and the identity element, normalizing expressions to compare them.

theorem group_example (x y : ℤ) : x + (-x + y) = y := by
  group

module

The module tactic proves equalities in modules over a ring, handling scalar multiplication and vector addition to normalize expressions.

theorem module_example (x y : ℤ) (a : ℤ) : a • (x + y) = a • x + a • y := by
  module

aesop

The aesop tactic is a general-purpose automation tactic that combines many strategies including simplification, introduction rules, and case splitting to solve goals automatically.

theorem aesop_example (p q r : Prop) : p → (p → q) → (q → r) → r := by
  aesop

Tip

aesop is powerful but can be slow on complex goals. Use aesop? to see what it did, then extract a faster proof. Register custom lemmas with @[aesop safe] or @[aesop unsafe 50%] to extend its knowledge. The safe rules are always applied; unsafe rules are tried with backtracking weighted by percentage.

grind

The grind tactic is one of Lean 4’s most sophisticated automation tools. Under the hood, it maintains an e-graph (equivalence graph), a data structure that efficiently represents equivalence classes of terms. When you assert a = b, the e-graph merges the equivalence classes containing a and b. The key insight is congruence: if a = b, then f a = f b for any function f. The e-graph propagates these consequences automatically.

The algorithm works in three phases. First, congruence closure processes all equalities and computes the transitive, symmetric, reflexive closure under function application. If you know $x = y$ and $f(x) = 10$, congruence closure deduces $f(y) = 10$ without explicit rewriting. Second, forward chaining applies implications: if you have $p \land q$ and $q \to r$, it extracts $q$ from the conjunction and fires the implication to derive $r$. Third, case splitting handles disjunctions and if-then-else expressions by exploring branches.

theorem grind_example1 (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  grind

theorem grind_example2 (f : Nat → Nat) (x y : Nat)
    (h1 : x = y) (h2 : f x = 10) : f y = 10 := by
  grind

theorem grind_example3 (p q r : Prop)
    (h1 : p ∧ q) (h2 : q → r) : p ∧ r := by
  grind

theorem grind_example4 (x y : Nat) :
    (if x = y then x else y) = y ∨ x = y := by
  grind

The power shows up when these mechanisms combine. Here grind chains four equalities through two functions to conclude f b = 42:

-- Nested function applications with chained equalities
theorem grind_chain (f g : Nat → Nat) (a b c d : Nat)
    (h1 : a = b) (h2 : c = d) (h3 : f a = g c) (h4 : g d = 42) :
    f b = 42 := by
  grind

-- Existential witnesses from equality reasoning
theorem grind_exists (p : Nat → Prop) (a b : Nat)
    (h1 : a = b) (h2 : p a) : ∃ x, p x ∧ x = b := by
  grind

Tip

grind excels at “obvious” goals that would require tedious manual rewriting. If your goal involves chained equalities, function congruence, or propositional reasoning, try grind before writing out the steps by hand. For debugging, grind? shows the proof term it constructs.

tauto

The tauto tactic proves propositional tautologies involving $\land$, $\lor$, $\to$, $\leftrightarrow$, $\lnot$, True, and False. It handles classical and intuitionistic reasoning automatically.

theorem tauto_example (p q : Prop) : p → (p → q) → q := by
  tauto

Goal Operations

sorry

The sorry tactic closes any goal without actually proving it, leaving a hole in the proof. Use it as a placeholder during development, but never in finished proofs as it makes theorems unsound.

@[simp]  -- Add simp attribute to suppress sorry warning
theorem incomplete_proof : ∀ P : Prop, P ∨ ¬P := by
  sorry  -- Proof left as exercise

Warning

Any theorem containing sorry is marked as unsound and propagates this flag to anything that depends on it. Use #check @myTheorem to see if a theorem is sorry-free. Mathlib rejects all PRs containing sorry. During development, sorry is invaluable for sketching proofs top-down, but treat each one as a debt to be paid.

swap

The swap tactic exchanges the first two goals in the goal list, letting you work on the second goal first when that is more convenient.

theorem swap_example : True ∧ True := by
  constructor
  swap
  · trivial  -- Proves second goal first
  · trivial  -- Then first goal

`pick_goal`

The pick_goal tactic moves a specific numbered goal to the front of the goal list, allowing you to address goals in any order you choose.

theorem pick_goal_example : True ∧ True := by
  constructor
  pick_goal 2  -- Move second goal to front
  · trivial    -- Prove second goal
  · trivial    -- Prove first goal

`all_goals`

The all_goals tactic applies a given tactic to every goal in the current goal list, which is useful when multiple goals can be solved the same way.

theorem all_goals_example : (1 = 1) ∧ (2 = 2) := by
  constructor
  all_goals rfl

`any_goals`

The any_goals tactic applies a given tactic to each goal where it succeeds, skipping goals where it fails. It succeeds if it makes progress on at least one goal.

theorem any_goals_example : (1 = 1) ∧ (True) := by
  constructor
  · rfl
  · trivial

focus

The focus tactic restricts attention to the first goal, hiding all other goals. This helps ensure you complete one goal before moving to the next.

theorem focus_example : True ∧ True := by
  constructor
  · focus
      trivial
  · trivial

try

The try tactic attempts to apply a tactic and succeeds regardless of whether the inner tactic succeeds or fails. It is useful for optional simplification steps.

theorem try_example (p q : Prop) (hp : p) : p ∨ q := by
  try assumption  -- tries to close the goal if it exactly matches a hypothesis
  exact Or.inl hp

first

The first tactic tries a list of tactics in order and uses the first one that succeeds. It fails only if all tactics fail.

set_option linter.unusedTactic false in
set_option linter.unreachableTactic false in
theorem first_example (x : Nat) : x = x := by
  first | simp | rfl | sorry

repeat

The repeat tactic applies a given tactic repeatedly until it fails to make progress. It is useful for exhaustively applying a simplification or introduction rule.

set_option linter.unusedTactic false in
set_option linter.unreachableTactic false in
/-- `repeat` applies a tactic repeatedly -/
theorem repeat_example : True ∧ True ∧ True := by
  repeat constructor
  all_goals trivial

Tactic Combinators

The semicolon ; sequences tactics, while <;> applies the second tactic to all goals created by the first. These combinators help write concise proof scripts.

theorem combinator_example : (True ∧ True) ∧ (True ∧ True) := by
  constructor <;> (constructor <;> trivial)

Domain-Specific Tactics

`interval_cases`

The interval_cases tactic performs case analysis when a variable is known to lie in a finite range. Given bounds on a natural number, it generates a case for each possible value.

theorem interval_cases_example (n : ℕ) (h : n ≤ 2) : n = 0 ∨ n = 1 ∨ n = 2 := by
  interval_cases n
  · left; rfl
  · right; left; rfl
  · right; right; rfl

`fin_cases`

The fin_cases tactic performs case analysis on elements of a finite type like Fin n or Bool, creating a subgoal for each possible value of the type.

theorem fin_cases_example (i : Fin 3) : i.val < 3 := by
  fin_cases i <;> simp

Working with Quantifiers

Existential Quantifiers

Existential statements claim that some witness exists satisfying a property. To prove one, use use to provide the witness. To use an existential hypothesis, use obtain to extract the witness and its property.

theorem exists_intro : ∃ n : Nat, n > 10 := by
  exact ⟨42, by simp⟩

theorem exists_elim (h : ∃ n : Nat, n > 10) : True := by
  obtain ⟨n, hn⟩ := h
  trivial

Universal Quantifiers

Universal statements claim a property holds for all values. To prove one, use intro to introduce an arbitrary value. To use a universal hypothesis, use specialize or simply apply it to a specific value.

theorem forall_intro : ∀ x : Nat, x + 0 = x := by
  intro x
  simp

theorem forall_elim (h : ∀ x : Nat, x + 0 = x) : 5 + 0 = 5 := by
  exact h 5

Using This Reference

You do not need to memorize this article. Bookmark it. When you encounter a goal you cannot close, return here and ask: what shape is my goal? Implication, conjunction, existential, equality? Find the matching section. The tactic you need is there. Over time, the common ones become muscle memory. The obscure ones remain here for when you need them.

Congruence and Subtyping

Every type system makes tradeoffs between precision and convenience. A function that takes Nat will accept any natural number, including zero, even when zero would cause a division error three stack frames later. A function that takes Int cannot directly accept a Nat without explicit conversion, even though every natural number is an integer. The constraints are either too loose or the syntax is too verbose. Pick your frustration.

Lean provides tools to fix both problems. Subtypes let you carve out precisely the values you want: positive numbers, non-empty lists, valid indices. The constraint travels with the value, enforced by the type system. Coercions let the compiler insert safe conversions automatically, so you can pass a Nat where an Int is expected without ceremony. These mechanisms together give you precise types with ergonomic syntax.

Types are sets with attitude. A Nat carries the natural numbers along with all their operations and laws. A subtype narrows this: the positive natural numbers are the naturals with an extra constraint, a proof obligation that travels with every value. This is refinement: taking a broad type and carving out the subset you actually need.

The other direction is coercion. When Lean expects an Int but you give it a Nat, something must convert between them. Explicit casts are tedious. Coercions make the compiler do the work, inserting conversions automatically where safe. The result is code that looks like it mixes types freely but maintains type safety underneath.

Congruence handles the third concern: propagating equality. If a = b, then f a = f b. This seems obvious, but the compiler needs to be told. The congr tactic applies this principle systematically, breaking equality goals into their components.

Subtypes

A subtype refines an existing type with a predicate. Values of a subtype carry both the data and a proof that the predicate holds.

def Positive := { n : Nat // n > 0 }

def five : Positive := ⟨5, by decide⟩

#eval five.val  -- 5

def NonEmpty (α : Type) := { xs : List α // xs ≠ [] }

def singletonList : NonEmpty Nat := ⟨[42], by decide⟩

Working with Subtypes

Functions on subtypes can access the underlying value and use the proof to ensure operations are safe.

def doublePositive (p : Positive) : Positive :=
  ⟨p.val * 2, Nat.mul_pos p.property (by decide)⟩

#eval (doublePositive five).val  -- 10

def addPositive (a b : Positive) : Positive :=
  ⟨a.val + b.val, Nat.add_pos_left a.property b.val⟩

def safeHead {α : Type} (xs : NonEmpty α) : α :=
  match h : xs.val with
  | x :: _ => x
  | [] => absurd h xs.property

Subtypes let you express precise invariants. Common patterns include bounded numbers, non-zero values, and values satisfying specific properties.

def Even := { n : Nat // n % 2 = 0 }
def Odd := { n : Nat // n % 2 = 1 }

def zero' : Even := ⟨0, rfl⟩
def two' : Even := ⟨2, rfl⟩
def one' : Odd := ⟨1, rfl⟩
def three' : Odd := ⟨3, rfl⟩

def BoundedNat (lo hi : Nat) := { n : Nat // lo ≤ n ∧ n < hi }

def inRange : BoundedNat 0 10 := ⟨5, by omega⟩

Basic Coercions

Coercions allow automatic type conversion. When a value of type A is expected but you provide type B, Lean looks for a coercion from B to A.

instance : Coe Positive Nat where
  coe p := p.val

def useAsNat (p : Positive) : Nat :=
  p + 10

#eval useAsNat five  -- 15

instance {α : Type} : Coe (NonEmpty α) (List α) where
  coe xs := xs.val

def listLength (xs : NonEmpty Nat) : Nat :=
  xs.val.length

#eval listLength singletonList  -- 1

Coercion Chains

Coercions can chain together. If there is a coercion from A to B and from B to C, Lean can automatically convert from A to C.

structure Meters where
  val : Float
  deriving Repr

structure Kilometers where
  val : Float
  deriving Repr

instance : Coe Meters Float where
  coe m := m.val

instance : Coe Kilometers Meters where
  coe km := ⟨km.val * 1000⟩

def addDistances (a : Meters) (b : Kilometers) : Meters :=
  ⟨a.val + (b : Meters).val⟩

#eval addDistances ⟨500⟩ ⟨1.5⟩  -- Meters.mk 2000.0

Function Coercions

CoeFun allows values to be used as functions. This is useful for callable objects and function-like structures.

instance : CoeFun Positive (fun _ => Nat → Nat) where
  coe p := fun n => p.val + n

#eval five 10  -- 15

structure Adder where
  amount : Nat

instance : CoeFun Adder (fun _ => Nat → Nat) where
  coe a := fun n => n + a.amount

def addFive : Adder := ⟨5⟩

#eval addFive 10  -- 15

Sort Coercions

CoeFun coerces values to functions, allowing structures to behave like callable objects.

structure Predicate' (α : Type) where
  test : α → Bool

instance {α : Type} : CoeFun (Predicate' α) (fun _ => α → Bool) where
  coe p := p.test

def isEven' : Predicate' Nat := ⟨fun n => n % 2 == 0⟩

#eval isEven' 4   -- true
#eval isEven' 5   -- false

Congruence

The congr tactic applies congruence reasoning: if you need to prove f a = f b and you know a = b, congruence can close the goal.

example (a b : Nat) (h : a = b) : a + 1 = b + 1 := by
  congr

example (f : Nat → Nat) (a b : Nat) (h : a = b) : f a = f b := by
  congr

example (a b c d : Nat) (h1 : a = b) (h2 : c = d) : a + c = b + d := by
  congr <;> assumption

Congruence with Multiple Arguments

Congruence works with functions of multiple arguments, generating subgoals for each argument that differs.

example (f : Nat → Nat → Nat) (a b c d : Nat)
    (h1 : a = c) (h2 : b = d) : f a b = f c d := by
  rw [h1, h2]

example (xs ys : List Nat) (h : xs = ys) : xs.length = ys.length := by
  rw [h]

Substitution and Rewriting

The subst tactic substitutes equal terms, and rw rewrites using equalities. These are fundamental tactics for equality reasoning.

example (a b : Nat) (h : a = b) : a * a = b * b := by
  subst h
  rfl

example (a b c : Nat) (h1 : a = b) (h2 : b = c) : a = c := by
  rw [h1, h2]

example (a b : Nat) (h : a = b) (f : Nat → Nat) : f a = f b := by
  rw [h]

Type Conversion

Lean provides automatic coercion between numeric types and explicit conversion functions.

example (n : Nat) : Int := n

def natToInt (n : Nat) : Int := n

#eval natToInt 42  -- 42

def stringToNat? (s : String) : Option Nat :=
  s.toNat?

#eval stringToNat? "123"   -- some 123
#eval stringToNat? "abc"   -- none

Decidable Propositions

A proposition is decidable if there is an algorithm to determine its truth. This enables using propositions in if-expressions.

def isPositive (n : Int) : Decidable (n > 0) :=
  if h : n > 0 then isTrue h else isFalse h

def checkPositive (n : Int) : String :=
  if n > 0 then "positive" else "not positive"

#eval checkPositive 5    -- "positive"
#eval checkPositive (-3) -- "not positive"

def decideEqual (a b : Nat) : Decidable (a = b) :=
  if h : a = b then isTrue h else isFalse h

Type Class Inheritance

Type classes can extend other classes, creating inheritance hierarchies. A type implementing a subclass automatically implements its parent classes.

class Animal (α : Type) where
  speak : α → String

class Dog (α : Type) extends Animal α where
  fetch : α → String

structure Labrador where
  name : String

instance : Animal Labrador where
  speak lab := s!"{lab.name} says woof!"

instance : Dog Labrador where
  speak lab := s!"{lab.name} says woof!"
  fetch lab := s!"{lab.name} fetches the ball!"

def makeSpeak {α : Type} [Animal α] (a : α) : String :=
  Animal.speak a

def rex : Labrador := ⟨"Rex"⟩

#eval makeSpeak rex  -- "Rex says woof!"
#eval Dog.fetch rex  -- "Rex fetches the ball!"

Universe Polymorphism

Types live in a hierarchy of universes. Type is shorthand for Type 0, and each type has a type in the next universe.

def useType (α : Type) : Type := List α

def useType' (α : Type 1) : Type 1 := List α

universe u

def polyType (α : Type u) : Type u := List α

#check (Nat : Type)
#check (Type : Type 1)
#check (Type 1 : Type 2)

Structure Extension

Structures can extend other structures, inheriting their fields while adding new ones.

structure Shape where
  name : String

structure Circle extends Shape where
  radius : Float

structure Rectangle extends Shape where
  width : Float
  height : Float

def myCircle : Circle := { name := "unit circle", radius := 1.0 }
def myRect : Rectangle := { name := "square", width := 2.0, height := 2.0 }

#eval myCircle.name    -- "unit circle"
#eval myCircle.radius  -- 1.0

Nominal vs Structural Typing

Lean uses nominal typing: two types with identical structures are still distinct types. This prevents accidental mixing of values with different semantics. A UserId and a ProductId might both be integers underneath, but you cannot accidentally pass one where the other is expected. The bug where you deleted user 47 because product 47 was out of stock becomes a compile error. Nominal typing is the formal version of “label your variables.”

structure Meters' where
  val : Float

structure Seconds where
  val : Float

def distance : Meters' := ⟨100.0⟩
def time : Seconds := ⟨10.0⟩

abbrev Speed := Float

def calcSpeed (d : Meters') (t : Seconds) : Speed :=
  d.val / t.val

#eval calcSpeed distance time  -- 10.0

Classic Results

The machinery is in place. You understand types, proofs, tactics, and the refinements that make specifications precise. Next we put it all together: classic mathematical proofs formalized in Lean. Bezout’s identity, the infinitude of primes, the irrationality of root two. All the greatest hits.

Classic Proofs

This article presents proofs you likely encountered in undergraduate mathematics, now written in Lean. Each example shows the traditional proof and its formalization side by side. The goal is not to teach you these theorems; you already know them. The goal is to build intuition for how mathematical reasoning translates into Lean code. When you see a proof by contradiction in English, what tactic does that become? When a textbook says “by strong induction,” what does Lean require? The side-by-side format lets you map familiar reasoning patterns onto unfamiliar syntax.

Euclid’s proof of the infinitude of primes has survived for over two thousand years. It requires no calculus, no abstract algebra, only the observation that $n! + 1$ shares no prime factors with $n!$. Yet formalizing this argument reveals hidden assumptions: that every number greater than one has a prime divisor, that primes are well-defined, that contradiction is a valid proof technique. The proofs here are not difficult by mathematical standards, but they exercise the full machinery of dependent types, tactics, and theorem proving. If you can formalize theorems that have survived two millennia of scrutiny, how hard can proving your web app correctly validates email addresses really be?

Infinitude of Primes

Traditional Proof

Theorem. There exist infinitely many prime numbers.

The proof proceeds in two parts. First, we establish that every integer $n \geq 2$ has a prime divisor. If $n$ is prime, it divides itself. Otherwise, $n$ has a proper divisor $m$ with $1 < m < n$. By strong induction, $m$ has a prime divisor $p$, and since $p \mid m$ and $m \mid n$, we have $p \mid n$.

Second, we show that for any $n$, there exists a prime $p > n$. Consider $N = n! + 1$. Since $n! \geq 1$, we have $N \geq 2$, so $N$ has a prime divisor $p$. We claim $p > n$. Suppose for contradiction that $p \leq n$. Then $p \mid n!$ since $n!$ contains all factors from 1 to $n$. Since $p \mid N$ and $p \mid n!$, we have $p \mid (N - n!) = 1$. But $p \geq 2$, so $p \nmid 1$, a contradiction. Therefore $p > n$. QED

Lean Formalization

The Lean proof mirrors this structure exactly. The theorem exists_prime_factor establishes the first part by case analysis and strong induction (via termination_by). The main theorem InfinitudeOfPrimes constructs $n! + 1$, extracts a prime divisor, then derives a contradiction using dvd_factorial and Nat.dvd_add_right.

import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Nat.Factorial.Basic
import Mathlib.Tactic

namespace ZeroToQED.Proofs
open Nat

theorem exists_prime_factor (n : ℕ) (hn : 2 ≤ n) : ∃ p, Nat.Prime p ∧ p ∣ n := by
  by_cases hp : Nat.Prime n
  · exact ⟨n, hp, dvd_refl n⟩
  · obtain ⟨m, hm_dvd, hm_ne_one, hm_ne_n⟩ := exists_dvd_of_not_prime hn hp
    have hm_lt : m < n := lt_of_le_of_ne (Nat.le_of_dvd (by omega) hm_dvd) hm_ne_n
    have hm_ge : 2 ≤ m := by
      rcases m with _ | _ | m <;> simp_all
    obtain ⟨p, hp_prime, hp_dvd⟩ := exists_prime_factor m hm_ge
    exact ⟨p, hp_prime, dvd_trans hp_dvd hm_dvd⟩
termination_by n

theorem factorial_pos (n : ℕ) : 0 < n ! := Nat.factorial_pos n

theorem dvd_factorial {k n : ℕ} (hk : 0 < k) (hkn : k ≤ n) : k ∣ n ! :=
  Nat.dvd_factorial hk hkn

theorem InfinitudeOfPrimes : ∀ n, ∃ p > n, Nat.Prime p := by
  intro n
  have hN : 2 ≤ n ! + 1 := by
    have hfact : 0 < n ! := factorial_pos n
    omega
  obtain ⟨p, hp_prime, hp_dvd⟩ := exists_prime_factor (n ! + 1) hN
  refine ⟨p, ?_, hp_prime⟩
  by_contra hle
  push_neg at hle
  have hp_pos : 0 < p := hp_prime.pos
  have hdvd_fact : p ∣ n ! := dvd_factorial hp_pos hle
  have hdvd_one : p ∣ 1 := (Nat.dvd_add_right hdvd_fact).mp hp_dvd
  have hp_le_one : p ≤ 1 := Nat.le_of_dvd one_pos hdvd_one
  have hp_ge_two : 2 ≤ p := hp_prime.two_le
  omega

end ZeroToQED.Proofs

Alternative: Proof by Grind

The same theorem admits a much shorter proof using Lean’s grind tactic. The proof below defines its own IsPrime predicate and factorial to remain self-contained. Notice how each theorem body collapses to one or two lines, with grind handling the case analysis and arithmetic that required explicit by_contra, push_neg, and omega calls in the manual version.

import Mathlib.Tactic

namespace ZeroToQED.Proofs.Grind

/-- A prime is a number larger than 1 with no trivial divisors -/
def IsPrime (n : Nat) := 1 < n ∧ ∀ k, 1 < k → k < n → ¬ k ∣ n

/-- Every number larger than 1 has a prime factor -/
theorem exists_prime_factor :
    ∀ n, 1 < n → ∃ k, IsPrime k ∧ k ∣ n := by
  intro n h1
  by_cases hprime : IsPrime n
  · grind [Nat.dvd_refl]
  · obtain ⟨k, _⟩ : ∃ k, 1 < k ∧ k < n ∧ k ∣ n := by
      simp_all [IsPrime]
    obtain ⟨p, _, _⟩ := exists_prime_factor k (by grind)
    grind [Nat.dvd_trans]

/-- The factorial, defined recursively -/
def factorial : Nat → Nat
  | 0 => 1
  | n+1 => (n + 1) * factorial n
notation:10000 n "!" => factorial n

/-- The factorial is always positive -/
theorem factorial_pos : ∀ n, 0 < n ! := by
  intro n; induction n <;> grind [factorial]

/-- The factorial is divided by its constituent factors -/
theorem dvd_factorial : ∀ n, ∀ k ≤ n, 0 < k → k ∣ n ! := by
  intro n; induction n <;>
    grind [Nat.dvd_mul_right, Nat.dvd_mul_left_of_dvd, factorial]

/-- There are infinitely many primes: for any n, there exists p > n that is prime -/
theorem InfinitudeOfPrimes : ∀ n, ∃ p > n, IsPrime p := by
  intro n
  have : 1 < n ! + 1 := by grind [factorial_pos]
  obtain ⟨p, hp, _⟩ := exists_prime_factor (n ! + 1) this
  suffices ¬p ≤ n by grind
  intro (_ : p ≤ n)
  have : 1 < p := hp.1
  have : p ∣ n ! := dvd_factorial n p ‹p ≤ n› (by grind)
  have := Nat.dvd_sub ‹p ∣ n ! + 1› ‹p ∣ n !›
  grind [Nat.add_sub_cancel_left, Nat.dvd_one]

end ZeroToQED.Proofs.Grind

Both proofs establish the same theorem. The explicit version teaches you the proof structure; the grind version shows what automation can handle once you understand the underlying mathematics.

Irrationality of $\sqrt{2}$

Traditional Proof

Theorem. $\sqrt{2}$ is irrational.

The key lemma is: if $n^2$ is even, then $n$ is even. We prove the contrapositive. Suppose $n$ is odd, so $n = 2k + 1$ for some integer $k$. Then $n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2 + 2k) + 1$, which is odd. Therefore, if $n^2$ is even, $n$ cannot be odd, so $n$ must be even.

Now suppose $\sqrt{2} = p/q$ where $p, q$ are integers with $q \neq 0$ and $\gcd(p,q) = 1$. Then $2q^2 = p^2$, so $p^2$ is even, hence $p$ is even. Write $p = 2k$. Then $2q^2 = 4k^2$, so $q^2 = 2k^2$, meaning $q^2$ is even, hence $q$ is even. But then $\gcd(p,q) \geq 2$, contradicting our assumption. QED

Lean Formalization

The Lean code proves the parity lemmas explicitly. The theorem sq_odd_of_odd shows that squaring an odd number yields an odd number by expanding $(2k+1)^2$. The theorem even_of_sq_even proves the contrapositive: assuming $n$ is odd leads to $n^2$ being odd, which contradicts $n^2$ being even. The final irrationality result follows from Mathlib’s irrational_sqrt_two, which uses this same parity argument internally.

import Mathlib.RingTheory.Real.Irrational
import Mathlib.Tactic

namespace ZeroToQED.Proofs

theorem sq_even_of_even {n : ℤ} (h : Even n) : Even (n ^ 2) := by
  obtain ⟨k, hk⟩ := h
  exact ⟨2 * k ^ 2, by rw [hk]; ring⟩

theorem sq_odd_of_odd {n : ℤ} (h : Odd n) : Odd (n ^ 2) := by
  obtain ⟨k, hk⟩ := h
  exact ⟨2 * k ^ 2 + 2 * k, by rw [hk]; ring⟩

theorem even_of_sq_even {n : ℤ} (h : Even (n ^ 2)) : Even n := by
  by_contra hodd
  rw [Int.not_even_iff_odd] at hodd
  have hsq_odd : Odd (n ^ 2) := sq_odd_of_odd hodd
  obtain ⟨k, hk⟩ := hsq_odd
  obtain ⟨m, hm⟩ := h
  omega

theorem sqrt2_irrational : Irrational (Real.sqrt 2) := irrational_sqrt_two

end ZeroToQED.Proofs

Euclid’s Lemma

Traditional Proof

Theorem (Euclid’s Lemma). If a prime $p$ divides a product $ab$, then $p \mid a$ or $p \mid b$.

Let $p$ be prime with $p \mid ab$. Since $p$ is prime, the only divisors of $p$ are 1 and $p$. Therefore $\gcd(p, a) \in \{1, p\}$ (greatest common divisor).

Case 1: If $\gcd(p, a) = p$, then $p \mid a$ and we are done.

Case 2: If $\gcd(p, a) = 1$, we show $p \mid b$. Consider $\gcd(pb, ab)$. Since $p \mid pb$ and $p \mid ab$, we have $p \mid \gcd(pb, ab)$. By the property $\gcd(pb, ab) = b \cdot \gcd(p, a) = b \cdot 1 = b$, we conclude $p \mid b$. QED

Lean Formalization

The Lean proof follows this GCD-based argument directly. It case-splits on whether $\gcd(p, a) = 1$ or $\gcd(p, a) > 1$. In the coprime case, it uses Nat.gcd_mul_right to establish that $\gcd(pb, ab) = b$, then shows $p$ divides this GCD. In the non-coprime case, since $p$ is prime, $\gcd(p, a) = p$, so $p \mid a$.

import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Nat.GCD.Basic
import Mathlib.Tactic

namespace ZeroToQED.Proofs
open Nat

theorem euclid_lemma {a b p : ℕ} (hp : Nat.Prime p) (h : p ∣ a * b) :
    p ∣ a ∨ p ∣ b := by
  rcases Nat.eq_or_lt_of_le (Nat.one_le_iff_ne_zero.mpr (Nat.gcd_pos_of_pos_left a hp.pos).ne')
    with hcop | hncop
  · right
    have key : p ∣ Nat.gcd (p * b) (a * b) := Nat.dvd_gcd (dvd_mul_right p b) h
    rwa [Nat.gcd_mul_right, hcop.symm, one_mul] at key
  · left
    have hdvd := Nat.gcd_dvd_left p a
    rcases hp.eq_one_or_self_of_dvd _ hdvd with h1 | hp_eq
    · omega
    · have : p ∣ a := by rw [← hp_eq]; exact Nat.gcd_dvd_right p a
      exact this

theorem prime_divides_product_iff {p a b : ℕ} (hp : Nat.Prime p) :
    p ∣ a * b ↔ p ∣ a ∨ p ∣ b :=
  ⟨euclid_lemma hp, fun h => h.elim (dvd_mul_of_dvd_left · b) (dvd_mul_of_dvd_right · a)⟩

end ZeroToQED.Proofs

Binomial Theorem

Traditional Proof

Theorem (Binomial Theorem). For real numbers $x, y$ and natural number $n$: \[(x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}\]

The proof proceeds by induction. The base case $n = 0$ gives $(x+y)^0 = 1 = \binom{0}{0}x^0y^0$. For the inductive step, we expand $(x+y)^{n+1} = (x+y)(x+y)^n$, distribute, and apply Pascal’s identity $\binom{n}{k} + \binom{n}{k-1} = \binom{n+1}{k}$ to combine terms.

As concrete examples: $(x+1)^2 = x^2 + 2x + 1$ and $(x+1)^3 = x^3 + 3x^2 + 3x + 1$. QED

Lean Formalization

Mathlib provides add_pow, which establishes the binomial theorem via the same inductive argument. Our binomial_theorem reformulates this in the standard notation. The specific cases binomial_two and binomial_three are verified by the ring tactic, which normalizes polynomial expressions.

import Mathlib.Tactic

namespace ZeroToQED.Proofs

theorem binomial_theorem (x y : ℝ) (n : ℕ) :
    (x + y) ^ n = (Finset.range (n + 1)).sum fun k => ↑(n.choose k) * x ^ k * y ^ (n - k) := by
  rw [add_pow]
  apply Finset.sum_congr rfl
  intros k _
  ring

theorem binomial_two (x : ℝ) : (x + 1) ^ 2 = x ^ 2 + 2 * x + 1 := by ring

theorem binomial_three (x : ℝ) : (x + 1) ^ 3 = x ^ 3 + 3 * x ^ 2 + 3 * x + 1 := by ring

example : (2 : ℝ) ^ 5 = 32 := by norm_num

end ZeroToQED.Proofs

Fibonacci Numbers

Traditional Proof

Definition. The Fibonacci sequence: $F_0 = 0$, $F_1 = 1$, $F_{n+2} = F_{n+1} + F_n$. The sequence that appears everywhere: rabbit populations, sunflower spirals, financial markets, bad interview questions.

Theorem. $\sum_{k=0}^{n-1} F_k + 1 = F_{n+1}$

Base case ($n = 0$): The empty sum equals 0, and $0 + 1 = 1 = F_1$.

Inductive step: Assume $\sum_{k=0}^{n-1} F_k + 1 = F_{n+1}$. Then: \[\sum_{k=0}^{n} F_k + 1 = \left(\sum_{k=0}^{n-1} F_k + 1\right) + F_n = F_{n+1} + F_n = F_{n+2}\] which equals $F_{(n+1)+1}$. QED

Lean Formalization

The Lean proof follows the same structure. The definition fib uses pattern matching on 0, 1, and $n+2$. The theorem fib_sum proceeds by induction: the base case simplifies directly, and the inductive step uses Finset.sum_range_succ to split off the last term, applies the inductive hypothesis, then uses the recurrence relation.

import Mathlib.Tactic

namespace ZeroToQED.Proofs

def fib : ℕ → ℕ
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n

theorem fib_add_two (n : ℕ) : fib (n + 2) = fib (n + 1) + fib n := rfl

theorem fib_pos {n : ℕ} (h : 0 < n) : 0 < fib n := by
  cases n with
  | zero => contradiction
  | succ n =>
    cases n with
    | zero => decide
    | succ m =>
      rw [fib_add_two]
      exact Nat.add_pos_left (fib_pos (Nat.zero_lt_succ _)) _

theorem fib_sum (n : ℕ) : (Finset.range n).sum fib + 1 = fib (n + 1) := by
  induction n with
  | zero => simp [fib]
  | succ n ih =>
    rw [Finset.sum_range_succ, add_assoc, add_comm (fib n) 1, ← add_assoc, ih]
    rfl

end ZeroToQED.Proofs

Pigeonhole Principle

Traditional Proof

Theorem (Pigeonhole Principle). Let $f : A \to B$ be a function between finite sets with $|A| > |B|$. Then $f$ is not injective: there exist distinct $a_1, a_2 \in A$ with $f(a_1) = f(a_2)$.

Suppose for contradiction that $f$ is injective, meaning $f(a_1) = f(a_2)$ implies $a_1 = a_2$. An injective function from $A$ to $B$ implies $|A| \leq |B|$, since distinct elements of $A$ map to distinct elements of $B$. But we assumed $|A| > |B|$, a contradiction. Therefore $f$ is not injective, so there exist distinct $a_1 \neq a_2$ with $f(a_1) = f(a_2)$. QED

Corollary. In any group of $n > 365$ people, at least two share a birthday.

Lean Formalization

The Lean proof mirrors this argument precisely. It assumes by contradiction (by_contra hinj) that no two distinct elements collide. The push_neg tactic transforms this into: for all $a_1, a_2$, if $a_1 \neq a_2$ then $f(a_1) \neq f(a_2)$. This is exactly injectivity. We then apply Fintype.card_le_of_injective, which states that an injective function implies $|A| \leq |B|$, contradicting our hypothesis $|B| < |A|$.

import Mathlib.Data.Fintype.Card
import Mathlib.Tactic

namespace ZeroToQED.Proofs

theorem pigeonhole {α β : Type*} [Fintype α] [Fintype β]
    (f : α → β) (h : Fintype.card β < Fintype.card α) :
    ∃ a₁ a₂ : α, a₁ ≠ a₂ ∧ f a₁ = f a₂ := by
  by_contra hinj
  push_neg at hinj
  have inj : Function.Injective f := fun a₁ a₂ heq =>
    Classical.byContradiction fun hne => hinj a₁ a₂ hne heq
  exact Nat.not_lt.mpr (Fintype.card_le_of_injective f inj) h

theorem birthday_pigeonhole {n : ℕ} (hn : 365 < n) (birthday : Fin n → Fin 365) :
    ∃ i j : Fin n, i ≠ j ∧ birthday i = birthday j := by
  have hcard : Fintype.card (Fin 365) < Fintype.card (Fin n) := by simp [hn]
  exact pigeonhole birthday hcard

end ZeroToQED.Proofs

Divisibility

Traditional Proof

Definition. We write $a \mid b$ (divisibility) if there exists $k$ such that $b = ak$.

Theorem. Divisibility satisfies:

$a \mid a$ (reflexivity)
$a \mid b \land b \mid c \Rightarrow a \mid c$ (transitivity)
$a \mid b \land a \mid c \Rightarrow a \mid (b + c)$
$a \mid b \Rightarrow a \mid bc$

Proof. (1) $a = a \cdot 1$, so take $k = 1$.

(2) If $b = ak$ and $c = bm$, then $c = (ak)m = a(km)$.

(3) If $b = ak$ and $c = am$, then $b + c = ak + am = a(k + m)$.

(4) If $b = ak$, then $bc = (ak)c = a(kc)$. QED

Lean Formalization

Each Lean proof constructs the witness $k$ explicitly. The obtain tactic extracts the witnesses from divisibility hypotheses, then we provide the new witness as an anonymous constructor ⟨_, _⟩. The equality proofs use rw to substitute and mul_assoc or mul_add to rearrange.

import Mathlib.Tactic

namespace ZeroToQED.Proofs

example : 3 ∣ 12 := ⟨4, rfl⟩

example : ¬5 ∣ 12 := by decide

theorem dvd_refl' (n : ℕ) : n ∣ n := ⟨1, (mul_one n).symm⟩

theorem dvd_trans' {a b c : ℕ} (hab : a ∣ b) (hbc : b ∣ c) : a ∣ c := by
  obtain ⟨k, hk⟩ := hab
  obtain ⟨m, hm⟩ := hbc
  exact ⟨k * m, by rw [hm, hk, mul_assoc]⟩

theorem dvd_add' {a b c : ℕ} (hab : a ∣ b) (hac : a ∣ c) : a ∣ b + c := by
  obtain ⟨k, hk⟩ := hab
  obtain ⟨m, hm⟩ := hac
  exact ⟨k + m, by rw [hk, hm, mul_add]⟩

theorem dvd_mul_right' (a b : ℕ) : a ∣ a * b := ⟨b, rfl⟩

theorem dvd_mul_left' (a b : ℕ) : b ∣ a * b := ⟨a, (mul_comm b a).symm⟩

end ZeroToQED.Proofs

Generalized Riemann Hypothesis

The proofs above are solved problems. But what about the unsolved ones? The Generalized Riemann Hypothesis asserts that all non-trivial zeros of Dirichlet L-functions have real part $\frac{1}{2}$. It has resisted proof since 1859. The statement is precise enough to formalize:

/-- The **Generalized Riemann Hypothesis** asserts that all the non-trivial zeros of the
Dirichlet L-function L(χ, s) of a primitive Dirichlet character χ have real part 1/2. -/
theorem generalized_riemann_hypothesis (q : ℕ) [NeZero q] (χ : DirichletCharacter ℂ q)
    (hχ : χ.IsPrimitive) (s : ℂ) (hs : χ.LFunction s = 0)
    (hs_nontrivial : s ∉ Int.cast '' trivialZeros χ) :
    s.re = 1 / 2 :=
  sorry

That sorry is worth a million dollars from the Clay Mathematics Institute, a Fields Medal, arguably a Nobel Prize in Physics (for its implications in quantum chaos), a Turing Award if you use a computer to help, and mass adoration from strangers on the internet. The reward structure for closing that sorry is, by any reasonable measure, excessive.

Google DeepMind maintains a repository of open mathematical conjectures formalized in Lean, including the Generalized Riemann Hypothesis. The existence of this repository says something profound: the frontier of human mathematical knowledge can now be expressed as a list of sorry statements waiting to be filled. When someone eventually proves or disproves these conjectures, the proof will compile.

Algebraic Structures

Mathematics organizes operations into hierarchies. A group is more than a set with an operation: it is a semigroup with identity and inverses, and a semigroup is a set with an associative operation. These hierarchies matter because theorems proved at one level apply to all structures below it. Prove something about groups, and it holds for integers, permutations, and matrix transformations alike.

Lean captures these hierarchies through type classes. Each algebraic structure becomes a type class, and instances register specific types as members. The type class system then automatically provides the right theorems and operations wherever they apply. The notation is convenient, but the real value is the machinery underneath: generic mathematical code that works across any conforming type.

Semigroups

A semigroup is the simplest algebraic structure: a type with a binary operation that is associative. Nothing more. No identity element, no inverses, just the guarantee that $(a \cdot b) \cdot c = a \cdot (b \cdot c)$.

-- A semigroup has an associative binary operation
class Semigroup (α : Type) where
  op : α → α → α
  op_assoc : ∀ a b c : α, op (op a b) c = op a (op b c)

-- Notation for our operation
infixl:70 " ⋆ " => Semigroup.op

The op_assoc field is a proof that clients can use directly. Any theorem about semigroups can invoke this associativity without asking whether a particular type satisfies it. The type class instance guarantees it.

Monoids

A monoid extends a semigroup with an identity element. The identity must satisfy two laws: left identity ($e \cdot a = a$) and right identity ($a \cdot e = a$). Natural numbers under addition form a monoid with identity 0. Natural numbers under multiplication form a different monoid with identity 1. Same type, different structures.

-- A monoid adds an identity element to a semigroup
class Monoid (α : Type) extends Semigroup α where
  e : α
  e_op : ∀ a : α, op e a = a
  op_e : ∀ a : α, op a e = a

The extends Semigroup α clause means every monoid is automatically a semigroup. Lean’s type class inheritance handles this: any function expecting a semigroup accepts a monoid. The hierarchy is operational, not merely conceptual.

Groups

A group adds inverses to a monoid. Every element $a$ has an inverse $a^{-1}$ such that $a^{-1} \cdot a = e$ and $a \cdot a^{-1} = e$. Integers under addition form a group. Positive rationals under multiplication form a group. Permutations under composition form a group. The examples proliferate across mathematics.

-- A group adds inverses to a monoid
class Group (α : Type) extends Monoid α where
  inv : α → α
  inv_op : ∀ a : α, op (inv a) a = e
  op_inv : ∀ a : α, op a (inv a) = e

-- Notation for inverse
postfix:max "⁻¹" => Group.inv

With this definition, we can prove fundamental group theorems that apply to any group. These are not approximations or heuristics; they are mathematical facts verified by the type checker.

Group Theorems

From just the group axioms, many properties follow. Cancellation laws let us simplify equations. The identity is unique, as are inverses. The inverse of a product reverses the order. These theorems are mechanical consequences of the axioms, and Lean verifies each step.

-- Now we prove fundamental group theorems from the axioms

variable {G : Type} [Group G]

-- Left cancellation: if a ⋆ b = a ⋆ c then b = c
theorem op_left_cancel (a b c : G) (h : a ⋆ b = a ⋆ c) : b = c := by
  have : a⁻¹ ⋆ (a ⋆ b) = a⁻¹ ⋆ (a ⋆ c) := by rw [h]
  simp only [← Semigroup.op_assoc, Group.inv_op, Monoid.e_op] at this
  exact this

-- Right cancellation: if b ⋆ a = c ⋆ a then b = c
theorem op_right_cancel (a b c : G) (h : b ⋆ a = c ⋆ a) : b = c := by
  have : (b ⋆ a) ⋆ a⁻¹ = (c ⋆ a) ⋆ a⁻¹ := by rw [h]
  simp only [Semigroup.op_assoc, Group.op_inv, Monoid.op_e] at this
  exact this

-- The identity is unique
theorem e_unique (e' : G) (h : ∀ a : G, e' ⋆ a = a) : e' = Monoid.e := by
  have : e' ⋆ Monoid.e = Monoid.e := h Monoid.e
  rw [Monoid.op_e] at this
  exact this

-- Inverses are unique
theorem inv_unique (a b : G) (h : b ⋆ a = Monoid.e) : b = a⁻¹ := by
  have step1 : b ⋆ a ⋆ a⁻¹ = Monoid.e ⋆ a⁻¹ := by rw [h]
  simp only [Semigroup.op_assoc, Group.op_inv, Monoid.op_e, Monoid.e_op] at step1
  exact step1

-- Double inverse: (a⁻¹)⁻¹ = a
theorem inv_inv (a : G) : (a⁻¹)⁻¹ = a := by
  symm
  apply inv_unique
  exact Group.op_inv a

-- Inverse of product: (a ⋆ b)⁻¹ = b⁻¹ ⋆ a⁻¹
theorem op_inv_rev (a b : G) : (a ⋆ b)⁻¹ = b⁻¹ ⋆ a⁻¹ := by
  symm
  apply inv_unique
  calc b⁻¹ ⋆ a⁻¹ ⋆ (a ⋆ b)
      = b⁻¹ ⋆ (a⁻¹ ⋆ (a ⋆ b)) := by rw [Semigroup.op_assoc]
    _ = b⁻¹ ⋆ (a⁻¹ ⋆ a ⋆ b) := by rw [← Semigroup.op_assoc a⁻¹ a b]
    _ = b⁻¹ ⋆ (Monoid.e ⋆ b) := by rw [Group.inv_op]
    _ = b⁻¹ ⋆ b := by rw [Monoid.e_op]
    _ = Monoid.e := Group.inv_op b

The theorem op_inv_rev shows that $(a \cdot b)^{-1} = b^{-1} \cdot a^{-1}$. The order reverses because we need to undo the operations in reverse sequence. The proof uses our inv_unique theorem: to show two things are equal to an inverse, show they act as that inverse.

Integers Mod 2

Theory without examples is suspect. Let us build a concrete group: integers modulo 2. This group has exactly two elements (zero and one) with addition wrapping around: $1 + 1 = 0$.

-- Example: Integers mod 2 form a group under addition

inductive Z2 : Type where
  | zero : Z2
  | one : Z2
  deriving DecidableEq, Repr

def Z2.add : Z2 → Z2 → Z2
  | .zero, a => a
  | .one, .zero => .one
  | .one, .one => .zero

def Z2.neg : Z2 → Z2
  | a => a  -- In Z2, every element is its own inverse

Every element is its own inverse ($0 + 0 = 0$ and $1 + 1 = 0$), which simplifies the structure. Now we register this as a group instance:

instance : Group Z2 where
  op := Z2.add
  op_assoc := by
    intro a b c
    cases a <;> cases b <;> cases c <;> rfl
  e := Z2.zero
  e_op := by
    intro a
    cases a <;> rfl
  op_e := by
    intro a
    cases a <;> rfl
  inv := Z2.neg
  inv_op := by
    intro a
    cases a <;> rfl
  op_inv := by
    intro a
    cases a <;> rfl

-- Test computation
#eval (Z2.one ⋆ Z2.one : Z2)  -- zero
#eval (Z2.one ⋆ Z2.zero : Z2) -- one

Each proof obligation is discharged by case analysis. With only two elements, Lean can verify each law by exhaustively checking all combinations.

-- Verify our theorems work on the concrete example
example : (Z2.one)⁻¹ = Z2.one := rfl
example : Z2.one ⋆ Z2.one⁻¹ = Z2.zero := rfl

theorem z2_self_inverse (a : Z2) : a ⋆ a = Monoid.e := by
  cases a <;> rfl

-- Z2 is commutative
theorem z2_comm (a b : Z2) : a ⋆ b = b ⋆ a := by
  cases a <;> cases b <;> rfl

Because Z2 is now a Group, all our general theorems apply. The op_left_cancel and inv_unique theorems work on Z2 without modification. Generic mathematics, specific verification.

Commutative Groups

Some groups satisfy an additional property: commutativity. In a commutative (or Abelian) group, $a \cdot b = b \cdot a$ for all elements. Integer addition is commutative; matrix multiplication is not.

-- Commutative (Abelian) groups
class CommGroup (α : Type) extends Group α where
  op_comm : ∀ a b : α, Semigroup.op a b = Semigroup.op b a

instance : CommGroup Z2 where
  op_comm := z2_comm

Vector Spaces

Groups appear everywhere, including in linear algebra. A vector space is an Abelian group (vectors under addition) equipped with scalar multiplication satisfying certain compatibility laws. Let us build a simple 2D vector space over the integers.

-- A simple 2D vector space over integers

structure Vec2 where
  x : Int
  y : Int
  deriving DecidableEq, Repr

def Vec2.add (v w : Vec2) : Vec2 :=
  ⟨v.x + w.x, v.y + w.y⟩

def Vec2.neg (v : Vec2) : Vec2 :=
  ⟨-v.x, -v.y⟩

def Vec2.zero : Vec2 := ⟨0, 0⟩

def Vec2.smul (c : Int) (v : Vec2) : Vec2 :=
  ⟨c * v.x, c * v.y⟩

infixl:65 " +ᵥ " => Vec2.add
prefix:100 "-ᵥ" => Vec2.neg
infixl:70 " •ᵥ " => Vec2.smul

The vectors form a group under addition. Each vector $(x, y)$ has inverse $(-x, -y)$, and the zero vector is the identity.

-- Vec2 forms a group under addition
theorem vec2_inv_op (a : Vec2) : Vec2.add (Vec2.neg a) a = Vec2.zero := by
  simp only [Vec2.add, Vec2.neg, Vec2.zero, Int.add_comm, Int.add_right_neg]

theorem vec2_op_inv (a : Vec2) : Vec2.add a (Vec2.neg a) = Vec2.zero := by
  simp only [Vec2.add, Vec2.neg, Vec2.zero, Int.add_right_neg]

instance : Group Vec2 where
  op := Vec2.add
  op_assoc := by
    intro a b c
    simp only [Vec2.add, Int.add_assoc]
  e := Vec2.zero
  e_op := by
    intro a
    simp only [Vec2.add, Vec2.zero, Int.zero_add]
  op_e := by
    intro a
    simp only [Vec2.add, Vec2.zero, Int.add_zero]
  inv := Vec2.neg
  inv_op := vec2_inv_op
  op_inv := vec2_op_inv

-- Vec2 is commutative
theorem vec2_comm (a b : Vec2) : a ⋆ b = b ⋆ a := by
  show Vec2.add a b = Vec2.add b a
  simp only [Vec2.add, Int.add_comm]

instance : CommGroup Vec2 where
  op_comm := vec2_comm

Scalar multiplication satisfies the expected laws. These are the axioms that make scalar multiplication “compatible” with the vector space structure.

-- Scalar multiplication properties
theorem smul_zero (c : Int) : c •ᵥ Vec2.zero = Vec2.zero := by
  simp only [Vec2.smul, Vec2.zero, Int.mul_zero]

theorem zero_smul (v : Vec2) : (0 : Int) •ᵥ v = Vec2.zero := by
  simp only [Vec2.smul, Vec2.zero, Int.zero_mul]

theorem one_smul (v : Vec2) : (1 : Int) •ᵥ v = v := by
  simp only [Vec2.smul, Int.one_mul]

theorem smul_add (c : Int) (v w : Vec2) :
    c •ᵥ (v +ᵥ w) = (c •ᵥ v) +ᵥ (c •ᵥ w) := by
  simp only [Vec2.smul, Vec2.add, Int.mul_add]

theorem add_smul (c d : Int) (v : Vec2) :
    (c + d) •ᵥ v = (c •ᵥ v) +ᵥ (d •ᵥ v) := by
  simp only [Vec2.smul, Vec2.add, Int.add_mul]

theorem smul_smul (c d : Int) (v : Vec2) :
    c •ᵥ (d •ᵥ v) = (c * d) •ᵥ v := by
  simp only [Vec2.smul, Int.mul_assoc]

Concrete computations confirm the definitions work:

-- Concrete computations
def v1 : Vec2 := ⟨1, 2⟩
def v2 : Vec2 := ⟨3, 4⟩

#eval v1 +ᵥ v2           -- { x := 4, y := 6 }
#eval 2 •ᵥ v1            -- { x := 2, y := 4 }
#eval v1 +ᵥ (-ᵥv1)       -- { x := 0, y := 0 }

-- The group inverse works correctly
example : v1 +ᵥ (-ᵥv1) = Vec2.zero := by rfl

Rings

A ring has two operations: addition forming an Abelian group, and multiplication forming a monoid. Distributivity connects them: $a \cdot (b + c) = a \cdot b + a \cdot c$. Integers, polynomials, and matrices all form rings.

-- A ring combines two structures: an abelian group under addition
-- and a monoid under multiplication, linked by distributivity

class Ring (α : Type) where
  -- Additive abelian group
  add : α → α → α
  zero : α
  neg : α → α
  add_assoc : ∀ a b c, add (add a b) c = add a (add b c)
  zero_add : ∀ a, add zero a = a
  add_zero : ∀ a, add a zero = a
  neg_add : ∀ a, add (neg a) a = zero
  add_comm : ∀ a b, add a b = add b a
  -- Multiplicative monoid
  mul : α → α → α
  one : α
  mul_assoc : ∀ a b c, mul (mul a b) c = mul a (mul b c)
  one_mul : ∀ a, mul one a = a
  mul_one : ∀ a, mul a one = a
  -- Distributivity connects the two structures
  left_distrib : ∀ a b c, mul a (add b c) = add (mul a b) (mul a c)
  right_distrib : ∀ a b c, mul (add a b) c = add (mul a c) (mul b c)

The integers satisfy all ring axioms:

-- The integers form a ring
instance intRing : Ring Int where
  add := (· + ·)
  zero := 0
  neg := (- ·)
  mul := (· * ·)
  one := 1
  add_assoc := Int.add_assoc
  zero_add := Int.zero_add
  add_zero := Int.add_zero
  neg_add := Int.add_left_neg
  add_comm := Int.add_comm
  mul_assoc := Int.mul_assoc
  one_mul := Int.one_mul
  mul_one := Int.mul_one
  left_distrib := Int.mul_add
  right_distrib := Int.add_mul

From these axioms, one can prove that $0 \cdot a = 0$ for any ring element $a$. This follows from distributivity: $0 \cdot a + 0 \cdot a = (0 + 0) \cdot a = 0 \cdot a$, and cancellation gives $0 \cdot a = 0$. See ring_zero_mul and ring_mul_zero in AlgebraicStructures.lean for the full proofs.

The Hierarchy

The structures we have defined form a hierarchy. At the base sits Semigroup, requiring only an associative operation. Monoid extends Semigroup by adding an identity element. Group extends Monoid by adding inverses. From Group, two paths diverge: CommGroup adds commutativity, while Ring combines an abelian group (for addition) with a monoid (for multiplication) linked by distributivity.

Each extension relationship means theorems flow downward. Prove something about semigroups, and it applies to monoids, groups, and rings. Lean’s type class inheritance makes this operational: any function expecting a Semigroup instance automatically accepts a Monoid, Group, or Ring.

Algebraic Structure Hierarchy — *The algebraic structure hierarchy: arrows indicate "extends" relationships.*

Mathlib takes this much further. The full algebraic hierarchy includes semirings, division rings, fields, modules, algebras, and dozens of ordered variants. Each structure captures a precise set of assumptions, and theorems are proved at exactly the level of generality where they hold.

First Principles to Mathlib

We built these structures from scratch to understand how they work. In practice, you would use Mathlib’s definitions, which are battle-tested and integrated with thousands of theorems. Our Group is Mathlib’s Group. Our Ring is Mathlib’s Ring. The concepts are identical; the implementations are industrial-strength.

The value of building from first principles is understanding. When Mathlib’s ring tactic solves a polynomial identity, it is applying theorems like our ring_zero_mul millions of times per second. When type class inference finds a CommGroup instance, it is navigating a hierarchy like the one we drew. The abstraction is real, and so is the machinery underneath.

Constraints Beget Structure

One of the beautiful facts in group theory is that strong constraints force unexpected structure. Consider a group where every element is its own inverse: $g^2 = e$ for all $g$. Such groups must be abelian. The proof is a gem of algebraic reasoning.

-- If every element is its own inverse, the group must be abelian.
-- The proof: ab = (ab)⁻¹ = b⁻¹a⁻¹ = ba. Constraints beget structure.
theorem involutive_imp_comm {G : Type} [Group G]
    (h : ∀ g : G, g ⋆ g = Monoid.e) : ∀ a b : G, a ⋆ b = b ⋆ a := by
  intro a b
  -- Key insight: if g² = e then g = g⁻¹
  have inv_self : ∀ g : G, g = g⁻¹ := fun g => by
    have := h g
    calc g = g ⋆ Monoid.e := (Monoid.op_e _).symm
      _ = g ⋆ (g ⋆ g⁻¹) := by rw [Group.op_inv]
      _ = (g ⋆ g) ⋆ g⁻¹ := (Semigroup.op_assoc _ _ _).symm
      _ = Monoid.e ⋆ g⁻¹ := by rw [this]
      _ = g⁻¹ := Monoid.e_op _
  -- Now: ab = (ab)⁻¹ = b⁻¹a⁻¹ = ba
  calc a ⋆ b
      = (a ⋆ b)⁻¹ := inv_self (a ⋆ b)
    _ = b⁻¹ ⋆ a⁻¹ := op_inv_rev a b
    _ = b ⋆ a := by rw [← inv_self a, ← inv_self b]

The key insight is that $g^2 = e$ implies $g = g^{-1}$. From there, $ab = (ab)^{-1} = b^{-1}a^{-1} = ba$. The constraint on squares forces commutativity. Our Z2 group is an example: every element squares to zero, and the group is indeed abelian.

Squaring Distributes

A related result: if squaring distributes over the group operation, meaning $(ab)^2 = a^2 b^2$ for all elements, then the group must be abelian. This is left as an exercise.

-- Exercise: If squaring distributes, the group is abelian.
-- Hint: expand (ab)² = a²b² and cancel to get ab = ba.
theorem square_distrib_imp_comm {G : Type} [Group G]
    (h : ∀ a b : G, (a ⋆ b) ⋆ (a ⋆ b) = (a ⋆ a) ⋆ (b ⋆ b)) :
    ∀ a b : G, a ⋆ b = b ⋆ a := by
  sorry

The hint is to expand both sides. On the left, $(ab)^2 = abab$. On the right, $a^2 b^2 = aabb$. The equality $abab = aabb$ lets you cancel $a$ on the left and $b$ on the right, yielding $ba = ab$. The machinery we built here forms the foundation for the full algebraic hierarchy, including Galois theory in Mathlib.

Mathlib

Mathlib is the mathematical library for Lean 4. Over a million lines of formalized mathematics, from basic logic through graduate-level algebra, analysis, and number theory. Hundreds of contributors have poured years of work into this thing. When you import it, you inherit their labor. The triangle inequality is already proven. So is the fundamental theorem of algebra. You do not need to prove that primes are infinite; someone did that in 2017 and you can just use it. The community tracks progress against a list of 100 major theorems; most are done.

The library is organized hierarchically. At the foundation sit logic, sets, and basic data types. Above these rise algebraic structures, then topology and analysis, then specialized domains like combinatorics and number theory. Each layer builds on those below. Finding what you need in a million-line codebase used to be challenging, but the community has built excellent semantic search tools powered by AI, and the Mathlib documentation provides searchable API references for every declaration.

Core Foundations

These modules provide the logical and set-theoretic foundations that everything else depends on. The logic modules formalize propositional and predicate calculus, including both constructive reasoning and classical axioms like the law of excluded middle. Set theory in Mathlib is built on top of type theory rather than replacing it: Set α is defined as α → Prop, making sets predicates on types. This foundation supports finite sets with decidable membership, order theory including lattices and Galois connections, and the infrastructure for defining mathematical structures throughout the library.

Module	Description
`Mathlib.Logic.Basic`	Core logical connectives, `And`, `Or`, `Not`, `Iff`, basic lemmas
`Init.Classical`	Classical axioms: `Classical.em`, `Classical.choose`, `Classical.byContradiction`
`Mathlib.Data.Set.Basic`	Set operations: union, intersection, complement, membership
`Mathlib.Data.Finset.Basic`	Finite sets with decidable membership
`Mathlib.Order.Basic`	Partial orders, lattices, suprema and infima

Algebraic Hierarchy

Mathlib builds algebra through a hierarchy of type classes. Each structure adds operations and axioms to those below it. The hierarchy begins with semigroups and monoids, progresses through groups and rings, and culminates in fields and modules. This design means that a theorem about groups automatically applies to rings, fields, and any other structure that extends groups. The library includes both additive and multiplicative variants of each structure, connected by the @[to_additive] attribute that automatically generates parallel theories. Key accomplishments include complete formalizations of Galois theory, the structure theorem for finitely generated modules over PIDs, and the Nullstellensatz.

Module	Description
`Mathlib.Algebra.Group.Basic`	Monoids, groups, abelian groups
`Mathlib.Algebra.Ring.Basic`	Semirings, rings, commutative rings
`Mathlib.Algebra.Field.Basic`	Division rings, fields
`Mathlib.Algebra.Module.Basic`	Modules over rings, vector spaces
`Mathlib.Algebra.Module.LinearMap.Defs`	Linear maps, submodules, quotients
`Mathlib.RingTheory.Ideal.Basic`	Ideals, quotient rings

Number Systems

The standard number types and their properties, constructed with mathematical rigor. Natural numbers come from Lean’s core, but Mathlib adds comprehensive libraries for divisibility, primality, and arithmetic functions. Integers are built as a quotient of pairs of naturals. Rationals are fractions in lowest terms. Real numbers are equivalence classes of Cauchy sequences of rationals. Complex numbers are pairs of reals. Each construction comes with the expected algebraic structure and interoperability lemmas. The library also provides modular arithmetic through ZMod n, which is a field when n is prime.

Module	Description
`Mathlib.Data.Nat.Prime.Defs`	Prime numbers, factorization
`Mathlib.Data.Int.Basic`	Integers
`Mathlib.Data.Rat.Defs`	Rational numbers
`Mathlib.Data.Real.Basic`	Real numbers (Cauchy completion)
`Mathlib.Data.Complex.Basic`	Complex numbers
`Mathlib.Data.ZMod.Defs`	Integers modulo n

Analysis and Topology

Continuous mathematics built on topological foundations. The topology library provides general topological spaces, filters, and nets as the foundation for limits and continuity. Metric spaces add distance functions with the expected triangle inequality and completeness properties. Analysis proper includes differentiation in arbitrary normed spaces, the Fréchet derivative for multivariable calculus, and integration via measure theory. Major formalizations include the Fundamental Theorem of Calculus, the Hahn-Banach theorem, the spectral theorem for compact self-adjoint operators, and the Central Limit Theorem. The library handles both real and complex analysis through a unified framework.

Module	Description
`Mathlib.Topology.Basic`	Topological spaces, open sets, continuity
`Mathlib.Topology.MetricSpace.Basic`	Metric spaces, distances
`Mathlib.Analysis.Normed.Field.Basic`	Normed fields
`Mathlib.Analysis.Calculus.Deriv.Basic`	Derivatives
`Mathlib.MeasureTheory.Measure.MeasureSpace`	Measure spaces, integration

Category Theory and Combinatorics

Abstract structures and discrete mathematics form two largely independent branches of Mathlib. The category theory library provides a comprehensive framework for categorical reasoning: categories, functors, natural transformations, adjunctions, limits, colimits, and monads. This infrastructure supports both abstract mathematics and the categorical semantics of type theory. The combinatorics library covers graph theory with simple graphs and multigraphs, the pigeonhole principle, inclusion-exclusion, and generating functions. Notable formalizations include Szemerédi’s regularity lemma, the cap set problem bound, and significant progress toward the Polynomial Freiman-Ruzsa conjecture.

Module	Description
`Mathlib.CategoryTheory.Category.Basic`	Categories, functors, natural transformations
`Mathlib.CategoryTheory.Limits.IsLimit`	Limits and colimits
`Mathlib.CategoryTheory.Monad.Basic`	Monads in category theory
`Mathlib.Combinatorics.SimpleGraph.Basic`	Graph theory
`Mathlib.Combinatorics.Pigeonhole`	Pigeonhole principle

Finding What You Need

Mathlib is large. You will spend more time searching for lemmas than proving theorems, at least at first. Accept this. The good news is that the lemma you need almost certainly exists. The bad news is that it might be named something you would never guess. The community has built an ecosystem of search tools, each with different strengths.

Moogle: Semantic search for Mathlib. Type “triangle inequality for norms” and find norm_add_le. Sometimes it even understands what you meant rather than what you typed. Moogle uses embeddings trained on mathematical text, so it handles synonyms and related concepts well. Start here when you know what you want but not what it is called.

Loogle: Type signature search. If you need a lemma involving List.map and List.length, search for List.map, List.length and find List.length_map. Loogle lets you search by the shape of the types involved, using wildcards and constraints. Precise queries get precise answers. This is the tool when you know the types but not the name.

LeanSearch: Another semantic search engine over Mathlib, focused on finding relevant theorems from natural language descriptions. It provides a different ranking algorithm than Moogle, so when one fails, try the other. Sometimes the theorem you need appears on page two of one engine and page one of another.

LeanExplore: Semantic search that indexes not just theorems but also metaprogramming, tactics, and attributes. It exposes an MCP (Model Context Protocol) interface, making it accessible to AI coding assistants. Useful when you need to find not just a lemma but how to use a particular tactic or attribute.

Lean Finder: Understands proof states and theorem statements at a deeper level than keyword matching. You can paste your current goal and it will suggest relevant lemmas. Particularly useful when you are mid-proof and need something that applies to your specific situation.

Mathlib4 Docs: The auto-generated API reference for every declaration in Mathlib. Not a search engine per se, but once you know the module, this is where you browse the available lemmas. Each declaration links to its source code and shows its type signature, docstring, and related definitions.

In-editor tactics: The exact? tactic searches for lemmas that exactly match your goal. The apply? tactic finds lemmas whose conclusion unifies with your goal. Slow but thorough since they search locally compiled dependencies. Use them when web-based search fails or when you need something from a non-Mathlib dependency.

Module structure: If you need facts about prime numbers, look in Mathlib.Data.Nat.Prime. If you need topology lemmas, start in Mathlib.Topology. The Mathematics in Mathlib overview provides a map of what has been formalized and where. When all else fails, grep the source code like everyone else does.

Importing Mathlib

Most projects import Mathlib wholesale:

import Mathlib

This works, but your compile times will make you reconsider your life choices. For faster iteration during development, import only what you need:

-- Import specific modules for faster compilation
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Real.Basic
import Mathlib.Algebra.Group.Basic
import Mathlib.Tactic

The Mathlib documentation lists all available modules. When your proof needs a specific lemma, check which module provides it and add that import. Or just import everything and go make coffee while it compiles.

Working with Primes

Number theory in Mathlib is surprisingly pleasant. The basics are all there, and the proofs often look like what you would write on paper if paper could check your work:

-- Working with prime numbers from Mathlib
example : Nat.Prime 17 := by decide

example : ¬ Nat.Prime 15 := by decide

-- Every number > 1 has a prime factor
example (n : Nat) (h : n > 1) : ∃ p, Nat.Prime p ∧ p ∣ n := by
  have hn : n ≠ 1 := by omega
  exact Nat.exists_prime_and_dvd hn

-- Infinitely many primes: for any n, there's a prime ≥ n
example (n : Nat) : ∃ p, Nat.Prime p ∧ p ≥ n := by
  obtain ⟨p, hn, hp⟩ := Nat.exists_infinite_primes n
  exact ⟨p, hp, hn⟩

Algebraic Structures

Type classes do the heavy lifting here. Declare that your type is a group, and you get inverses, identity laws, and associativity for free. Declare it is a ring, and multiplication distributes over addition without you lifting a finger:

-- Using algebraic structures

-- Groups: every element has an inverse
example {G : Type*} [Group G] (a : G) : a * a⁻¹ = 1 :=
  mul_inv_cancel a

-- Rings: distributivity comes for free
example {R : Type*} [Ring R] (a b c : R) : a * (b + c) = a * b + a * c :=
  mul_add a b c

-- Commutativity in commutative rings
example {R : Type*} [CommRing R] (a b : R) : a * b = b * a :=
  mul_comm a b

Real Numbers

The reals are constructed as equivalence classes of Cauchy sequences, which is mathematically clean but occasionally leaks through the abstraction when you least expect it. Most of the time you can pretend they are just numbers:

-- Real number analysis

-- Reals are a field
example (x : ℝ) (h : x ≠ 0) : x * x⁻¹ = 1 :=
  mul_inv_cancel₀ h

-- Basic inequalities
example (x y : ℝ) (hx : 0 < x) (hy : 0 < y) : 0 < x + y :=
  add_pos hx hy

-- The triangle inequality
example (x y : ℝ) : |x + y| ≤ |x| + |y| :=
  abs_add_le x y

Mathlib Tactics

Mathlib ships tactics that know more mathematics than most undergraduates. ring closes polynomial identities. linarith handles linear arithmetic over ordered rings. positivity proves things are positive. These are not magic; they are carefully engineered decision procedures. But from the outside, they look like magic:

-- Mathlib tactics in action

-- ring solves polynomial identities
example (x y : ℤ) : (x - y) * (x + y) = x^2 - y^2 := by ring

-- linarith handles linear arithmetic
example (x y z : ℚ) (h1 : x < y) (h2 : y < z) : x < z := by linarith

-- field_simp clears denominators
example (x : ℚ) (h : x ≠ 0) : (1 / x) * x = 1 := by field_simp

-- positivity proves positivity goals
example (x : ℝ) : 0 ≤ x^2 := by positivity

-- gcongr for monotonic reasoning
example (a b c d : ℕ) (h1 : a ≤ b) (h2 : c ≤ d) : a + c ≤ b + d := by gcongr

Using Search Tools

When stuck, let the computer do the searching. exact? trawls through Mathlib looking for a lemma that exactly matches your goal. apply? finds lemmas whose conclusion fits. These tactics are slow, but they beat staring at the screen trying to remember if the lemma is called add_comm or comm_add:

-- Finding lemmas with exact? and apply?

-- When stuck, use exact? to find matching lemmas
example (n : Nat) : n + 0 = n := by
  exact Nat.add_zero n  -- exact? would suggest this

-- apply? finds lemmas whose conclusion matches goal
example (a b : Nat) (h : a ∣ b) (h2 : b ∣ a) : a = b := by
  exact Nat.dvd_antisymm h h2  -- apply? would find this

The Fundamental Theorem of Calculus

The Fundamental Theorem of Calculus in Mathlib is due to Yury Kudryashov and Benjamin Davidson. It comes in two parts, as you might remember from analysis.

The first part says that integrating then differentiating recovers the original function. If $f$ is integrable and tends to $c$ at $b$, then the function $u \mapsto \int_a^u f(x),dx$ has derivative $c$ at $b$:

theorem integral_hasStrictDerivAt_of_tendsto_ae_right
    (hf : IntervalIntegrable f volume a b)
    (hmeas : StronglyMeasurableAtFilter f (nhds b) volume)
    (hb : Tendsto f (nhds b ⊓ ae volume) (nhds c)) :
    HasStrictDerivAt (fun u => ∫ x in a..u, f x) c b

The second part says that differentiating then integrating recovers the original function up to boundary values. If $f$ is continuous on $[a,b]$ and differentiable on $(a,b)$ with integrable derivative, then:

$$\int_a^b f’(x) , dx = f(b) - f(a)$$

theorem integral_eq_sub_of_hasDeriv_right_of_le
    (hab : a ≤ b)
    (hcont : ContinuousOn f (Icc a b))
    (hderiv : ∀ x ∈ Ioo a b, HasDerivWithinAt f (f' x) (Ioi x) x)
    (hint : IntervalIntegrable f' volume a b) :
    ∫ x in a..b, f' x = f b - f a

The hypotheses handle the edge cases your calculus teacher glossed over: continuity on the closed interval, differentiability on the open interior, integrability of the derivative. Centuries of refinement, machine-checked.

Resources

Moogle - Search Mathlib using natural language queries like “triangle inequality for norms”.
Loogle - Search by type signature when you know the shape of the lemma you need.
LeanSearch - Semantic search over Mathlib with a focus on finding relevant theorems.
LeanExplore - Semantic search that also indexes metaprogramming and exposes an MCP interface for AI agents.
Lean Finder - Understands proof states and theorem statements, not just keywords.
Mathlib4 Docs - Auto-generated API reference for every declaration in Mathlib.
Mathematics in Mathlib - Map of formalized mathematics organized by mathematical area.
100 Theorems - Tracks which of 100 major theorems have been formalized in Lean.
Zulip Chat - Ask questions in the “Is there code for X?” stream when search engines fail.
GitHub - Source code, issue tracker, and contribution guidelines.

Software Verification

The promise of theorem provers extends beyond mathematics. We can verify that software does what we claim it does. This article explores three approaches to software verification: an intrinsically-typed interpreter where type safety is baked into the structure, Conway’s Game of Life where we prove that gliders glide and blinkers blink, and finally verification-guided development where we transcribe Rust code into Lean and prove properties that transfer back to the production implementation.

Intrinsically-Typed Interpreters

The standard approach to building interpreters involves two phases. First, parse text into an untyped abstract syntax tree. Second, run a type checker that rejects malformed programs. This works, but the interpreter must still handle the case where a program passes the type checker but evaluates to nonsense. The runtime carries the burden of the type system’s failure modes. It is like a bouncer who checks IDs at the door but still has to deal with troublemakers inside.

Intrinsically-typed interpreters refuse to play this game. The abstract syntax tree itself encodes typing judgments. An ill-typed program cannot be constructed. The type system statically excludes runtime type errors, not by checking them at runtime, but by making them unrepresentable. The bouncer is replaced by architecture: there is no door for troublemakers to enter.

Consider a small expression language with natural numbers, booleans, arithmetic, and conditionals. We start by defining the types our language supports and a denotation function that maps them to Lean types.

inductive Ty where
  | nat  : Ty
  | bool : Ty
  deriving Repr, DecidableEq

@[reducible] def Ty.denote : Ty → Type
  | .nat  => Nat
  | .bool => Bool

The denote function is key. It interprets our object-level types (Ty) as meta-level types (Type). When our expression language says something has type nat, we mean it evaluates to a Lean Nat. When it says bool, we mean a Lean Bool. This type-level interpretation function is what makes the entire approach work.

Expressions

The expression type indexes over the result type. Each constructor precisely constrains which expressions can be built and what types they produce.

inductive Expr : Ty → Type where
  | nat  : Nat → Expr .nat
  | bool : Bool → Expr .bool
  | add  : Expr .nat → Expr .nat → Expr .nat
  | mul  : Expr .nat → Expr .nat → Expr .nat
  | lt   : Expr .nat → Expr .nat → Expr .bool
  | eq   : Expr .nat → Expr .nat → Expr .bool
  | and  : Expr .bool → Expr .bool → Expr .bool
  | or   : Expr .bool → Expr .bool → Expr .bool
  | not  : Expr .bool → Expr .bool
  | ite  : {t : Ty} → Expr .bool → Expr t → Expr t → Expr t

Every constructor documents its typing rule. The add constructor requires both arguments to be natural number expressions and produces a natural number expression. The ite constructor requires a boolean condition and two branches of matching type.

This encoding makes ill-typed expressions unrepresentable. You cannot write add (nat 1) (bool true) because the types do not align. The Lean type checker rejects such expressions before they exist.

/-
def bad : Expr .nat := .add (.nat 1) (.bool true)
-- Error: type mismatch
-/

Evaluation

The evaluator maps expressions to their denotations. Because expressions are intrinsically typed, the evaluator is total. It never fails, never throws exceptions, never encounters impossible cases. Every pattern match is exhaustive.

def Expr.eval : {t : Ty} → Expr t → t.denote
  | _, .nat n      => n
  | _, .bool b     => b
  | _, .add e₁ e₂  => e₁.eval + e₂.eval
  | _, .mul e₁ e₂  => e₁.eval * e₂.eval
  | _, .lt e₁ e₂   => e₁.eval < e₂.eval
  | _, .eq e₁ e₂   => e₁.eval == e₂.eval
  | _, .and e₁ e₂  => e₁.eval && e₂.eval
  | _, .or e₁ e₂   => e₁.eval || e₂.eval
  | _, .not e      => !e.eval
  | _, .ite c t e  => if c.eval then t.eval else e.eval

The return type t.denote varies with the expression’s type index. A natural number expression evaluates to Nat. A boolean expression evaluates to Bool. This dependent return type is what makes the evaluator type-safe by construction.

def ex1 : Expr .nat := .add (.nat 2) (.nat 3)
def ex2 : Expr .bool := .lt (.nat 2) (.nat 3)
def ex3 : Expr .nat := .ite (.lt (.nat 2) (.nat 3)) (.nat 10) (.nat 20)
def ex4 : Expr .nat := .mul (.add (.nat 2) (.nat 3)) (.nat 4)

#eval ex1.eval  -- 5
#eval ex2.eval  -- true
#eval ex3.eval  -- 10
#eval ex4.eval  -- 20

Verified Optimization

Interpreters become interesting when we transform programs. Compilers do this constantly: dead code elimination, loop unrolling, strength reduction. Each transformation promises to preserve meaning while improving performance. But how do we know the promise is kept? A constant folder simplifies expressions by evaluating constant subexpressions at compile time. Adding two literal numbers produces a literal. Conditionals with constant conditions eliminate the untaken branch.

def Expr.constFold : {t : Ty} → Expr t → Expr t
  | _, .nat n => .nat n
  | _, .bool b => .bool b
  | _, .add e₁ e₂ =>
    match e₁.constFold, e₂.constFold with
    | .nat n, .nat m => .nat (n + m)
    | e₁', e₂' => .add e₁' e₂'
  | _, .mul e₁ e₂ =>
    match e₁.constFold, e₂.constFold with
    | .nat n, .nat m => .nat (n * m)
    | e₁', e₂' => .mul e₁' e₂'
  | _, .lt e₁ e₂ => .lt e₁.constFold e₂.constFold
  | _, .eq e₁ e₂ => .eq e₁.constFold e₂.constFold
  | _, .and e₁ e₂ => .and e₁.constFold e₂.constFold
  | _, .or e₁ e₂ => .or e₁.constFold e₂.constFold
  | _, .not e => .not e.constFold
  | _, .ite c t e =>
    match c.constFold with
    | .bool true => t.constFold
    | .bool false => e.constFold
    | c' => .ite c' t.constFold e.constFold

The optimization preserves types. If e : Expr t, then e.constFold : Expr t. The type indices flow through unchanged. The type system enforces this statically.

But type preservation is a weak property. We want semantic preservation: the optimized program computes the same result as the original. This requires a proof.

theorem constFold_correct : ∀ {t : Ty} (e : Expr t), e.constFold.eval = e.eval := by
  intro t e
  induction e with
  | nat n => rfl
  | bool b => rfl
  | add e₁ e₂ ih₁ ih₂ =>
    simp only [Expr.constFold, Expr.eval]
    cases he₁ : e₁.constFold <;> cases he₂ : e₂.constFold <;>
      simp only [Expr.eval, ← ih₁, ← ih₂, he₁, he₂]
  | mul e₁ e₂ ih₁ ih₂ =>
    simp only [Expr.constFold, Expr.eval]
    cases he₁ : e₁.constFold <;> cases he₂ : e₂.constFold <;>
      simp only [Expr.eval, ← ih₁, ← ih₂, he₁, he₂]
  | lt e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | eq e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | and e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | or e₁ e₂ ih₁ ih₂ => simp only [Expr.constFold, Expr.eval, ih₁, ih₂]
  | not e ih => simp only [Expr.constFold, Expr.eval, ih]
  | ite c t e ihc iht ihe =>
    simp only [Expr.constFold, Expr.eval]
    cases hc : c.constFold <;> simp only [Expr.eval, ← ihc, ← iht, ← ihe, hc]
    case bool b => cases b <;> rfl

The theorem states that for any expression, evaluating the constant-folded expression yields the same result as evaluating the original. The proof proceeds by structural induction on the expression. Most cases follow directly from the induction hypotheses.

A Verified Compiler

The intrinsically-typed interpreter demonstrates type safety. But real systems compile to lower-level representations. Can we verify the compiler itself? The answer is yes, and it requires remarkably little code. In roughly 40 lines, we can define a source language, a target language, compilation, and prove the compiler correct. This is CompCert in miniature.

The source language is arithmetic expressions: literals, addition, and multiplication. The target language is a stack machine with push, add, and multiply instructions. The compilation strategy is straightforward: literals become pushes, binary operations compile their arguments and then emit the operator.

-- Source: arithmetic expressions
inductive Expr where
  | lit : Nat → Expr
  | add : Expr → Expr → Expr
  | mul : Expr → Expr → Expr
deriving Repr

-- Target: stack machine instructions
inductive Instr where
  | push : Nat → Instr
  | add : Instr
  | mul : Instr
deriving Repr

-- What an expression means: evaluate to a number
def eval : Expr → Nat
  | .lit n => n
  | .add a b => eval a + eval b
  | .mul a b => eval a * eval b

-- Compile expression to stack code
def compile : Expr → List Instr
  | .lit n => [.push n]
  | .add a b => compile a ++ compile b ++ [.add]
  | .mul a b => compile a ++ compile b ++ [.mul]

-- Execute stack code
def run : List Instr → List Nat → List Nat
  | [], stack => stack
  | .push n :: is, stack => run is (n :: stack)
  | .add :: is, b :: a :: stack => run is ((a + b) :: stack)
  | .mul :: is, b :: a :: stack => run is ((a * b) :: stack)
  | _ :: is, stack => run is stack

-- Lemma: execution distributes over concatenation
theorem run_append (is js : List Instr) (s : List Nat) :
    run (is ++ js) s = run js (run is s) := by
  induction is generalizing s with
  | nil => rfl
  | cons i is ih =>
    cases i with
    | push n => exact ih _
    | add => cases s with
      | nil => exact ih _
      | cons b s => cases s with
        | nil => exact ih _
        | cons a s => exact ih _
    | mul => cases s with
      | nil => exact ih _
      | cons b s => cases s with
        | nil => exact ih _
        | cons a s => exact ih _

-- THE THEOREM: the compiler is correct
-- Running compiled code pushes exactly the evaluated result
theorem compile_correct (e : Expr) (s : List Nat) :
    run (compile e) s = eval e :: s := by
  induction e generalizing s with
  | lit n => rfl
  | add a b iha ihb => simp [compile, run_append, iha, ihb, run, eval]
  | mul a b iha ihb => simp [compile, run_append, iha, ihb, run, eval]

The key insight is the run_append lemma: executing concatenated instruction sequences is equivalent to executing them in order. This lets us prove correctness compositionally. The main theorem, compile_correct, states that running compiled code pushes exactly the evaluated result onto the stack.

The proof proceeds by structural induction on expressions. Literal compilation is trivially correct. For binary operations, we use run_append to split the execution: first we run the compiled left argument, then the compiled right argument, then the operator. The induction hypotheses tell us each subexpression evaluates correctly. The operator instruction combines them as expected.

-- Try it: (2 + 3) * 4
def expr : Expr := .mul (.add (.lit 2) (.lit 3)) (.lit 4)

#eval eval expr                  -- 20
#eval compile expr               -- [push 2, push 3, add, push 4, mul]
#eval run (compile expr) []      -- [20]

-- The theorem guarantees these always match. Not by testing. By proof.

This is verified compiler technology at its most distilled. The same principles scale to CompCert, which verifies a production C compiler. The gap between 40 lines and 100,000 lines is mostly the complexity of real languages and optimizations, not the verification methodology.

Proof-Carrying Parsers

The intrinsically-typed interpreter guarantees type safety. The verified compiler guarantees semantic preservation. But what about parsers? A parser takes untrusted input and produces structured data. The traditional approach is to hope the parser is correct and test extensively. The verified approach is to make the parser carry its own proof of correctness.

A proof-carrying parser returns not just the parsed result but also evidence that the result matches the grammar. Invalid parses are not runtime errors; they are type errors. The proof is constructed during parsing and verified by the type checker.

We define a grammar as an inductive type with constructors for characters, sequencing, alternation, repetition, and the empty string:

inductive Grammar where
  | char : Char → Grammar
  | seq : Grammar → Grammar → Grammar
  | alt : Grammar → Grammar → Grammar
  | many : Grammar → Grammar
  | eps : Grammar

inductive Matches : Grammar → List Char → Prop where
  | char {c} : Matches (.char c) [c]
  | eps : Matches .eps []
  | seq {g₁ g₂ s₁ s₂} : Matches g₁ s₁ → Matches g₂ s₂ → Matches (.seq g₁ g₂) (s₁ ++ s₂)
  | altL {g₁ g₂ s} : Matches g₁ s → Matches (.alt g₁ g₂) s
  | altR {g₁ g₂ s} : Matches g₂ s → Matches (.alt g₁ g₂) s
  | manyNil {g} : Matches (.many g) []
  | manyCons {g s₁ s₂} : Matches g s₁ → Matches (.many g) s₂ → Matches (.many g) (s₁ ++ s₂)

The Matches relation defines when a string matches a grammar. Each constructor corresponds to a grammar production: a character matches itself, sequences match concatenations, alternatives match either branch, and repetition matches zero or more occurrences.

A parse result bundles the consumed input, remaining input, and a proof that the consumed portion matches the grammar:

structure ParseResult (g : Grammar) where
  consumed : List Char
  rest : List Char
  proof : Matches g consumed

abbrev Parser (g : Grammar) := List Char → Option (ParseResult g)

The parser combinators construct these proof terms as they parse. When pchar 'a' succeeds, it returns a ParseResult containing proof that 'a' matches Grammar.char 'a'. When pseq combines two parsers, it combines their proofs using the Matches.seq constructor:

def pchar (c : Char) : Parser (.char c) := fun
  | x :: xs => if h : x = c then some ⟨[c], xs, h ▸ .char⟩ else none
  | [] => none

variable {g₁ g₂ g : Grammar}

def pseq (p₁ : Parser g₁) (p₂ : Parser g₂) : Parser (.seq g₁ g₂) := fun input =>
  p₁ input |>.bind fun ⟨s₁, r, pf₁⟩ => p₂ r |>.map fun ⟨s₂, r', pf₂⟩ => ⟨s₁ ++ s₂, r', .seq pf₁ pf₂⟩

def palt (p₁ : Parser g₁) (p₂ : Parser g₂) : Parser (.alt g₁ g₂) := fun input =>
  (p₁ input |>.map fun ⟨s, r, pf⟩ => ⟨s, r, .altL pf⟩) <|>
  (p₂ input |>.map fun ⟨s, r, pf⟩ => ⟨s, r, .altR pf⟩)

partial def pmany (p : Parser g) : Parser (.many g) := fun input =>
  match p input with
  | none => some ⟨[], input, .manyNil⟩
  | some ⟨s₁, r, pf₁⟩ =>
    if s₁.isEmpty then some ⟨[], input, .manyNil⟩
    else (pmany p r).map fun ⟨s₂, r', pf₂⟩ => ⟨s₁ ++ s₂, r', .manyCons pf₁ pf₂⟩

infixl:60 " *> " => pseq
infixl:50 " <+> " => palt
postfix:90 "⁺" => pmany

Soundness is trivial. Every successful parse carries its proof:

theorem soundness (p : Parser g) (input : List Char) (r : ParseResult g) :
    p input = some r → Matches g r.consumed := fun _ => r.proof

The theorem says: if a parser returns a result, then the consumed input matches the grammar. The proof is the identity function, because the evidence is already in the result. This is the power of proof-carrying data: correctness is not something we prove after the fact but something we construct alongside the computation.

Conway’s Game of Life

Before we tackle the challenge of connecting proofs to production code, let us take a detour through cellular automata. Conway’s Game of Life is a zero-player game that evolves on an infinite grid. Each cell is either alive or dead. At each step, cells follow simple rules based on the eight neighbors surrounding each cell:

Cell neighbors — *Each cell has eight neighbors.*

The rules are simple. A live cell with two or three neighbors survives. A dead cell with exactly three neighbors becomes alive. Everything else dies. From these rules emerges startling complexity: oscillators, spaceships, and patterns that compute arbitrary functions.

The Game of Life is an excellent verification target because we can prove properties about specific patterns without worrying about the infinite grid. The challenge is that the true Game of Life lives on an unbounded plane, which we cannot represent directly. We need a finite approximation that preserves the local dynamics.

The standard solution is a toroidal grid. Imagine taking a rectangular grid and gluing the top edge to the bottom edge, forming a cylinder. Then glue the left edge to the right edge, forming a torus. Geometrically, this is the surface of a donut. A cell at the right edge has its eastern neighbor on the left edge. A cell at the top has its northern neighbor at the bottom. Every cell has exactly eight neighbors, with no special boundary cases.

This topology matters for verification. On a bounded grid with walls, edge cells would have fewer neighbors, changing their evolution rules. We would need separate logic for corners, edges, and interior cells. The toroidal topology eliminates this complexity: the neighbor-counting function is uniform across all cells. More importantly, patterns that fit within the grid and do not interact with their wrapped-around selves behave exactly as they would on the infinite plane. A 5x5 blinker on a 10x10 torus evolves identically to a blinker on the infinite grid, because the pattern never grows large enough to meet itself coming around the other side.

abbrev Grid := Array (Array Bool)

def Grid.mk (n m : Nat) (f : Fin n → Fin m → Bool) : Grid :=
  Array.ofFn fun i => Array.ofFn fun j => f i j

def Grid.get (g : Grid) (i j : Nat) : Bool :=
  if h₁ : i < g.size then
    let row := g[i]
    if h₂ : j < row.size then row[j] else false
  else false

def Grid.dead (n m : Nat) : Grid :=
  Array.replicate n (Array.replicate m false)

def Grid.rows (g : Grid) : Nat := g.size
def Grid.cols (g : Grid) : Nat := if h : 0 < g.size then g[0].size else 0

The grid representation uses arrays of arrays, with accessor functions that handle boundary conditions. The countNeighbors function implements toroidal wrapping by computing indices modulo the grid dimensions.

def Grid.countNeighbors (g : Grid) (i j : Nat) : Nat :=
  let n := g.rows
  let m := g.cols
  let deltas : List (Int × Int) :=
    [(-1, -1), (-1, 0), (-1, 1),
     (0, -1),           (0, 1),
     (1, -1),  (1, 0),  (1, 1)]
  deltas.foldl (fun acc (di, dj) =>
    let ni := (((i : Int) + di + n) % n).toNat
    let nj := (((j : Int) + dj + m) % m).toNat
    if g.get ni nj then acc + 1 else acc) 0

The step function applies Conway’s rules to every cell. The pattern matching encodes the survival conditions directly: a live cell survives with 2 or 3 neighbors, a dead cell is born with exactly 3 neighbors.

def Grid.step (g : Grid) : Grid :=
  let n := g.rows
  let m := g.cols
  Array.ofFn fun (i : Fin n) => Array.ofFn fun (j : Fin m) =>
    let neighbors := g.countNeighbors i.val j.val
    let alive := g.get i.val j.val
    match alive, neighbors with
    | true, 2 => true
    | true, 3 => true
    | false, 3 => true
    | _, _ => false

def Grid.stepN (g : Grid) : Nat → Grid
  | 0 => g
  | k + 1 => (g.step).stepN k

Now for the fun part. We can define famous patterns and prove properties about them.

The blinker is a period-2 oscillator: three cells in a row that flip between horizontal and vertical orientations, then back again.

Blinker oscillation — *The blinker oscillates between vertical and horizontal orientations.*

The block is a 2x2 square that never changes. Each live cell has exactly three neighbors, so all survive. No dead cell has exactly three live neighbors, so none are born.

Block pattern — *The block is stable: it never changes.*

The glider is the star of our show. It is a spaceship: a pattern that translates across the grid. After four generations, the glider has moved one cell diagonally.

Glider evolution — *The glider translates diagonally after four generations.*

After generation 4, the pattern is identical to generation 0, but shifted one cell down and one cell right. The glider crawls across the grid forever.

-- John Conway (1937-2020) invented this cellular automaton in 1970.
def blinker : Grid := Grid.mk 5 5 fun i j =>
  (i.val, j.val) ∈ [(1, 2), (2, 2), (3, 2)]

def blinkerPhase2 : Grid := Grid.mk 5 5 fun i j =>
  (i.val, j.val) ∈ [(2, 1), (2, 2), (2, 3)]

def glider : Grid := Grid.mk 6 6 fun i j =>
  (i.val, j.val) ∈ [(0, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

def gliderTranslated : Grid := Grid.mk 6 6 fun i j =>
  (i.val, j.val) ∈ [(1, 2), (2, 3), (3, 1), (3, 2), (3, 3)]

def block : Grid := Grid.mk 4 4 fun i j =>
  (i.val, j.val) ∈ [(1, 1), (1, 2), (2, 1), (2, 2)]

Here is where theorem proving earns its keep. We can prove that the blinker oscillates with period 2, that the block is stable, and that the glider translates after exactly four generations.

theorem blinker_oscillates : blinker.step = blinkerPhase2 := by native_decide

theorem blinker_period_2 : blinker.stepN 2 = blinker := by native_decide

theorem glider_translates : glider.stepN 4 = gliderTranslated := by native_decide

theorem block_is_stable : block.step = block := by native_decide

The native_decide tactic does exhaustive computation. Lean evaluates the grid evolution and confirms the equality. The proof covers every cell in the grid across the specified number of generations.

Think about what we have accomplished. We have formally verified that a glider translates diagonally after four steps. Every cellular automaton enthusiast knows this empirically, having watched countless gliders march across their screens. But we have proven it. The glider must translate. It is not a bug that the pattern moves; it is a theorem. (Readers of Greg Egan’s Permutation City may appreciate that we are now proving theorems about the computational substrate in which his characters would live.)

We can also verify that the blinker conserves population, and observe that the glider does too:

def Grid.population (g : Grid) : Nat :=
  g.foldl (fun acc row => row.foldl (fun acc cell => if cell then acc + 1 else acc) acc) 0

#eval blinker.population
#eval blinker.step.population
#eval glider.population
#eval glider.step.population

For visualization, we can print the grids:

def Grid.toString (g : Grid) : String :=
  String.intercalate "\n" <|
    g.toList.map fun row =>
      String.mk <| row.toList.map fun cell =>
        if cell then '#' else '.'

#eval IO.println blinker.toString
#eval IO.println blinker.step.toString
#eval IO.println glider.toString
#eval IO.println (glider.stepN 4).toString

The Verification Gap

Here is the sobering reality check. We have a beautiful proof that gliders translate. The Lean model captures Conway’s rules precisely. The theorems are watertight. And yet, if someone writes a Game of Life implementation in Rust, our proofs say nothing about it.

The Rust implementation in examples/game-of-life/ implements the same rules. It has the same step function, the same neighbor counting, the same pattern definitions. Run it and you will see blinkers blink and gliders glide. But the Lean proofs do not transfer automatically. The Rust code might have off-by-one errors in the wrap-around logic. It might use different integer semantics. It might have subtle bugs in edge cases that our finite grid proofs never exercise.

This is the central problem of software verification. Writing proofs about mathematical models is satisfying but insufficient. Real software runs on real hardware with real bugs. The gap matters most where the stakes are highest: matching engines that execute trades, auction mechanisms that allocate resources, systems where a subtle bug can cascade into market-wide failures. How do we bridge the gap between a verified model and a production implementation?

Verification-Guided Development

The interpreter example shows verification within Lean. But real systems are written in languages like Rust, C, or Go. How do we connect a verified specification to a production implementation?

The answer comes from verification-guided development. The approach has three components. First, write the production implementation in your target language. Second, transcribe the core logic into Lean as a pure functional program. Third, prove properties about the Lean model; the proofs transfer to the production code because the transcription is exact. This technique was developed by AWS for their Cedar policy language, and it applies wherever a functional core can be isolated from imperative scaffolding.

The transcription must be faithful. Every control flow decision in the Rust code must have a corresponding decision in the Lean model. Loops become recursion. Mutable state becomes accumulator parameters. Early returns become validity flags. When the transcription is exact, we can claim that the Lean proofs apply to the Rust implementation.

To verify this correspondence, both systems produce execution traces. A trace records the state after each operation. If the Rust implementation and the Lean model produce identical traces on all inputs, the proof transfers. For finite input spaces, we can verify this exhaustively. For infinite spaces, we use differential testing to gain confidence.

The Stack Machine

We demonstrate verification with a stack machine, the fruit fly of computer science. Like the fruit fly in genetics, stack machines are simple enough to study exhaustively yet complex enough to exhibit interesting behavior. The machine has five operations: push a value, pop the top, add the top two values, multiply them, or duplicate the top.

inductive Op where
  | push : Int → Op
  | pop : Op
  | add : Op
  | mul : Op
  | dup : Op
  deriving Repr, DecidableEq

The run function executes a program against a stack:

def run : List Op → List Int → List Int
  | [], stack => stack
  | .push n :: ops, stack => run ops (n :: stack)
  | .pop :: ops, _ :: stack => run ops stack
  | .pop :: ops, [] => run ops []
  | .add :: ops, b :: a :: stack => run ops ((a + b) :: stack)
  | .add :: ops, stack => run ops stack
  | .mul :: ops, b :: a :: stack => run ops ((a * b) :: stack)
  | .mul :: ops, stack => run ops stack
  | .dup :: ops, x :: stack => run ops (x :: x :: stack)
  | .dup :: ops, [] => run ops []

#eval run [.push 3, .push 4, .add] []
#eval run [.push 3, .push 4, .mul] []
#eval run [.push 5, .dup, .mul] []
#eval run [.push 10, .push 3, .pop] []

Universal Properties

The power of theorem proving lies not in verifying specific programs but in proving properties about all programs. Consider the composition theorem: running two programs in sequence equals running their concatenation.

theorem run_append (p1 p2 : List Op) (s : List Int) :
    run (p1 ++ p2) s = run p2 (run p1 s) := by
  induction p1 generalizing s with
  | nil => rfl
  | cons op ops ih =>
    cases op with
    | push n => exact ih _
    | pop => cases s <;> exact ih _
    | add => match s with
      | [] | [_] => exact ih _
      | _ :: _ :: _ => exact ih _
    | mul => match s with
      | [] | [_] => exact ih _
      | _ :: _ :: _ => exact ih _
    | dup => cases s <;> exact ih _

This theorem quantifies over all programs p1 and p2 and all initial stacks s. The proof proceeds by induction on the first program, with case analysis on each operation and the stack state. The result is a guarantee that holds for the infinite space of all possible programs.

Stack Effects

Each operation has a predictable effect on stack depth. Push and dup add one element; pop, add, and mul remove one (add and mul consume two and produce one). We can compute the total effect of a program statically:

def effect : Op → Int
  | .push _ => 1
  | .pop => -1
  | .add => -1
  | .mul => -1
  | .dup => 1

def totalEffect : List Op → Int
  | [] => 0
  | op :: ops => effect op + totalEffect ops

theorem effect_append (p1 p2 : List Op) :
    totalEffect (p1 ++ p2) = totalEffect p1 + totalEffect p2 := by
  induction p1 with
  | nil => simp [totalEffect]
  | cons op ops ih => simp [totalEffect, ih]; ring

The effect_append theorem proves that stack effects compose additively. If program p1 changes the stack depth by n and p2 changes it by m, then p1 ++ p2 changes it by n + m. This is another universal property, holding for all programs.

Program Equivalence

We can also prove that certain program transformations preserve semantics. Addition and multiplication are commutative, so swapping the order of pushes does not change the result:

theorem add_comm (n m : Int) (rest : List Op) (s : List Int) :
    run (.push n :: .push m :: .add :: rest) s =
    run (.push m :: .push n :: .add :: rest) s := by
  simp [run, Int.add_comm]

theorem mul_comm (n m : Int) (rest : List Op) (s : List Int) :
    run (.push n :: .push m :: .mul :: rest) s =
    run (.push m :: .push n :: .mul :: rest) s := by
  simp [run, Int.mul_comm]

theorem dup_add_eq_double (n : Int) (rest : List Op) (s : List Int) :
    run (.push n :: .dup :: .add :: rest) s =
    run (.push (n + n) :: rest) s := by
  simp [run]

theorem dup_mul_eq_square (n : Int) (rest : List Op) (s : List Int) :
    run (.push n :: .dup :: .mul :: rest) s =
    run (.push (n * n) :: rest) s := by
  simp [run]

These theorems justify program transformations. An optimizer that reorders pushes before adds is provably correct. The dup_add_eq_double and dup_mul_eq_square theorems show that push n; dup; add computes 2n and push n; dup; mul computes n². A compiler could use these equivalences for strength reduction.

What We Proved

The stack machine demonstrates verification of universal properties. We proved:

Composition: Running concatenated programs equals sequential execution
Effect additivity: Stack effects compose predictably
Commutativity: Push order does not affect addition or multiplication
Equivalences: Certain instruction sequences compute the same result

These are not tests of specific programs. They are theorems about the entire space of programs. The composition theorem alone covers infinitely many cases that no test suite could enumerate. This is the difference between testing and proving: tests sample behavior, proofs characterize it completely.

Closing Thoughts

Why do we prove properties rather than test for them? Rice’s Classes of Recursively Enumerable Sets and Their Decision Problems provides the fundamental answer: every non-trivial semantic property of programs is undecidable. You cannot write a program that decides whether other programs halt, are correct, never access null, or satisfy any interesting behavioral property. The proof reduces from the halting problem. Verification escapes this limitation by requiring human-provided proofs that the compiler can check, rather than trying to infer properties automatically.

Finding good verification examples is hard. The system must be small enough to specify cleanly, complex enough to have non-trivial properties, and simple enough that proofs are tractable. Too simple and the exercise is pointless; too complex and the proofs become intractable. The stack machine threads this needle. Six operations, a validity flag, and stack depth create enough complexity for interesting proofs without overwhelming the verification machinery.

The native_decide tactic makes finite verification automatic. For any decidable property on a finite domain, Lean evaluates both sides and confirms equality. This is proof by exhaustive computation, not sampling. The limitation is that it only works for concrete inputs. Universal statements over infinite domains require structural induction.

The key insight is that we do not verify the Rust code directly. Rust’s ownership system, borrow checker, and imperative features make direct verification impractical. Instead, we carve out the functional core, transcribe it to Lean, prove properties there, and transfer the proofs back through trace equivalence. The verification gap closes not through tool support, but through disciplined transcription and differential testing.

The techniques scale far beyond toy examples. Financial systems are a particularly compelling domain: matching engines, auction mechanisms, and clearing systems where bugs can trigger flash crashes or expose participants to unbounded losses. Expressive bidding, where market orders encode complex preferences over combinations of instruments, requires solvers for NP-hard optimization problems with precise economic properties. These mechanisms must satisfy strategy-proofness and efficiency; the theorems exist in papers, and the implementations exist in production. Verification-guided development bridges them. Somewhere, proven-correct code is already running markets.

Artificial Intelligence

In 2024, a computer solved one of the hardest problems at the International Mathematical Olympiad with a formally verified proof. In 2025, another hit gold-medal standard. The proofs were checked down to the axioms. No trust required. The interesting part is not that machines beat humans at competition math. The interesting part is that reason is becoming executable at scale, and that will change the world. How quickly is an open question.

Mathlib now contains 1.9 million lines of formalized mathematics spanning algebra, analysis, topology, and number theory. It grows by thousands of theorems monthly. No single person understands all of it, and no single person needs to. The theorem you formalize today may be imported by a researcher fifty years from now working on problems we cannot imagine. The proof will still check. Meanwhile, neural networks have learned to propose proof steps that formal systems verify. The model guesses, the kernel checks. DeepSeek-Prover and LeanDojo make this practical today. PhysLean is formalizing physics itself: Maxwell’s equations, quantum mechanics, field theory. The tooling has matured faster than most expected.

We should be honest about limits. Higher-order logic is undecidable. Church and Turing settled this in 1936. Formalization is expensive: the Polynomial Freiman-Ruzsa conjecture required 20,000 lines of Lean for a 50-page paper. Some domains resist entirely. Physics says “for large N” and expects you to understand. But within scope, something remarkable becomes possible: certainty. Not high confidence. Certainty. The proof typechecks or it does not.

This kind of work gets funded because it pushes the frontiers of human knowledge. Science foundations fund theorem provers because they see infrastructure for the future of mathematics. Trading firms fund them because they need systems that actually work. Both are right. Knight Capital lost $440 million in 45 minutes from a deployment bug. The code did exactly what it was written to do. It was simply written wrong. Formal methods addresses both failure modes: it verifies that the code matches the specification, and forces you to make the specification precise enough to verify. You cannot prove a theorem you do not understand. For firms whose existence depends on algorithm correctness, this discipline can be existential.

This is where it gets interesting, at least if you find the intersection of mechanism design, game theory, and formal methods interesting, which you should, because it is genuinely one of the most exciting open problems in theoretical computer science and also immediately practical. Markets are mathematical objects. Combinatorial auctions, where bidders express preferences over bundles of assets rather than individual items, turn resource allocation into constraint satisfaction problems. (See the open problem below for a concrete challenge in this domain.) This kind of problem reduces to weighted set packing: find the best non-overlapping selection from exponentially many candidates. NP-hard in general, but tractable instances exist, and the boundary between hard and easy is where the interesting mathematics lives. Proving properties about these systems, that they are incentive-compatible, that they converge, that they allocate efficiently under stated assumptions, is exactly the kind of hard problem where clever types and formal verification shines. Every improvement in market mechanism design, every formally verified property of an auction protocol, translates into real systems allocating real resources. Better reasoning about markets means systems that allocate capital more efficiently, and efficient allocation is the difference between prosperity and stagnation.

The functional programming shops of Wall Street, the quant firms in New York and London, the AI labs in China, all contribute to this ecosystem. DeepSeek’s open-source theorem provers emerged from it. The competition is global but the infrastructure is shared. A trading firm in New York open-sources a proof automation library; researchers in Beijing build on it. An AI lab in Hangzhou releases trained models; mathematicians in Paris use them. Private incentive aligns with public good. The tools developed for trading algorithms can verify medical devices. The techniques refined for financial models can prove properties of cryptographic protocols. And as AI infrastructure itself becomes tradeable, as markets emerge for GPU compute (The AI Boom Needs a Market for Compute), data center capacity, and model inference, the same auction theory applies. The resources that train the models are allocated by the mechanisms the models might one day verify. This is a brave new world of software and economics.

This is the future. AI agents will increasingly act in markets: trading, lending, allocating, optimizing. This is already happening. The question is not whether but how. An AI agent can be constrained by rules, but only if those rules are precise enough to check. Natural language policies are suggestions. Formally verified constraints are guarantees. Imagine market infrastructure where agents must prove, before executing, that their actions satisfy regulatory constraints, risk limits, fairness properties, and eventually machine-checkable proofs of Pareto efficiency of market mechanisms. This is a big, hairy, ambitious goal. Not “we reviewed the code” but “the system verified the proof.” The agent that cannot demonstrate compliance cannot act. Formal constraints are not a limitation on AI autonomy. They can be the tools that make AI autonomy safe.

We are building that infrastructure now, whether we recognize it or not. Every verified auction protocol, every theorem about market equilibria, becomes a potential constraint on future autonomous systems. AI agents in markets are not a hypothetical. They are here, and more are coming. The practical question is not whether to allow them but how to make them work well. Formal verification offers something concrete: constraints that actually constrain, rules that cannot be silently violated, guarantees that hold regardless of what the model learned. Skip the existential risk debates. The point is engineering systems that do what they are supposed to do.

If you are reading this as a student or someone early in your career: this stuff is fun. Watching a proof come together, seeing the goal state shrink to nothing, getting that green checkmark from the compiler when everything finally clicks. It is like solving puzzles, except the puzzles are deep and the solutions last. The theorems you formalize will still be valid when you are gone. That is a strange thing to be able to say about your work. The field is small enough that you can make real contributions and growing fast enough that there is plenty to do.

The work is hard. The learning curve is real. There will be days when the goal state mocks you and nothing seems to work. This is normal. The difficulty is not a bug; it is the cost of building things that matter. Scientists have always endured frustration because progress depends on it. The stoic response is not to complain but to continue: one lemma at a time, one proof at a time, one small piece of certainty added to the edifice. The edifice grows. It has been growing for centuries, and you can be part of it.

Jump in.

The Prover-Verifier Architecture

The breakthrough behind systems like DeepSeek-Prover is architectural, not just scale. The key insight: wrap a neural network in a recursive loop where a formal system acts as judge.

The pipeline works as follows. A large model decomposes a theorem into lemmas. A smaller specialized model attempts to prove each lemma in Lean. The Lean compiler checks the proof. If it fails, the model retries with the error message as feedback. If it passes, the lemma is proven with certainty. The system synthesizes proven lemmas into a complete proof.

This architecture solves the hallucination problem for formal domains. In standard RLHF, humans grade answers, which is noisy and expensive. In prover-verifier loops, the compiler provides a binary signal: the proof typechecks or it does not. This enables pure reinforcement learning. DeepSeek-R1-Zero learned to reason without any supervised fine-tuning, developing self-correction behaviors purely from trial and error against a verifier. The model discovered “aha moments” on its own.

The synthetic data flywheel accelerates this. DeepSeek generated millions of formal statements, verified them with Lean, and fed the correct proofs back into training. No human annotation required. The loop is self-reinforcing: better provers generate more training data, which trains better provers.

The search strategy matters. DeepSeek-Prover-V2 uses Monte Carlo Tree Search to explore proof paths, backtracking when a path fails. This is the same algorithmic family as AlphaGo, applied to theorem proving. Instead of evaluating board positions, the model evaluates partial proof states.

The broader hypothesis is inference-time scaling: rather than making models bigger (training-time compute), make them think longer (inference-time compute). Early results suggest that letting models reason for more tokens improves accuracy, at least on certain benchmarks. But the upper bound on this improvement remains an open question. Whether inference-time scaling continues to yield gains, or hits diminishing returns at some threshold, is something the next generation of models will determine empirically. Test-time search, parallel rollouts, and recursive self-correction all bet on this dynamic. The bet may pay off. It may not.

Open Problem: Honesty as Strategy

William Vickrey won the 1996 Nobel Prize in Economics for a deceptively simple idea. In an auction where the highest bidder wins but pays only the second-highest bid, lying about your value cannot help you. Truthful bidding is weakly dominant. This is not a conjecture or an empirical regularity. It is a theorem, provable from first principles.

abbrev Bid := Nat
abbrev Value := Nat

/-- A sealed-bid auction between two bidders -/
structure Auction where
  bid1 : Bid
  bid2 : Bid
  deriving DecidableEq, Repr

/-- Higher bid wins; ties favor bidder 1 -/
def winner (a : Auction) : Fin 2 :=
  if a.bid1 ≥ a.bid2 then 0 else 1

/-- Second-price rule: winner pays the losing bid -/
def payment (a : Auction) : Nat :=
  if a.bid1 ≥ a.bid2 then a.bid2 else a.bid1

The payoff structure captures the essence: you win if you outbid, but you pay what your opponent bid, not what you bid. Your bid determines whether you win. It does not determine what you pay.

/-- Payoff: value minus payment when winning, zero when losing -/
def payoff (value : Value) (a : Auction) : Int :=
  if winner a = 0 then (value : Int) - (payment a : Int) else 0

/-- Bid truthfully: declare your actual value -/
def truthful (value : Value) (otherBid : Bid) : Auction :=
  ⟨value, otherBid⟩

/-- Bid strategically: declare something else -/
def strategic (altBid : Bid) (otherBid : Bid) : Auction :=
  ⟨altBid, otherBid⟩

We can already prove that honesty never loses money:

/-- Truthful bidding never loses money -/
theorem truthful_nonneg (value : Value) (otherBid : Bid) :
    payoff value (truthful value otherBid) ≥ 0 := by
  unfold payoff truthful winner payment
  by_cases h : value ≥ otherBid
  · simp [h, Int.ofNat_le.mpr h]
  · simp [h]

The deeper theorem is that honesty is optimal. No strategic deviation improves your expected outcome:

/--
The weak dominance theorem: no strategic bid beats truthful bidding.

For any valuation, any alternative bid, and any opponent behavior,
telling the truth does at least as well as any lie.
-/
theorem weak_dominance (value : Value) (altBid : Bid) (otherBid : Bid) :
    payoff value (truthful value otherBid) ≥
    payoff value (strategic altBid otherBid) := by
  sorry

Fill in the sorry. The proof is case analysis. Overbidding makes you win auctions you should lose, paying more than the item is worth. Underbidding makes you lose auctions you should win, missing profitable trades. Truthful bidding threads the needle: you win exactly when winning is profitable.

This two-bidder result is a toy, but the insight scales. Combinatorial auctions let participants bid on bundles of assets, expressing preferences like “I want A and B together, or neither.” The optimization becomes NP-hard, but the incentive properties generalize. The VCG mechanism extends Vickrey’s insight to arbitrary allocation problems. Markets that allocate spectrum, landing slots, and financial instruments all descend from these ideas.

OneChronos builds this infrastructure for financial markets. We run combinatorial auctions that match complex orders across multiple securities simultaneously. The theorems matter because they guarantee properties that no amount of testing could verify: incentive compatibility, efficiency under stated assumptions, bounds on strategic manipulation. These are hard problems at the intersection of optimization, game theory, and formal methods. If that sounds interesting, we are hiring.

Modern Reasoning Models

Frontier models have become increasingly capable at writing Lean. As of December 2025, Gemini 3.5 Pro and Claude Opus 4.5 represent the state of the art for interactive theorem proving. Google reportedly has internal models that perform even better. Six months ago these models struggled with basic tactics; now they can complete non-trivial proofs with guidance. They are not yet autonomous mathematicians, but they are useful collaborators today.

The key to effective AI-assisted theorem proving is giving models access to the proof state. Without it, they generate tactics blind and hallucinate lemma names. With it, they can read the goal, search for relevant theorems, and build proofs interactively. The Model Context Protocol standardizes this interaction, letting AI assistants query external tools through a common interface.

The ML Infrastructure Stack

LeanDojo is the foundation. It wraps Lean 4 in a Python API, turning theorem proving into an RL environment. You send a tactic; it returns success or an error message. This is the bridge between neural networks and formal verification. Every serious research project in this space builds on it.

Lean Copilot brings this into VS Code. It suggests tactics in real-time as you write proofs, using a local or remote LLM. When you accept a suggestion, Lean immediately verifies it. The human provides high-level guidance; the model fills in tedious proof steps. This collaboration is more productive than either working alone.

llmstep is a lightweight alternative, model-agnostic and easy to integrate. It calls an LLM to suggest the next proof step, designed to work with any backend.

You can build a complete prover-verifier loop today without proprietary models. Use Claude or GPT-4o as the prover, LeanDojo as the environment, and Lean as the verifier. The stack: prompt the model with the goal state, receive a tactic, check it with Lean, feed errors back as context, repeat. This gives you the reasoning power of frontier models with the guarantees of formal verification.

Setting Up Claude Code

Claude Code is Anthropic’s command-line tool for AI-assisted development. To use it with Lean, you need to connect it to Lean’s language server via MCP. First, ensure you have the uv package manager installed. Then, from your Lean project root (after running lake build), register the MCP server:

claude mcp add lean-lsp uvx lean-lsp-mcp

This installs the lean-lsp-mcp server, which exposes Lean’s language server to Claude. The model can now read diagnostics, inspect goal states, query hover documentation, and search Mathlib using Loogle and LeanSearch.

For richer automation, you can install the lean4-skills plugin, which provides structured workflows for common theorem proving tasks:

/plugin marketplace add cameronfreer/lean4-skills

The plugin adds specialized agents for proof optimization, sorry filling, axiom checking, and compiler-guided repair. These are not magic; they encode patterns that experienced Lean users apply manually. But they save time on routine tasks and help beginners discover effective strategies.

Setting Up Cursor and VS Code

For Cursor or VS Code with an MCP-compatible extension, add the server to your MCP settings:

{ "mcpServers": { "lean-lsp": { "command": "uvx", "args": ["lean-lsp-mcp"] } } }

The workflow is similar: the model reads proof states through the language server and proposes tactics based on the current goal. The human reviews, accepts or rejects, and guides the search. This collaboration is more productive than either working alone.

Resources

Libraries & Tools

Mathlib: The formalized mathematics library
PhysLean: Formalizing physics in Lean
LeanDojo: ML infrastructure for theorem proving
Lean Copilot: Neural inference in Lean
llmstep: Lightweight model-agnostic tactic suggestion
lean-lsp-mcp: MCP server for Lean interaction
LeanExplore: Semantic search across Mathlib

Models & Reproductions

DeepSeek-Prover: Open-weight theorem proving models
DeepSeek-Prover-V1.5: Official prover-verifier codebase
DeepSeek-R1: Reasoning model weights and documentation
TinyZero: Minimal reproduction of DeepSeek-R1-Zero reasoning
Open-R1: HuggingFace’s open reproduction of the R1 pipeline
Verifiers: Modular MCTS and search for LLM reasoning

Papers & Analysis

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs: The R1-Zero paper on pure RL reasoning
DeepSeek-Prover-V2: Formal theorem proving with tree search
Process-Driven Autoformalization (FormL4): Compiler-guided natural language to Lean translation
AI and Formal Verification: Kleppmann on the convergence of LLMs and proof assistants
Technical Deep Dive: DeepSeek: Raschka’s architectural analysis

References

Official Documentation

Theorem Proving Games

Natural Number Game - HHU Düsseldorf
Real Analysis Game - Rutgers University
Reintroduction to Proofs - A game introducing proofs, dependent type theory, and Lean prepared by Emily Riehl for a first year seminar at Johns Hopkins (Fall 2025). Covers types, functions, products, coproducts, quantifiers, and dependent types through interactive puzzles. Source

University Courses (Lean 4)

Functional Programming and Theorem Proving - Stanford University
Formal Proof and Verification - Brown University
The Mechanics of Proof - Fordham University
Formalising Mathematics - Imperial College London
Formalized Mathematics in Lean - University of Bonn
Interactive Theorem Proving - LMU Munich
Proofs and Programs - Indian Institute of Science
Theorem Proving with Lean - University of Warwick
Logic and Mechanized Reasoning - Carnegie Mellon University
Lean for Scientists and Engineers - University of Maryland
An Introduction to Lean 4 - Universitat de València
Interactive Theorem Proving in Lean - MPI Leipzig
The Hitchhiker’s Guide to Logical Verification - Various institutions
Formal Methods in Mathematics - Mastermath (Netherlands)
Logique et démonstrations assistées - Université Paris-Saclay
Semantics and Verification of Software - RWTH Aachen
Formal Proof - Ohio State University
Lean Community Course Catalog - Full listing

University Courses (Lean 3)

Logic and Proof - Carnegie Mellon University
Modern Mathematics with Lean - University of Exeter
Graduate Introduction to Logic - University of Hawaii
Introduction to Proofs with Lean - Johns Hopkins University
Logic and Modelling - Vrije Universiteit Amsterdam
Harvard MATH 161 - University course on theorem proving with Lean and Mathlib.

Keyboard shortcuts

From Zero to QED