Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Theorem Provers

This article covers the history of theorem provers. If you do not care about history, skip to Basics.

The idea of mechanizing mathematical reasoning dates back centuries, but the modern era of theorem proving began in the 1960s and 1970s when researchers first attempted to implement formal logic on computers. These early systems were primitive by today’s standards, but they established the fundamental insight that proofs could be represented as data structures and verified by algorithms.

Early Systems

The first generation of theorem provers emerged from two distinct traditions. One tradition, exemplified by systems like Automath developed by Nicolaas de Bruijn in the late 1960s, focused on encoding existing mathematical proofs in a formal language that a computer could check. De Bruijn’s work introduced many concepts that remain central to modern systems, including the idea that types could depend on values and that propositions could be represented as types. The other tradition focused on automated theorem proving, attempting to have computers discover proofs on their own through search procedures. While fully automated proving remains intractable for most interesting mathematics, techniques from this tradition inform the automation tactics available in modern proof assistants.

The 1980s saw the development of several influential systems that shaped the field. The Calculus of Constructions, introduced by Thierry Coquand and Gérard Huet, provided a unified foundation combining dependent types with a hierarchy of universes. This calculus became the theoretical basis for Coq, which remains one of the most widely used proof assistants today. Coq pioneered many features now standard in the field, including tactic-based proof development, extraction of executable programs from proofs, and a module system for organizing large developments. Major verification efforts in Coq include the CompCert certified C compiler and the mathematical proof of the four color theorem.

First-Order Theorem Proving

A parallel tradition built theorem provers on first-order logic rather than type theory. The Boyer-Moore family of provers, culminating in ACL2, used an untyped computational substrate based on Lisp with powerful automation heuristics for discovering proofs. ACL2 achieved notable industrial successes, including verification of AMD’s floating-point division after the Pentium FDIV bug made hardware correctness suddenly interesting to executives.

Despite these successes, first-order theorem proving has not been widely adopted outside specialized industrial applications. First-order logic imposes an expressiveness ceiling that makes formalizing modern mathematics awkward. Without dependent types, you cannot easily express properties like “a vector of length n” or “a sorted list.” These systems rely heavily on opaque automation heuristics rather than user-programmable tactics, which makes it harder to understand why proofs fail and how to fix them. Most importantly, there is no Curry-Howard correspondence linking proofs to programs, which means verified algorithms cannot be extracted into executable code.

The contrast is instructive. Type-theoretic systems grew ecosystems of thousands of users, million-line mathematical libraries, and active research communities. The Boyer-Moore family and other first-order provers remained specialized tools for specific classes of problems, a less fruitful branch of the theorem proving tree. Unless you are maintaining legacy verification infrastructure, there is little reason to reach for these systems today when tools like Lean offer strictly more expressive power. The Curry-Howard insight that proofs are programs and types are propositions turned out to be generatively powerful in ways that first-order theorem proving was not. When you can express your specification, your implementation, and your correctness proof in the same language, each informs the others. This unity is what makes dependent type theory feel like mathematics rather than a checkbox.

The LCF Tradition

Around the same time, researchers in Edinburgh developed the LCF system and its descendants, which introduced the influential LCF architecture. In this design, there is a small trusted kernel that defines what constitutes a valid proof, and all proof construction must ultimately pass through this kernel. This approach provides strong guarantees because only the kernel needs to be trusted, while tactics and automation can be implemented in untrusted code. The HOL family of provers, including HOL4 and Isabelle/HOL, descend from this tradition. Isabelle in particular has been used for major verification efforts including the seL4 verified operating system kernel.

Dependent Type Theory

The development of Martin-Löf type theory in the 1970s and 1980s provided another foundational framework that influenced systems like Agda and later Idris. Per Martin-Löf’s intensional type theory emphasized the computational content of proofs and introduced identity types as a way to reason about equality. Agda, developed primarily at Chalmers University, implements a variant of this theory with sophisticated support for dependent pattern matching. Its syntax influenced Lean’s design, and it remains popular for research in type theory and programming language semantics.

Idris, created by Edwin Brady, took a different approach by prioritizing practical programming with dependent types rather than theorem proving per se. Idris demonstrated that dependent types could be integrated into a language designed for general-purpose programming, with features like implicit arguments and type-driven development making dependently typed code more accessible to working programmers. Many of these ergonomic innovations influenced Lean 4’s design.

The 2010s brought renewed interest in the foundations of mathematics through homotopy type theory, which reinterprets types as spaces and equality as paths. This perspective, developed by Vladimir Voevodsky and others, led to new proof assistants like Cubical Agda that implement univalent foundations. While Lean does not natively support cubical type theory, the mathematical insights from this research have influenced how the community thinks about equality and transport.

Lean’s Origins

Lean emerged from this rich history. The first version was developed by Leonardo de Moura at Microsoft Research starting in 2013, with the goal of building a system suitable for both interactive theorem proving and automated reasoning. Lean 2 and Lean 3 refined the system and built a substantial mathematical library called Mathlib. Lean 4 was a ground-up rewrite, largely self-hosting, with a powerful metaprogramming framework that allows users to extend the system.

Modern Landscape

Today’s theorem provers share many features despite their different foundations. Most support some form of dependent types, allowing types to depend on values and enabling precise specifications. Most provide tactic languages for interactive proof development alongside term-mode proof construction. Most include automation ranging from simple rewriting to sophisticated decision procedures. The systems differ in their logical foundations, their approach to equality and computation, their support for classical versus constructive reasoning, and their emphasis on programming versus pure mathematics. Lean occupies a distinctive position by providing classical logic by default while maintaining strong computational properties, and by treating programming and proving as equally important activities.

If you want to jump ahead to specific topics: Type Theory covers the theoretical foundations, Dependent Types explains how types can depend on values, Tactics covers interactive proof development, and Proof Strategy discusses automation including next-generation tactics like aesop and grind. The Artificial Intelligence article discusses where this is all heading.

The mathematical libraries built on these systems represent decades of formalization effort. Mathlib, Lean’s mathematical library, contains over a million lines of formalized mathematics covering undergraduate and graduate-level material across algebra, analysis, topology, number theory, and other areas. Similar libraries exist for other systems. These libraries demonstrate that modern proof assistants can handle serious mathematics, not just toy examples, and they provide the infrastructure needed for working mathematicians to formalize their own results.

The field continues to evolve rapidly. Large language models have recently shown promise in generating proof steps, raising the possibility of more powerful automation. Verification of software and hardware systems has become increasingly practical, with companies using theorem provers to ensure correctness of critical systems. The boundary between programming and proving continues to blur as dependent types become more accessible. Lean 4 represents the current state of the art in making these powerful tools usable for a broad audience of mathematicians and programmers.