System F Type Checker
While the simple lambda calculus with HM-style type schemes introduces safety and limited polymorphism, it is also very rigid and not very expressive. So now we move onto System F, also known as the polymorphic lambda calculus, smashes this limitation by introducing parametric polymorphism: the ability to write a single piece of code that is generic over types. This is the theoretical foundation for generics in languages like Rust, Java, and C++, and it represents a monumental leap in expressive power.
To achieve this, System F extends both the term and type languages. The most crucial addition to the type language is the universal quantifier, \(\forall\) (forall), which allows us to express the type of a polymorphic function.
\[ \begin{align*} \text{types} \quad \tau &::= \alpha \mid \tau_1 \to \tau_2 \mid \forall \alpha . \tau \mid \text{Int} \mid \text{Bool} \\ \text{expressions} \quad e &::= x \mid \lambda x : \tau . e \mid e_1 \ e_2 \mid \Lambda \alpha . e \mid e[\tau] \mid \dots \end{align*} \]
The type \(\forall \alpha . \tau\) is represented by Type::Forall(String, Box<Type>)
. The term language now includes two new constructs specifically for handling polymorphism:
- Type Abstraction (\(\Lambda \alpha . e\)): This creates a polymorphic function. The capital lambda (\(\Lambda\)) signifies that we are abstracting over a type variable \(\alpha\), not a term variable. This expression is represented by
Expr::TAbs(String, Box<Expr>)
. - Type Application (\(e[\tau]\)): This specializes a polymorphic function by applying it to a concrete type \(\tau\). It’s how we use a generic function. This is represented by
Expr::TApp(Box<Expr>, Box<Type>)
.
A key difference from the untyped lambda calculus is that our term-level abstractions (\(\lambda\)) are now explicitly annotated: \(\lambda x : \tau . e\). The programmer must declare the type of the function’s parameter. This is a hallmark of System F; its power comes at the cost of full type inference, necessitating these annotations.
The quintessential example of System F’s power is the polymorphic identity function, id
. We can now write a single identity function that works for any type \(\alpha\). We create it with a type abstraction and give its parameter x
the generic type \(\alpha\):
\[ \text{id} = \Lambda \alpha . \lambda x : \alpha . x \]
The type of this function is \(\forall \alpha . \alpha \to \alpha\). This type says: “For all types \(\alpha\), I am a function that takes an argument of type \(\alpha\) and returns a value of type \(\alpha\).”
To use this polymorphic function, we apply it to a type using type application. For example, to get an identity function specifically for integers, we apply id
to the type \(\text{Int}\):
\[ \text{id}[\text{Int}] \]
This is a computation that happens at the type-checking level. The result of this type application is a new, specialized expression, \(\lambda x : \text{Int} . x\), which has the corresponding specialized type \(\text{Int} \to \text{Int}\). We can then apply this specialized function to a value, like \((\lambda x : \text{Int} . x) \ 5\), which finally reduces to \(5\). This separation of type-level application (\(e[\tau]\)) and term-level application (\(e_1 \ e_2\)) is fundamental to System F. It provides a powerful and safe way to write generic code, but as we have seen, it forces the programmer to be more explicit with annotations, a tradeoff that directly motivates the bidirectional algorithms used in more modern language implementations.
Abstract Syntax Trees
Let’s look at the full abstract syntax tree for both the type language (enum Type
) and the term language (enum Expr
):
The Type Level
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq)] pub enum Expr { Var(String), App(Box<Expr>, Box<Expr>), Abs(String, Box<Type>, Box<Expr>), TApp(Box<Expr>, Box<Type>), TAbs(String, Box<Expr>), Ann(Box<Expr>, Box<Type>), // Type annotation: e : T LitInt(i64), LitBool(bool), // New constructs Let(String, Box<Expr>, Box<Expr>), // let x = e1 in e2 IfThenElse(Box<Expr>, Box<Expr>, Box<Expr>), // if e1 then e2 else e3 BinOp(BinOp, Box<Expr>, Box<Expr>), // Binary operations } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { // Arithmetic Add, // + Sub, // - Mul, // * Div, // / // Boolean And, // && Or, // || // Comparison Eq, // == Ne, // != Lt, // < Le, // <= Gt, // > Ge, // >= } impl std::fmt::Display for Type { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { Type::Var(name) => write!(f, "{}", name), Type::ETVar(name) => write!(f, "^{}", name), Type::Int => write!(f, "Int"), Type::Bool => write!(f, "Bool"), Type::Arrow(t1, t2) => { // Add parentheses around left side if it's an arrow to avoid ambiguity match t1.as_ref() { Type::Arrow(_, _) => write!(f, "({}) -> {}", t1, t2), _ => write!(f, "{} -> {}", t1, t2), } } Type::Forall(var, ty) => write!(f, "∀{}. {}", var, ty), } } } #[derive(Debug, Clone, PartialEq)] pub enum Type { Var(String), // α (ordinary type variable) ETVar(String), // ^α (existential type variable) Arrow(Box<Type>, Box<Type>), // A -> B Forall(String, Box<Type>), // ∀α. A Int, // Int Bool, // Bool } }
The Type
enum defines the grammar for all possible types in our language. It’s the vocabulary we use to describe our expressions.
-
Type::Var(String)
: This represents a simple type variable, like \(\alpha\) or \(\beta\). These are placeholders for types that will be specified later, often used in polymorphic functions. -
Type::ETVar(String)
: This represents an “existential type variable,” often written as \(^\alpha\). These are a special kind of variable used internally by the type checker, particularly in bidirectional algorithms. They act as placeholders for an unknown type that the algorithm needs to infer. -
Type::Arrow(Box<Type>, Box<Type>)
: This is the function type, \(\tau_1 \to \tau_2\). It represents a function that takes an argument of the first type and returns a result of the second type. -
Type::Forall(String, Box<Type>)
: This is the universal quantifier, \(\forall \alpha . \tau\). It is the cornerstone of System F and represents a polymorphic type. It reads: “for all types \(\alpha\), the following type \(\tau\) holds.” TheString
is the name of the type variable \(\alpha\) being bound. -
Type::Int
andType::Bool
: These are primitive, or base, types. They are concrete types that are built directly into the language, representing 64-bit integers and booleans, respectively.
The Value Level
The Expr
enum defines the grammar for all runnable expressions or terms. This is the code that performs computations.
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq)] pub enum BinOp { // Arithmetic Add, // + Sub, // - Mul, // * Div, // / // Boolean And, // && Or, // || // Comparison Eq, // == Ne, // != Lt, // < Le, // <= Gt, // > Ge, // >= } #[derive(Debug, Clone, PartialEq)] pub enum Type { Var(String), // α (ordinary type variable) ETVar(String), // ^α (existential type variable) Arrow(Box<Type>, Box<Type>), // A -> B Forall(String, Box<Type>), // ∀α. A Int, // Int Bool, // Bool } impl std::fmt::Display for Type { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { Type::Var(name) => write!(f, "{}", name), Type::ETVar(name) => write!(f, "^{}", name), Type::Int => write!(f, "Int"), Type::Bool => write!(f, "Bool"), Type::Arrow(t1, t2) => { // Add parentheses around left side if it's an arrow to avoid ambiguity match t1.as_ref() { Type::Arrow(_, _) => write!(f, "({}) -> {}", t1, t2), _ => write!(f, "{} -> {}", t1, t2), } } Type::Forall(var, ty) => write!(f, "∀{}. {}", var, ty), } } } #[derive(Debug, Clone, PartialEq)] pub enum Expr { Var(String), App(Box<Expr>, Box<Expr>), Abs(String, Box<Type>, Box<Expr>), TApp(Box<Expr>, Box<Type>), TAbs(String, Box<Expr>), Ann(Box<Expr>, Box<Type>), // Type annotation: e : T LitInt(i64), LitBool(bool), // New constructs Let(String, Box<Expr>, Box<Expr>), // let x = e1 in e2 IfThenElse(Box<Expr>, Box<Expr>, Box<Expr>), // if e1 then e2 else e3 BinOp(BinOp, Box<Expr>, Box<Expr>), // Binary operations } }
-
Var(String)
: A term-level variable, likex
orf
. It refers to a value that is in scope, such as a function parameter or alet
-bound variable. -
App(Box<Expr>, Box<Expr>)
: This is function application, \(e_1 \ e_2\). It represents calling the function \(e_1\) with the argument \(e_2\). -
Abs(String, Box<Type>, Box<Expr>)
: This is a typed lambda abstraction, \(\lambda x : \tau . e\). It defines an anonymous function. Unlike in the pure lambda calculus, the parameter (String
) must have an explicit type annotation (Box<Type>
). The finalBox<Expr>
is the function’s body. -
TApp(Box<Expr>, Box<Type>)
: This is type application, \(e[\tau]\). This is the mechanism for specializing a polymorphic function. The expression \(e\) must have aForall
type, and this construct applies it to a specific type \(\tau\), effectively “filling in” the generic type parameter. -
TAbs(String, Box<Expr>)
: This is type abstraction, \(\Lambda \alpha . e\). This is how we create a polymorphic function. It introduces a new type variable (String
) that can be used within the expression body (Box<Expr>
). -
Ann(Box<Expr>, Box<Type>)
: This represents a type annotation, \(e : T\). It’s an explicit instruction to the type checker, asserting that the expression \(e\) should have the type \(T\). This is invaluable in a bidirectional system for guiding the inference process and resolving ambiguity. -
LitInt(i64)
andLitBool(bool)
: These are literal values. They represent concrete, primitive values that are “wired into” the language, corresponding to the base typesInt
andBool
. They are the simplest form of expression, representing a constant value.