LALRPOP
LALRPOP is an LR(1) parser generator for Rust that produces efficient, table-driven parsers from declarative grammars. Unlike parser combinators, LALRPOP generates parsers at compile time from grammar files, providing excellent parsing performance and compile-time validation of grammar correctness. The generated parsers handle left recursion naturally and provide precise error messages with conflict detection during grammar compilation.
For compiler development, LALRPOP offers the traditional parser generator experience familiar to users of yacc, bison, or ANTLR, but with Rust’s type safety and zero-cost abstractions. The generated code integrates seamlessly with Rust’s ownership system, producing typed ASTs without runtime overhead. LALRPOP excels at parsing programming languages where performance matters and grammar complexity requires the power of LR parsing.
Basic Calculator
A simple arithmetic expression parser demonstrates LALRPOP’s syntax:
#![allow(unused)] fn main() { use std::str::FromStr; grammar; pub Expr: i32 = { <l:Expr> "+" <r:Factor> => l + r, <l:Expr> "-" <r:Factor> => l - r, Factor, }; Factor: i32 = { <l:Factor> "*" <r:Term> => l * r, <l:Factor> "/" <r:Term> => l / r, Term, }; Term: i32 = { Number, "(" <Expr> ")", }; Number: i32 = { r"[0-9]+" => i32::from_str(<>).unwrap(), }; }
This grammar correctly handles operator precedence through rule stratification. Addition and subtraction are parsed at a lower precedence level than multiplication and division. The parser is left-associative, parsing 10 - 2 - 3
as (10 - 2) - 3 = 5
.
AST Construction
Building typed ASTs requires defining the types and constructing them in grammar actions:
#![allow(unused)] fn main() { impl Expr { /// Evaluate the expression pub fn eval(&self) -> f64 { match self { Expr::Number(n) => *n, Expr::Add(l, r) => l.eval() + r.eval(), Expr::Subtract(l, r) => l.eval() - r.eval(), Expr::Multiply(l, r) => l.eval() * r.eval(), Expr::Divide(l, r) => l.eval() / r.eval(), Expr::Negate(e) => -e.eval(), Expr::Variable(_) => panic!("Cannot evaluate variable without context"), Expr::Call(name, args) => match name.as_str() { "max" if args.len() == 2 => f64::max(args[0].eval(), args[1].eval()), "min" if args.len() == 2 => f64::min(args[0].eval(), args[1].eval()), "sqrt" if args.len() == 1 => args[0].eval().sqrt(), _ => panic!("Unknown function: {}", name), }, } } } /// Statement types for a simple language #[derive(Debug, Clone, PartialEq)] pub enum Statement { Expression(Expr), Assignment(String, Expr), Print(Expr), If(Expr, Vec<Statement>, Option<Vec<Statement>>), While(Expr, Vec<Statement>), } /// A complete program #[derive(Debug, Clone, PartialEq)] pub struct Program { pub statements: Vec<Statement>, } /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Add(Box<Expr>, Box<Expr>), Subtract(Box<Expr>, Box<Expr>), Multiply(Box<Expr>, Box<Expr>), Divide(Box<Expr>, Box<Expr>), Negate(Box<Expr>), Variable(String), Call(String, Vec<Expr>), } }
The grammar constructs this AST:
#![allow(unused)] fn main() { use crate::ast::{Expr}; grammar; pub Expr: Expr = { <l:Expr> "+" <r:Term> => Expr::Add(Box::new(l), Box::new(r)), <l:Expr> "-" <r:Term> => Expr::Subtract(Box::new(l), Box::new(r)), Term, }; Term: Expr = { <l:Term> "*" <r:Factor> => Expr::Multiply(Box::new(l), Box::new(r)), <l:Term> "/" <r:Factor> => Expr::Divide(Box::new(l), Box::new(r)), Factor, }; Factor: Expr = { Primary, "-" <e:Factor> => Expr::Negate(Box::new(e)), }; Primary: Expr = { Number => Expr::Number(<>), Identifier => Expr::Variable(<>), "(" <Expr> ")", }; Number: f64 = { r"[0-9]+(\.[0-9]+)?" => f64::from_str(<>).unwrap(), }; Identifier: String = { r"[a-zA-Z][a-zA-Z0-9_]*" => <>.to_string(), }; }
Each production rule returns a value of the specified type. The angle bracket syntax <name:Rule>
binds matched values to variables used in the action code.
Statement Grammar
A more complete language with statements demonstrates complex grammar patterns:
#![allow(unused)] fn main() { /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Add(Box<Expr>, Box<Expr>), Subtract(Box<Expr>, Box<Expr>), Multiply(Box<Expr>, Box<Expr>), Divide(Box<Expr>, Box<Expr>), Negate(Box<Expr>), Variable(String), Call(String, Vec<Expr>), } impl Expr { /// Evaluate the expression pub fn eval(&self) -> f64 { match self { Expr::Number(n) => *n, Expr::Add(l, r) => l.eval() + r.eval(), Expr::Subtract(l, r) => l.eval() - r.eval(), Expr::Multiply(l, r) => l.eval() * r.eval(), Expr::Divide(l, r) => l.eval() / r.eval(), Expr::Negate(e) => -e.eval(), Expr::Variable(_) => panic!("Cannot evaluate variable without context"), Expr::Call(name, args) => match name.as_str() { "max" if args.len() == 2 => f64::max(args[0].eval(), args[1].eval()), "min" if args.len() == 2 => f64::min(args[0].eval(), args[1].eval()), "sqrt" if args.len() == 1 => args[0].eval().sqrt(), _ => panic!("Unknown function: {}", name), }, } } } /// Statement types for a simple language #[derive(Debug, Clone, PartialEq)] pub enum Statement { Expression(Expr), Assignment(String, Expr), Print(Expr), If(Expr, Vec<Statement>, Option<Vec<Statement>>), While(Expr, Vec<Statement>), } /// A complete program #[derive(Debug, Clone, PartialEq)] pub struct Program { pub statements: Vec<Statement>, } }
#![allow(unused)] fn main() { /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Add(Box<Expr>, Box<Expr>), Subtract(Box<Expr>, Box<Expr>), Multiply(Box<Expr>, Box<Expr>), Divide(Box<Expr>, Box<Expr>), Negate(Box<Expr>), Variable(String), Call(String, Vec<Expr>), } impl Expr { /// Evaluate the expression pub fn eval(&self) -> f64 { match self { Expr::Number(n) => *n, Expr::Add(l, r) => l.eval() + r.eval(), Expr::Subtract(l, r) => l.eval() - r.eval(), Expr::Multiply(l, r) => l.eval() * r.eval(), Expr::Divide(l, r) => l.eval() / r.eval(), Expr::Negate(e) => -e.eval(), Expr::Variable(_) => panic!("Cannot evaluate variable without context"), Expr::Call(name, args) => match name.as_str() { "max" if args.len() == 2 => f64::max(args[0].eval(), args[1].eval()), "min" if args.len() == 2 => f64::min(args[0].eval(), args[1].eval()), "sqrt" if args.len() == 1 => args[0].eval().sqrt(), _ => panic!("Unknown function: {}", name), }, } } } /// A complete program #[derive(Debug, Clone, PartialEq)] pub struct Program { pub statements: Vec<Statement>, } /// Statement types for a simple language #[derive(Debug, Clone, PartialEq)] pub enum Statement { Expression(Expr), Assignment(String, Expr), Print(Expr), If(Expr, Vec<Statement>, Option<Vec<Statement>>), While(Expr, Vec<Statement>), } }
The grammar for this language:
#![allow(unused)] fn main() { Statement: Statement = { <e:Expr> ";" => Statement::Expression(e), "let" <name:Identifier> "=" <e:Expr> ";" => Statement::Assignment(name, e), "print" <e:Expr> ";" => Statement::Print(e), "if" <cond:Expr> "{" <then:Statement*> "}" <els:("else" "{" <Statement*> "}")?> => { Statement::If(cond, then, els) }, "while" <cond:Expr> "{" <body:Statement*> "}" => Statement::While(cond, body), }; }
The *
operator creates lists of zero or more items. The ?
operator makes productions optional. Parentheses group sub-patterns for clarity.
Using External Lexers
LALRPOP can use external lexers like logos for improved performance and features:
#![allow(unused)] fn main() { use std::fmt; use logos::Logos; impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Plus => write!(f, "+"), Token::Minus => write!(f, "-"), Token::Star => write!(f, "*"), Token::Slash => write!(f, "/"), Token::Equals => write!(f, "="), Token::LeftParen => write!(f, "("), Token::RightParen => write!(f, ")"), Token::LeftBrace => write!(f, "{{"), Token::RightBrace => write!(f, "}}"), Token::Comma => write!(f, ","), Token::Semicolon => write!(f, ";"), Token::Let => write!(f, "let"), Token::If => write!(f, "if"), Token::Else => write!(f, "else"), Token::While => write!(f, "while"), Token::Print => write!(f, "print"), Token::True => write!(f, "true"), Token::False => write!(f, "false"), Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) => write!(f, "{}", s), Token::StringLiteral(s) => write!(f, "\"{}\"", s), Token::Error => write!(f, "<error>"), } } } /// Token types for our language using logos #[derive(Logos, Debug, PartialEq, Clone)] pub enum Token { // Operators #[token("+")] Plus, #[token("-")] Minus, #[token("*")] Star, #[token("/")] Slash, #[token("=")] Equals, // Delimiters #[token("(")] LeftParen, #[token(")")] RightParen, #[token("{")] LeftBrace, #[token("}")] RightBrace, #[token(",")] Comma, #[token(";")] Semicolon, // Keywords #[token("let")] Let, #[token("if")] If, #[token("else")] Else, #[token("while")] While, #[token("print")] Print, #[token("true")] True, #[token("false")] False, // Literals #[regex(r"[0-9]+(\.[0-9]+)?", |lex| lex.slice().parse::<f64>().ok())] Number(f64), #[regex(r"[a-zA-Z][a-zA-Z0-9_]*", |lex| lex.slice().to_string())] Identifier(String), #[regex(r#""([^"\\]|\\.)*""#, |lex| { let s = lex.slice(); s[1..s.len()-1].to_string() })] StringLiteral(String), // Skip whitespace #[regex(r"[ \t\n\f]+", logos::skip)] #[regex(r"//[^\n]*", logos::skip)] Error, } }
The grammar declares the external token type:
#![allow(unused)] fn main() { extern { type Location = usize; type Error = String; enum Token { "+" => Token::Plus, "-" => Token::Minus, "*" => Token::Star, "/" => Token::Slash, "=" => Token::Equals, "(" => Token::LeftParen, ")" => Token::RightParen, "number" => Token::Number(<f64>), "identifier" => Token::Identifier(<String>), } } }
Terminal symbols in the grammar now refer to token variants. The angle brackets extract associated data from token variants.
Integration with Logos
Connecting logos lexer to LALRPOP parser:
#![allow(unused)] fn main() { pub mod ast; pub mod token; use lalrpop_util::lalrpop_mod; lalrpop_mod!(pub calculator_builtin); lalrpop_mod!(pub expression); lalrpop_mod!(pub expression_logos); lalrpop_mod!(pub left_recursion); use lalrpop_util::ParseError; use logos::Logos; /// Parse a simple calculator expression using built-in lexer pub fn parse_calculator(input: &str) -> Result<i32, String> { calculator_builtin::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Example of detailed error handling for parse errors pub fn parse_with_detailed_errors(input: &str) -> Result<i32, String> { let parser = calculator_builtin::ExprParser::new(); match parser.parse(input) { Ok(result) => Ok(result), Err(ParseError::InvalidToken { location }) => { Err(format!("Invalid token at position {}", location)) } Err(ParseError::UnrecognizedToken { token, expected }) => { let (start, _, end) = token; Err(format!( "Unexpected '{}' at position {}-{}, expected one of: {:?}", &input[start..end], start, end, expected )) } Err(ParseError::UnrecognizedEof { location, expected }) => Err(format!( "Unexpected end of input at position {}, expected: {:?}", location, expected )), Err(ParseError::ExtraToken { token }) => { let (start, _, end) = token; Err(format!( "Extra token '{}' at position {}-{} after valid input", &input[start..end], start, end )) } Err(ParseError::User { error }) => Err(format!("Parse error: {}", error)), } } /// Parse an expression language program using built-in lexer pub fn parse_expression(input: &str) -> Result<ast::Program, String> { expression::ProgramParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Example: Building a simple interpreter pub struct Interpreter { variables: std::collections::HashMap<String, f64>, } impl Interpreter { pub fn new() -> Self { Self { variables: std::collections::HashMap::new(), } } pub fn execute(&mut self, program: &ast::Program) -> Result<(), String> { for statement in &program.statements { self.execute_statement(statement)?; } Ok(()) } fn execute_statement(&mut self, stmt: &ast::Statement) -> Result<(), String> { match stmt { ast::Statement::Expression(expr) => { self.eval_expr(expr)?; Ok(()) } ast::Statement::Assignment(name, expr) => { let value = self.eval_expr(expr)?; self.variables.insert(name.clone(), value); Ok(()) } ast::Statement::Print(expr) => { let value = self.eval_expr(expr)?; println!("{}", value); Ok(()) } ast::Statement::If(cond, then_block, else_block) => { let cond_value = self.eval_expr(cond)?; if cond_value != 0.0 { for stmt in then_block { self.execute_statement(stmt)?; } } else if let Some(else_stmts) = else_block { for stmt in else_stmts { self.execute_statement(stmt)?; } } Ok(()) } ast::Statement::While(cond, body) => { while self.eval_expr(cond)? != 0.0 { for stmt in body { self.execute_statement(stmt)?; } } Ok(()) } } } fn eval_expr(&self, expr: &ast::Expr) -> Result<f64, String> { match expr { ast::Expr::Number(n) => Ok(*n), ast::Expr::Add(l, r) => Ok(self.eval_expr(l)? + self.eval_expr(r)?), ast::Expr::Subtract(l, r) => Ok(self.eval_expr(l)? - self.eval_expr(r)?), ast::Expr::Multiply(l, r) => Ok(self.eval_expr(l)? * self.eval_expr(r)?), ast::Expr::Divide(l, r) => { let divisor = self.eval_expr(r)?; if divisor == 0.0 { Err("Division by zero".to_string()) } else { Ok(self.eval_expr(l)? / divisor) } } ast::Expr::Negate(e) => Ok(-self.eval_expr(e)?), ast::Expr::Variable(name) => self .variables .get(name) .copied() .ok_or_else(|| format!("Undefined variable: {}", name)), ast::Expr::Call(name, args) => { let arg_values: Result<Vec<_>, _> = args.iter().map(|e| self.eval_expr(e)).collect(); let arg_values = arg_values?; match name.as_str() { "max" if arg_values.len() == 2 => Ok(f64::max(arg_values[0], arg_values[1])), "min" if arg_values.len() == 2 => Ok(f64::min(arg_values[0], arg_values[1])), "sqrt" if arg_values.len() == 1 => Ok(arg_values[0].sqrt()), "abs" if arg_values.len() == 1 => Ok(arg_values[0].abs()), _ => Err(format!("Unknown function: {}", name)), } } } } } impl Default for Interpreter { fn default() -> Self { Self::new() } } /// Demonstrate left vs right associativity parsing pub fn demonstrate_associativity(input: &str) -> (String, String) { let left = left_recursion::LeftAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); let right = left_recursion::RightAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); (left, right) } /// Parse a comma-separated list using left recursion pub fn parse_list_left(input: &str) -> Result<Vec<i32>, String> { left_recursion::CommaSeparatedLeftParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse field access chains like "obj.field1.field2" pub fn parse_field_access(input: &str) -> Result<String, String> { left_recursion::FieldAccessParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse method chains like "obj.method1().method2()" pub fn parse_method_chain(input: &str) -> Result<String, String> { left_recursion::MethodChainParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse expressions with full operator precedence pub fn parse_with_precedence(input: &str) -> Result<ast::Expr, String> { left_recursion::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } #[cfg(test)] mod tests { use super::*; #[test] fn test_calculator() { assert_eq!(parse_calculator("2 + 3 * 4").unwrap(), 14); assert_eq!(parse_calculator("(2 + 3) * 4").unwrap(), 20); assert_eq!(parse_calculator("10 - 2 - 3").unwrap(), 5); } #[test] fn test_expression_parser() { let program = parse_expression("let x = 10; let y = 20; print x + y;").unwrap(); assert_eq!(program.statements.len(), 3); } #[test] fn test_logos_parser() { let program = parse_with_logos("let x = 5; print x * 2;").unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_interpreter() { let mut interpreter = Interpreter::new(); let program = parse_expression( "let x = 10; let y = 20; let z = x + y; print z;", ) .unwrap(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("z").unwrap(), 30.0); } #[test] fn test_if_statement() { let program = parse_expression( "let x = 5; if x { let y = 10; }", ) .unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_function_calls() { let program = parse_expression( "let x = max(10, 20); let y = sqrt(16);", ) .unwrap(); let mut interpreter = Interpreter::new(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("x").unwrap(), 20.0); assert_eq!(*interpreter.variables.get("y").unwrap(), 4.0); } #[test] fn test_left_vs_right_associativity() { // Test that subtraction is left-associative // 10 - 5 - 2 should be (10 - 5) - 2 = 3 for left // and 10 - (5 - 2) = 7 for right let (left, right) = demonstrate_associativity("10 - 5 - 2"); assert!(left.contains("Subtract")); assert!(right.contains("Subtract")); } #[test] fn test_comma_separated_list() { let result = parse_list_left("1, 2, 3, 4, 5").unwrap(); assert_eq!(result, vec![1, 2, 3, 4, 5]); } #[test] fn test_field_access_chain() { let result = parse_field_access("obj.field1.field2.field3").unwrap(); assert_eq!(result, "obj.field1.field2.field3"); } #[test] fn test_method_chain() { let result = parse_method_chain("obj.method1().method2().method3()").unwrap(); assert_eq!(result, "obj.method1().method2().method3()"); } #[test] fn test_operator_precedence() { // Test that * has higher precedence than + // 2 + 3 * 4 should be 2 + (3 * 4) = 14 let expr = parse_with_precedence("2 + 3 * 4").unwrap(); assert_eq!(expr.eval(), 14.0); } } /// Parse using logos for lexing pub fn parse_with_logos(input: &str) -> Result<ast::Program, String> { let lexer = token::Token::lexer(input); let tokens: Result<Vec<_>, _> = lexer .spanned() .map(|(tok, span)| match tok { Ok(t) => Ok((span.start, t, span.end)), Err(_) => Err("Lexer error"), }) .collect(); match tokens { Ok(tokens) => expression_logos::ProgramParser::new() .parse(tokens) .map_err(|e| format!("Parse error: {:?}", e)), Err(e) => Err(e.to_string()), } } }
The lexer produces tokens with location information that LALRPOP uses for error reporting. This separation of concerns allows optimizing lexer and parser independently.
Helper Rules
Common patterns can be abstracted into parameterized rules:
#![allow(unused)] fn main() { Comma<T>: Vec<T> = { <mut v:(<T> ",")*> <e:T?> => match e { None => v, Some(e) => { v.push(e); v } } }; }
This generic rule parses comma-separated lists of any type. Use it like <args:Comma<Expr>>
to parse function arguments.
Build Configuration
LALRPOP requires a build script to generate parsers:
extern crate lalrpop; fn main() { lalrpop::process_root().unwrap(); }
The build script processes all .lalrpop
files in the source tree, generating corresponding Rust modules.
Using Generated Parsers
Import generated parsers with the lalrpop_mod macro and use them to parse input:
#![allow(unused)] fn main() { pub mod ast; pub mod token; use lalrpop_util::lalrpop_mod; lalrpop_mod!(pub calculator_builtin); lalrpop_mod!(pub expression); lalrpop_mod!(pub expression_logos); lalrpop_mod!(pub left_recursion); use lalrpop_util::ParseError; use logos::Logos; /// Example of detailed error handling for parse errors pub fn parse_with_detailed_errors(input: &str) -> Result<i32, String> { let parser = calculator_builtin::ExprParser::new(); match parser.parse(input) { Ok(result) => Ok(result), Err(ParseError::InvalidToken { location }) => { Err(format!("Invalid token at position {}", location)) } Err(ParseError::UnrecognizedToken { token, expected }) => { let (start, _, end) = token; Err(format!( "Unexpected '{}' at position {}-{}, expected one of: {:?}", &input[start..end], start, end, expected )) } Err(ParseError::UnrecognizedEof { location, expected }) => Err(format!( "Unexpected end of input at position {}, expected: {:?}", location, expected )), Err(ParseError::ExtraToken { token }) => { let (start, _, end) = token; Err(format!( "Extra token '{}' at position {}-{} after valid input", &input[start..end], start, end )) } Err(ParseError::User { error }) => Err(format!("Parse error: {}", error)), } } /// Parse an expression language program using built-in lexer pub fn parse_expression(input: &str) -> Result<ast::Program, String> { expression::ProgramParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse using logos for lexing pub fn parse_with_logos(input: &str) -> Result<ast::Program, String> { let lexer = token::Token::lexer(input); let tokens: Result<Vec<_>, _> = lexer .spanned() .map(|(tok, span)| match tok { Ok(t) => Ok((span.start, t, span.end)), Err(_) => Err("Lexer error"), }) .collect(); match tokens { Ok(tokens) => expression_logos::ProgramParser::new() .parse(tokens) .map_err(|e| format!("Parse error: {:?}", e)), Err(e) => Err(e.to_string()), } } /// Example: Building a simple interpreter pub struct Interpreter { variables: std::collections::HashMap<String, f64>, } impl Interpreter { pub fn new() -> Self { Self { variables: std::collections::HashMap::new(), } } pub fn execute(&mut self, program: &ast::Program) -> Result<(), String> { for statement in &program.statements { self.execute_statement(statement)?; } Ok(()) } fn execute_statement(&mut self, stmt: &ast::Statement) -> Result<(), String> { match stmt { ast::Statement::Expression(expr) => { self.eval_expr(expr)?; Ok(()) } ast::Statement::Assignment(name, expr) => { let value = self.eval_expr(expr)?; self.variables.insert(name.clone(), value); Ok(()) } ast::Statement::Print(expr) => { let value = self.eval_expr(expr)?; println!("{}", value); Ok(()) } ast::Statement::If(cond, then_block, else_block) => { let cond_value = self.eval_expr(cond)?; if cond_value != 0.0 { for stmt in then_block { self.execute_statement(stmt)?; } } else if let Some(else_stmts) = else_block { for stmt in else_stmts { self.execute_statement(stmt)?; } } Ok(()) } ast::Statement::While(cond, body) => { while self.eval_expr(cond)? != 0.0 { for stmt in body { self.execute_statement(stmt)?; } } Ok(()) } } } fn eval_expr(&self, expr: &ast::Expr) -> Result<f64, String> { match expr { ast::Expr::Number(n) => Ok(*n), ast::Expr::Add(l, r) => Ok(self.eval_expr(l)? + self.eval_expr(r)?), ast::Expr::Subtract(l, r) => Ok(self.eval_expr(l)? - self.eval_expr(r)?), ast::Expr::Multiply(l, r) => Ok(self.eval_expr(l)? * self.eval_expr(r)?), ast::Expr::Divide(l, r) => { let divisor = self.eval_expr(r)?; if divisor == 0.0 { Err("Division by zero".to_string()) } else { Ok(self.eval_expr(l)? / divisor) } } ast::Expr::Negate(e) => Ok(-self.eval_expr(e)?), ast::Expr::Variable(name) => self .variables .get(name) .copied() .ok_or_else(|| format!("Undefined variable: {}", name)), ast::Expr::Call(name, args) => { let arg_values: Result<Vec<_>, _> = args.iter().map(|e| self.eval_expr(e)).collect(); let arg_values = arg_values?; match name.as_str() { "max" if arg_values.len() == 2 => Ok(f64::max(arg_values[0], arg_values[1])), "min" if arg_values.len() == 2 => Ok(f64::min(arg_values[0], arg_values[1])), "sqrt" if arg_values.len() == 1 => Ok(arg_values[0].sqrt()), "abs" if arg_values.len() == 1 => Ok(arg_values[0].abs()), _ => Err(format!("Unknown function: {}", name)), } } } } } impl Default for Interpreter { fn default() -> Self { Self::new() } } /// Demonstrate left vs right associativity parsing pub fn demonstrate_associativity(input: &str) -> (String, String) { let left = left_recursion::LeftAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); let right = left_recursion::RightAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); (left, right) } /// Parse a comma-separated list using left recursion pub fn parse_list_left(input: &str) -> Result<Vec<i32>, String> { left_recursion::CommaSeparatedLeftParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse field access chains like "obj.field1.field2" pub fn parse_field_access(input: &str) -> Result<String, String> { left_recursion::FieldAccessParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse method chains like "obj.method1().method2()" pub fn parse_method_chain(input: &str) -> Result<String, String> { left_recursion::MethodChainParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse expressions with full operator precedence pub fn parse_with_precedence(input: &str) -> Result<ast::Expr, String> { left_recursion::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } #[cfg(test)] mod tests { use super::*; #[test] fn test_calculator() { assert_eq!(parse_calculator("2 + 3 * 4").unwrap(), 14); assert_eq!(parse_calculator("(2 + 3) * 4").unwrap(), 20); assert_eq!(parse_calculator("10 - 2 - 3").unwrap(), 5); } #[test] fn test_expression_parser() { let program = parse_expression("let x = 10; let y = 20; print x + y;").unwrap(); assert_eq!(program.statements.len(), 3); } #[test] fn test_logos_parser() { let program = parse_with_logos("let x = 5; print x * 2;").unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_interpreter() { let mut interpreter = Interpreter::new(); let program = parse_expression( "let x = 10; let y = 20; let z = x + y; print z;", ) .unwrap(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("z").unwrap(), 30.0); } #[test] fn test_if_statement() { let program = parse_expression( "let x = 5; if x { let y = 10; }", ) .unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_function_calls() { let program = parse_expression( "let x = max(10, 20); let y = sqrt(16);", ) .unwrap(); let mut interpreter = Interpreter::new(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("x").unwrap(), 20.0); assert_eq!(*interpreter.variables.get("y").unwrap(), 4.0); } #[test] fn test_left_vs_right_associativity() { // Test that subtraction is left-associative // 10 - 5 - 2 should be (10 - 5) - 2 = 3 for left // and 10 - (5 - 2) = 7 for right let (left, right) = demonstrate_associativity("10 - 5 - 2"); assert!(left.contains("Subtract")); assert!(right.contains("Subtract")); } #[test] fn test_comma_separated_list() { let result = parse_list_left("1, 2, 3, 4, 5").unwrap(); assert_eq!(result, vec![1, 2, 3, 4, 5]); } #[test] fn test_field_access_chain() { let result = parse_field_access("obj.field1.field2.field3").unwrap(); assert_eq!(result, "obj.field1.field2.field3"); } #[test] fn test_method_chain() { let result = parse_method_chain("obj.method1().method2().method3()").unwrap(); assert_eq!(result, "obj.method1().method2().method3()"); } #[test] fn test_operator_precedence() { // Test that * has higher precedence than + // 2 + 3 * 4 should be 2 + (3 * 4) = 14 let expr = parse_with_precedence("2 + 3 * 4").unwrap(); assert_eq!(expr.eval(), 14.0); } } /// Parse a simple calculator expression using built-in lexer pub fn parse_calculator(input: &str) -> Result<i32, String> { calculator_builtin::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } }
Each public rule in the grammar generates a corresponding parser struct with a parse
method.
Error Handling
LALRPOP provides detailed error information with location tracking and expected tokens:
#![allow(unused)] fn main() { pub mod ast; pub mod token; use lalrpop_util::lalrpop_mod; lalrpop_mod!(pub calculator_builtin); lalrpop_mod!(pub expression); lalrpop_mod!(pub expression_logos); lalrpop_mod!(pub left_recursion); use lalrpop_util::ParseError; use logos::Logos; /// Parse a simple calculator expression using built-in lexer pub fn parse_calculator(input: &str) -> Result<i32, String> { calculator_builtin::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse an expression language program using built-in lexer pub fn parse_expression(input: &str) -> Result<ast::Program, String> { expression::ProgramParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse using logos for lexing pub fn parse_with_logos(input: &str) -> Result<ast::Program, String> { let lexer = token::Token::lexer(input); let tokens: Result<Vec<_>, _> = lexer .spanned() .map(|(tok, span)| match tok { Ok(t) => Ok((span.start, t, span.end)), Err(_) => Err("Lexer error"), }) .collect(); match tokens { Ok(tokens) => expression_logos::ProgramParser::new() .parse(tokens) .map_err(|e| format!("Parse error: {:?}", e)), Err(e) => Err(e.to_string()), } } /// Example: Building a simple interpreter pub struct Interpreter { variables: std::collections::HashMap<String, f64>, } impl Interpreter { pub fn new() -> Self { Self { variables: std::collections::HashMap::new(), } } pub fn execute(&mut self, program: &ast::Program) -> Result<(), String> { for statement in &program.statements { self.execute_statement(statement)?; } Ok(()) } fn execute_statement(&mut self, stmt: &ast::Statement) -> Result<(), String> { match stmt { ast::Statement::Expression(expr) => { self.eval_expr(expr)?; Ok(()) } ast::Statement::Assignment(name, expr) => { let value = self.eval_expr(expr)?; self.variables.insert(name.clone(), value); Ok(()) } ast::Statement::Print(expr) => { let value = self.eval_expr(expr)?; println!("{}", value); Ok(()) } ast::Statement::If(cond, then_block, else_block) => { let cond_value = self.eval_expr(cond)?; if cond_value != 0.0 { for stmt in then_block { self.execute_statement(stmt)?; } } else if let Some(else_stmts) = else_block { for stmt in else_stmts { self.execute_statement(stmt)?; } } Ok(()) } ast::Statement::While(cond, body) => { while self.eval_expr(cond)? != 0.0 { for stmt in body { self.execute_statement(stmt)?; } } Ok(()) } } } fn eval_expr(&self, expr: &ast::Expr) -> Result<f64, String> { match expr { ast::Expr::Number(n) => Ok(*n), ast::Expr::Add(l, r) => Ok(self.eval_expr(l)? + self.eval_expr(r)?), ast::Expr::Subtract(l, r) => Ok(self.eval_expr(l)? - self.eval_expr(r)?), ast::Expr::Multiply(l, r) => Ok(self.eval_expr(l)? * self.eval_expr(r)?), ast::Expr::Divide(l, r) => { let divisor = self.eval_expr(r)?; if divisor == 0.0 { Err("Division by zero".to_string()) } else { Ok(self.eval_expr(l)? / divisor) } } ast::Expr::Negate(e) => Ok(-self.eval_expr(e)?), ast::Expr::Variable(name) => self .variables .get(name) .copied() .ok_or_else(|| format!("Undefined variable: {}", name)), ast::Expr::Call(name, args) => { let arg_values: Result<Vec<_>, _> = args.iter().map(|e| self.eval_expr(e)).collect(); let arg_values = arg_values?; match name.as_str() { "max" if arg_values.len() == 2 => Ok(f64::max(arg_values[0], arg_values[1])), "min" if arg_values.len() == 2 => Ok(f64::min(arg_values[0], arg_values[1])), "sqrt" if arg_values.len() == 1 => Ok(arg_values[0].sqrt()), "abs" if arg_values.len() == 1 => Ok(arg_values[0].abs()), _ => Err(format!("Unknown function: {}", name)), } } } } } impl Default for Interpreter { fn default() -> Self { Self::new() } } /// Demonstrate left vs right associativity parsing pub fn demonstrate_associativity(input: &str) -> (String, String) { let left = left_recursion::LeftAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); let right = left_recursion::RightAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); (left, right) } /// Parse a comma-separated list using left recursion pub fn parse_list_left(input: &str) -> Result<Vec<i32>, String> { left_recursion::CommaSeparatedLeftParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse field access chains like "obj.field1.field2" pub fn parse_field_access(input: &str) -> Result<String, String> { left_recursion::FieldAccessParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse method chains like "obj.method1().method2()" pub fn parse_method_chain(input: &str) -> Result<String, String> { left_recursion::MethodChainParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse expressions with full operator precedence pub fn parse_with_precedence(input: &str) -> Result<ast::Expr, String> { left_recursion::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } #[cfg(test)] mod tests { use super::*; #[test] fn test_calculator() { assert_eq!(parse_calculator("2 + 3 * 4").unwrap(), 14); assert_eq!(parse_calculator("(2 + 3) * 4").unwrap(), 20); assert_eq!(parse_calculator("10 - 2 - 3").unwrap(), 5); } #[test] fn test_expression_parser() { let program = parse_expression("let x = 10; let y = 20; print x + y;").unwrap(); assert_eq!(program.statements.len(), 3); } #[test] fn test_logos_parser() { let program = parse_with_logos("let x = 5; print x * 2;").unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_interpreter() { let mut interpreter = Interpreter::new(); let program = parse_expression( "let x = 10; let y = 20; let z = x + y; print z;", ) .unwrap(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("z").unwrap(), 30.0); } #[test] fn test_if_statement() { let program = parse_expression( "let x = 5; if x { let y = 10; }", ) .unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_function_calls() { let program = parse_expression( "let x = max(10, 20); let y = sqrt(16);", ) .unwrap(); let mut interpreter = Interpreter::new(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("x").unwrap(), 20.0); assert_eq!(*interpreter.variables.get("y").unwrap(), 4.0); } #[test] fn test_left_vs_right_associativity() { // Test that subtraction is left-associative // 10 - 5 - 2 should be (10 - 5) - 2 = 3 for left // and 10 - (5 - 2) = 7 for right let (left, right) = demonstrate_associativity("10 - 5 - 2"); assert!(left.contains("Subtract")); assert!(right.contains("Subtract")); } #[test] fn test_comma_separated_list() { let result = parse_list_left("1, 2, 3, 4, 5").unwrap(); assert_eq!(result, vec![1, 2, 3, 4, 5]); } #[test] fn test_field_access_chain() { let result = parse_field_access("obj.field1.field2.field3").unwrap(); assert_eq!(result, "obj.field1.field2.field3"); } #[test] fn test_method_chain() { let result = parse_method_chain("obj.method1().method2().method3()").unwrap(); assert_eq!(result, "obj.method1().method2().method3()"); } #[test] fn test_operator_precedence() { // Test that * has higher precedence than + // 2 + 3 * 4 should be 2 + (3 * 4) = 14 let expr = parse_with_precedence("2 + 3 * 4").unwrap(); assert_eq!(expr.eval(), 14.0); } } /// Example of detailed error handling for parse errors pub fn parse_with_detailed_errors(input: &str) -> Result<i32, String> { let parser = calculator_builtin::ExprParser::new(); match parser.parse(input) { Ok(result) => Ok(result), Err(ParseError::InvalidToken { location }) => { Err(format!("Invalid token at position {}", location)) } Err(ParseError::UnrecognizedToken { token, expected }) => { let (start, _, end) = token; Err(format!( "Unexpected '{}' at position {}-{}, expected one of: {:?}", &input[start..end], start, end, expected )) } Err(ParseError::UnrecognizedEof { location, expected }) => Err(format!( "Unexpected end of input at position {}, expected: {:?}", location, expected )), Err(ParseError::ExtraToken { token }) => { let (start, _, end) = token; Err(format!( "Extra token '{}' at position {}-{} after valid input", &input[start..end], start, end )) } Err(ParseError::User { error }) => Err(format!("Parse error: {}", error)), } } }
The error types include location information and expected tokens, enabling high-quality error messages. The parser tracks byte positions which can be converted to line and column numbers for user-friendly error reporting.
Precedence and Associativity
Operator precedence is controlled by grammar structure:
#![allow(unused)] fn main() { // Lower precedence Expr: Expr = { <l:Expr> "||" <r:AndExpr> => Expr::Or(Box::new(l), Box::new(r)), AndExpr, }; AndExpr: Expr = { <l:AndExpr> "&&" <r:CmpExpr> => Expr::And(Box::new(l), Box::new(r)), CmpExpr, }; // Higher precedence CmpExpr: Expr = { <l:CmpExpr> "==" <r:AddExpr> => Expr::Equal(Box::new(l), Box::new(r)), <l:CmpExpr> "!=" <r:AddExpr> => Expr::NotEqual(Box::new(l), Box::new(r)), AddExpr, }; }
Rules lower in the grammar hierarchy have higher precedence. Left recursion creates left associativity; right recursion creates right associativity.
Left Recursion
Unlike recursive descent parsers and PEG parsers, LALRPOP handles left recursion naturally and efficiently. This is a fundamental advantage of LR parsing that enables intuitive grammar definitions for left-associative operators and list constructions.
Consider the difference between left and right associative parsing for subtraction:
#![allow(unused)] fn main() { // LEFT RECURSIVE - parses "10 - 5 - 2" as (10 - 5) - 2 = 3 pub LeftAssociative: Expr = { <l:LeftAssociative> "-" <r:Term> => Expr::Subtract(Box::new(l), Box::new(r)), Term, }; // RIGHT RECURSIVE - parses "10 - 5 - 2" as 10 - (5 - 2) = 7 pub RightAssociative: Expr = { <l:Term> "-" <r:RightAssociative> => Expr::Subtract(Box::new(l), Box::new(r)), Term, }; }
The left recursive version correctly implements the standard mathematical interpretation where operations associate left to right. This natural expression of grammar rules is impossible in top-down parsers without transformation.
Left recursion excels at parsing lists that build incrementally:
#![allow(unused)] fn main() { // Builds list as items are encountered pub CommaSeparatedLeft: Vec<i32> = { <mut list:CommaSeparatedLeft> "," <item:Number> => { list.push(item); list }, <n:Number> => vec![n], }; }
Field access and method chaining naturally use left recursion:
#![allow(unused)] fn main() { // Parses "obj.field1.field2" correctly pub FieldAccess: String = { <obj:FieldAccess> "." <field:Identifier> => format!("{}.{}", obj, field), Identifier, }; // Parses "obj.method1().method2()" correctly pub MethodChain: String = { <obj:MethodChain> "." <method:Identifier> "(" ")" => format!("{}.{}()", obj, method), Identifier, }; }
These patterns appear frequently in programming languages where operations chain from left to right. The ability to express them directly as left recursive rules simplifies grammar development and improves parser performance.
Postfix operators also benefit from left recursion:
#![allow(unused)] fn main() { // Array indexing, function calls, and postfix increment PostfixExpr: Expr = { <e:PostfixExpr> "[" <index:Expr> "]" => Expr::Index(Box::new(e), Box::new(index)), <func:PostfixExpr> "(" <args:Arguments> ")" => Expr::Call(Box::new(func), args), <e:PostfixExpr> "++" => Expr::PostIncrement(Box::new(e)), PrimaryExpr, }; }
Testing associativity demonstrates the difference:
#![allow(unused)] fn main() { pub mod ast; pub mod token; use lalrpop_util::lalrpop_mod; lalrpop_mod!(pub calculator_builtin); lalrpop_mod!(pub expression); lalrpop_mod!(pub expression_logos); lalrpop_mod!(pub left_recursion); use lalrpop_util::ParseError; use logos::Logos; /// Parse a simple calculator expression using built-in lexer pub fn parse_calculator(input: &str) -> Result<i32, String> { calculator_builtin::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Example of detailed error handling for parse errors pub fn parse_with_detailed_errors(input: &str) -> Result<i32, String> { let parser = calculator_builtin::ExprParser::new(); match parser.parse(input) { Ok(result) => Ok(result), Err(ParseError::InvalidToken { location }) => { Err(format!("Invalid token at position {}", location)) } Err(ParseError::UnrecognizedToken { token, expected }) => { let (start, _, end) = token; Err(format!( "Unexpected '{}' at position {}-{}, expected one of: {:?}", &input[start..end], start, end, expected )) } Err(ParseError::UnrecognizedEof { location, expected }) => Err(format!( "Unexpected end of input at position {}, expected: {:?}", location, expected )), Err(ParseError::ExtraToken { token }) => { let (start, _, end) = token; Err(format!( "Extra token '{}' at position {}-{} after valid input", &input[start..end], start, end )) } Err(ParseError::User { error }) => Err(format!("Parse error: {}", error)), } } /// Parse an expression language program using built-in lexer pub fn parse_expression(input: &str) -> Result<ast::Program, String> { expression::ProgramParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse using logos for lexing pub fn parse_with_logos(input: &str) -> Result<ast::Program, String> { let lexer = token::Token::lexer(input); let tokens: Result<Vec<_>, _> = lexer .spanned() .map(|(tok, span)| match tok { Ok(t) => Ok((span.start, t, span.end)), Err(_) => Err("Lexer error"), }) .collect(); match tokens { Ok(tokens) => expression_logos::ProgramParser::new() .parse(tokens) .map_err(|e| format!("Parse error: {:?}", e)), Err(e) => Err(e.to_string()), } } /// Example: Building a simple interpreter pub struct Interpreter { variables: std::collections::HashMap<String, f64>, } impl Interpreter { pub fn new() -> Self { Self { variables: std::collections::HashMap::new(), } } pub fn execute(&mut self, program: &ast::Program) -> Result<(), String> { for statement in &program.statements { self.execute_statement(statement)?; } Ok(()) } fn execute_statement(&mut self, stmt: &ast::Statement) -> Result<(), String> { match stmt { ast::Statement::Expression(expr) => { self.eval_expr(expr)?; Ok(()) } ast::Statement::Assignment(name, expr) => { let value = self.eval_expr(expr)?; self.variables.insert(name.clone(), value); Ok(()) } ast::Statement::Print(expr) => { let value = self.eval_expr(expr)?; println!("{}", value); Ok(()) } ast::Statement::If(cond, then_block, else_block) => { let cond_value = self.eval_expr(cond)?; if cond_value != 0.0 { for stmt in then_block { self.execute_statement(stmt)?; } } else if let Some(else_stmts) = else_block { for stmt in else_stmts { self.execute_statement(stmt)?; } } Ok(()) } ast::Statement::While(cond, body) => { while self.eval_expr(cond)? != 0.0 { for stmt in body { self.execute_statement(stmt)?; } } Ok(()) } } } fn eval_expr(&self, expr: &ast::Expr) -> Result<f64, String> { match expr { ast::Expr::Number(n) => Ok(*n), ast::Expr::Add(l, r) => Ok(self.eval_expr(l)? + self.eval_expr(r)?), ast::Expr::Subtract(l, r) => Ok(self.eval_expr(l)? - self.eval_expr(r)?), ast::Expr::Multiply(l, r) => Ok(self.eval_expr(l)? * self.eval_expr(r)?), ast::Expr::Divide(l, r) => { let divisor = self.eval_expr(r)?; if divisor == 0.0 { Err("Division by zero".to_string()) } else { Ok(self.eval_expr(l)? / divisor) } } ast::Expr::Negate(e) => Ok(-self.eval_expr(e)?), ast::Expr::Variable(name) => self .variables .get(name) .copied() .ok_or_else(|| format!("Undefined variable: {}", name)), ast::Expr::Call(name, args) => { let arg_values: Result<Vec<_>, _> = args.iter().map(|e| self.eval_expr(e)).collect(); let arg_values = arg_values?; match name.as_str() { "max" if arg_values.len() == 2 => Ok(f64::max(arg_values[0], arg_values[1])), "min" if arg_values.len() == 2 => Ok(f64::min(arg_values[0], arg_values[1])), "sqrt" if arg_values.len() == 1 => Ok(arg_values[0].sqrt()), "abs" if arg_values.len() == 1 => Ok(arg_values[0].abs()), _ => Err(format!("Unknown function: {}", name)), } } } } } impl Default for Interpreter { fn default() -> Self { Self::new() } } /// Parse a comma-separated list using left recursion pub fn parse_list_left(input: &str) -> Result<Vec<i32>, String> { left_recursion::CommaSeparatedLeftParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse field access chains like "obj.field1.field2" pub fn parse_field_access(input: &str) -> Result<String, String> { left_recursion::FieldAccessParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse method chains like "obj.method1().method2()" pub fn parse_method_chain(input: &str) -> Result<String, String> { left_recursion::MethodChainParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse expressions with full operator precedence pub fn parse_with_precedence(input: &str) -> Result<ast::Expr, String> { left_recursion::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } #[cfg(test)] mod tests { use super::*; #[test] fn test_calculator() { assert_eq!(parse_calculator("2 + 3 * 4").unwrap(), 14); assert_eq!(parse_calculator("(2 + 3) * 4").unwrap(), 20); assert_eq!(parse_calculator("10 - 2 - 3").unwrap(), 5); } #[test] fn test_expression_parser() { let program = parse_expression("let x = 10; let y = 20; print x + y;").unwrap(); assert_eq!(program.statements.len(), 3); } #[test] fn test_logos_parser() { let program = parse_with_logos("let x = 5; print x * 2;").unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_interpreter() { let mut interpreter = Interpreter::new(); let program = parse_expression( "let x = 10; let y = 20; let z = x + y; print z;", ) .unwrap(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("z").unwrap(), 30.0); } #[test] fn test_if_statement() { let program = parse_expression( "let x = 5; if x { let y = 10; }", ) .unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_function_calls() { let program = parse_expression( "let x = max(10, 20); let y = sqrt(16);", ) .unwrap(); let mut interpreter = Interpreter::new(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("x").unwrap(), 20.0); assert_eq!(*interpreter.variables.get("y").unwrap(), 4.0); } #[test] fn test_left_vs_right_associativity() { // Test that subtraction is left-associative // 10 - 5 - 2 should be (10 - 5) - 2 = 3 for left // and 10 - (5 - 2) = 7 for right let (left, right) = demonstrate_associativity("10 - 5 - 2"); assert!(left.contains("Subtract")); assert!(right.contains("Subtract")); } #[test] fn test_comma_separated_list() { let result = parse_list_left("1, 2, 3, 4, 5").unwrap(); assert_eq!(result, vec![1, 2, 3, 4, 5]); } #[test] fn test_field_access_chain() { let result = parse_field_access("obj.field1.field2.field3").unwrap(); assert_eq!(result, "obj.field1.field2.field3"); } #[test] fn test_method_chain() { let result = parse_method_chain("obj.method1().method2().method3()").unwrap(); assert_eq!(result, "obj.method1().method2().method3()"); } #[test] fn test_operator_precedence() { // Test that * has higher precedence than + // 2 + 3 * 4 should be 2 + (3 * 4) = 14 let expr = parse_with_precedence("2 + 3 * 4").unwrap(); assert_eq!(expr.eval(), 14.0); } } /// Demonstrate left vs right associativity parsing pub fn demonstrate_associativity(input: &str) -> (String, String) { let left = left_recursion::LeftAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); let right = left_recursion::RightAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); (left, right) } }
The function parses the same input with both left and right associative grammars, revealing how the parse tree structure differs. For the expression “10 - 5 - 2”, left association produces 3 while right association produces 7.
Complex expressions with multiple precedence levels all use left recursion:
#![allow(unused)] fn main() { BinaryOp: Expr = { <l:BinaryOp> "||" <r:AndExpr> => Expr::Or(Box::new(l), Box::new(r)), AndExpr, }; AndExpr: Expr = { <l:AndExpr> "&&" <r:EqExpr> => Expr::And(Box::new(l), Box::new(r)), EqExpr, }; AddExpr: Expr = { <l:AddExpr> "+" <r:MulExpr> => Expr::Add(Box::new(l), Box::new(r)), <l:AddExpr> "-" <r:MulExpr> => Expr::Subtract(Box::new(l), Box::new(r)), MulExpr, }; }
Each level of the precedence hierarchy uses left recursion to ensure operators associate correctly. This pattern scales to arbitrarily complex expression grammars while maintaining readability and performance.
The LR parsing algorithm builds the parse tree bottom-up, naturally handling left recursion without stack overflow issues that plague recursive descent parsers. This fundamental difference makes LALRPOP ideal for parsing programming languages with complex expression syntax.
Building an Interpreter
LALRPOP-generated parsers integrate well with interpreters:
#![allow(unused)] fn main() { pub mod ast; pub mod token; use lalrpop_util::lalrpop_mod; lalrpop_mod!(pub calculator_builtin); lalrpop_mod!(pub expression); lalrpop_mod!(pub expression_logos); lalrpop_mod!(pub left_recursion); use lalrpop_util::ParseError; use logos::Logos; /// Parse a simple calculator expression using built-in lexer pub fn parse_calculator(input: &str) -> Result<i32, String> { calculator_builtin::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Example of detailed error handling for parse errors pub fn parse_with_detailed_errors(input: &str) -> Result<i32, String> { let parser = calculator_builtin::ExprParser::new(); match parser.parse(input) { Ok(result) => Ok(result), Err(ParseError::InvalidToken { location }) => { Err(format!("Invalid token at position {}", location)) } Err(ParseError::UnrecognizedToken { token, expected }) => { let (start, _, end) = token; Err(format!( "Unexpected '{}' at position {}-{}, expected one of: {:?}", &input[start..end], start, end, expected )) } Err(ParseError::UnrecognizedEof { location, expected }) => Err(format!( "Unexpected end of input at position {}, expected: {:?}", location, expected )), Err(ParseError::ExtraToken { token }) => { let (start, _, end) = token; Err(format!( "Extra token '{}' at position {}-{} after valid input", &input[start..end], start, end )) } Err(ParseError::User { error }) => Err(format!("Parse error: {}", error)), } } /// Parse an expression language program using built-in lexer pub fn parse_expression(input: &str) -> Result<ast::Program, String> { expression::ProgramParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse using logos for lexing pub fn parse_with_logos(input: &str) -> Result<ast::Program, String> { let lexer = token::Token::lexer(input); let tokens: Result<Vec<_>, _> = lexer .spanned() .map(|(tok, span)| match tok { Ok(t) => Ok((span.start, t, span.end)), Err(_) => Err("Lexer error"), }) .collect(); match tokens { Ok(tokens) => expression_logos::ProgramParser::new() .parse(tokens) .map_err(|e| format!("Parse error: {:?}", e)), Err(e) => Err(e.to_string()), } } impl Interpreter { pub fn new() -> Self { Self { variables: std::collections::HashMap::new(), } } pub fn execute(&mut self, program: &ast::Program) -> Result<(), String> { for statement in &program.statements { self.execute_statement(statement)?; } Ok(()) } fn execute_statement(&mut self, stmt: &ast::Statement) -> Result<(), String> { match stmt { ast::Statement::Expression(expr) => { self.eval_expr(expr)?; Ok(()) } ast::Statement::Assignment(name, expr) => { let value = self.eval_expr(expr)?; self.variables.insert(name.clone(), value); Ok(()) } ast::Statement::Print(expr) => { let value = self.eval_expr(expr)?; println!("{}", value); Ok(()) } ast::Statement::If(cond, then_block, else_block) => { let cond_value = self.eval_expr(cond)?; if cond_value != 0.0 { for stmt in then_block { self.execute_statement(stmt)?; } } else if let Some(else_stmts) = else_block { for stmt in else_stmts { self.execute_statement(stmt)?; } } Ok(()) } ast::Statement::While(cond, body) => { while self.eval_expr(cond)? != 0.0 { for stmt in body { self.execute_statement(stmt)?; } } Ok(()) } } } fn eval_expr(&self, expr: &ast::Expr) -> Result<f64, String> { match expr { ast::Expr::Number(n) => Ok(*n), ast::Expr::Add(l, r) => Ok(self.eval_expr(l)? + self.eval_expr(r)?), ast::Expr::Subtract(l, r) => Ok(self.eval_expr(l)? - self.eval_expr(r)?), ast::Expr::Multiply(l, r) => Ok(self.eval_expr(l)? * self.eval_expr(r)?), ast::Expr::Divide(l, r) => { let divisor = self.eval_expr(r)?; if divisor == 0.0 { Err("Division by zero".to_string()) } else { Ok(self.eval_expr(l)? / divisor) } } ast::Expr::Negate(e) => Ok(-self.eval_expr(e)?), ast::Expr::Variable(name) => self .variables .get(name) .copied() .ok_or_else(|| format!("Undefined variable: {}", name)), ast::Expr::Call(name, args) => { let arg_values: Result<Vec<_>, _> = args.iter().map(|e| self.eval_expr(e)).collect(); let arg_values = arg_values?; match name.as_str() { "max" if arg_values.len() == 2 => Ok(f64::max(arg_values[0], arg_values[1])), "min" if arg_values.len() == 2 => Ok(f64::min(arg_values[0], arg_values[1])), "sqrt" if arg_values.len() == 1 => Ok(arg_values[0].sqrt()), "abs" if arg_values.len() == 1 => Ok(arg_values[0].abs()), _ => Err(format!("Unknown function: {}", name)), } } } } } impl Default for Interpreter { fn default() -> Self { Self::new() } } /// Demonstrate left vs right associativity parsing pub fn demonstrate_associativity(input: &str) -> (String, String) { let left = left_recursion::LeftAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); let right = left_recursion::RightAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); (left, right) } /// Parse a comma-separated list using left recursion pub fn parse_list_left(input: &str) -> Result<Vec<i32>, String> { left_recursion::CommaSeparatedLeftParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse field access chains like "obj.field1.field2" pub fn parse_field_access(input: &str) -> Result<String, String> { left_recursion::FieldAccessParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse method chains like "obj.method1().method2()" pub fn parse_method_chain(input: &str) -> Result<String, String> { left_recursion::MethodChainParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse expressions with full operator precedence pub fn parse_with_precedence(input: &str) -> Result<ast::Expr, String> { left_recursion::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } #[cfg(test)] mod tests { use super::*; #[test] fn test_calculator() { assert_eq!(parse_calculator("2 + 3 * 4").unwrap(), 14); assert_eq!(parse_calculator("(2 + 3) * 4").unwrap(), 20); assert_eq!(parse_calculator("10 - 2 - 3").unwrap(), 5); } #[test] fn test_expression_parser() { let program = parse_expression("let x = 10; let y = 20; print x + y;").unwrap(); assert_eq!(program.statements.len(), 3); } #[test] fn test_logos_parser() { let program = parse_with_logos("let x = 5; print x * 2;").unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_interpreter() { let mut interpreter = Interpreter::new(); let program = parse_expression( "let x = 10; let y = 20; let z = x + y; print z;", ) .unwrap(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("z").unwrap(), 30.0); } #[test] fn test_if_statement() { let program = parse_expression( "let x = 5; if x { let y = 10; }", ) .unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_function_calls() { let program = parse_expression( "let x = max(10, 20); let y = sqrt(16);", ) .unwrap(); let mut interpreter = Interpreter::new(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("x").unwrap(), 20.0); assert_eq!(*interpreter.variables.get("y").unwrap(), 4.0); } #[test] fn test_left_vs_right_associativity() { // Test that subtraction is left-associative // 10 - 5 - 2 should be (10 - 5) - 2 = 3 for left // and 10 - (5 - 2) = 7 for right let (left, right) = demonstrate_associativity("10 - 5 - 2"); assert!(left.contains("Subtract")); assert!(right.contains("Subtract")); } #[test] fn test_comma_separated_list() { let result = parse_list_left("1, 2, 3, 4, 5").unwrap(); assert_eq!(result, vec![1, 2, 3, 4, 5]); } #[test] fn test_field_access_chain() { let result = parse_field_access("obj.field1.field2.field3").unwrap(); assert_eq!(result, "obj.field1.field2.field3"); } #[test] fn test_method_chain() { let result = parse_method_chain("obj.method1().method2().method3()").unwrap(); assert_eq!(result, "obj.method1().method2().method3()"); } #[test] fn test_operator_precedence() { // Test that * has higher precedence than + // 2 + 3 * 4 should be 2 + (3 * 4) = 14 let expr = parse_with_precedence("2 + 3 * 4").unwrap(); assert_eq!(expr.eval(), 14.0); } } /// Example: Building a simple interpreter pub struct Interpreter { variables: std::collections::HashMap<String, f64>, } }
#![allow(unused)] fn main() { pub mod ast; pub mod token; use lalrpop_util::lalrpop_mod; lalrpop_mod!(pub calculator_builtin); lalrpop_mod!(pub expression); lalrpop_mod!(pub expression_logos); lalrpop_mod!(pub left_recursion); use lalrpop_util::ParseError; use logos::Logos; /// Parse a simple calculator expression using built-in lexer pub fn parse_calculator(input: &str) -> Result<i32, String> { calculator_builtin::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Example of detailed error handling for parse errors pub fn parse_with_detailed_errors(input: &str) -> Result<i32, String> { let parser = calculator_builtin::ExprParser::new(); match parser.parse(input) { Ok(result) => Ok(result), Err(ParseError::InvalidToken { location }) => { Err(format!("Invalid token at position {}", location)) } Err(ParseError::UnrecognizedToken { token, expected }) => { let (start, _, end) = token; Err(format!( "Unexpected '{}' at position {}-{}, expected one of: {:?}", &input[start..end], start, end, expected )) } Err(ParseError::UnrecognizedEof { location, expected }) => Err(format!( "Unexpected end of input at position {}, expected: {:?}", location, expected )), Err(ParseError::ExtraToken { token }) => { let (start, _, end) = token; Err(format!( "Extra token '{}' at position {}-{} after valid input", &input[start..end], start, end )) } Err(ParseError::User { error }) => Err(format!("Parse error: {}", error)), } } /// Parse an expression language program using built-in lexer pub fn parse_expression(input: &str) -> Result<ast::Program, String> { expression::ProgramParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse using logos for lexing pub fn parse_with_logos(input: &str) -> Result<ast::Program, String> { let lexer = token::Token::lexer(input); let tokens: Result<Vec<_>, _> = lexer .spanned() .map(|(tok, span)| match tok { Ok(t) => Ok((span.start, t, span.end)), Err(_) => Err("Lexer error"), }) .collect(); match tokens { Ok(tokens) => expression_logos::ProgramParser::new() .parse(tokens) .map_err(|e| format!("Parse error: {:?}", e)), Err(e) => Err(e.to_string()), } } /// Example: Building a simple interpreter pub struct Interpreter { variables: std::collections::HashMap<String, f64>, } impl Default for Interpreter { fn default() -> Self { Self::new() } } /// Demonstrate left vs right associativity parsing pub fn demonstrate_associativity(input: &str) -> (String, String) { let left = left_recursion::LeftAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); let right = left_recursion::RightAssociativeParser::new() .parse(input) .map(|e| format!("{:?}", e)) .unwrap_or_else(|e| format!("Error: {:?}", e)); (left, right) } /// Parse a comma-separated list using left recursion pub fn parse_list_left(input: &str) -> Result<Vec<i32>, String> { left_recursion::CommaSeparatedLeftParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse field access chains like "obj.field1.field2" pub fn parse_field_access(input: &str) -> Result<String, String> { left_recursion::FieldAccessParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse method chains like "obj.method1().method2()" pub fn parse_method_chain(input: &str) -> Result<String, String> { left_recursion::MethodChainParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } /// Parse expressions with full operator precedence pub fn parse_with_precedence(input: &str) -> Result<ast::Expr, String> { left_recursion::ExprParser::new() .parse(input) .map_err(|e| format!("Parse error: {:?}", e)) } #[cfg(test)] mod tests { use super::*; #[test] fn test_calculator() { assert_eq!(parse_calculator("2 + 3 * 4").unwrap(), 14); assert_eq!(parse_calculator("(2 + 3) * 4").unwrap(), 20); assert_eq!(parse_calculator("10 - 2 - 3").unwrap(), 5); } #[test] fn test_expression_parser() { let program = parse_expression("let x = 10; let y = 20; print x + y;").unwrap(); assert_eq!(program.statements.len(), 3); } #[test] fn test_logos_parser() { let program = parse_with_logos("let x = 5; print x * 2;").unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_interpreter() { let mut interpreter = Interpreter::new(); let program = parse_expression( "let x = 10; let y = 20; let z = x + y; print z;", ) .unwrap(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("z").unwrap(), 30.0); } #[test] fn test_if_statement() { let program = parse_expression( "let x = 5; if x { let y = 10; }", ) .unwrap(); assert_eq!(program.statements.len(), 2); } #[test] fn test_function_calls() { let program = parse_expression( "let x = max(10, 20); let y = sqrt(16);", ) .unwrap(); let mut interpreter = Interpreter::new(); interpreter.execute(&program).unwrap(); assert_eq!(*interpreter.variables.get("x").unwrap(), 20.0); assert_eq!(*interpreter.variables.get("y").unwrap(), 4.0); } #[test] fn test_left_vs_right_associativity() { // Test that subtraction is left-associative // 10 - 5 - 2 should be (10 - 5) - 2 = 3 for left // and 10 - (5 - 2) = 7 for right let (left, right) = demonstrate_associativity("10 - 5 - 2"); assert!(left.contains("Subtract")); assert!(right.contains("Subtract")); } #[test] fn test_comma_separated_list() { let result = parse_list_left("1, 2, 3, 4, 5").unwrap(); assert_eq!(result, vec![1, 2, 3, 4, 5]); } #[test] fn test_field_access_chain() { let result = parse_field_access("obj.field1.field2.field3").unwrap(); assert_eq!(result, "obj.field1.field2.field3"); } #[test] fn test_method_chain() { let result = parse_method_chain("obj.method1().method2().method3()").unwrap(); assert_eq!(result, "obj.method1().method2().method3()"); } #[test] fn test_operator_precedence() { // Test that * has higher precedence than + // 2 + 3 * 4 should be 2 + (3 * 4) = 14 let expr = parse_with_precedence("2 + 3 * 4").unwrap(); assert_eq!(expr.eval(), 14.0); } } impl Interpreter { pub fn new() -> Self { Self { variables: std::collections::HashMap::new(), } } pub fn execute(&mut self, program: &ast::Program) -> Result<(), String> { for statement in &program.statements { self.execute_statement(statement)?; } Ok(()) } fn execute_statement(&mut self, stmt: &ast::Statement) -> Result<(), String> { match stmt { ast::Statement::Expression(expr) => { self.eval_expr(expr)?; Ok(()) } ast::Statement::Assignment(name, expr) => { let value = self.eval_expr(expr)?; self.variables.insert(name.clone(), value); Ok(()) } ast::Statement::Print(expr) => { let value = self.eval_expr(expr)?; println!("{}", value); Ok(()) } ast::Statement::If(cond, then_block, else_block) => { let cond_value = self.eval_expr(cond)?; if cond_value != 0.0 { for stmt in then_block { self.execute_statement(stmt)?; } } else if let Some(else_stmts) = else_block { for stmt in else_stmts { self.execute_statement(stmt)?; } } Ok(()) } ast::Statement::While(cond, body) => { while self.eval_expr(cond)? != 0.0 { for stmt in body { self.execute_statement(stmt)?; } } Ok(()) } } } fn eval_expr(&self, expr: &ast::Expr) -> Result<f64, String> { match expr { ast::Expr::Number(n) => Ok(*n), ast::Expr::Add(l, r) => Ok(self.eval_expr(l)? + self.eval_expr(r)?), ast::Expr::Subtract(l, r) => Ok(self.eval_expr(l)? - self.eval_expr(r)?), ast::Expr::Multiply(l, r) => Ok(self.eval_expr(l)? * self.eval_expr(r)?), ast::Expr::Divide(l, r) => { let divisor = self.eval_expr(r)?; if divisor == 0.0 { Err("Division by zero".to_string()) } else { Ok(self.eval_expr(l)? / divisor) } } ast::Expr::Negate(e) => Ok(-self.eval_expr(e)?), ast::Expr::Variable(name) => self .variables .get(name) .copied() .ok_or_else(|| format!("Undefined variable: {}", name)), ast::Expr::Call(name, args) => { let arg_values: Result<Vec<_>, _> = args.iter().map(|e| self.eval_expr(e)).collect(); let arg_values = arg_values?; match name.as_str() { "max" if arg_values.len() == 2 => Ok(f64::max(arg_values[0], arg_values[1])), "min" if arg_values.len() == 2 => Ok(f64::min(arg_values[0], arg_values[1])), "sqrt" if arg_values.len() == 1 => Ok(arg_values[0].sqrt()), "abs" if arg_values.len() == 1 => Ok(arg_values[0].abs()), _ => Err(format!("Unknown function: {}", name)), } } } } } }
The interpreter walks the AST, maintaining variable bindings and executing statements. This separation of parsing and execution allows optimization and analysis passes between parsing and execution.
Conflict Resolution
LALRPOP detects grammar conflicts at compile time:
error: ambiguity detected
The following symbols can be reduced in two ways:
Expr "+" Expr
They could be reduced like so:
Expr = Expr "+" Expr
Or they could be reduced like so:
Expr = Expr, "+" Expr
Resolve conflicts by restructuring the grammar or using precedence annotations. LALRPOP’s error messages pinpoint the exact productions causing conflicts.
Performance Optimization
LALRPOP generates table-driven parsers with excellent performance characteristics. The parsing algorithm is O(n) for valid input with no backtracking. Tables are computed at compile time, so runtime overhead is minimal.
For maximum performance, use external lexers like logos that produce tokens in a single pass. The combination of logos lexing and LALRPOP parsing can process millions of lines per second.
Best Practices
Structure grammars for clarity and maintainability. Group related productions together and use comments to explain complex patterns. Keep action code simple, delegating complex logic to separate functions.
Use typed ASTs to catch errors at compile time. The type system ensures grammar productions and AST construction remain synchronized. Changes to the AST that break the grammar are caught during compilation.
Test grammars thoroughly with both valid and invalid input. LALRPOP’s error reporting helps debug grammar issues, but comprehensive tests ensure the parser accepts the intended language.
Profile parser performance on realistic input. While LALRPOP generates efficient parsers, grammar structure affects performance. Minimize ambiguity and left-factorize common prefixes when performance matters.