chumsky
Chumsky is a parser combinator library that emphasizes error recovery, performance, and ease of use. Unlike traditional parser generators, chumsky builds parsers from small, composable functions that can be combined to parse complex grammars. The library excels at providing detailed error messages and recovering from parse errors to continue processing malformed input.
Parser combinators in chumsky follow a functional programming style where parsers are values that can be composed using combinator functions. Each parser is a function that consumes input and produces either a parsed value or an error. The library provides extensive built-in combinators for common patterns like repetition, choice, and sequencing.
Core Parser Types
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } }
The expression type represents the abstract syntax tree nodes that parsers produce. Chumsky parsers transform character streams into structured data like this AST.
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } }
Binary operators demonstrate how parsers handle operator precedence and associativity through careful combinator composition.
Building Expression Parsers
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } }
The expression parser showcases several key chumsky features. The recursive
combinator enables parsing recursive structures like nested expressions. The choice
combinator tries multiple alternatives until one succeeds. The foldl
combinator builds left-associative binary operations by folding a list of operators and operands.
Operator precedence emerges naturally from parser structure. Parsers for higher-precedence operators like multiplication appear lower in the combinator chain, ensuring they bind more tightly than addition or subtraction. The then
combinator sequences parsers, while map
transforms parsed values into AST nodes.
Lexical Analysis
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } }
While chumsky can parse character streams directly, separate lexical analysis often improves performance and error messages for complex languages.
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } }
The lexer demonstrates span tracking, which records the source location of each token. The map_with_span
combinator attaches location information to parsed values, enabling precise error reporting. Keywords are distinguished from identifiers during lexing rather than parsing, simplifying the grammar.
Error Recovery
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } }
Error recovery allows parsers to continue processing after encountering errors, producing partial results and multiple error messages. The recover_with
combinator specifies recovery strategies for specific error conditions. The nested_delimiters
recovery strategy handles mismatched parentheses by searching for the appropriate closing delimiter.
Recovery strategies help development tools provide better user experiences. IDEs can show multiple syntax errors simultaneously, and compilers can report more problems in a single run. The separated_by
combinator with allow_trailing
handles comma-separated lists gracefully, even with trailing commas.
Custom Combinators
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } }
Custom combinators encapsulate common parsing patterns for reuse across different parts of a grammar. This binary operator parser handles any set of operators at the same precedence level, building left-associative expressions. The generic implementation works with any operator type and expression parser.
Creating domain-specific combinators improves grammar readability and reduces duplication. Common patterns in a language can be abstracted into reusable components that compose naturally with built-in combinators.
Validation and Semantic Analysis
#![allow(unused)] fn main() { use std::fmt; use chumsky::error::Rich; use chumsky::extra; use chumsky::prelude::*; /// AST node for expressions #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(f64), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Unary(UnOp, Box<Expr>), Call(String, Vec<Expr>), Let(String, Box<Expr>, Box<Expr>), } /// Binary operators #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, Eq, Lt, } /// Unary operators #[derive(Debug, Clone, PartialEq)] pub enum UnOp { Neg, Not, } /// Parse a simple expression language with operator precedence pub fn expr_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let atom = recursive(|expr| { let args = expr .clone() .separated_by(just(',')) .allow_trailing() .collect() .delimited_by(just('('), just(')')); let call = ident .then(args) .map(|(name, args): (&str, Vec<Expr>)| Expr::Call(name.to_string(), args)); let let_binding = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(text::keyword("in")) .then(expr.clone()) .map(|((name, value), body): ((&str, Expr), Expr)| { Expr::Let(name.to_string(), Box::new(value), Box::new(body)) }); choice(( number, call, let_binding, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.delimited_by(just('('), just(')')), )) }) .padded(); let unary = just('-') .repeated() .collect::<Vec<_>>() .then(atom.clone()) .map(|(ops, expr)| { ops.into_iter() .fold(expr, |expr, _| Expr::Unary(UnOp::Neg, Box::new(expr))) }); let product = unary.clone().foldl( choice((just('*').to(BinOp::Mul), just('/').to(BinOp::Div))) .then(unary) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); let sum = product.clone().foldl( choice((just('+').to(BinOp::Add), just('-').to(BinOp::Sub))) .then(product) .repeated(), |left, (op, right)| Expr::Binary(op, Box::new(left), Box::new(right)), ); sum.then_ignore(end()) } /// Token type for lexing #[derive(Debug, Clone, PartialEq)] pub enum Token { Number(f64), Identifier(String), Keyword(String), Op(char), Delimiter(char), } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Number(n) => write!(f, "{}", n), Token::Identifier(s) | Token::Keyword(s) => write!(f, "{}", s), Token::Op(c) | Token::Delimiter(c) => write!(f, "{}", c), } } } /// Lexer that produces tokens with spans pub fn lexer<'src>( ) -> impl Parser<'src, &'src str, Vec<(Token, SimpleSpan)>, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Token::Number(s.parse().unwrap())); let identifier = text::ident().to_slice().map(|s: &str| match s { "let" | "in" | "if" | "then" | "else" => Token::Keyword(s.to_string()), _ => Token::Identifier(s.to_string()), }); let op = one_of("+-*/=<>!&|").map(Token::Op); let delimiter = one_of("(){}[],;").map(Token::Delimiter); let token = choice((number, identifier, op, delimiter)).padded_by(text::whitespace()); token .map_with(|tok, e| (tok, e.span())) .repeated() .collect() .then_ignore(end()) } /// Parser with error recovery pub fn robust_parser<'src>() -> impl Parser<'src, &'src str, Vec<Expr>, extra::Err<Rich<'src, char>>> { let ident = text::ident().padded().to_slice(); let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap_or(0.0))) .padded(); let expr = recursive(|expr| { let atom = choice(( number, ident.map(|s: &str| Expr::Identifier(s.to_string())), expr.clone() .delimited_by(just('('), just(')')) .recover_with(via_parser(nested_delimiters( '(', ')', [('{', '}'), ('[', ']')], |_| Expr::Number(0.0), ))), )); atom }); expr.separated_by(just(';')) .allow_leading() .allow_trailing() .collect() .then_ignore(end()) } /// Custom parser combinator for binary operators with precedence pub fn binary_op_parser<'src>( ops: &[(&'src str, BinOp)], next: impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src, ) -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> + Clone + 'src { let op = choice( ops.iter() .map(|(s, op)| just(*s).to(op.clone())) .collect::<Vec<_>>(), ); next.clone() .foldl(op.then(next).repeated(), |left, (op, right)| { Expr::Binary(op, Box::new(left), Box::new(right)) }) } /// Parser with custom error types #[derive(Debug, Clone, PartialEq)] pub enum ParseError { UnexpectedToken(String), UnclosedDelimiter(char), InvalidNumber(String), } #[cfg(test)] mod tests { use super::*; #[test] fn test_expr_parser() { let parser = expr_parser(); let input = "2 + 3 * 4"; let result = parser.parse(input); assert!(!result.has_errors()); match result.into_output().unwrap() { Expr::Binary(BinOp::Add, left, right) => { assert_eq!(*left, Expr::Number(2.0)); match *right { Expr::Binary(BinOp::Mul, l, r) => { assert_eq!(*l, Expr::Number(3.0)); assert_eq!(*r, Expr::Number(4.0)); } _ => panic!("Expected multiplication on right"), } } _ => panic!("Expected addition at top level"), } } #[test] fn test_lexer() { let lexer = lexer(); let input = "let x = 42 + 3.14"; let result = lexer.parse(input); assert!(!result.has_errors()); let tokens = result.into_output().unwrap(); assert_eq!(tokens.len(), 6); // let, x, =, 42, +, 3.14 assert_eq!(tokens[0].0, Token::Keyword("let".to_string())); assert_eq!(tokens[1].0, Token::Identifier("x".to_string())); } #[test] fn test_robust_parser() { let parser = robust_parser(); // Test with valid input let input = "42; x; y"; let result = parser.parse(input); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap().len(), 3); // Test with recovery - unclosed paren let input_with_error = "42; (x; y"; let result = parser.parse(input_with_error); // The parser should still produce some output even with errors assert!(result.has_errors()); } #[test] fn test_binary_op_parser() { let ops = &[("&&", BinOp::Eq), ("||", BinOp::Eq)]; let atom = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .map(|s: &str| Expr::Number(s.parse().unwrap())) .padded(); let parser = binary_op_parser(ops, atom); let result = parser.parse("1 && 2 || 3"); assert!(!result.has_errors()); } #[test] fn test_validated_parser() { let parser = validated_parser(); // Test valid input let result = parser.parse("42"); assert!(!result.has_errors()); assert_eq!(result.into_output().unwrap(), Expr::Number(42.0)); // Test invalid number - this should produce an error let result = parser.parse("12.34.56"); assert!(result.has_errors()); } } pub fn validated_parser<'src>() -> impl Parser<'src, &'src str, Expr, extra::Err<Rich<'src, char>>> { let number = text::int(10) .then(just('.').then(text::digits(10)).or_not()) .to_slice() .try_map(|s: &str, span| { s.parse::<f64>() .map(Expr::Number) .map_err(|_| Rich::custom(span, format!("Invalid number: {}", s))) }); let ident = text::ident().to_slice().try_map(|s: &str, span| { if s.len() > 100 { Err(Rich::custom(span, "Identifier too long")) } else { Ok(Expr::Identifier(s.to_string())) } }); choice((number, ident)).then_ignore(end()) } }
The validate
combinator performs semantic checks during parsing, emitting errors for invalid constructs while continuing to parse. This enables reporting both syntactic and semantic errors in a single pass. Validation can check numeric ranges, identifier validity, or any other semantic constraint.
Combining parsing and validation reduces the number of passes over the input and provides better error messages by retaining parse context. The error emission mechanism allows multiple errors from a single validation, supporting comprehensive error reporting.
Performance Considerations
Chumsky parsers achieve good performance through several optimizations. The library uses zero-copy parsing where possible, avoiding string allocation for tokens and identifiers. Parsers are compiled to efficient state machines that minimize backtracking.
Choice combinators try alternatives in order, so placing common cases first improves performance. The or
combinator creates more efficient parsers than choice
when only two alternatives exist. Memoization can be added to recursive parsers to avoid reparsing the same input multiple times.
Integration Patterns
Chumsky integrates well with other compiler infrastructure. The span information works with error reporting libraries like ariadne or codespan-reporting to display beautiful error messages. AST nodes can implement visitor patterns or be processed by subsequent compiler passes.
The streaming API supports parsing large files without loading them entirely into memory. Incremental parsing can be implemented by caching parse results for unchanged portions of input. The modular parser design allows testing individual components in isolation.
Best Practices
Structure parsers hierarchically, with each level handling one precedence level or syntactic category. Use meaningful names for intermediate parsers to improve readability. Keep individual parsers focused on a single responsibility.
Test parsers thoroughly with both valid and invalid input. Error recovery strategies should be tested to ensure they produce reasonable partial results. Use property-based testing to verify parser properties like round-tripping through pretty-printing.
Profile parser performance on realistic input to identify bottlenecks. Complex lookahead or backtracking can dramatically impact performance. Consider using a separate lexer for languages with complex tokenization rules.
Document grammar ambiguities and their resolution strategies. Explain why certain parser structures were chosen, especially for complex precedence hierarchies. Provide examples of valid and invalid syntax to clarify language rules.