nom
Nom is a parser combinator library focused on binary formats and text protocols, emphasizing zero-copy parsing and streaming capabilities. The library uses a functional programming approach where small parsers combine into larger ones through combinator functions. Nom excels at parsing network protocols, file formats, and configuration languages with excellent performance characteristics.
The core abstraction in nom is the IResult
type, which represents the outcome of a parser. Every parser consumes input and produces either a successful parse with remaining input or an error. This design enables parsers to chain naturally, with each parser consuming part of the input and passing the remainder to the next parser.
Core Types and Parsers
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } }
The expression type demonstrates a typical AST that nom parsers produce. Each variant represents a different syntactic construct that the parser recognizes.
Number Parsing
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } }
The float parser showcases nom’s approach to parsing numeric values. The recognize
combinator captures the matched input as a string slice, while map_res
applies a fallible transformation. This pattern avoids allocation by working directly with input slices.
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } }
Integer parsing follows a similar pattern but handles signed integers. The pair
combinator sequences two parsers, and opt
makes a parser optional, enabling parsing of both positive and negative numbers.
String and Identifier Parsing
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } }
String literal parsing demonstrates nom’s handling of escape sequences. The escaped
combinator recognizes escaped characters within strings, supporting common escape sequences like newlines and quotes. The delimited
combinator extracts content between delimiters.
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } }
Identifier parsing shows how to build parsers for programming language tokens. The recognize
combinator returns the matched input slice rather than the parsed components, avoiding string allocation. The alt
combinator tries multiple alternatives until one succeeds.
Expression Parsing
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } }
Expression parsing demonstrates operator precedence through parser layering. The fold_many0
combinator implements left-associative binary operators by folding a sequence of operations. Higher precedence operations like multiplication are parsed in the term
function, called from within expression parsing.
The separation of term
and expression
functions creates the precedence hierarchy. Terms handle multiplication and division, while expressions handle addition and subtraction. This structure ensures correct operator precedence without explicit precedence declarations.
Function Calls and Arrays
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } }
Function call parsing combines several nom features. The tuple
combinator sequences multiple parsers, capturing all results. The separated_list0
combinator handles comma-separated argument lists, a common pattern in programming languages.
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } }
Array parsing uses similar techniques but with different delimiters. The ws
helper function handles whitespace around tokens, a critical aspect of parsing human-readable formats.
Configuration File Parsing
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } }
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } }
Configuration parsing demonstrates nom’s suitability for structured data formats. The types represent a typical configuration file structure with sections and key-value pairs.
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } }
The configuration parser builds up from smaller parsers for values, entries, and sections. Each parser focuses on one aspect of the format, combining through nom’s compositional approach. The many0
combinator parses zero or more occurrences, building collections incrementally.
Error Handling
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } }
Context-aware parsing improves error messages by annotating parsers with descriptive labels. The context
combinator wraps parsers with error context, while cut
prevents backtracking after partial matches. This combination provides precise error messages indicating exactly where parsing failed.
The VerboseError
type collects detailed error information including the error location and a trace of attempted parses. This information helps developers understand why parsing failed and where in the grammar the error occurred.
Streaming and Binary Parsing
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } }
Streaming parsing handles input that may not be completely available. The parser processes available data and indicates how much input was consumed. This approach works well for network protocols and large files that cannot fit in memory.
#![allow(unused)] fn main() { use nom::branch::alt; use nom::bytes::complete::{escaped, tag, take_while1}; use nom::character::complete::{alpha1, alphanumeric1, char, digit1, multispace0, one_of}; use nom::combinator::{map, map_res, opt, recognize, value}; use nom::multi::{fold_many0, many0, separated_list0}; use nom::sequence::{delimited, pair, preceded}; use nom::{IResult, Parser}; /// AST for a simple expression language #[derive(Debug, Clone, PartialEq)] pub enum Expr { Number(i64), Float(f64), String(String), Identifier(String), Binary(BinOp, Box<Expr>, Box<Expr>), Call(String, Vec<Expr>), Array(Vec<Expr>), } #[derive(Debug, Clone, PartialEq)] pub enum BinOp { Add, Sub, Mul, Div, } /// Parse a floating-point number pub fn float(input: &str) -> IResult<&str, f64> { map_res( recognize((opt(char('-')), digit1, opt((char('.'), digit1)))), |s: &str| s.parse::<f64>(), ) .parse(input) } /// Parse an integer pub fn integer(input: &str) -> IResult<&str, i64> { map_res(recognize(pair(opt(char('-')), digit1)), |s: &str| { s.parse::<i64>() }) .parse(input) } /// Parse a string literal with escape sequences pub fn string_literal(input: &str) -> IResult<&str, String> { delimited( char('"'), map( escaped( take_while1(|c: char| c != '"' && c != '\\'), '\\', one_of(r#""n\rt"#), ), |s: &str| s.to_string(), ), char('"'), ) .parse(input) } /// Parse an identifier pub fn identifier(input: &str) -> IResult<&str, String> { map( recognize(pair( alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))), )), |s: &str| s.to_string(), ) .parse(input) } /// Parse whitespace - wraps a parser with optional whitespace fn ws<'a, O, F>(mut inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O> where F: FnMut(&'a str) -> IResult<&'a str, O>, { move |input| { let (input, _) = multispace0.parse(input)?; let (input, result) = inner(input)?; let (input, _) = multispace0.parse(input)?; Ok((input, result)) } } /// Parse a function call pub fn function_call(input: &str) -> IResult<&str, Expr> { map( ( |i| identifier.parse(i), ws(|i| { delimited( char('('), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), char(')'), ) .parse(i) }), ), |(name, args)| Expr::Call(name, args), ) .parse(input) } /// Parse an array literal pub fn array(input: &str) -> IResult<&str, Expr> { map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| expression.parse(i)), ws(|input| char(']').parse(input)), ), Expr::Array, ) .parse(input) } /// Parse a primary expression pub fn primary(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| integer.parse(i), Expr::Number), map(|i| string_literal.parse(i), Expr::String), |i| function_call.parse(i), |i| array.parse(i), map(|i| identifier.parse(i), Expr::Identifier), delimited( ws(|input| char('(').parse(input)), |i| expression.parse(i), ws(|input| char(')').parse(input)), ), )) .parse(input) } /// Parse a term (multiplication and division) pub fn term(input: &str) -> IResult<&str, Expr> { let (input, init) = primary.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Mul, char('*')), value(BinOp::Div, char('/')))).parse(input) }), |i| primary.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Parse an expression (addition and subtraction) pub fn expression(input: &str) -> IResult<&str, Expr> { let (input, init) = term.parse(input)?; fold_many0( pair( ws(|input| { alt((value(BinOp::Add, char('+')), value(BinOp::Sub, char('-')))).parse(input) }), |i| term.parse(i), ), move || init.clone(), |acc, (op, val)| Expr::Binary(op, Box::new(acc), Box::new(val)), ) .parse(input) } /// Configuration file parser example #[derive(Debug, Clone, PartialEq)] pub struct Config { pub sections: Vec<Section>, } #[derive(Debug, Clone, PartialEq)] pub struct Section { pub name: String, pub entries: Vec<(String, Value)>, } #[derive(Debug, Clone, PartialEq)] pub enum Value { String(String), Number(f64), Boolean(bool), List(Vec<Value>), } /// Parse a configuration value pub fn config_value(input: &str) -> IResult<&str, Value> { alt(( map(float, Value::Number), map(string_literal, Value::String), map(tag("true"), |_| Value::Boolean(true)), map(tag("false"), |_| Value::Boolean(false)), map( delimited( ws(|input| char('[').parse(input)), separated_list0(ws(|input| char(',').parse(input)), |i| { config_value.parse(i) }), ws(|input| char(']').parse(input)), ), Value::List, ), )) .parse(input) } /// Parse a configuration entry pub fn config_entry(input: &str) -> IResult<&str, (String, Value)> { map( ( ws(|input| identifier.parse(input)), ws(|input| char('=').parse(input)), ws(|input| config_value.parse(input)), ), |(key, _, value)| (key, value), ) .parse(input) } /// Parse a configuration section pub fn config_section(input: &str) -> IResult<&str, Section> { map( ( delimited( ws(|input| char('[').parse(input)), identifier, ws(|input| char(']').parse(input)), ), many0(config_entry), ), |(name, entries)| Section { name, entries }, ) .parse(input) } /// Parse a complete configuration file pub fn parse_config(input: &str) -> IResult<&str, Config> { map(many0(ws(|input| config_section.parse(input))), |sections| { Config { sections } }) .parse(input) } /// Custom error handling with context pub fn parse_with_context(input: &str) -> IResult<&str, Expr> { alt(( map(|i| float.parse(i), Expr::Float), map(|i| identifier.parse(i), Expr::Identifier), delimited( |i| delimited(multispace0, char('('), multispace0).parse(i), |i| parse_with_context.parse(i), |i| delimited(multispace0, char(')'), multispace0).parse(i), ), )) .parse(input) } /// Streaming parser for large files pub fn streaming_parser(input: &str) -> IResult<&str, Vec<Expr>> { many0(delimited( |i| multispace0.parse(i), |i| expression.parse(i), |i| { alt(( map(char(';'), |_| ()), map(|i2| multispace0.parse(i2), |_| ()), )) .parse(i) }, )) .parse(input) } /// Parser with custom error type #[derive(Debug, PartialEq)] pub enum CustomError { InvalidNumber, UnexpectedToken, MissingDelimiter, } pub fn custom_error_parser(input: &str) -> IResult<&str, Expr> { alt(( map( |i| float.parse(i), |n| { if n.is_finite() { Expr::Float(n) } else { Expr::Float(0.0) // Return default value for invalid numbers } }, ), map(|i| identifier.parse(i), Expr::Identifier), )) .parse(input) } #[cfg(test)] mod tests { use super::*; #[test] fn test_float_parser() { assert_eq!(float.parse("3.14"), Ok(("", 3.14))); assert_eq!(float.parse("-2.5"), Ok(("", -2.5))); assert_eq!(float.parse("42"), Ok(("", 42.0))); } #[test] fn test_expression_parser() { use nom::Parser; let result = expression.parse("2 + 3 * 4").unwrap(); assert_eq!( result.1, Expr::Binary( BinOp::Add, Box::new(Expr::Float(2.0)), Box::new(Expr::Binary( BinOp::Mul, Box::new(Expr::Float(3.0)), Box::new(Expr::Float(4.0)) )) ) ); } #[test] fn test_function_call() { use nom::Parser; let result = function_call.parse("max(1, 2, 3)").unwrap(); assert_eq!( result.1, Expr::Call( "max".to_string(), vec![Expr::Float(1.0), Expr::Float(2.0), Expr::Float(3.0)] ) ); } #[test] fn test_config_parser() { use nom::Parser; let config = "[database]\nhost = \"localhost\"\nport = 5432\n"; let result = parse_config.parse(config).unwrap(); assert_eq!(result.1.sections.len(), 1); assert_eq!(result.1.sections[0].name, "database"); assert_eq!(result.1.sections[0].entries.len(), 2); } } /// Binary format parser pub fn parse_binary_header(input: &[u8]) -> IResult<&[u8], (u32, u32)> { use nom::number::complete::{be_u32, le_u32}; (preceded(tag(&b"MAGIC"[..]), le_u32), be_u32).parse(input) } }
Binary format parsing showcases nom’s byte-level parsing capabilities. The library provides parsers for various integer encodings, network byte order, and fixed-size data. The take
combinator extracts a specific number of bytes, while endian-specific parsers handle byte order conversions.
Performance Optimization
Nom achieves excellent performance through zero-copy parsing. Parsers work directly with input slices, avoiding string allocation until necessary. The recognize
combinator returns matched input slices, and parsers can pass ownership of subslices rather than copying data.
Careful combinator choice impacts performance. The alt
combinator tries alternatives sequentially, so placing common cases first reduces average parsing time. The many0
and many1
combinators can be replaced with fold_many0
and fold_many1
to avoid intermediate vector allocation.
Nom’s macros generate specialized code for each parser combination, eliminating function call overhead. The generated code often compiles to efficient machine code comparable to hand-written parsers.
Integration Patterns
Nom parsers integrate well with other Rust libraries. The &str
and &[u8]
input types work with standard library types, while the IResult
type integrates with error handling libraries. Parsed ASTs can be processed by subsequent compiler passes or serialized to other formats.
For incremental parsing, nom parsers can save state between invocations. The remaining input from one parse becomes the starting point for the next, enabling parsing of streaming data or interactive input.
Custom input types allow parsing from non-standard sources. Implementing nom’s input traits enables parsing from rope data structures, memory-mapped files, or network streams.
Best Practices
Structure parsers hierarchically with clear separation of concerns. Each parser should handle one grammatical construct, making the grammar evident from the code structure. Use descriptive names that match the grammar terminology.
Test parsers extensively with both valid and invalid input. Property-based testing verifies parser properties like consuming all valid input or rejecting invalid constructs. Fuzzing finds edge cases in parser implementations.
Profile parsers on representative input to identify performance bottlenecks. Complex alternatives or excessive backtracking impact performance. Consider using peek
to look ahead without consuming input when making parsing decisions.
Handle errors gracefully with appropriate error types. The VerboseError
type aids development, while custom error types provide better user experience. Use context
and cut
to improve error messages.
Document the grammar alongside the parser implementation. Comments should explain the grammatical constructs being parsed and any deviations from standard grammar notation. Examples of valid input clarify the parser’s behavior.