marginalia
The marginalia crate is a trivia-preserving adapter that slots into the standard logos + lalrpop pipeline. Most parsers drop comments and blank lines at the lexer stage, which is fine for an interpreter but ruinous for a formatter, a refactoring tool, or any other consumer that has to round-trip back to source. marginalia solves this by wrapping a logos iterator in an iterator adapter that strips comment tokens from the stream the parser sees while recording them on the side, indexed by byte offset, so a later pass can re-attach them to AST nodes.
The crate is small on purpose. It does not parse, format, or know about your AST. It exposes one trait (Classify), one iterator (TriviaLexer), and a TriviaTable of events. A separate attach module re-anchors trivia onto AST spans; a separate pretty module is a Doc-style IR with trivia slots. Pick the pieces you need.
Token Classification
The user’s token enum implements Classify to tell marginalia which variants are trivia and which are not. Anything that returns Some(TriviaPiece) is recorded and stripped from the stream; everything else passes through unchanged.
#![allow(unused)] fn main() { use logos::Logos; use marginalia::{BuiltinKind, Classify, Trivia, TriviaEvent, TriviaLexer, TriviaPiece}; /// Hand each comment variant to marginalia. Non-trivia tokens return `None` and /// flow through the lexer unchanged. impl Classify for Tok { fn trivia(&self) -> Option<TriviaPiece<'_>> { match self { Self::LineComment(s) => Some(TriviaPiece { kind: BuiltinKind::Line, text: s, }), Self::BlockComment(s) => Some(TriviaPiece { kind: BuiltinKind::Block, text: s, }), _ => None, } } } /// Lex `source`, returning the semantic tokens (comments stripped) and the /// trivia table that the formatter or AST-attachment pass would consume next. pub fn lex(source: &str) -> (Vec<(usize, Tok, usize)>, Vec<TriviaEvent>) { let raw = Tok::lexer(source).spanned().map(|(res, span)| match res { Ok(tok) => Ok((span.start, tok, span.end)), Err(()) => Err(()), }); let mut layer = TriviaLexer::new(raw, source); let tokens: Vec<_> = (&mut layer).filter_map(Result::ok).collect(); let table = layer.into_table(); (tokens, table.events().to_vec()) } /// Project a trivia event to a short human-readable tag for snapshot tests. pub fn describe(event: &TriviaEvent) -> String { match &event.trivia { Trivia::Comment { kind: BuiltinKind::Line, text, } => format!("line@{}..{}: {text}", event.span.start, event.span.end), Trivia::Comment { kind: BuiltinKind::Block, text, } => format!("block@{}..{}: {text}", event.span.start, event.span.end), Trivia::BlankLine => format!("blank@{}..{}", event.span.start, event.span.end), } } #[cfg(test)] mod tests { use super::*; #[test] fn parser_never_sees_comments() { let src = "let x = 1; // tail\n\n/* between */ x + 2;"; let (tokens, _) = lex(src); for (_, tok, _) in &tokens { assert!(!matches!(tok, Tok::LineComment(_) | Tok::BlockComment(_))); } assert!(tokens.iter().any(|(_, t, _)| matches!(t, Tok::Let))); } #[test] fn trivia_table_captures_comments_and_blank_lines() { let src = "let x = 1; // tail\n\n/* between */ x + 2;"; let (_, events) = lex(src); let kinds: Vec<_> = events.iter().map(describe).collect(); assert!(kinds.iter().any(|s| s.starts_with("line@"))); assert!(kinds.iter().any(|s| s.starts_with("blank@"))); assert!(kinds.iter().any(|s| s.starts_with("block@"))); } } /// A small calculator token enum. `LineComment` and `BlockComment` carry their /// payload so the `Classify` impl can hand the text to marginalia. #[derive(Clone, Debug, Logos, PartialEq, Eq)] #[logos(skip r"[ \t\f\r\n]+")] pub enum Tok { #[token("let")] Let, #[token("=")] Eq, #[token(";")] Semi, #[token("+")] Plus, #[token("*")] Star, #[regex(r"[0-9]+", |l| l.slice().parse::<i64>().ok())] Num(i64), #[regex(r"[A-Za-z_][A-Za-z0-9_]*", |l| l.slice().to_owned(), priority = 2)] Ident(String), #[regex(r"//[^\n]*", |l| l.slice().to_owned(), allow_greedy = true)] LineComment(String), #[regex(r"/\*([^*]|\*[^/])*\*/", |l| l.slice().to_owned())] BlockComment(String), } }
The two comment variants carry their lexeme so the Classify impl can hand the text straight to marginalia:
impl Classify for Tok {
fn trivia(&self) -> Option<TriviaPiece<'_>> {
match self {
Self::LineComment(s) => Some(TriviaPiece { kind: TriviaKind::Line, text: s }),
Self::BlockComment(s) => Some(TriviaPiece { kind: TriviaKind::Block, text: s }),
_ => None,
}
}
}
TriviaKind::Line and TriviaKind::Block are the only two kinds; blank-line events are detected by the lexer itself by counting newlines between adjacent tokens, so the user does not have to model them.
The Lexer Adapter
TriviaLexer<I, T, E> takes any Iterator<Item = Result<(usize, T, usize), E>> (the shape lalrpop expects) and yields the same shape, minus trivia. The constructor also borrows the source so it can detect blank-line runs by inspecting the bytes between tokens.
#![allow(unused)] fn main() { use logos::Logos; use marginalia::{BuiltinKind, Classify, Trivia, TriviaEvent, TriviaLexer, TriviaPiece}; /// A small calculator token enum. `LineComment` and `BlockComment` carry their /// payload so the `Classify` impl can hand the text to marginalia. #[derive(Clone, Debug, Logos, PartialEq, Eq)] #[logos(skip r"[ \t\f\r\n]+")] pub enum Tok { #[token("let")] Let, #[token("=")] Eq, #[token(";")] Semi, #[token("+")] Plus, #[token("*")] Star, #[regex(r"[0-9]+", |l| l.slice().parse::<i64>().ok())] Num(i64), #[regex(r"[A-Za-z_][A-Za-z0-9_]*", |l| l.slice().to_owned(), priority = 2)] Ident(String), #[regex(r"//[^\n]*", |l| l.slice().to_owned(), allow_greedy = true)] LineComment(String), #[regex(r"/\*([^*]|\*[^/])*\*/", |l| l.slice().to_owned())] BlockComment(String), } /// Hand each comment variant to marginalia. Non-trivia tokens return `None` and /// flow through the lexer unchanged. impl Classify for Tok { fn trivia(&self) -> Option<TriviaPiece<'_>> { match self { Self::LineComment(s) => Some(TriviaPiece { kind: BuiltinKind::Line, text: s, }), Self::BlockComment(s) => Some(TriviaPiece { kind: BuiltinKind::Block, text: s, }), _ => None, } } } /// Project a trivia event to a short human-readable tag for snapshot tests. pub fn describe(event: &TriviaEvent) -> String { match &event.trivia { Trivia::Comment { kind: BuiltinKind::Line, text, } => format!("line@{}..{}: {text}", event.span.start, event.span.end), Trivia::Comment { kind: BuiltinKind::Block, text, } => format!("block@{}..{}: {text}", event.span.start, event.span.end), Trivia::BlankLine => format!("blank@{}..{}", event.span.start, event.span.end), } } #[cfg(test)] mod tests { use super::*; #[test] fn parser_never_sees_comments() { let src = "let x = 1; // tail\n\n/* between */ x + 2;"; let (tokens, _) = lex(src); for (_, tok, _) in &tokens { assert!(!matches!(tok, Tok::LineComment(_) | Tok::BlockComment(_))); } assert!(tokens.iter().any(|(_, t, _)| matches!(t, Tok::Let))); } #[test] fn trivia_table_captures_comments_and_blank_lines() { let src = "let x = 1; // tail\n\n/* between */ x + 2;"; let (_, events) = lex(src); let kinds: Vec<_> = events.iter().map(describe).collect(); assert!(kinds.iter().any(|s| s.starts_with("line@"))); assert!(kinds.iter().any(|s| s.starts_with("blank@"))); assert!(kinds.iter().any(|s| s.starts_with("block@"))); } } /// Lex `source`, returning the semantic tokens (comments stripped) and the /// trivia table that the formatter or AST-attachment pass would consume next. pub fn lex(source: &str) -> (Vec<(usize, Tok, usize)>, Vec<TriviaEvent>) { let raw = Tok::lexer(source).spanned().map(|(res, span)| match res { Ok(tok) => Ok((span.start, tok, span.end)), Err(()) => Err(()), }); let mut layer = TriviaLexer::new(raw, source); let tokens: Vec<_> = (&mut layer).filter_map(Result::ok).collect(); let table = layer.into_table(); (tokens, table.events().to_vec()) } }
The returned Vec<TriviaEvent> is what a downstream formatter consumes. Each event has a Span { start, end } and a Trivia variant (Line, Block, or BlankLine).
Recovering the Trivia Table
Two ways to get the table back out: table() borrows it while the lexer is still in use; into_table() consumes the lexer. The composition pattern in the README chains TriviaLexer inside offsides::LayoutLexer, and the layout lexer exposes an into_inner() so the trivia table can still be recovered after both adapters are done.
#![allow(unused)] fn main() { use logos::Logos; use marginalia::{BuiltinKind, Classify, Trivia, TriviaEvent, TriviaLexer, TriviaPiece}; /// A small calculator token enum. `LineComment` and `BlockComment` carry their /// payload so the `Classify` impl can hand the text to marginalia. #[derive(Clone, Debug, Logos, PartialEq, Eq)] #[logos(skip r"[ \t\f\r\n]+")] pub enum Tok { #[token("let")] Let, #[token("=")] Eq, #[token(";")] Semi, #[token("+")] Plus, #[token("*")] Star, #[regex(r"[0-9]+", |l| l.slice().parse::<i64>().ok())] Num(i64), #[regex(r"[A-Za-z_][A-Za-z0-9_]*", |l| l.slice().to_owned(), priority = 2)] Ident(String), #[regex(r"//[^\n]*", |l| l.slice().to_owned(), allow_greedy = true)] LineComment(String), #[regex(r"/\*([^*]|\*[^/])*\*/", |l| l.slice().to_owned())] BlockComment(String), } /// Hand each comment variant to marginalia. Non-trivia tokens return `None` and /// flow through the lexer unchanged. impl Classify for Tok { fn trivia(&self) -> Option<TriviaPiece<'_>> { match self { Self::LineComment(s) => Some(TriviaPiece { kind: BuiltinKind::Line, text: s, }), Self::BlockComment(s) => Some(TriviaPiece { kind: BuiltinKind::Block, text: s, }), _ => None, } } } /// Lex `source`, returning the semantic tokens (comments stripped) and the /// trivia table that the formatter or AST-attachment pass would consume next. pub fn lex(source: &str) -> (Vec<(usize, Tok, usize)>, Vec<TriviaEvent>) { let raw = Tok::lexer(source).spanned().map(|(res, span)| match res { Ok(tok) => Ok((span.start, tok, span.end)), Err(()) => Err(()), }); let mut layer = TriviaLexer::new(raw, source); let tokens: Vec<_> = (&mut layer).filter_map(Result::ok).collect(); let table = layer.into_table(); (tokens, table.events().to_vec()) } #[cfg(test)] mod tests { use super::*; #[test] fn parser_never_sees_comments() { let src = "let x = 1; // tail\n\n/* between */ x + 2;"; let (tokens, _) = lex(src); for (_, tok, _) in &tokens { assert!(!matches!(tok, Tok::LineComment(_) | Tok::BlockComment(_))); } assert!(tokens.iter().any(|(_, t, _)| matches!(t, Tok::Let))); } #[test] fn trivia_table_captures_comments_and_blank_lines() { let src = "let x = 1; // tail\n\n/* between */ x + 2;"; let (_, events) = lex(src); let kinds: Vec<_> = events.iter().map(describe).collect(); assert!(kinds.iter().any(|s| s.starts_with("line@"))); assert!(kinds.iter().any(|s| s.starts_with("blank@"))); assert!(kinds.iter().any(|s| s.starts_with("block@"))); } } /// Project a trivia event to a short human-readable tag for snapshot tests. pub fn describe(event: &TriviaEvent) -> String { match &event.trivia { Trivia::Comment { kind: BuiltinKind::Line, text, } => format!("line@{}..{}: {text}", event.span.start, event.span.end), Trivia::Comment { kind: BuiltinKind::Block, text, } => format!("block@{}..{}: {text}", event.span.start, event.span.end), Trivia::BlankLine => format!("blank@{}..{}", event.span.start, event.span.end), } } }
Composition with offsides
marginalia and offsides are designed to compose. Stack TriviaLexer inside LayoutLexer so trivia is stripped before the layout algorithm starts measuring columns:
#![allow(unused)] fn main() { let raw = MyTok::lexer(source).spanned().map(/* shape into (lo, tok, hi) */); let trivia = marginalia::TriviaLexer::new(raw, source); let layout = offsides::LayoutLexer::new(trivia, source, cfg); let ast = MyParser::new().parse(layout)?; }
The grammar sees a stream that is both trivia-clean and braced-and-semicoloned, with neither concern leaking into the .lalrpop file.
Best Practices
Keep Classify::trivia total over your enum: returning None for the non-trivia arms is required and the compiler will catch missing arms if you write an exhaustive match.
Capture comment text in the token variant. The Classify impl needs an &str, so the variant has to own the lexeme. Logos’s |l| l.slice().to_owned() callback does this in one line.
Resolve trivia at AST-build time, not at parse time. lalrpop actions run with (usize, lo, hi) spans available; pair those with TriviaTable::between(lo, hi) to attach leading or trailing trivia to the node being built.
For formatters, never re-tokenize. The original TriviaTable already has every comment with its byte span; the attach and pretty modules are designed around that fact.