smallvec
The smallvec
crate provides a vector type that stores a small number of elements inline, avoiding heap allocation for common cases. In compiler development, many data structures contain a small number of elements most of the time. For example, most expressions have only a few operands, most basic blocks have only a few instructions, and most functions have only a few parameters. Using SmallVec
for these cases can significantly reduce allocations and improve cache locality.
The key insight is that compilers often deal with collections that are usually small but occasionally large. Traditional vectors always allocate on the heap, even for a single element. SmallVec stores up to N elements inline within the struct itself, only spilling to heap allocation when the capacity is exceeded. This optimization is particularly effective in AST nodes, instruction operands, and symbol tables.
Basic Usage
SmallVec is used similarly to standard vectors but with an inline capacity specified in the type:
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } }
The type parameter [i32; 4]
specifies both the element type and inline capacity. The vector starts with space for 4 elements allocated inline. When a fifth element is added, it spills to the heap with a larger capacity.
Tokenization
A common use case in compilers is storing token streams. Most expressions contain a moderate number of tokens that fit well within inline storage:
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } }
The TokenStream
type alias uses a SmallVec with inline capacity for 32 tokens. This covers most expressions without heap allocation while still handling arbitrarily large inputs when needed.
AST Nodes
Abstract syntax tree nodes often have a small, variable number of children. SmallVec is ideal for storing these children:
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } }
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } }
Most AST nodes have fewer than 4 children, so this inline capacity avoids heap allocation for the common case. Function calls might have many arguments, but the vector seamlessly handles this by spilling to the heap.
Instruction Operands
Compiler intermediate representations often model instructions with a variable number of operands:
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } }
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } }
Most instructions have 0-3 operands, making SmallVec with inline capacity of 3 an excellent choice. This keeps instruction objects compact and cache-friendly.
Symbol Tables
Symbol tables benefit from SmallVec at multiple levels. Most scopes contain few symbols, and the scope stack itself is usually shallow:
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } }
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } }
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } }
This implementation uses SmallVec for both the scope stack (usually less than 8 deep) and the symbol list within each scope (often less than 16 symbols). This provides excellent performance for typical programs while gracefully handling edge cases.
Error Context
Compiler errors often need to track context through multiple levels. SmallVec efficiently stores this context:
#![allow(unused)] fn main() { use smallvec::{smallvec, SmallVec}; #[derive(Debug, Clone, PartialEq)] pub struct Token { pub kind: TokenKind, pub span: Span, } #[derive(Debug, Clone, PartialEq)] pub enum TokenKind { Identifier(String), Number(i64), Operator(char), Keyword(String), } #[derive(Debug, Clone, PartialEq)] pub struct Span { pub start: usize, pub end: usize, } pub type TokenStream = SmallVec<[Token; 32]>; pub fn tokenize_expression(input: &str) -> TokenStream { let mut tokens = SmallVec::new(); let mut chars = input.char_indices().peekable(); while let Some((i, ch)) = chars.next() { match ch { ' ' | '\t' | '\n' => continue, '+' | '-' | '*' | '/' | '(' | ')' => { tokens.push(Token { kind: TokenKind::Operator(ch), span: Span { start: i, end: i + 1, }, }); } '0'..='9' => { let start = i; let mut value = ch.to_digit(10).unwrap() as i64; while let Some(&(_, ch)) = chars.peek() { if ch.is_ascii_digit() { value = value * 10 + ch.to_digit(10).unwrap() as i64; chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); tokens.push(Token { kind: TokenKind::Number(value), span: Span { start, end }, }); } 'a'..='z' | 'A'..='Z' | '_' => { let start = i; let mut ident = String::new(); ident.push(ch); while let Some(&(_, ch)) = chars.peek() { if ch.is_alphanumeric() || ch == '_' { ident.push(ch); chars.next(); } else { break; } } let end = chars.peek().map(|&(i, _)| i).unwrap_or(input.len()); let kind = match ident.as_str() { "if" | "else" | "while" | "for" | "return" => TokenKind::Keyword(ident), _ => TokenKind::Identifier(ident), }; tokens.push(Token { kind, span: Span { start, end }, }); } _ => {} } } tokens } #[derive(Debug, Clone)] pub struct AstNode { pub kind: AstKind, pub children: SmallVec<[Box<AstNode>; 4]>, } #[derive(Debug, Clone)] pub enum AstKind { Program, Function(String), Block, Expression, Statement, Identifier(String), Number(i64), } pub fn build_simple_ast() -> AstNode { AstNode { kind: AstKind::Program, children: smallvec![Box::new(AstNode { kind: AstKind::Function("main".to_string()), children: smallvec![Box::new(AstNode { kind: AstKind::Block, children: smallvec![Box::new(AstNode { kind: AstKind::Expression, children: smallvec![Box::new(AstNode { kind: AstKind::Number(42), children: SmallVec::new(), })], })], })], })], } } #[derive(Debug, Clone)] pub struct Instruction { pub opcode: Opcode, pub operands: SmallVec<[Operand; 3]>, } #[derive(Debug, Clone)] pub enum Opcode { Load, Store, Add, Sub, Mul, Jmp, Ret, } #[derive(Debug, Clone)] pub enum Operand { Register(u8), Immediate(i32), Memory(u32), } pub fn create_instruction_sequence() -> Vec<Instruction> { vec![ Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(0), Operand::Memory(0x1000)], }, Instruction { opcode: Opcode::Load, operands: smallvec![Operand::Register(1), Operand::Memory(0x1004)], }, Instruction { opcode: Opcode::Add, operands: smallvec![ Operand::Register(2), Operand::Register(0), Operand::Register(1) ], }, Instruction { opcode: Opcode::Store, operands: smallvec![Operand::Memory(0x1008), Operand::Register(2)], }, Instruction { opcode: Opcode::Ret, operands: SmallVec::new(), }, ] } pub fn demonstrate_capacity() { let mut vec: SmallVec<[i32; 4]> = SmallVec::new(); println!("Initial capacity: {}", vec.inline_size()); println!("Is heap allocated: {}", vec.spilled()); for i in 0..6 { vec.push(i); println!( "After pushing {}: capacity = {}, spilled = {}", i, vec.capacity(), vec.spilled() ); } } pub struct SymbolTable { scopes: SmallVec<[Scope; 8]>, } pub struct Scope { symbols: SmallVec<[(String, SymbolInfo); 16]>, } #[derive(Debug, Clone)] pub struct SymbolInfo { pub kind: SymbolKind, pub offset: usize, } #[derive(Debug, Clone)] pub enum SymbolKind { Variable, Function, Parameter, } impl Default for SymbolTable { fn default() -> Self { Self::new() } } impl SymbolTable { pub fn new() -> Self { Self { scopes: smallvec![Scope { symbols: SmallVec::new(), }], } } pub fn push_scope(&mut self) { self.scopes.push(Scope { symbols: SmallVec::new(), }); } pub fn pop_scope(&mut self) { if self.scopes.len() > 1 { self.scopes.pop(); } } pub fn insert(&mut self, name: String, info: SymbolInfo) { if let Some(scope) = self.scopes.last_mut() { scope.symbols.push((name, info)); } } pub fn lookup(&self, name: &str) -> Option<&SymbolInfo> { for scope in self.scopes.iter().rev() { for (sym_name, info) in &scope.symbols { if sym_name == name { return Some(info); } } } None } } #[derive(Debug)] pub struct Location { pub file: String, pub line: u32, pub column: u32, } impl CompactError { pub fn new(message: String, location: Location) -> Self { Self { messages: smallvec![message], locations: smallvec![location], } } pub fn add_context(&mut self, message: String, location: Location) { self.messages.push(message); self.locations.push(location); } } #[cfg(test)] mod tests { use super::*; #[test] fn test_tokenization() { let tokens = tokenize_expression("x + 42 * (y - 3)"); assert_eq!(tokens.len(), 9); assert!(matches!(tokens[0].kind, TokenKind::Identifier(_))); assert!(matches!(tokens[2].kind, TokenKind::Number(42))); } #[test] fn test_inline_capacity() { let vec: SmallVec<[i32; 8]> = smallvec![1, 2, 3, 4]; assert!(!vec.spilled()); assert_eq!(vec.len(), 4); assert_eq!(vec.capacity(), 8); } #[test] fn test_spill_to_heap() { let mut vec: SmallVec<[i32; 2]> = SmallVec::new(); vec.push(1); vec.push(2); assert!(!vec.spilled()); vec.push(3); assert!(vec.spilled()); assert!(vec.capacity() >= 3); } #[test] fn test_symbol_table() { let mut table = SymbolTable::new(); table.insert( "x".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 0, }, ); table.push_scope(); table.insert( "y".to_string(), SymbolInfo { kind: SymbolKind::Variable, offset: 4, }, ); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_some()); table.pop_scope(); assert!(table.lookup("x").is_some()); assert!(table.lookup("y").is_none()); } } #[derive(Debug)] pub struct CompactError { pub messages: SmallVec<[String; 2]>, pub locations: SmallVec<[Location; 2]>, } }
Most errors have only one or two context levels, so inline storage of 2 elements covers the common case without allocation.
Best Practices
Choose inline capacity based on profiling and typical use cases. Too small wastes the optimization opportunity, while too large wastes stack space. Common sweet spots are 2-4 for AST children, 8-16 for local collections, and 32-64 for token buffers.
Be aware of the size implications. A SmallVec<[T; N]>
is approximately the size of N elements plus a discriminant and pointer. This can make structs larger, potentially affecting cache behavior. Measure the trade-offs in your specific use case.
Use type aliases to make code more readable and to centralize capacity decisions. This makes it easy to tune capacities based on profiling data.
Consider using SmallVec in hot paths where allocation overhead matters. Parser combinators, visitor patterns, and iterative algorithms often benefit significantly.
The smallvec!
macro provides convenient initialization similar to vec!
. Use it for clarity when creating SmallVecs with initial values.
For recursive structures like ASTs, SmallVec can dramatically reduce total allocations. A tree with depth D and branching factor B would normally require O(B^D) allocations, but with SmallVec, most nodes require zero heap allocations.