DocsOrderbook
Learning Path00 Start Here01 Big Picture
IntroductionWhat Is an Order BookBids and AsksSpread and Mid PriceLimit OrdersMarket OrdersDepth and Liquidity
Matching Engine Basics02 Core Domain03 Matching Flow05 Guided Exercises06 Visual Guide07 Why This Design08 Performance Bridge
ArchitectureTechnical ArchitectureUI Notes and Terminal Ideas
Engine Performance NotesPerformance
Milestones
Glossary
LearnTerminal
DocsPerformance

Performance

Benchmark commands, current baselines, and profiling guidance.

Performance

This document records the benchmark and profiling path for the engine.

For the structural explanation of the matching engine through a performance lens, read `docs/engine-performance.md`.

Benchmark command

Run:

  • cargo bench --bench throughput -- --sample-size 10
  • cargo bench --bench price_level_prototypes -- --sample-size 10

For a lightweight local profile wrapper:

  • ./scripts/profile_bench.sh

On macOS the helper uses /usr/bin/time -l; on Linux it uses /usr/bin/time -v.

Current baseline

The most recent local benchmark pass produced approximately:

  • submit_resting_limit_orders: 7.80 ms to 7.83 ms
  • cross_seeded_book: 395.8 µs to 396.9 µs
  • cancel_populated_book: 4.00 ms to 4.02 ms
  • sweep_single_price_fifo: 8.02 ms to 8.04 ms

The benchmark suite also includes lean-path comparisons:

  • submit_resting_limit_orders_minimal
  • submit_resting_limit_orders_minimal_no_invariants
  • cross_seeded_book_minimal
  • cross_seeded_book_minimal_no_invariants
  • cancel_populated_book_minimal
  • cancel_populated_book_minimal_no_invariants
  • cancel_deep_single_price_level
  • cancel_deep_single_price_level_minimal
  • cancel_deep_single_price_level_minimal_no_invariants

The repository also includes prototype-only price-level structure comparisons:

  • prototype_enqueue_vecdeque_1000
  • prototype_enqueue_slab_1000
  • prototype_enqueue_dense_1000
  • prototype_fifo_sweep_vecdeque_1000
  • prototype_fifo_sweep_slab_1000
  • prototype_fifo_sweep_dense_1000
  • prototype_deep_cancel_vecdeque_1000
  • prototype_deep_cancel_slab_1000
  • prototype_deep_cancel_dense_1000

Allocation profile

The most recent ./scripts/profile_bench.sh run on macOS reported approximately:

  • maximum resident set size: 57.1 MB
  • peak memory footprint: 38.7 MB
  • retired instructions: 895M
  • elapsed cycles: 275M
  • wall time for the full benchmark suite: 38.0 s

These are whole-suite numbers, not per-benchmark isolated allocations. They are useful for regression tracking, not for attributing one code path precisely.

These numbers are environment-specific and should only be used as local regression references, not absolute performance claims.

Profiling guidance

Use the benchmark suite before attempting engine optimizations.

Suggested order:

  1. Run ./scripts/profile_bench.sh.
  2. Compare against the stored Criterion baseline.
  3. Only optimize paths that show stable pressure across repeated runs.

Current likely hot paths:

  • resting order insertion
  • resting order insertion with rich result assembly
  • resting order insertion with lean summaries
  • post-mutation invariant walks
  • best-level matching loops
  • FIFO sweep execution
  • cancellation on a populated book
  • cancellation inside a deep single price level
  • mixed insert-and-cross churn

Current read on optimization pressure:

  • CPU hot spots appear more important than gross memory pressure at the current workload size.
  • The full suite footprint is moderate for a benchmark process and does not yet justify structural redesign.
  • If optimization continues, prioritize per-benchmark allocation tracing or CPU sampling around matching loops and event/top-of-book construction.
PreviousNext
On This Page4
Benchmark commandCurrent baselineAllocation profileProfiling guidance