07 Why This Design
This chapter exists for one reason:
to answer "why this design?" before your brain starts inventing five other designs and pulling you off track.
That instinct is good. It means you are thinking like an engineer. But early in learning, too many alternatives can make the current design feel less solid than it really is.
So this chapter is here to anchor you.
The core rule
The project does not try to be the most advanced matching engine possible.
It tries to be:
- correct
- deterministic
- understandable
- measurable
- good enough to teach real tradeoffs
That is why some structures are simpler than what a production HFT engine might eventually use.
Why `BTreeMap` for bids and asks?
Current choice:
- bids:
BTreeMap<Price, PriceLevel> - asks:
BTreeMap<Price, PriceLevel>
Why this was chosen:
- prices stay sorted automatically
- best bid and best ask are easy to find
- the relationship between price priority and the data structure is easy to see
- it is deterministic and readable
Why not a dense price array yet?
- a dense array assumes a bounded and practical price grid
- it is faster only when the market structure makes that worthwhile
- it adds another layer of representation to understand early
Why not a heap?
- heaps are good at finding a best value
- they are worse at full ordered traversal and level management
- they also do not naturally model grouped price levels the way a map does
Beginner takeaway:
BTreeMap is not the fastest possible structure. It is the clearest structure that still behaves like a real engine.
Why `PriceLevel` instead of storing all orders in one giant list?
Current choice:
- orders are grouped by price first
- then FIFO is handled inside each price level
Why:
- price-time priority is easier to reason about in two layers:
- best price first
- then oldest order at that price
Why not one giant sorted structure of all orders?
- it hides the mental model
- it makes price levels less obvious
- it makes “best price, then FIFO within that price” harder to see directly
Beginner takeaway:
Grouping by price is not just an implementation detail. It matches how the matching rule itself works.
Why slot-indexed linked storage inside a price level?
Current choice:
- each price level keeps FIFO order
- each order can also be removed later by a known slot
Why:
- matching wants cheap pop-from-front behavior
- cancellation wants cheap removal of a known resting order
- the current design tries to satisfy both
Why not just use VecDeque<Order>?
VecDequeis very clean for FIFO- but cancellation inside one deep level becomes expensive if you have to search
- earlier versions showed that deep cancel was a real weakness
Why not use a hash-heavy linked design?
- we tried that direction in prototype form
- it improved one microcase
- it made broader engine behavior worse
Why this current design instead?
- it keeps the cancel win
- it stays more compact than the rejected hash-heavy variant
- it still makes FIFO visible
Beginner takeaway:
This is an example of a compromise structure: not the simplest possible, not the fanciest possible, but one that exposes the right tradeoff for learning.
Why an `OrderLocator` index?
Current choice:
HashMap<OrderId, OrderLocator>
Why:
- cancel needs a direct path from
OrderIdto the book location - without it, you would have to scan the book
Why not just search all levels on cancel?
- because that teaches the wrong habit
- it is simple, but too wasteful for a serious engine
Why not store raw pointers everywhere?
- more dangerous
- harder to reason about
- easier to get wrong while learning
Beginner takeaway:
The locator index is a professional compromise: fast enough to matter, simple enough to understand.
Why keep matching logic separate from HTTP?
Why:
- matching rules should not depend on transport format
- tests should not need a server
- future wrappers should be replaceable
Why not just build everything inside the API layer?
- because then the engine becomes harder to test, reuse, and reason about
- transport code and matching code start contaminating each other
Beginner takeaway:
Separation is not “extra architecture.” It is what keeps the real logic clean.
Why keep invariants at all?
Why:
- invariants catch impossible states
- they help you trust the engine while learning
- they make refactors safer
Why not disable them permanently for speed?
- because then you lose an important correctness tool
- and you stop learning where safety costs actually come from
Why not always do full-book invariant walks?
- because measurements showed that was too expensive on the hot path
Why Local by default?
- it keeps checks on
- it limits the work to touched structures
- it preserves the learning value without paying the full old cost
Beginner takeaway:
This is a good example of engineering compromise: not maximum safety cost, not zero safety cost, but a measured middle ground.
Why not optimize everything right now?
Because optimization without understanding tends to make systems worse.
This repo already proved that:
- one attempted optimization looked good in a narrow case
- benchmarks showed it hurt the engine more broadly
- the right move was to reject it
That is real engineering.
The right way to use this chapter
When you catch yourself thinking:
- "Why not just use a vector?"
- "Why not use a heap?"
- "Why not use a database?"
- "Why not make it async?"
come back here first.
Most of the time, the answer is:
because the project is trying to teach the matching engine clearly before chasing every advanced possibility.