ML / AI · Visual reads · Est. 2023
Least
Squares
Visual breakdowns of modern machine learning systems — architectures, system design, and ML design, drawn from first principles for engineers who build things.
What we cover
Three pillars.
One mission — make hard systems legible.
Architectures
Transformer internals, diffusion stacks, MoE routing — diagrammed from first principles.
02System Design
Inference pipelines, vector DBs, distributed training — how production ML actually scales.
03ML Design
End-to-end patterns for ranking, recommendations, RAG, and agentic workflows.
Visual reads
View all →Architecture
Speculative Decoding in LLMs
How a cheap draft model and a fast verification pass can deliver full-quality output at a fraction of the latency.
System Design
Inference at scale
Batching strategies, KV caching, continuous batching, and the hard economics of serving large models to real users.
ML Design
The hidden costs of RAG in production
Latency trade-offs between dense retrieval, semantic caching, and long-context models.
YouTube