Why do we need KV caching?
A visual explanation of why LLM inference caches key and value vectors, and why the trick saves compute during token-by-token generation.
9 min read
@yasuonguyen01
build · break · understand




















Robotics Engineer December 2025 – June 2026
AI Engineer Trainee July 2025 – December 2025
Made some good friends along the way.
Bachelor of Science in Artificial Intelligence Oct 2021 – Oct 2025
Best investment in myself I ever made