Why do we need KV caching?
A visual explanation of why LLM inference caches key and value vectors, and why the trick saves compute during token-by-token generation.
9 min read
home
@nguyen le
build · break · understand




















Robotics Engineer
Met great people here (my peers and seniors).
December 2025 - June 2026
AI Engineer Intern
Made some good friends along the way.
July 2025 - December 2025
B.Sc. in Artificial Intelligence
Best investment in myself I ever made
Oct 2021 - Oct 2025