Why do we need KV caching?
A visual explanation of why LLM inference caches key and value vectors, and why the trick saves compute during token-by-token generation.
9 min read
home
@yasuonguyen01
build ยท break ยท understand




















Robotics Engineer
December 2025 โ June 2026
AI Engineer Trainee
Made some good friends along the way.
July 2025 โ December 2025
Bachelor of Science in Artificial Intelligence
Best investment in myself I ever made
Oct 2021 โ Oct 2025