-
Can a Single Direction Weaken Vivi's Refusal Behavior?
A small activation-ablation experiment on refusal behavior in a Vietnamese chat model.
June 8, 2026 5 min read -
Why do we need KV caching?
A visual explanation of why LLM inference caches key and value vectors, and why the trick saves compute during token-by-token generation.
May 17, 2026 9 min read -
Fantastic Directions and Where to Find Them: Dissecting the Lazy Mechanism Inside RMU
March 15, 2026 25 min read -
A note on chapter 3 of Sutton & Barto: Finite Markov Decision Process (MDP)
December 8, 2025 20 min read -
Residual Stream is Key to Transformer Interpretability
June 21, 2025 18 min read