tags

en

6 entries

Why do we need KV caching?

9 min read

Why do we need KV caching?

Why I want to learn to code again?

8 min read

Why I want to learn to code again?

A Little Bit About "Numbers" and "Mathematics"

January 3, 2026

16 min read

A Little Bit About "Numbers" and "Mathematics"

A note on chapter 3 of Sutton & Barto: Finite Markov Decision Process (MDP)

December 8, 2025

20 min read

A note on chapter 3 of Sutton & Barto: Finite Markov Decision Process (MDP)

Fantastic Directions and Where to Find Them: Dissecting the Lazy Mechanism Inside RMU

November 20, 2025

18 min read

Fantastic Directions and Where to Find Them: Dissecting the Lazy Mechanism Inside RMU

Residual Stream is Key to Transformer Interpretability

18 min read

Residual Stream is Key to Transformer Interpretability