Papers
- Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Freepaperneurips-2025oral
- 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilitiespaperneurips-2025oral
- Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Trainingpaperneurips-2025oral
- Does RL Really Incentivize Reasoning Capacity in LLMs?paperneurips-2025oral
- Superposition Yields Robust Neural Scalingpaperneurips-2025oral
- Learning Dynamics of LLM Finetuningpapericlr-2025oral
- Understanding Integer Overflow in C++paperarxivsystem
- Code Foundation Models to Agents: Surveypaperarxivllm
- Foundations of Diffusion Modelspaperarxivdiffusion
- Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Modelspaperarxivinterpretability
- What Every Computer Scientist Should Know About Floating-Point Arithmeticcssystemspaper
Blog
- What Would Non-Linear Features Actually Look Like?bloginterpretability
- Smol Training Playbooktutoriallmtraining
- Ultrascale Playbooktutorialllmtrainingscaling
- Understanding CUDA Compiler & PTX With a Top-K Kernelbloggpucudacompiler
- Deep Dive into Triton Internals (Part 1)bloggputriton
- Understand Tinygradgputinygradtutorial
- Inside NVIDIA GPUs: Anatomy of high performance matmul kernelsbloggpucuda
- Triton Flash Attention Kernel Walkthroughbloggputritonllm
- Triton Linear Layout Conceptbloggputriton
- Field Notes on Scaling MoE Expert Parallelism with DeepEPbloggpumoescaling
- Flash Attention for 5090 in CUDA C++gpucudaattentionblog
- NVFP4 Pretraining: From Theory to Implementation (Part 1)gpuquantizedllmblog
- Algorithms for Modern Hardwarecsalgorithmshpctutorial
- CPython Internalscspythoninternalstutorial
- Write Your Own Virtual Machinecsvmsystemstutorial
- How to Vulkan in 2026csgraphicsvulkantutorial
- Vulkan Guidecsgraphicsvulkantutorial
- Vulkan Guide (Khronos)csgraphicsvulkantutorial
- No Graphics APIcsgraphicssystemsblog
- Dive into Systemscssystemstutorial
- Explorations of RDM in LLM Systemsllmsystemsblog
- Inside Nvidia GPU: Discussing Blackwell's Limitations and Predicting Rubin's Microarchitecturecudablog
- Disscecting FlashInfer - A Systems Perspective on High-Performance LLM Inferencellmsystemsblog
- the bug that taught me more about PyTorch than years of using itpytorchblog
- Dummy Guide to LLM Samplingllmblog
Course
- 15-440/640 Distributed Systems (CMU)csdistributed-systemscourse
- 6.5840: Distributed Systems (MIT)csdistributed-systemscourse
- Advanced Topics of Deep Generative Modelsdiffusion
- Hardvard CS127: Cryptographycscryptographycourse
Mathematics
- Optimization for Machine Learning (Princeton)mathoptimizationmllecture-notes
- Introduction to Online Convex Optimizationmathoptimizationbook
- Introduction to Online Controlmathcontrol-theorybook
- An Infinite Descent into Pure Mathematicsmathpure-mathbook
- Introduction to Homotopy Type Theorymathtype-theorybook
Quantitative Finance
- Quant Trading Guidequanttrading
- QUANT BIBLE (MIT Sloan Business)quantfinance
Cool Repositories
- simple-llmrepollm
- mini-sglangllminferencerepo
- Data Structures in Practicedata-structuresrepo
- nano-vllmrepollm
- llm.qrepollmquantized
Robotics
- BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learningpaperarxivrobotics
- VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulationpaperrobotics
- BAAI Thor Towards Human-level WhOle-body Reactions under Intense Contact-Rich Environmentspaperrobotics
- GentleHumanoid Learning Upper-body Compliance for Contact-rich Human and Object Interactionpaperrobotics