Papers
- Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Freeneurips-2025oral
- 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilitiesneurips-2025oral
- Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Trainingneurips-2025oral
- Does RL Really Incentivize Reasoning Capacity in LLMs?: Beyond the Base Modelneurips-2025oral
- Superposition Yields Robust Neural Scalingneurips-2025oral
- Learning Dynamics of LLM Finetuningiclr-2025oral
- Understanding Integer Overflow in C++arxivsystem
- Code Foundation Models to Agents: Survey: A Comprehensive Survey and Practical Guide to Code Intelligencearxivllm
- Foundations of Diffusion Models: A Self-Contained Introduction in General State Spacesarxivdiffusion
- BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learningarxivrobotics
- Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Modelsinterpretabilitypaper
Blog
- What Would Non-Linear Features Actually Look Like?interpretabilityblog
- Smol Training Playbook: Learn about how to pre-train a LLMllmtraining
- Ultrascale Playbook: Learn how to scale training LLMllmtrainingscaling
- Understanding CUDA Compiler & PTX: With a Top-K Kernelgpucudacompilertutorial
- Deep Dive into Triton Internals (Part 1)gputritontutorial
- Understand Tinygradgputinygradtutorial
- Inside NVIDIA GPUs: Anatomy of high performance matmul kernelsgpucuda
- Triton Flash Attention Kernel Walkthroughgputritonllm
- Triton Linear Layout Conceptgputriton
- Field Notes on Scaling MoE Expert Parallelism with DeepEPgpumoescaling
- Flash Attention for 5090 in CUDA C++: Writing Speed-of-Light Flash Attentiongpucudaattentiontutorial
- NVFP4 Pretraining: From Theory to Implementation (Part 1)gpuquantizedllm
- Algorithms for Modern Hardwarecsalgorithmshpctutorial
- CPython Internalscspythoninternals
- Write Your Own Virtual Machinecsvmsystemstutorial
- How to Vulkan in 2026csgraphicsvulkantutorial
- Vulkan Guidecsgraphicsvulkantutorial
- Vulkan Guide (Khronos)csgraphicsvulkantutorial
- No Graphics APIcsgraphicssystems
- Dive into Systemscssystemstutorial
Course
- 15-440/640 Distributed Systems (CMU)csdistributed-systemscourse
- 6.5840: Distributed Systems (MIT)csdistributed-systemscourse
- Advanced Topics of Deep Generative Modelsdiffusion
Mathematics
- Introduction to Modern Convex Geometry: An Elementary Introductionmathgeometrytheory
- Optimization for Machine Learning (Princeton)mathoptimizationmllecture-notes
- Introduction to Online Convex Optimizationmathoptimizationonline-learning
- Introduction to Online Controlmathcontrol-theoryonline-learning
- An Infinite Descent into Pure Mathematicsmathpure-mathtutorial
- Introduction to Homotopy Type Theorymathtype-theoryfoundations
Quantitative Finance
- Quant Trading Guidequanttradingguide
- QUANT BIBLE (MIT Sloan Business)quantfinanceguide
Cool Repositories
- simple-llmrepollmcode
- mini-sglangllminferencecode
- Data Structures in Practicedata-structurescode