Papers
- 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilitiesreinforcement-learningdeep-learningneurips-2025oral
- The Value Equivalence Principle for Model-Based Reinforcement Learningreinforcement-learningworld-modelneurips-2020
- Learning Awareness Modelsreinforcement-learningworld-modeliclr-2018
- Embedded Agencyreinforcement-learningworld-modelarxiv
- Maximum Likelihood Reinforcement Learningreinforcement-learningarxivtheory
- Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Freellmattentionneurips-2025oral
- Does RL Really Incentivize Reasoning Capacity in LLMs?llmreinforcement-learningreasoningneurips-2025oral
- Learning Dynamics of LLM Finetuningllmtrainingtheoryiclr-2025oral
- Code Foundation Models to Agents: Surveyllmcodeagentssurveyarxiv
- Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Modelsllminterpretabilitysurveyarxiv
- Bridging the Attention Gap: Complete Replacement Models for Complete Circuit Tracingllminterpretabilityarxiv
- Pretraining Large Language Models with NVFP4llmtrainingarxiv
- Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Trainingdiffusiontheoryneurips-2025oral
- Foundations of Diffusion Modelsdiffusiontheorysurveyarxiv
- Superposition Yields Robust Neural Scalinginterpretabilityscalingtheoryneurips-2025oral
- Understanding Integer Overflow in C++systemscppcompiler
- What Every Computer Scientist Should Know About Floating-Point Arithmeticsystemsnumerical-computingfundamentals
Books
- Optimization for Machine Learning (Princeton)mathematicsoptimizationmachine-learning
- Introduction to Online Convex Optimizationmathematicsoptimizationtheory
- Introduction to Online Controlmathematicscontrol-theoryoptimization
- Optimization for Machine Learning (Lecture Notes)machine-learningoptimization
- An Infinite Descent into Pure Mathematicsmathematicspure-mathfoundations
- Introduction to Homotopy Type Theorymathematicstype-theorylogic
- Probabilities (Jean-Yves Ouvard)mathematicsprobability
- Quantum Theory, Groups and Representations: An Introductionmathematicsquantum-mechanics
- STAT 201A - Introduction to Probability at an advanced level (Berkeley)mathematicsprobability
- Data Stream Algorithms (Lecture Notes)algorithms
- The little book of Semaphoreconcurrent-programmingalgorithms
- Notes on Theory of Distributed Systemdistributed-systems
- Is Parallel Programming Hard, And, If So, What Can You Do About It?parallel-programmingalgorithms
- Algorithms for Modern Hardwarealgorithmssystemshpcperformance
- Quant Trading Guidequantitative-financetradingpractical
- QUANT BIBLE (MIT Sloan Business)quantitative-financetheorycomprehensive
Tutorials
- Smol Training Playbookllmtrainingpractical
- Ultrascale Playbookllmtrainingscaling
- Tiny LLM Serving in a weekllminference
- Machine Learning Compilationdeep-learningcompiler
- How to Scale Your Modelllmtrainingscaling
- Understand Tinygradgpucompilerdeep-learning
- torch.compile Manualpytorch
- CPython Internalspythoninternalssystems
- Write Your Own Virtual Machinesystemsvmcompilerhands-on
- Dive into Systemssystemsfundamentalscomprehensive
- Software Optimization Resourcessystemsperformance
- How to Vulkan in 2026graphicsvulkangpu
- Vulkan Guidegraphicsvulkangpu
- Vulkan Guide (Khronos)graphicsvulkangpu
- GPU Optimization for Game Devgraphicsgpu
Blog Posts
- What Would Non-Linear Features Actually Look Like?interpretabilitytheoryllm
- On neural scaling and the quanta hypothesisinterpretabilitytheoryllm
- Dummy Guide to LLM Samplingllminferencepractical
- Diffusion Language Models Deep Divellmdiffusion
- Linear Attention: Does Attention Have a Softmax?llmattention
- Understanding CUDA Compiler & PTX With a Top-K Kernelgpucudacompilerkernels
- Deep Dive into Triton Internals (Part 1)gputritoncompiler
- Inside NVIDIA GPUs: Anatomy of high performance matmul kernelsgpucudaperformancekernels
- Triton Flash Attention Kernel Walkthroughgputritonattentionllm
- Triton Linear Layout Conceptgputritonmemory
- Flash Attention for 5090 in CUDA C++gpucudaattentionperformance
- Worklog: Optimising GEMM on NVIDIA H100 for cuBLAS-like Performance (WIP)gpucudaperformancekernels
- CUTLASS CUTE tutorialgpucudacutlass
- NCCL from scratch: Writing my own communications librarygpudistributed-systems
- Field Notes on Scaling MoE Expert Parallelism with DeepEPllmmoescalingdistributed-systems
- NVFP4 Pretraining: From Theory to Implementation (Part 1)llmquantizationtraininggpu
- Explorations of RDMA in LLM Systemsllmsystemsnetworkingdistributed-systems
- Dissecting FlashInfer - A Systems Perspective on High-Performance LLM Inferencellminferencesystemsperformance
- the bug that taught me more about PyTorch than years of using itpytorchdeep-learningdebugging
- PagedAttention from first principlesllminferencevllm
- Understanding LLM Inference Engines: Inside Nano-vLLMllminferencevllm
- Inside vLLM: Anatomy of a High-Throughput LLM Inference System: Accompany notebook: https://modal.com/notebooks/modal-labs/charles-dev/nb-x2wXrLH7aqi7HGVQ8Fosh2llminferencevllm
- Distributed GPTllmtrainingdistributed-systems
- Defeating Nondeterminism in LLM Inferencellminference
- Demystifying Reasoning Modelsllmreasoning
- No Graphics APIgraphicssystemsarchitecture
- Allocators from C to Zigsystems
- Beginner's Guide to Linkerssystems
Courses
- INF6953PE: Deep Learning Dynamics (Montreal)deep-learningtheory
- CSC2541: Topics in Machine Learning: Neural Net Training Dynamics (UofT)deep-learningtheory
- Advanced Topics of Deep Generative Modelsdiffusiongenerative-modelsdeep-learning
- Stanford CS 228 - Probabilistic Graphical Modelsdeep-learningtheoryprobabilisticstanford
- Stanford CS 229M - Machine Learning Theorydeep-learningtheorystanford
- CMU 15-440/640 Distributed Systemsdistributed-systemssystemscmu
- MIT 6.5840: Distributed Systemsdistributed-systemssystemsmit
- Harvard CS121: Introduction to TCStheorycomputer-scienceharvard
- Harvard CS127: Cryptographycryptographytheoryharvard
- MIT 6.8210: Underactuated Roboticsroboticsmit
Videos
- Reinforcement Learning from the bookreinforcement-learning
Repositories
- picotronllmeducationaltrainingimplementation
- slimellmpost-trainingimplementation
- mini-sglangllmeducationalinferenceimplementation
- nano-vllmllmeducationalinferenceimplementation
- llm.qllmquantizationimplementation
- Course on Flash Attention in Tritonllmattentiontriton
- CUTLASS tutorialcudagpucutlass
- MLIR Tutorialcompilermlirtutorial
- Prediction Market Analysis: A cool repo contains huge dataset for market analysisquantitative-financedata
- Data Structures in Practicealgorithmsdata-structureseducational
Theses
- Reinforcement Learning and Simulation-Based Search in Computer Go: David Silverthesisreinforcement-learning
- APPRENTICESHIP LEARNING AND REINFORCEMENT LEARNING WITH APPLICATION TO ROBOTIC CONTROL: Pieter Abbeelthesisreinforcement-learning
- Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs: John Schulmanthesisreinforcement-learning
- On the Sample Complexity of Reinforcement Learning: Sham Machandranath Kakadethesisreinforcement-learning
Robotics
- BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learningroboticsreinforcement-learninghumanoid
- VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulationroboticshumanoidsimulation
- BAAI Thor Towards Human-level WhOle-body Reactions under Intense Contact-Rich Environmentsroboticshumanoidcontrol
- GentleHumanoid Learning Upper-body Compliance for Contact-rich Human and Object Interactionroboticshumanoidmanipulation
- RoboStriker: Hierarchical Decision-Making for Autonomous Humanoid Boxingroboticshumanoidmotion-planning
- HumanX Toward Agile and Generalizable Humanoid Interaction Skills from Human Videosroboticshumanoidmotion-planning
- RPL: Learning Robust Humanoid Perceptive Locomotion on Challenging Terrainsroboticshumanoidperception
- ExtremeControl Low-Latency Humanoid Teleoperation with Direct Extremity Controlroboticshumanoidteleoperation
- InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactionsroboticshumanoidphysics-simulation
- Humanoid Locomotion as Next Token Predictionroboticshumanoidllm
- Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robotsroboticsworld-modelreinforcement-learning