Transformers

Title Topic Comments
Transformers Primer by Aman.AI (Chadha, 2020) Transformers Very comprehensive
The Illustrated Transformer by Jay Alammar (Chadha, 2020) Transformers Great Illustrations
Attention in transformers, visually explained by 3Blue1Brown (3Blue1Brown, 2024) Transformers Great Visuals and Explanation
Some Intuition on Attention and the Transformer by Eugene Yan (Yan, 2023) Transformers Great Visuals and Explanation
The Transformer Family by Lilian Weng (Weng, 2020) Advances in Transformers Advanced transformer post-enhancements
The Transformer Family 2.0 by Lilian Weng (Weng, 2023) Advances in Transformers Update to (Weng, 2020) the transformer family. Adds a lot of other updates on the transformers, however, some modules (which were not covered in (Weng, 2020) since it’s very detailed and niche.
The Illustrated BERT (Alammar, 2018) LMs Good Short Overview
Generalized Language Models by Lilian Weng (Weng, 2019) LMs Great overview of BERT and its successors
Ten Noteworthy AI Research Papers of 2023 by Sebastian Raschka (Raschka, 2023) LMs/Research Decent samplers of 2023 10 papers
AI and Open Source in 2023 (Raschka, 2023) LMs/Research Decent samplers of 2023 10 papers
New LLM Pre-training and Post-training Paradigms (Raschka, 2024) LMs/Training/Research detailed overview of pre-training pipelines
Multimodality and Large Multimodal Models (LMMs) by Chip Huyen (Huyen, 2023) MMs Great review of MMs, with CLIP , FLAMINGO and insights
Generalized Visual Language Models by Lilian Weng (Weng, 2022) MMs Great overview of VLM techniques
Primers - Vision Language Models (Chadha, 2020) MMs Average Read
RLHF: Reinforcement Learning from Human Feedback (Huyen, 2023) Training Great intro to pre-training, SFT and RMs
LLM Training: RLHF and Its Alternatives (Raschka, 2023) Training Good overview of RLHF
LLM Alignment by Aman.AI (Chadha, 2023) Training Very thorough, not all topics are useful at the current date
Predictive Human Preference: From Model Ranking to Model Routing (Huyen, 2024) Training Basics of Model evaluation, routing and ranking. Other items like predictive human preference experiments can be ignored.
Aligning language models to follow instructions by OpenAI (OpenAI, 2022) Training  
Reinforcement Learning for Language Models (missing reference) Training  
Instruction Pretraining LLMs (Raschka, 2024) Training  
Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation) (missing reference) PET Methods Good overview of LoRA and practical tips for using it
Noteworthy AI Research Papers of 2024 (Part One) (Raschka, 2023) PET Methods 6 Research Papers of 2024-H1
Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch (Raschka, 2023) PET Methods DoRA overview in-depth
Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments (Raschka, 2023) PET Methods Deep-dive of (missing reference)
The Scaling Hypothesis (Gwern, 2022) Scaling Laws Great discussion and overview, and thought provoking (Long Read)
Scaling Laws in Large Language Models (Mandliya, 2024) Scaling Laws Great Quick Overview
Model Merging, Mixtures of Experts, and Towards Smaller LLMs (Raschka, 2023) MoE/Merging