A mixture of multiple related paper, blog post and lecture.
- The first MoE paper: Mixture of Experts
- The one that make it work at scale, with Transformers: Switch Transformers
- HuggingFace’s overview: blog post
- Mixtral: Stanford cs25 lecture
A mixture of multiple related paper, blog post and lecture.