Yanda's Random Notes

Home

❯

ML

❯

Mixture of Experts Overview

Mixture of Experts Overview

Jan 09, 20261 min read

A mixture of multiple related paper, blog post and lecture.

  • The first MoE paper: Mixture of Experts
  • The one that make it work at scale, with Transformers: Switch Transformers
  • HuggingFace’s overview: blog post
  • Mixtral: Stanford cs25 lecture

Graph View

Backlinks

  • Mixture of Experts

Created with Quartz v4.5.2 © 2026