Yanda's Random Notes

❯

❯

Mixture of Experts Overview

Mixture of Experts Overview

Jan 09, 20261 min read

A mixture of multiple related paper, blog post and lecture.

The first MoE paper: Mixture of Experts
The one that make it work at scale, with Transformers: Switch Transformers
HuggingFace’s overview: blog post
Mixtral: Stanford cs25 lecture

Graph View

Backlinks

Mixture of Experts

Created with Quartz v4.5.2 © 2026