Yanda's Random Notes

❯

Reinforcement Learning

❯

GLIE

Nov 02, 20251 min read

Greedy in the Limit with Infinite Exploration

All state-action pairs are explored infinitely many times,
$\forall s, a t \to \infty lim N_{t} (s, a) = \infty$

The policy converges to a greedy policy,
$t \to \infty lim π_{t} (a ∣ s) = I (a = ar g a^{'} max q_{t} (s, a^{'}))$

- For example, $\epsilon$-greedy with $\epsilon_k = \frac{1}{k}$ GLIE Model-free control converges to the optimal action-value function, $q_{t}\rightarrow q_{*}$.

Graph View

Backlinks

Overview of model free methods
Q learning

Created with Quartz v4.5.2 © 2026