All state-action pairs are explored infinitely many times,
∀s,at→∞limNt(s,a)=∞
The policy converges to a greedy policy,
t→∞limπt(a∣s)=I(a=arga′maxqt(s,a′))
- For example, $\epsilon$-greedy with $\epsilon_k = \frac{1}{k}$
GLIE Model-free control converges to the optimal action-value function, $q_{t}\rightarrow q_{*}$.