Greedy in the Limit with Infinite Exploration

  • All state-action pairs are explored infinitely many times,

  • The policy converges to a greedy policy,

- For example, $\epsilon$-greedy with $\epsilon_k = \frac{1}{k}$ GLIE Model-free control converges to the optimal action-value function, $q_{t}\rightarrow q_{*}$.