Now note here that is chosen by policy in state . We can take out the dependency on a specific policy by stating that the following holds for the optimal case:

For Bellman equation, in discrete case we can write it out in linear form. In fact we can just solve it given , the same way we solve MDPs by finding the stationary point.

where

Solving it is though. The Bellman optimality equation though, is non-linear, there’s the max there.