Bellman equation

v_{π} (s) = E [R_{t + 1} + γ G_{t + 1} ∣ S_{t} = s, A_{t} \sim π (s)] = E [R_{t + 1} + γ v_{π} (S_{t + 1}) ∣ S_{t} = s, A_{t} \sim π (s)]

Now note here that $a$ is chosen by policy $π$ in state $s$ . We can take out the dependency on a specific policy by stating that the following holds for the optimal case:

v_{*} (s) = a max E [R_{t + 1} + γ v_{*} (S_{t + 1}) ∣ S_{t} = s, A_{t} = a]

For Bellman equation, in discrete case we can write it out in linear form. In fact we can just solve it given $π$ , the same way we solve MDPs by finding the stationary point.

v = r^{π} + γ P^{π} v

where

v_{i} r_{i}^{π} P_{ij}^{π} = v (s_{i}) = E [R_{t + 1} ∣ S_{t} = s_{i}, A_{t} \sim π (S_{t})] = p (s_{j} ∣ s_{i}) = a \sum π (a ∣ s_{i}) p (s_{j} ∣ s_{i}, a)

Solving it is $O (∣ S ∣^{3})$ though. The Bellman optimality equation though, is non-linear, there’s the max there.

Yanda's Random Notes

Explorer

Bellman equation

Graph View

Backlinks