Fitted Value Iteration

Value Iteration, but with a NN to fit it. The difference between this and the vanilla one is with new evidence, we add that to data instead of doing simple table value setting.

Note that for policy & value iteration, we assume that we can freely explore the environment. If we don’t know the transition dynamics, that $max_{a_{i}}$ is hard.

Thus we have Fitted Q Iteration

Yanda's Random Notes

Explorer

Fitted Value Iteration

Graph View

Backlinks