Value Iteration, but with a NN to fit it.
The difference between this and the vanilla one is with new evidence, we add that to data instead of doing simple table value setting.
Note that for policy & value iteration, we assume that we can freely explore the environment. If we don’t know the transition dynamics, that is hard.
Thus we have Fitted Q Iteration