Value Iteration, but with a NN to fit it. The difference between this and the vanilla one is with new evidence, we add that to data instead of doing simple table value setting.

Note that for policy & value iteration, we assume that we can freely explore the environment. If we don’t know the transition dynamics, that is hard.

Thus we have Fitted Q Iteration