Based on a conversation with ChatGPT 5.2.
Model-Based vs Model-Free RL
Core distinction
The difference is about access to a model, not about tabular vs neural, or Monte Carlo vs TD.
A method is model-based if it can generate hypothetical transitions:
A method is model-free if it can only learn from experienced transitions:
Model-Free RL
- No access to
- Cannot simulate alternative futures
- Learns only from real interaction or logged data
- Cannot branch from arbitrary states
Examples:
Key property: works with a fixed dataset (offline RL).
Model-Based RL
- Has access to a generative model or simulator
- Can simulate transitions from arbitrary states
- Can evaluate hypothetical action sequences
- Can branch and build search trees
Examples:
- Value Iteration (with known model)
- MCTS
- MPC (with known or learned dynamics)
- Dyna-style methods
Key property: requires ability to generate new transitions.
Planning vs Learning (orthogonal)
- Planning: uses a model to choose actions
- Learning: updates parameters from data
Examples:
- MCTS → model-based planning
- Q-learning → model-free learning
- Dyna → model-based learning + planning
Litmus Test
If given only a static dataset (no simulator):
- Still works → model-free
- Breaks → model-based
In a deterministic case, given the model, one can actually just solve for the best action given any time horizon because it’s basically an optimization problem. If it’s a stochastic environment, doing those kinds of things in an open loop is a bad idea, obviously, because things could go wrong and the actions you choose are not optimal anymore. You’ll need closed-loop planning.