Source: CS 285 at UC Berkeley
- Lecture Recording from 2023
- Lecture note from 2026: behavior_cloning_1.pdf, behavior_cloning_2.pdf
Behavior cloning is supervised learning.
Distribution shift
Vanilla version does not work because we train on and test on .
It can be proofed (with some nice modeling) that the error increases quadratically with horizon. This is from the DAgger paper. Why haven’t Drew publish more paper recently?
A way to fix this distribution shift is, unsurprisingly, proposed from the DAgger paper: Dataset Aggregation.
- Train from human data
- Run to get dataset
- Ask human to label with actions
- Aggregate:
It’s proved that this can reduce the error to linear.
Why might we fail to fit the expert
- Non-Markovian behavior
- Not only depend on past single state
- Causal confusion (the model may mix up the cause and the result)
- Multimodal behavior: multiple trajectories are viable, so fitting one-true one does not work
- Discretize continuous action space
- Autoregressive discretization. So an autoregressive model output one dim at a time. For sequential model, since the next output depend on all the previous output, the math works.
- Consider autoregressive robot output modeling. E.g. FAST
- Expressive continuous distributions
- Discretize continuous action space
Data
Intentionally add mistakes and corrections, possibly with data augmentations. That’s also where the idea of pre-training and post-training comes up, where we first use diverse knowledge and then narrow, high quality data later.
Multi-task learning
Maybe teach the model to reach any p help it better to learn how to reach .