Pick some action and let’s roll! We can simply do random shooting (parallelize very well), or doing these:
Derivative based methods
order method tends to work better than first order method, as the chain can be long and it’s easy to see vanishing / exploding gradients.
Shooting vs collocation method: we can either optimize over actions only, or both action and state with constraints (conditions better)
In the linear case, we have a nice dynamic programming thing: LQR]