Pick some action and let’s roll! We can simply do random shooting (parallelize very well), or doing these:

CEM Monte Carlo tree search

Derivative based methods

order method tends to work better than first order method, as the chain can be long and it’s easy to see vanishing / exploding gradients.

Shooting vs collocation method: we can either optimize over actions only, or both action and state with constraints (conditions better)

In the linear case, we have a nice dynamic programming thing: LQR]