Stochastic optimization

Pick some action and let’s roll!

a_{1}, \dots, a_{T} A = ar g a_{1}, \dots, a_{T} max don’t care what this is J (a_{1}, \dots, a_{T}) = ar g A max J (A)

We can simply do random shooting (parallelize very well), or doing these:

Derivative based methods

u_{1}, \dots, u_{T} min t = 1 \sum T c (x_{t}, u_{t}) s.t. x_{t} = f (x_{t - 1}, u_{t - 1}) u_{1}, \dots, u_{T} min c (x_{1}, u_{1}) + c (f (x_{1}, u_{1}), u_{2}) + \dots + c (f (f (\dots) \dots), u_{T})

$2^{n d}$ order method tends to work better than first order method, as the chain can be long and it’s easy to see vanishing / exploding gradients.

Shooting vs collocation method: we can either optimize over actions only, or both action and state with constraints (conditions better)

u_{1}, \dots, u_{T}, x_{1}, \dots, x_{T} min t = 1 \sum T c (x_{t}, u_{t}) s.t. x_{t} = f (x_{t - 1}, u_{t - 1})

In the linear case, we have a nice dynamic programming thing: LQR]