Classic optimal control, state transition is linear, cost is quadratic:
The needs to be quadratic instead of linear, or the whole thing will be a huge linear, and solving can be infinite.
LQR is a clever way to use dynamic programming so it can be solved fast.
It goes like follows:
- Start from the last action , suppose is known, what’s the best action there? So we just solve . It turns out .
- Now we can just substitute with , and it turns out… it’s still linear quadratic: . We are overloading a bit but you know the meaning (cost that depend only on state).
- Now taking a step back at . Consider . We know from state transition that . So now we can replace with and and use that formula. Or, we propagate the “optimal value back”. It’s always linear quadratic.
\begin{algorithm}
\caption{Linear Quadratic Regulator (LQR)}
\begin{algorithmic}
\STATE \textbf{Backward recursion:}
\FOR{$t = T$ \TO $1$}
\STATE $\mathbf{Q}_t = \mathbf{C}_t + \mathbf{F}_t^T \mathbf{V}_{t+1} \mathbf{F}_t$
\STATE $\mathbf{q}_t = \mathbf{c}_t + \mathbf{F}_t^T \mathbf{V}_{t+1} \mathbf{f}_t + \mathbf{F}_t^T \mathbf{v}_{t+1}$
\STATE $Q(\mathbf{x}_t, \mathbf{u}_t) = \text{const} + \frac{1}{2} \begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix}^T \mathbf{Q}_t \begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix} + \begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix}^T \mathbf{q}_t$
\STATE $\mathbf{u}_t \leftarrow \arg \min_{\mathbf{u}_t} Q(\mathbf{x}_t, \mathbf{u}_t) = \mathbf{K}_t \mathbf{x}_t + \mathbf{k}_t$
\STATE $\mathbf{K}_t = -\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}^{-1} \mathbf{Q}_{\mathbf{u}_t, \mathbf{x}_t}$
\STATE $\mathbf{k}_t = -\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}^{-1} \mathbf{q}_{\mathbf{u}_t}$
\STATE $\mathbf{V}_t = \mathbf{Q}_{\mathbf{x}_t, \mathbf{x}_t} + \mathbf{Q}_{\mathbf{x}_t, \mathbf{u}_t} \mathbf{K}_t + \mathbf{K}_t^T \mathbf{Q}_{\mathbf{u}_t, \mathbf{x}_t} + \mathbf{K}_t^T \mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t} \mathbf{K}_t$
\STATE $\mathbf{v}_t = \mathbf{q}_{\mathbf{x}_t} + \mathbf{Q}_{\mathbf{x}_t, \mathbf{u}_t} \mathbf{k}_t + \mathbf{K}_t^T \mathbf{q}_{\mathbf{u}_t} + \mathbf{K}_t^T \mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t} \mathbf{k}_t$
\STATE $V(\mathbf{x}_t) = \text{const} + \frac{1}{2} \mathbf{x}_t^T \mathbf{V}_t \mathbf{x}_t + \mathbf{x}_t^T \mathbf{v}_t$
\ENDFOR
\STATE \textbf{Forward recursion:}
\FOR{$t = 1$ \TO $T$}
\STATE $\mathbf{u}_t = \mathbf{K}_t \mathbf{x}_t + \mathbf{k}_t$
\STATE $\mathbf{x}_{t+1} = f(\mathbf{x}_t, \mathbf{u}_t)$
\ENDFOR
\end{algorithmic}
\end{algorithm}
This can handle stochastic dynamics with no change of algorithm, since Gaussian is special.
The nonlinear case: iLQR / ddP
The obvious idea is: just like how EKF expends KF, we use Taylor expansion here: Now you can see the and part is just like the previous case and we can run LQR on the term.
\begin{algorithm}
\caption{Iterative LQR (iLQR)}
\begin{algorithmic}
\REPEAT
\FOR{$t = 1$ \TO $T$}
\STATE $\mathbf{F}_t = \nabla_{\mathbf{x}_t, \mathbf{u}_t} f(\hat{\mathbf{x}}_t, \hat{\mathbf{u}}_t)$ \COMMENT{Linearize dynamics}
\STATE $\mathbf{c}_t = \nabla_{\mathbf{x}_t, \mathbf{u}_t} c(\hat{\mathbf{x}}_t, \hat{\mathbf{u}}_t)$ \COMMENT{Quadratic cost approximation (gradient)}
\STATE $\mathbf{C}_t = \nabla_{\mathbf{x}_t, \mathbf{u}_t}^2 c(\hat{\mathbf{x}}_t, \hat{\mathbf{u}}_t)$ \COMMENT{Quadratic cost approximation (Hessian)}
\ENDFOR
\STATE Run LQR backward pass on state $\delta \mathbf{x}_t = \mathbf{x}_t - \hat{\mathbf{x}}_t$ and action $\delta \mathbf{u}_t = \mathbf{u}_t - \hat{\mathbf{u}}_t$
\STATE Run forward pass with real nonlinear dynamics and $\mathbf{u}_t = \mathbf{K}_t(\mathbf{x}_t - \hat{\mathbf{x}}_t) + \mathbf{k}_t + \hat{\mathbf{u}}_t$
\STATE Update $\hat{\mathbf{x}}_t$ and $\hat{\mathbf{u}}_t$ based on states and actions in forward pass
\UNTIL{convergence}
\end{algorithmic}
\end{algorithm}
Note this can be bad since the Newton’s method overshoots. we can add an to part in forward pass to see if we see improvement / line search. This is the same idea of Newton’s method, it’s an approximation of Newton’s method for solving the min.
\begin{algorithm}
\caption{Newton's Method for Optimization}
\begin{algorithmic}
\REPEAT
\STATE $\mathbf{g} = \nabla_{\mathbf{x}} g(\hat{\mathbf{x}})$ \COMMENT{Compute gradient}
\STATE $\mathbf{H} = \nabla_{\mathbf{x}}^2 g(\hat{\mathbf{x}})$ \COMMENT{Compute Hessian}
\STATE $\hat{\mathbf{x}} \leftarrow \arg \min_{\mathbf{x}} \frac{1}{2}(\mathbf{x} - \hat{\mathbf{x}})^T \mathbf{H} (\mathbf{x} - \hat{\mathbf{x}}) + \mathbf{g}^T (\mathbf{x} - \hat{\mathbf{x}})$
\UNTIL{convergence}
\end{algorithmic}
\end{algorithm}
If you use second order dynamics for too, that becomes DDP.