Linear-quadratic regulator

Classic optimal control, state transition is linear, cost is quadratic:

u_{1}, \dots, u_{T} min c (x_{1}, u_{1}) + c (f (x_{1}, u_{1}), u_{2}) + \dots + c (f (f (\dots) \dots), u_{T})

linear f (x_{t}, u_{t}) = F_{t} [x_{t} u_{t}] + f_{t} quadratic c (x_{t}, u_{t}) = \frac{1}{2} [x_{t} u_{t}]^{T} C_{t} [x_{t} u_{t}] + [x_{t} u_{t}]^{T} c_{t}

The $c$ needs to be quadratic instead of linear, or the whole thing will be a huge linear, and solving $min$ can be infinite.

LQR is a clever way to use dynamic programming so it can be solved fast.

It goes like follows:

Start from the last action $u_{T}$ , suppose $x_{T}$ is known, what’s the best action there? So we just solve $\nabla_{u_{T}} Q (x_{T}, u_{T}) = \nabla_{u_{T}} c (x_{T}, u_{T}) = 0$ . It turns out $u_{T} = K_{T} x_{T} + k_{T}$ .
Now we can just substitute $u_{T}$ with $x_{T}$ , and it turns out… it’s still linear quadratic: $V (x_{T}) = const + \frac{1}{2} x_{T}^{T} V_{T} x_{T} + x_{T}^{T} v_{T}$ . We are overloading $v$ a bit but you know the meaning (cost that depend only on state).
Now taking a step back at $T - 1$ . Consider $Q (x_{T - 1}, u_{T - 1}) = c (x_{T - 1}, u_{T - 1}) + V (f (x_{T - 1}, u_{T - 1}))$ . We know from state transition that $x_{T} = f (x_{T - 1}, u_{T - 1})$ . So now we can replace $x_{T}$ with $x_{T - 1}$ and $u_{T - 1}$ and use that $V (x_{T})$ formula. Or, we propagate the “optimal value back”. It’s always linear quadratic.

\begin{algorithm}
\caption{Linear Quadratic Regulator (LQR)}
\begin{algorithmic}
\STATE \textbf{Backward recursion:}
\FOR{$t = T$ \TO $1$}
    \STATE $\mathbf{Q}_t = \mathbf{C}_t + \mathbf{F}_t^T \mathbf{V}_{t+1} \mathbf{F}_t$
    \STATE $\mathbf{q}_t = \mathbf{c}_t + \mathbf{F}_t^T \mathbf{V}_{t+1} \mathbf{f}_t + \mathbf{F}_t^T \mathbf{v}_{t+1}$
    \STATE $Q(\mathbf{x}_t, \mathbf{u}_t) = \text{const} + \frac{1}{2} \begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix}^T \mathbf{Q}_t \begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix} + \begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix}^T \mathbf{q}_t$
    \STATE $\mathbf{u}_t \leftarrow \arg \min_{\mathbf{u}_t} Q(\mathbf{x}_t, \mathbf{u}_t) = \mathbf{K}_t \mathbf{x}_t + \mathbf{k}_t$
    \STATE $\mathbf{K}_t = -\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}^{-1} \mathbf{Q}_{\mathbf{u}_t, \mathbf{x}_t}$
    \STATE $\mathbf{k}_t = -\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}^{-1} \mathbf{q}_{\mathbf{u}_t}$
    \STATE $\mathbf{V}_t = \mathbf{Q}_{\mathbf{x}_t, \mathbf{x}_t} + \mathbf{Q}_{\mathbf{x}_t, \mathbf{u}_t} \mathbf{K}_t + \mathbf{K}_t^T \mathbf{Q}_{\mathbf{u}_t, \mathbf{x}_t} + \mathbf{K}_t^T \mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t} \mathbf{K}_t$
    \STATE $\mathbf{v}_t = \mathbf{q}_{\mathbf{x}_t} + \mathbf{Q}_{\mathbf{x}_t, \mathbf{u}_t} \mathbf{k}_t + \mathbf{K}_t^T \mathbf{q}_{\mathbf{u}_t} + \mathbf{K}_t^T \mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t} \mathbf{k}_t$
    \STATE $V(\mathbf{x}_t) = \text{const} + \frac{1}{2} \mathbf{x}_t^T \mathbf{V}_t \mathbf{x}_t + \mathbf{x}_t^T \mathbf{v}_t$
\ENDFOR

\STATE \textbf{Forward recursion:}
\FOR{$t = 1$ \TO $T$}
    \STATE $\mathbf{u}_t = \mathbf{K}_t \mathbf{x}_t + \mathbf{k}_t$
    \STATE $\mathbf{x}_{t+1} = f(\mathbf{x}_t, \mathbf{u}_t)$
\ENDFOR
\end{algorithmic}
\end{algorithm}

This can handle stochastic dynamics with no change of algorithm, since Gaussian is special.

f (x_{t}, u_{t}) x_{t + 1} p (x_{t + 1} ∣ x_{t}, u_{t}) = F_{t} [x_{t} u_{t}] + f_{t} \sim p (x_{t + 1} ∣ x_{t}, u_{t}) = N (F_{t} [x_{t} u_{t}] + f_{t}, Σ_{t})

The nonlinear case: iLQR / ddP

The obvious idea is: just like how EKF expends KF, we use Taylor expansion here:

f (x_{t}, u_{t}) c (x_{t}, u_{t}) \approx f (\hat{x}_{t}, \hat{u}_{t}) + \nabla_{x_{t}, u_{t}} f (\hat{x}_{t}, \hat{u}_{t}) [x_{t} - \hat{x}_{t} u_{t} - \hat{u}_{t}] \approx c (\hat{x}_{t}, \hat{u}_{t}) + \nabla_{x_{t}, u_{t}} c (\hat{x}_{t}, \hat{u}_{t}) [x_{t} - \hat{x}_{t} u_{t} - \hat{u}_{t}] + \frac{1}{2} [x_{t} - \hat{x}_{t} u_{t} - \hat{u}_{t}]^{T} \nabla_{x_{t}, u_{t}}^{2} c (\hat{x}_{t}, \hat{u}_{t}) [x_{t} - \hat{x}_{t} u_{t} - \hat{u}_{t}]

Now you can see the $x_{t} - \overset{x}{^}_{t}$ and $u_{t} - \overset{u}{^}_{t}$ part is just like the previous case and we can run LQR on the $δ$ term.

\begin{algorithm}
\caption{Iterative LQR (iLQR)}
\begin{algorithmic}
\REPEAT
    \FOR{$t = 1$ \TO $T$}
        \STATE $\mathbf{F}_t = \nabla_{\mathbf{x}_t, \mathbf{u}_t} f(\hat{\mathbf{x}}_t, \hat{\mathbf{u}}_t)$ \COMMENT{Linearize dynamics}
        \STATE $\mathbf{c}_t = \nabla_{\mathbf{x}_t, \mathbf{u}_t} c(\hat{\mathbf{x}}_t, \hat{\mathbf{u}}_t)$ \COMMENT{Quadratic cost approximation (gradient)}
        \STATE $\mathbf{C}_t = \nabla_{\mathbf{x}_t, \mathbf{u}_t}^2 c(\hat{\mathbf{x}}_t, \hat{\mathbf{u}}_t)$ \COMMENT{Quadratic cost approximation (Hessian)}
    \ENDFOR
    
    \STATE Run LQR backward pass on state $\delta \mathbf{x}_t = \mathbf{x}_t - \hat{\mathbf{x}}_t$ and action $\delta \mathbf{u}_t = \mathbf{u}_t - \hat{\mathbf{u}}_t$
    
    \STATE Run forward pass with real nonlinear dynamics and $\mathbf{u}_t = \mathbf{K}_t(\mathbf{x}_t - \hat{\mathbf{x}}_t) + \mathbf{k}_t + \hat{\mathbf{u}}_t$
    
    \STATE Update $\hat{\mathbf{x}}_t$ and $\hat{\mathbf{u}}_t$ based on states and actions in forward pass
\UNTIL{convergence}
\end{algorithmic}
\end{algorithm}

Note this can be bad since the Newton’s method overshoots. we can add an $α$ to $k_{t}$ part in forward pass to see if we see improvement / line search. This is the same idea of Newton’s method, it’s an approximation of Newton’s method for solving the min.

\begin{algorithm}
\caption{Newton's Method for Optimization}
\begin{algorithmic}
\REPEAT
    \STATE $\mathbf{g} = \nabla_{\mathbf{x}} g(\hat{\mathbf{x}})$ \COMMENT{Compute gradient}
    \STATE $\mathbf{H} = \nabla_{\mathbf{x}}^2 g(\hat{\mathbf{x}})$ \COMMENT{Compute Hessian}
    \STATE $\hat{\mathbf{x}} \leftarrow \arg \min_{\mathbf{x}} \frac{1}{2}(\mathbf{x} - \hat{\mathbf{x}})^T \mathbf{H} (\mathbf{x} - \hat{\mathbf{x}}) + \mathbf{g}^T (\mathbf{x} - \hat{\mathbf{x}})$
\UNTIL{convergence}
\end{algorithmic}
\end{algorithm}

If you use second order dynamics for $f$ too, that becomes DDP.

Yanda's Random Notes

Explorer

Linear-quadratic regulator

The nonlinear case: iLQR / ddP

Graph View

Backlinks