Math Deep-Dive: Singular Value Decomposition

The Statement

Every real matrix \(A \in \mathbb{R}^{m \times n}\) can be factored as:

$$A = U \Sigma V^T$$
Where:
  • \(U \in \mathbb{R}^{m \times m}\) — orthogonal matrix, columns are the left singular vectors
  • \(\Sigma \in \mathbb{R}^{m \times n}\) — diagonal matrix, diagonal entries \(\sigma_1 \ge \sigma_2 \ge \cdots \ge 0\) are the singular values
  • \(V \in \mathbb{R}^{n \times n}\) — orthogonal matrix, columns are the right singular vectors

Geometric Intuition

Every linear transformation \(A\) can be decomposed into three simple geometric steps:

Step 1 — Rotate input space: \(V^T\)

\(V^T\) is a rotation (and possibly reflection) of the input \(\mathbb{R}^n\). It aligns the input along the right singular vectors — the natural axes of \(A\).

Step 2 — Scale along each axis: \(\Sigma\)

\(\Sigma\) stretches or compresses each axis independently. The \(i\)-th axis is scaled by \(\sigma_i\). Axes with \(\sigma_i = 0\) are collapsed entirely — this is where rank deficiency lives.

Step 3 — Rotate output space: \(U\)

\(U\) rotates the result into the output \(\mathbb{R}^m\), aligning the scaled axes with the left singular vectors.

Summary: Any matrix — no matter how complex — is just a rotation, a scaling, and another rotation.

How to Compute SVD

Step 1 — Form \(A^T A\)

Compute the \(n \times n\) symmetric positive semi-definite matrix \(A^T A\). This matrix is always symmetric, so its eigendecomposition always exists and all eigenvalues are real and non-negative.

$$A^T A = V \Lambda V^T$$

where \(\Lambda = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)\) with \(\lambda_1 \ge \lambda_2 \ge \cdots \ge 0\).

Step 2 — Extract singular values and right singular vectors

The columns of \(V\) (eigenvectors of \(A^T A\)) are the right singular vectors. The singular values are:

$$\sigma_i = \sqrt{\lambda_i}$$

Step 3 — Compute left singular vectors

For each non-zero singular value \(\sigma_i\), the corresponding left singular vector is:

$$u_i = \frac{1}{\sigma_i} A v_i$$

The remaining columns of \(U\) (corresponding to zero singular values) can be chosen as any orthonormal basis for the left null space.

Step 4 — Assemble the factorization

Stack the vectors and values:

$$U = [u_1 \mid u_2 \mid \cdots \mid u_m], \quad \Sigma = \begin{pmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_r \end{pmatrix}, \quad V = [v_1 \mid v_2 \mid \cdots \mid v_n]$$

where \(r = \text{rank}(A)\).

Key Properties

Rank

The rank of \(A\) equals the number of non-zero singular values:

$$\text{rank}(A) = r = \#\{\sigma_i > 0\}$$

Frobenius norm

$$\|A\|_F = \sqrt{\sum_{i=1}^{r} \sigma_i^2}$$

Spectral norm (largest singular value)

$$\|A\|_2 = \sigma_1$$

This is the maximum factor by which \(A\) can stretch a unit vector.

Pseudoinverse

The Moore-Penrose pseudoinverse \(A^+\) is constructed directly from the SVD:

$$A^+ = V \Sigma^+ U^T$$

where \(\Sigma^+\) replaces each non-zero \(\sigma_i\) with \(1/\sigma_i\).

Low-Rank Approximation (Eckart–Young Theorem)

The best rank-\(k\) approximation to \(A\) in both the Frobenius and spectral norms is obtained by keeping only the top \(k\) singular values:

$$A_k = \sum_{i=1}^{k} \sigma_i \, u_i v_i^T$$

The approximation error is:

$$\|A - A_k\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2$$
Practical implication: If the singular values drop off quickly, a low-rank \(A_k\) captures most of the structure of \(A\) with far fewer numbers. This is the basis of data compression, PCA, and latent semantic analysis.

Connection to Eigendecomposition

SVD generalizes eigendecomposition to non-square and non-symmetric matrices.

If \(A\) is square and symmetric positive semi-definite

Then \(A = Q \Lambda Q^T\) (eigendecomposition) and SVD gives \(U = V = Q\), \(\Sigma = \Lambda\). Singular values equal eigenvalues.

In general

Eigenvalues can be negative or complex; singular values are always real and non-negative. Singular values measure how much the transformation stretches; eigenvalues do not (for non-symmetric matrices).

Applications

  • PCA — principal components are the right singular vectors of the mean-centered data matrix
  • Least-squares solutions — solve \(Ax = b\) via \(x = A^+ b\) when \(A\) is not square or not full rank
  • Image compression — store a rank-\(k\) approximation instead of the full pixel matrix
  • Recommender systems — factorize user-item matrices to discover latent factors
  • Numerical stability — condition number \(\kappa(A) = \sigma_1 / \sigma_r\) quantifies sensitivity to perturbations
  • Natural language processing — Latent Semantic Analysis (LSA) applies SVD to term-document matrices

Concrete Example

Matrix \(A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}\)

Step 1 — Compute \(A^T A = A^2\) (since \(A\) is symmetric):

$$A^T A = \begin{pmatrix} 10 & 6 \\ 6 & 10 \end{pmatrix}$$

Step 2 — Eigenvalues of \(A^T A\): \(\lambda_1 = 16\), \(\lambda_2 = 4\).

Step 3 — Singular values: \(\sigma_1 = 4\), \(\sigma_2 = 2\).

Step 4 — Eigenvectors (right singular vectors):

$$v_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}, \quad v_2 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}$$

Step 5 — Left singular vectors \(u_i = \frac{1}{\sigma_i}Av_i\) (equal to \(v_i\) here since \(A\) is symmetric positive definite).

Result:

$$A = \frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&-1\end{pmatrix} \begin{pmatrix}4&0\\0&2\end{pmatrix} \frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&-1\end{pmatrix}$$