Singular Value Decomposition

The Statement

Every real matrix $A \in \mathbb{R}^{m \times n}$ can be factored as:

$$A = U \Sigma V^T$$

Where:

$U \in \mathbb{R}^{m \times m}$ — orthogonal matrix, columns are the left singular vectors
$\Sigma \in \mathbb{R}^{m \times n}$ — diagonal matrix, diagonal entries $\sigma_1 \ge \sigma_2 \ge \cdots \ge 0$ are the singular values
$V \in \mathbb{R}^{n \times n}$ — orthogonal matrix, columns are the right singular vectors

Geometric Intuition

Every linear transformation $A$ can be decomposed into three simple geometric steps:

Step 1 — Rotate input space: $V^T$

$V^T$ is a rotation (and possibly reflection) of the input $\mathbb{R}^n$. It aligns the input along the right singular vectors — the natural axes of $A$.

Step 2 — Scale along each axis: $\Sigma$

$\Sigma$ stretches or compresses each axis independently. The $i$-th axis is scaled by $\sigma_i$. Axes with $\sigma_i = 0$ are collapsed entirely — this is where rank deficiency lives.

Step 3 — Rotate output space: $U$

$U$ rotates the result into the output $\mathbb{R}^m$, aligning the scaled axes with the left singular vectors.

Summary: Any matrix — no matter how complex — is just a rotation, a scaling, and another rotation.

How to Compute SVD

Step 1 — Form $A^T A$

Compute the $n \times n$ symmetric positive semi-definite matrix $A^T A$. This matrix is always symmetric, so its eigendecomposition always exists and all eigenvalues are real and non-negative.

$$A^T A = V \Lambda V^T$$

where $\Lambda = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$ with $\lambda_1 \ge \lambda_2 \ge \cdots \ge 0$.

Step 2 — Extract singular values and right singular vectors

The columns of $V$ (eigenvectors of $A^T A$) are the right singular vectors. The singular values are:

$$\sigma_i = \sqrt{\lambda_i}$$

Step 3 — Compute left singular vectors

For each non-zero singular value $\sigma_i$, the corresponding left singular vector is:

$$u_i = \frac{1}{\sigma_i} A v_i$$

The remaining columns of $U$ (corresponding to zero singular values) can be chosen as any orthonormal basis for the left null space.

Step 4 — Assemble the factorization

Stack the vectors and values:

$$U = [u_1 \mid u_2 \mid \cdots \mid u_m], \quad \Sigma = \begin{pmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_r \end{pmatrix}, \quad V = [v_1 \mid v_2 \mid \cdots \mid v_n]$$

where $r = \text{rank}(A)$.

Key Properties

Rank

The rank of $A$ equals the number of non-zero singular values:

$$\text{rank}(A) = r = \#\{\sigma_i > 0\}$$

Frobenius norm

$$\|A\|_F = \sqrt{\sum_{i=1}^{r} \sigma_i^2}$$

Spectral norm (largest singular value)

$$\|A\|_2 = \sigma_1$$

This is the maximum factor by which $A$ can stretch a unit vector.

Pseudoinverse

The Moore-Penrose pseudoinverse $A^+$ is constructed directly from the SVD:

$$A^+ = V \Sigma^+ U^T$$

where $\Sigma^+$ replaces each non-zero $\sigma_i$ with $1/\sigma_i$.

Low-Rank Approximation (Eckart–Young Theorem)

The best rank-$k$ approximation to $A$ in both the Frobenius and spectral norms is obtained by keeping only the top $k$ singular values:

$$A_k = \sum_{i=1}^{k} \sigma_i \, u_i v_i^T$$

The approximation error is:

$$\|A - A_k\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2$$

Practical implication: If the singular values drop off quickly, a low-rank $A_k$ captures most of the structure of $A$ with far fewer numbers. This is the basis of data compression, PCA, and latent semantic analysis.

Connection to Eigendecomposition

SVD generalizes eigendecomposition to non-square and non-symmetric matrices.

If $A$ is square and symmetric positive semi-definite

Then $A = Q \Lambda Q^T$ (eigendecomposition) and SVD gives $U = V = Q$, $\Sigma = \Lambda$. Singular values equal eigenvalues.

In general

Eigenvalues can be negative or complex; singular values are always real and non-negative. Singular values measure how much the transformation stretches; eigenvalues do not (for non-symmetric matrices).

Applications

PCA — principal components are the right singular vectors of the mean-centered data matrix
Least-squares solutions — solve $Ax = b$ via $x = A^+ b$ when $A$ is not square or not full rank
Image compression — store a rank-$k$ approximation instead of the full pixel matrix
Recommender systems — factorize user-item matrices to discover latent factors
Numerical stability — condition number $\kappa(A) = \sigma_1 / \sigma_r$ quantifies sensitivity to perturbations
Natural language processing — Latent Semantic Analysis (LSA) applies SVD to term-document matrices

Concrete Example

Matrix $A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}$

Step 1 — Compute $A^T A = A^2$ (since $A$ is symmetric):

$$A^T A = \begin{pmatrix} 10 & 6 \\ 6 & 10 \end{pmatrix}$$

Step 2 — Eigenvalues of $A^T A$: $\lambda_1 = 16$, $\lambda_2 = 4$.

Step 3 — Singular values: $\sigma_1 = 4$, $\sigma_2 = 2$.

Step 4 — Eigenvectors (right singular vectors):

$$v_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}, \quad v_2 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}$$

Step 5 — Left singular vectors $u_i = \frac{1}{\sigma_i}Av_i$ (equal to $v_i$ here since $A$ is symmetric positive definite).

Result:

$$A = \frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&-1\end{pmatrix} \begin{pmatrix}4&0\\0&2\end{pmatrix} \frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&-1\end{pmatrix}$$

Math Deep-Dive: Singular Value Decomposition