Math Deep-Dive: Singular Value Decomposition
The Statement
Every real matrix \(A \in \mathbb{R}^{m \times n}\) can be factored as:
- \(U \in \mathbb{R}^{m \times m}\) — orthogonal matrix, columns are the left singular vectors
- \(\Sigma \in \mathbb{R}^{m \times n}\) — diagonal matrix, diagonal entries \(\sigma_1 \ge \sigma_2 \ge \cdots \ge 0\) are the singular values
- \(V \in \mathbb{R}^{n \times n}\) — orthogonal matrix, columns are the right singular vectors
Geometric Intuition
Every linear transformation \(A\) can be decomposed into three simple geometric steps:
Step 1 — Rotate input space: \(V^T\)
\(V^T\) is a rotation (and possibly reflection) of the input \(\mathbb{R}^n\). It aligns the input along the right singular vectors — the natural axes of \(A\).
Step 2 — Scale along each axis: \(\Sigma\)
\(\Sigma\) stretches or compresses each axis independently. The \(i\)-th axis is scaled by \(\sigma_i\). Axes with \(\sigma_i = 0\) are collapsed entirely — this is where rank deficiency lives.
Step 3 — Rotate output space: \(U\)
\(U\) rotates the result into the output \(\mathbb{R}^m\), aligning the scaled axes with the left singular vectors.
How to Compute SVD
Step 1 — Form \(A^T A\)
Compute the \(n \times n\) symmetric positive semi-definite matrix \(A^T A\). This matrix is always symmetric, so its eigendecomposition always exists and all eigenvalues are real and non-negative.
where \(\Lambda = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)\) with \(\lambda_1 \ge \lambda_2 \ge \cdots \ge 0\).
Step 2 — Extract singular values and right singular vectors
The columns of \(V\) (eigenvectors of \(A^T A\)) are the right singular vectors. The singular values are:
Step 3 — Compute left singular vectors
For each non-zero singular value \(\sigma_i\), the corresponding left singular vector is:
The remaining columns of \(U\) (corresponding to zero singular values) can be chosen as any orthonormal basis for the left null space.
Step 4 — Assemble the factorization
Stack the vectors and values:
where \(r = \text{rank}(A)\).
Key Properties
Rank
The rank of \(A\) equals the number of non-zero singular values:
Frobenius norm
Spectral norm (largest singular value)
This is the maximum factor by which \(A\) can stretch a unit vector.
Pseudoinverse
The Moore-Penrose pseudoinverse \(A^+\) is constructed directly from the SVD:
where \(\Sigma^+\) replaces each non-zero \(\sigma_i\) with \(1/\sigma_i\).
Low-Rank Approximation (Eckart–Young Theorem)
The best rank-\(k\) approximation to \(A\) in both the Frobenius and spectral norms is obtained by keeping only the top \(k\) singular values:
The approximation error is:
Connection to Eigendecomposition
SVD generalizes eigendecomposition to non-square and non-symmetric matrices.
If \(A\) is square and symmetric positive semi-definite
Then \(A = Q \Lambda Q^T\) (eigendecomposition) and SVD gives \(U = V = Q\), \(\Sigma = \Lambda\). Singular values equal eigenvalues.
In general
Eigenvalues can be negative or complex; singular values are always real and non-negative. Singular values measure how much the transformation stretches; eigenvalues do not (for non-symmetric matrices).
Applications
- PCA — principal components are the right singular vectors of the mean-centered data matrix
- Least-squares solutions — solve \(Ax = b\) via \(x = A^+ b\) when \(A\) is not square or not full rank
- Image compression — store a rank-\(k\) approximation instead of the full pixel matrix
- Recommender systems — factorize user-item matrices to discover latent factors
- Numerical stability — condition number \(\kappa(A) = \sigma_1 / \sigma_r\) quantifies sensitivity to perturbations
- Natural language processing — Latent Semantic Analysis (LSA) applies SVD to term-document matrices
Concrete Example
Matrix \(A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}\)
Step 1 — Compute \(A^T A = A^2\) (since \(A\) is symmetric):
Step 2 — Eigenvalues of \(A^T A\): \(\lambda_1 = 16\), \(\lambda_2 = 4\).
Step 3 — Singular values: \(\sigma_1 = 4\), \(\sigma_2 = 2\).
Step 4 — Eigenvectors (right singular vectors):
Step 5 — Left singular vectors \(u_i = \frac{1}{\sigma_i}Av_i\) (equal to \(v_i\) here since \(A\) is symmetric positive definite).
Result: