Diagonal vs. off-diagonal entries in PSD matrices

$\newcommand{\one}{\mathbf 1}$$\newcommand{\0}{\mathbf 0}$$\newcommand{\tr}{\operatorname{tr}}$$\newcommand{\e}{\varepsilon}$In this post I’m going to look at some relationships between the diagonal of a positive semidefinite (PSD) matrix and the off-diagonal elements. Throughout this I’ll let $A$ be a real-valued symmetric $n \times n$ matrix except when otherwise noted. I’ll use $\lambda_1\geq\dots\geq\lambda_n$ to denote its eigenvalues.

Result 1: if $A$ is PSD then $\tr(A) \geq \frac 1n \one^TA\one$.

Pf: by the spectral theorem I can factor $A$ as $A=Q\Lambda Q^T$ with $Q$ orthonormal. This means
$$
\tr(A) = \tr(Q\Lambda Q^T) = \tr(\Lambda Q^TQ) = \sum_i \lambda_i
$$
and
$$
\frac 1n \one^TA\one = \frac 1n \one^TQ\Lambda Q^T\one \\
= (Q^T\one/\sqrt n)^T\Lambda (Q^T\one/\sqrt n) \\
= \sum_i \lambda_i \left\langle q_i, \frac 1{\sqrt n}\one\right\rangle^2
$$
where $q_i$ is the $i$th column of $Q$. $q_i$ and $\frac 1{\sqrt n} \one$ are both unit vectors so by the dot product formula
$$
\langle q_i, n^{-1/2}\one\rangle^2 = \left(\|q_i\|\cdot\|n^{-1/2}\one\| \cdot \cos \theta\right)^2 \\
= (\cos \theta)^2 \leq 1
$$
therefore since all $\lambda_i \geq 0$ I have
$$
\tr(A) = \sum_i \lambda_i \\
\geq \sum_i \lambda_i \langle q_i, n^{-1/2}\one\rangle^2 = \frac 1n \one^T A \one.
$$

$\square$

I can interpret this result by multiplying both sides by $n^{-1}$ to get $\frac 1n \tr(A) \geq \frac 1{n^2}\one^T A \one$ which says that for a PSD matrix the average of the diagonal is at least as big as the average of all the elements.

Result 2: $\tr (A) \geq \frac 1n \one^T A \one$ does not imply $A$ is PSD.

Pf: I’ll construct an example of where the inequality in Result 1 holds but $A$ is not positive semidefinite.

I’ll take $n=2$ so
$$
A = \begin{bmatrix}a & c \\ c & b \end{bmatrix}
$$
with $a,b,c\in\mathbb R$. By assumption I have $a+b \geq \frac 12 (a+b+2c)$ which implies
$$
a+b \geq 2c.
$$

Now I need to get the eigenvalues of $A$ since I’ll need $\lambda_1 > 0 > \lambda_2$.
$$
\begin{aligned}\det(A – \lambda I) &= \left\vert\begin{array}{cc}a – \lambda & c \\ c & b-\lambda \end{array}\right\vert \\&
= (a-\lambda)(b-\lambda) – c^2 \\&
= \lambda^2 – (a+b)\lambda + (ab-c^2) \\&
= \lambda^2 – (\tr A)\lambda + \det A\end{aligned}
$$
so the eigenvalues are given by
$$
\lambda = \frac{\tr A \pm \sqrt{(\tr A)^2 – 4 \det A}}{2}.
$$
I already know that a symmetric real-valued matrix has real eigenvalues, but I can see it algebraically here by
$$
(\tr A)^2 – 4 \det A = (a+b)^2 – 4(ab-c^2) \\
= a^2-2ab+b^2 + c^2 \\
= (a-b)^2 + c^2 \geq 0.
$$

I’ll take $a,b > 0$ which means $\lambda_1 > 0$ and then I’ll find appropriate $a,b,c$ to make $\lambda_2 < 0$. This means I need
$$
\tr A – \sqrt{(\tr A)^2 – 4\det A} < 0 \\
\implies (\tr A)^2 < (\tr A)^2 – 4 \det A \\
\implies \det A < 0
$$
which in retrospect is an obvious condition as $\det A = \lambda_1\lambda_2 < 0$ is equivalent to saying that $\lambda_1$ and $\lambda_2$ have different signs. Thus I now need to satisfy the following inequalities: $$ \begin{cases} a,b > 0 \\
a+b \geq 2c \\
ab < c^2 \end{cases} $$ I’ll take $a=1$ and see if I can find a valid triple. I now have $$ \begin{cases} b > 0 \\
b \geq 2c – 1 \\
b < c^2 \end{cases} $$ so this means I need to find a positive $b$ such that $$ c^2 > b \geq 2c-1.
$$
$c^2 – (2c -1) = (c-1)^2 \geq 0$ so I will have plenty of options for $b$. Taking $c=3$ I need $b$ in between $5$ and $9$ so I’ll just take $b=6$. This means
$$
A = \begin{bmatrix}1 & 3 \\ 3 & 6\end{bmatrix}
$$
is my matrix.

Just to check that I’m correct here, I have $\tr A = 7$ and $\frac 12 \one^T A \one = \frac{13}{2} = 6.5$ so the inequality holds, but $\det A = 6-9 = -3$ so the eigenvalues have different signs and $A$ is therefore indefinite.

$\square$

This shows that if I want PSD matrices it’s not enough to have a diagonal that’s on average larger than the average of all the entries, but I can consider a stronger condition that will guarantee me PSD matrices (although it in turn turns out to be too strong).

Before this, I’ll prove the following theorem relating row sums to eigenvalues and diagonal entries. I’m pulling all of this from the wikipedia article on this theorem:

Result 3: Gershgorin Circle Theorem: Suppose $A \in \mathbb C^{n\times n}$ has entries denoted by $a_{ij}$ and for $z \in \mathbb C$ and $r \in [0,\infty)$ let $D(z, r)\subset \mathbb C$ be the closed disk of radius $r$ centered at $z$. For $i=1,\dots,n$ let $R_i = \sum_{j\neq i}|a_{ij}|$. Then every eigenvalue of $A$ is in some disk $D(a_{ii}, R_i)$.

Pf: let $(\lambda, x)$ be an eigenpair of $A$. Let $i = \operatorname{argmax} |x_j|$. By definition $x \neq \0$ which means $x_i \neq 0$ so I can divide by it. This means $\tilde x := x / x_i$ is well-defined and by construction I’ll have $\tilde x_i = 1$ and then $|\tilde x_j| = \frac{|x_j|}{|x_i|}\leq 1$ for $j\neq i$.

Now
$$
A\tilde x = (Ax) / x_i = (\lambda x) / x_i = \lambda \tilde x
$$
so
$$
(A\tilde x)_i = \sum_j a_{i j}\tilde x_j = \lambda \tilde x_i = \lambda.
$$
This means
$$
a_{ii} + \sum_{j\neq i} a_{ij}\tilde x_j = \lambda
$$
so
$$
|\lambda – a_{ii}| = \left\vert\sum_{j\neq i} a_{ij}\tilde x_j\right\vert \\
\leq \sum_{j\neq i}|a_{ij}||\tilde x_j| \\
\leq \sum_{j\neq i}|a_{ij}| = R_i
$$
which means that $\lambda$, an arbitrary eigenvalue, is within $R_i$ of $a_{ii}$ for some $i$.

$\square$

The wikipedia article on this theorem gives an interpretation that I like: if a matrix has off-diagonal entries that are small relative to the diagonal then the matrix is close to just being diagonal and having its diagonal values be its eigenvalues. Thus if the diagonal entries are large and positive the matrix will be PD.

I’ll now define a matrix to be diagonally dominant (DD) if for $i=1,\dots,n$
$$
|a_{ii}| \geq \sum_{j\neq i} |a_{ij}|.
$$
If instead all the inequalities are strict, I will call it strictly diagonally dominant (SDD).

The Gershgorin circle theorem gives me a couple of immediate corollaries. First of all, in an SDD matrix no eigenvalue can be zero since no disk $D(a_{ii}, R_i)$ includes $0$, so SDD matrices are always invertible. Another corollary is that if $A$ is DD and all of the $a_{ii} \geq 0$ then $A$ is PSD as no eigenvalue can be less than zero; if these inequalities are strict then $A$ will be PD.

Result 4: There are PD matrices that are not diagonally dominant.

Pf: for $n \geq 3$ consider
$$
A_n := \one_n\one_n^T + \e I_n
$$
where $\e > 0$ is small (the condition on $n$ is because $A_2$ is SDD).

Using the matrix determinant lemma I have
$$
\det (\one\one^T – \lambda I) = (-\lambda)^n (1 – \one^T\one / \lambda) \\
= (-1)^n \lambda^{n-1}(\lambda – n).
$$
This means $\one_n\one_n^T$ has a single eigenvalue of $n$ and $n-1$ eigenvalues of $0$, so $A_n$ has one eigenvalue of $n+\e$ and $n-1$ eigenvalues of $\e$. $n+\e$ and $\e$ are both strictly positive so this establishes that $A_n$ is PD.

But $R_i^{(n)} = n-1$ and $|a_{ii}^{(n)}| = 1+\e$ but for $n \geq 3$ $1+\e \ngeq n-1$ which means that $A_n$ is PD but not DD.

$\square$

I have established in Result 1 a necessary but not sufficient condition for a PSD matrix, namely that it has a diagonal that is larger on average than the matrix in general. Result 3 gives the opposite direction of a sufficient but not necessary condition for being PSD, which is that the diagonal entries are large relative to the off-diagonal entries. These together give some intuition for what positive semidefinite matrices “look like”.

2 thoughts on “Diagonal vs. off-diagonal entries in PSD matrices”

  1. This is a great post. So a positive definite matrix has diagonal elements that dominate off diagonal entries. In a large dimensional positive definite matrix (n >> 3) if there is a condition that allows to safely ignore off diagonal elements in the analysis, computational burden will significantly reduce. I wonder if the author knows such conditions exist and show that condition in a next post.

    1. Thanks for the comment! If A is a PD matrix then the best diagonal approximation in Frobenius norm, say D, is just diag(A) (since ||A-D||_F^2 is a sum of squares so the smallest they can be is 0 for D_{ii} = A_{ii}). So maybe if the application can be related to ||A-diag(A)||_F^2 being sufficiently small? If I think of anything interesting I’ll post about it

Leave a Reply

Your email address will not be published. Required fields are marked *