Diffusion on undirected and unweighted graphs

$\newcommand{\a}{\alpha}$$\newcommand{\L}{\mathcal L}$$\newcommand{\one}{\mathbf 1}$$\newcommand{\x}{\mathbf x}$$\newcommand{\0}{\mathbf 0}$Let $G = (V,E)$ be an undirected and unweighted graph on $n$ vertices. I’ll have $A\in\{0,1\}^{n\times n}$ be the adjacency matrix so $A$ is symmetric, $A_{ii} = 0$ for $i=1,\dots,n$, and for $i, j \in V$ $A_{ij} = 1 \iff (i,j)\in E$. I’ll use $i\sim j$ as a shorthand for $(i,j)\in E$. For each $i$ define $d_i = \sum_{j\sim i} 1 = \sum_{j=1}^n A_{ij}$ to be the degree of node $i$ and collect these into $D = \text{diag}(d_1,\dots,d_n)$. Finally, take $L = D-A$ to be the unweighted Laplacian, $\L = D^{-1/2}LD^{-1/2}$ to be the symmetric weighted Laplacian, and $P = D^{-1}A$. I’ll be assuming that there are no isolated verticies so I don’t have to worry about the invertibility of $D$. I’ll also be appealing to results about $\L$ for a lot of what follows and the proofs for many of these follow chapter 1 of Fan Chung’s Spectral Graph Theory.

Let $\a\in[0,1]$. I want to study the sequence $x^{(0)}, x^{(1)}, \dots \in \mathbb R^V$ formed by
$$
x^{(k)}_i = \a x_i^{(k-1)} + (1-\a)\frac 1{d_i}\sum_{j\sim i}x^{(k-1)}_j
$$
for some initial vector $x^{(0)}$. Thus if $x$ is my current vector, I’m iterating this procedure where I replace $x_i$ with a weighted average of $x_i$ and the average value of $x$ on node $i$’s neighbors.

This all can be written much more succinctly by noting that
$$
\begin{aligned}\frac 1{d_i}\sum_{j\sim i}x_j &= \frac 1{d_i}\sum_{j=1}^n A_{ij}x_j = d_i^{-1}A_i^Tx\\&
\implies x^{(k)} = (\a I + (1 – \a) D^{-1}A)x^{(k-1)} = (\a I + (1 – \a) P)^kx^{(0)}.\end{aligned}
$$
This makes it clear that what really matters is the matrix $R_\a := \a I + (1 – \a) P$, and in particular I need to investigate $R_\a^k$ for large $k$. Note that the rows of $R_\a$ sum to $1$ so it is a (right) stochastic matrix. That makes this effectively a DeGroot learning process (I say “effectively” because I’m not imposing any constraints on $x^{(0)}$ other than not being $\mathbf 0$). My ultimate goal here is to understand what properties of $G$ lead to $x^{(k)}$ converging on something, and what that something can be. The case for $\a = 1$ is not interesting as $R_1 = I$ so when necessary I will assume $\a < 1$ without making a big announcement about it.

I will use the eigendecomposition of $R_\a$ to study the behavior of $R_\a^k$ but $P$ and therefore $R_\a$ are not symmetric so it’s not immediately obvious to me that $R_\a$ is diagonalizable.

Result 1: $P$ and $R_\a$ are simultaneously diagonalizable.

Pf: $\L$ is real-valued and symmetric so by the standard spectral theorem it has an eigendecomposition as $\L = Q \Lambda Q^T$. Noting that
$$
\L = D^{-1/2}LD^{-1/2} = I – D^{-1/2}AD^{-1/2} = I – D^{1/2}PD^{-1/2}
$$
I can solve for $P$ to find
$$
P = I – D^{-1/2} Q\ \Lambda Q^TD^{1/2} = D^{-1/2} Q(I – \Lambda) Q^TD^{1/2}.
$$
Thus if $(\lambda, q)$ is an eigenpair of $\L$ then $(1-\lambda, D^{-1/2}q)$ is an eigenpair of $P$. Take $S = D^{-1/2} Q$ so $P = S(I-\Lambda)S^{-1}$ and note that while $P$ is diagonalized here, it is not generally by an orthogonal matrix.

Now for $R_\a$, I have
$$
R_\a = \a I + (1 – \a) P = S(\a I + (1 – \a) (I -\Lambda))S^{-1} \\
= S(I – (1-\a)\Lambda )S^{-1}
$$
so if $(1-\lambda, s)$ is an eigenpair of $P$ then $(1 – (1-\a)\lambda, s)$ is an eigenpair of $R_\a$ and both $R_\a$ and $P$ are diagonalized by $S$.

$\square$

This means $R_\a^k = S(I – (1-\a)\Lambda)^kS^{-1}$ so the convergence of $R_\a^k$ is governed by its eigenvalues $\rho_1,\dots,\rho_n$, with $\rho_i = 1 – (1 – \a)\lambda_i$. By default I ordered the eigenvalues of $\L$ as $\lambda_1 \leq \dots \leq \lambda_n$, so this means $\rho_1 \geq \dots \geq \rho_n$.

I can now prove some bounds on the $\lambda_i$ that will let me bound $\rho_i$ in a useful way, but before I do that I need the following lemmas.

Lemma 1: $x^T L x = \sum_{i\sim j}(x_i – x_j)^2$.

Pf:
$$
\begin{aligned}2 \cdot x^TL x &= 2\sum_i x_i^2 d_i – 2\sum_{ij}x_ix_jA_{ij} \\&
= \sum_i x_i^2 d_i – 2 \sum_{ij} x_ix_jA_{ij} + \sum_j x_j^2 d_j \\&
= \sum_{ij} A_{ij}(x_i^2 – 2 x_ix_j + x_j^2) \\&
= \sum_{ij} A_{ij}(x_i-x_j)^2 \\&
= 2 \sum_{i\sim j}(x_i – x_j)^2\end{aligned}
$$
where that $2$ appears because the sum over all $i$ and $j$ double counts each edge.

$\square$

Lemma 2: $(a – b)^2 \leq 2(a^2 + b^2)$.

Pf:
$$
\begin{aligned}2(a^2 + b^2) – (a-b)^2 &= 2a^2 + 2b^2 – a^2 + 2ab – b^2 \\&
= a^2 + 2ab + b^2 = (a+b)^2 \geq 0.\end{aligned}
$$
$\square$

Now I can prove a bound on all the eigenvalues.

Result 2: $\lambda_i \in [0,2]$ for all $i$.

Pf: From the Courant-Fischer theorem I know
$$
\lambda_1 = \inf_{x \in \mathbb R^V \backslash \{\0\}} \frac{x^T \L x}{x^T x}.
$$
$$
\frac{x^T \L x}{x^T x} = \frac{x^T D^{-1/2} L D^{-1/2}x}{x^T D^{-1/2} D D^{-1/2} x} = \frac{y^T L y}{y^T D y} \\
= \frac{\sum_{i\sim j} (y_i – y_j)^2}{\sum_i y_i^2 d_i}
$$
where I’ve used the change of variables $y = D^{-1/2} x$. This means that $\frac{x^T \L x}{x^T x} \geq 0$ so $\L$ is positive semi-definite (PSD), but it is necessarily PSD rather than PD as $y = \one \iff x = D^{1/2}\one$ leads to $\frac{x^T \L x}{x^T x} = 0$. Thus $\lambda_1 = 0$ and $\lambda_i \geq 0$ for all $i$.

For the upper bound, again by Courant-Fischer I know
$$
\lambda_n = \sup_{x \in \mathbb R^V} \frac{x^T \L x}{x^T x} = \sup_{y \in \mathbb R^V} \frac{y^T L y}{y^T D y} \\
= \sup_y \frac{\sum_{i\sim j} (y_i – y_j)^2}{\sum_i y_i^2 d_i}.
$$
By Lemma 2 I know $(y_i – y_j)^2 \leq 2(y_i^2 + y_j^2)$ so
$$
\lambda_n \leq \sup_y 2 \frac{\sum_{i\sim j} y_i^2 + y_j^2}{\sum_i y_i^2 d_i}.
$$
$$
\sum_{i\sim j} y_i^2 + y_j^2 = 2\sum_{i\sim j}y_i^2 \\
= \sum_{ij} A_{ij} y_i^2 = \sum_i y_i^2 d_i = y^T D y
$$
so all together
$$
\lambda_n \leq \sup_y 2 \frac{y^T D y}{y^T D y} = 2.
$$
Thus for each $i$ I know $\lambda_i\in[0,2]$.

$\square$

Corollary 1: $\rho_i \in [-1,1]$

Pf: I already worked out that $\rho_i = 1 – (1 – \a)\lambda_i$. This means
$$
0 \leq \lambda_i \leq 2 \\
\implies 0 \geq -(1 – \a) \lambda_i \geq -2(1 – \a) \\
\implies 2\a – 1 \leq \rho_i \leq 1.
$$
Thus for each $i$ I have $\rho_i \in [2\a – 1, 1]\subseteq [-1, 1]$.

$\square$

This is good news for me: since $|\rho_i| \leq 1$ for all $i$ I do not have to worry about $\|R_\a^kx\|\to\infty$ as $k\to\infty$. But it’s still not guaranteed that this has to converge. Suppose $\rho_n = -1$. Then in $R_\a^k$ I’ll have $\rho_n^k = (-1)^k$ which doesn’t converge to anything, so in turn $R_\a^kx$ may not converge.

Here’s an explicit example of that happening for the star graph on $8$ nodes, $S_8$. I took $\a = 0$ for this and the initial vector $x^{(0)}$ has a value of $1$ on the center of the star and a $-1$ on every other value.

What happens is that each application of $R_0$ just flips the sign of $x^{(0)}$ since every node is only connected to nodes with the negative of their value. In particular, no averaging happens from $P$.

I can see this in $R_0$:
$$
\begin{aligned}R_0 &= D^{-1}A = \left(\begin{array}{c|c}(n-1)^{-1} & \mathbf 0 \\ \hline \mathbf 0 & I_{n-1}\end{array}\right)\left(\begin{array}{c|c}0 & \one_{n-1}^T \\ \hline \one_{n-1} & \mathbf 0 \end{array}\right) \\&
= \left(\begin{array}{c|c}0 & (n-1)^{-1}\one^T \\ \hline \one & \mathbf 0 \end{array}\right).\end{aligned}
$$
Then
$$
\begin{aligned}R_0^3 &= \left(\begin{array}{c|c}0 & (n-1)^{-1}\one^T \\ \hline \one & \mathbf 0 \end{array}\right)^3 \\&
= \left(\begin{array}{c|c}1 & \mathbf 0\\ \hline \mathbf 0 & (n-1)^{-1}\one\one^T\end{array}\right)\left(\begin{array}{c|c}0 & (n-1)^{-1}\one^T \\ \hline \one & \mathbf 0 \end{array}\right) \\&
= \left(\begin{array}{c|c} 0 & (n-1)^{-1}\one^T\\ \hline \one^T & \mathbf 0 \end{array}\right) = R_0\end{aligned}
$$
Thus $R_0^3 = R_0$ is what causes these oscillations. $R$ is not invertible so this doesn’t mean $R^2 = I$, but it does mean that $R$ is its own pseudoinverse.

So when is this non-convergence possible? It turns out that both $\a$ and the graph structure matter.

For $\a$, I just need to note that if $\a > 0$ then I’ll have
$$
\rho_n = 1 – (1-\a)\lambda_n > 1 – \lambda_n \geq -1
$$
so it’s only possible to achieve $\rho_n = -1$ if $\a = 0$. Intuitively, this tells me that when there’s some damping to the averaging (as provided by $\a$) then I can’t get this oscillation that never settles.

But even if $\a = 0$ it’s not guaranteed that non-convergence is possible, since $\rho_n = -1 \iff \lambda_n = 2$ when $\a = 0$, but $\lambda_n = 2$ is not guaranteed as shown by example in the next result.

Result 3: The complete graph $K_n$ on $n$ vertices has $\lambda_n = \frac{n}{n-1}$ which is less than $2$ for $n > 2$.

Pf: I’ll prove something stronger, that the eigenvalues of $K_n$ are the multiset $\left\{0, \frac{n}{n-1}, \dots, \frac{n}{n-1}\right\}$.

Let $A$ be the adjacency matrix of a complete graph. Then $A = \one\one^T – I$ and $D = (n-1)I$ so
$$
\L = I – D^{-1/2}AD^{-1/2} = I – (n-1)^{-1}(\one\one^T – I) \\
= \frac{n}{n-1}I – \frac{1}{n-1}\one\one^T.
$$
By the matrix-determinant lemma I know that the characteristic polynomial is
$$
\begin{aligned}p_\L(\lambda) &= \det \left(\left(\frac{n}{n-1} – \lambda\right)I – \frac{1}{n-1}\one\one^T\right) \\&
= \left(1 -\frac{n}{n-1}\left(\frac{n}{n-1} – \lambda\right)^{-1}\right)\left(\frac{n}{n-1} – \lambda\right)^n.\end{aligned}
$$
Technically I need to assume $\lambda \neq \frac{n}{n-1}$ since otherwise $\left(\frac{n}{n-1} – \lambda\right)I$ can be singular. Then I can find what $p_\L$ is on $\mathbb R\backslash\left\{\frac{n}{n-1}\right\}$, and since $p_\L$ is guaranteed to be a polynomial I can use continuity to plug in the value at $\lambda = \frac{n}{n-1}$. But since I know this works out I’m not going to bother with those extra steps.

Simplifying the left-hand factor in $p_\L$ I have
$$
\begin{aligned}1 -\frac{n}{n-1}\left(\frac{n}{n-1} – \lambda\right)^{-1} &= 1 -\frac{n}{n-1}\frac{1}{\frac{n}{n-1} – \lambda} \\&
= 1 – \frac{n}{n – (n-1)\lambda} = \frac{n – (n-1)\lambda – n}{n – (n-1)\lambda} \\&
= \frac{-\lambda}{\frac{n}{n-1} – \lambda}\end{aligned}
$$
so all together I have that the eigenvalues of $\L$ are the solutions of
$$
\lambda\left(\frac{n}{n-1} – \lambda\right)^{n-1} = 0
$$
and so long as $n > 2$ this graph does not have $\lambda_n = 2$.

$\square$

So it’s clear that some graphs allow for $\lambda_n = 2$ while others don’t. It turns out that it is possible to characterize exactly when $\lambda_n = 2$. I’ll need a lemma first:

Lemma 3: if $G$ has $K$ connected components then the spectrum of $\L$ is exactly the union (with multiplicity) of the spectra of the $\L_i$, the symmetric Laplacians for each piece.

Pf: suppose $G$ has $K$ components. Then $A$ can be permuted so that it’s block diagonal, and in turn $L$ and $\L$ will be block diagonal (and this permutation does not change the eigenvalues) so I can write $\L = \text{diag}\left(\L_1,\dots,\L_K\right)$. For any block diagonal matrix $M = \text{diag}(M_1,\dots,M_k)$ I’ll have $\det M = \prod_{i=1}^k\det M_i$ so
$$
\det(\L – \lambda I) = \prod_{i=1}^K \det(\L_i- \lambda I) \\
\implies p_\L(\lambda) = \prod_{i=1}^K p_{\L_i}(\lambda)
$$
where for a square matrix $M$ $p_M(\lambda)$ is the characteristic polynomial of $M$. This means that the roots of $p_\L$ are exactly the roots of any of the individual $p_{\L_i}$ and the multiplicity of any one root $\lambda$ is the sum of the multiplicities of $\lambda$ as a root in each $p_{\L_i}$.

$\square$

Result 4: $\lambda_n = 2$ if and only if $G$ has a non-trivial bipartite connected component.

Pf: WLOG I can take $G$ to have a single connected component because, by Lemma 3, $2$ appears as an eigenvalue of $\L$ if and only if it appears as an eigenvalue of the symmetric Laplacian of a component of $G$. Furthermore I have been assuming throughout that I have no isolated vertices so I can’t have a trivial connected component as that’s exactly an isolated node. Thus it’s sufficient for me to show that for a graph $G$ with one component I have $\lambda_n = 2$ if and only if $G$ is bipartite.

Suppose $G$ is bipartite and let $V = B_1\cup B_2$ be the partition of $V$ into two non-empty pieces so that the only edges in $G$ are between $B_1$ and $B_2$. Consider $y \in \mathbb R^V$ defined by
$$
y_i = \begin{cases}1 & i \in B_1 \\ -1 & i \in B_2 \end{cases}.
$$
Then for every edge $(i,j)\in E$ I’ll have $(y_i – y_j)^2 = 4$ so
$$
\frac{y^T L y}{y^T D y} = \frac{4 \sum_{i\sim j} 1}{\sum_i d_i} = \frac{2\one^T A \one}{\one^T A \one} = 2
$$
which means $\lambda_n$ attains its maximal value of $2$.

Now instead suppose $\lambda_n = 2$ and let $x$ be an eigenvector of $2$ for $\L$. Take $y = D^{-1/2}x$ so I have $y^T L y / y^T D y = 2$. In the proof of Result 2 I showed $\lambda_n \leq 2$ by using the fact that $\sum_{i\sim j} (y_i – y_j)^2 \leq 2\sum_{i\sim j}(y_i^2 + y_j^2)$. Suppose there is an edge $(i,j)$ such that $(y_i – y_j)^2 < 2(y_i^2 + y_j^2)$. Then I have
$$
\frac{\sum_{i\sim j} (y_i – y_j)^2}{y^T D y} < \frac{2\sum_{i \sim j} (y_i^2 + y_j^2)}{y^T D y} = 2
$$
which is impossible. This means that I need $(y_i – y_j)^2 = 2(y_i^2 + y_j^2) \implies (y_i + y_j)^2 = 0$ for every edge $(i,j)$. In other words, $y$ changes sign over every edge.

Suppose $y$ has a zero element and WLOG take it to be $y_1$. Then for any $j\in V$ with $j\sim 1$ I need $(0 + y_j)^2 = y_j^2 = 0 \implies y_j = 0$. Right now I’m assuming $G$ has only a single component so this process continues to make $y = \mathbf 0$. This is a contradiction so it must be that $y$ has no zero elements.

Now let $B_1 = \{i \in V: y_i > 0\}$ and $B_2 = \{i \in V : y_i < 0\}$. If one of these was empty then every $(y_i + y_j)^2$ would actually be positive so both are non-empty. And by construction all edges are between them so $G$ is bipartite with $B_1$ and $B_2$ as the bipartite pieces.

$\square$

So if $G$ has no non-trivial bipartite components then I am guaranteed that $\rho_n > -1$ and that means that the $x^{(k)}$ will converge to something. My example star graph $S_8$ was bipartite so it was an example of non-convergence due to $\rho_n = -1$, and in Result 3 $K_2$ is both complete and bipartite so that’s why it has $\lambda_n = 2 = \frac{n}{n-1}$.

At this point if $G$ has no trivial or bipartite components, or if $\a > 0$, then I know
$$
R_\a^k \to S\left(\begin{array}{c|c}I_m & \mathbf 0 \\ \hline \mathbf 0 & \mathbf 0 \end{array}\right)S^{-1} \\
= D^{-1/2}Q\left(\begin{array}{c|c}I_m & \mathbf 0 \\ \hline \mathbf 0 & \mathbf 0 \end{array}\right)Q^TD^{1/2}
$$
so $R_\a^k$ converges on a rank $m$ matrix, where $m$ is the multiplicity of $1$ in the eigenvalues of $R_\a$. I already showed $\lambda_1=0$ no matter what so this means $\rho_1 = 1-(1-\a)\lambda_1 = 1$ therefore $m \geq 1$.

It turns out I can completely characterize $m$ too.

Result 5: if $G$ has a single connected component then $\lambda_2$, the second smallest eigenvalue of $\L$, is positive.

Pf: In the proof of Result 2 I saw that $(0, D^{1/2}\one)$ is an eigenpair of $\L$. Let $x$ be another eigenvector of $0$ for $\L$ and take $y = D^{-1/2}x$. Then
$$
L y = D^{1/2}D^{-1/2}LD^{-1/2} x = \mathbf 0
$$
so $(0, y)$ is an eigenpair of $L$. In particular this means $y^T L y = 0$, which by Lemma 1 gives me
$$
\sum_{i\sim j}(y_i – y_j)^2 = 0
$$
so for any $(i,j) \in E$ I have $y_i = y_j$. WLOG take $y_1 \neq 0$ (I can do this since $y = \0 \implies x = \0$ which contradicts $x$ being an eigenvector). Then for any $j \in V : j \sim 1$ I’ll have $y_j = y_1$, and so on for the neighbors of each of these $j$ and etc. Since $G$ is connected by assumption, there is a path from node $1$ to every other node so this process eventually hits every node which means $y = y_1 \one \implies y \in \text{span }\one$.

Returning to $\L$, this means $x = D^{1/2}y \in \text{span }D^{1/2}\one$ so my arbitrary eigenvector $x$ is in the span of the one eigenvector of $0$ that I knew about. This means that the geometric multiplicity of $0$ is $1$ for $\L$. And since $\L$ is diagonalizable this also means that the algebraic multiplicity of $0$ is $1$ so $\lambda_2 \neq \lambda_1$. By $\L$ being PSD this particularly means $\lambda_2 > \lambda_1 = 0$.

$\square$

Corollary 2: $m$, the multiplicity of $1$ as an eigenvalue of $R_\a$, is equal to the number of connected components $K$.

From Lemma 3 I know that if $G$ has $K$ connected components with symmetric Laplacians $\L_1, \dots, \L_K$ then the spectrum of $\L$ is the union with multiplicity of the spectra of each $\L_i$. Each $\L_i$ corresponds to a connected graph so it contributes exactly one eigenvalue of $0$ so for $\L$ the multiplicity of $0$ is exactly $K$.

Then $R_\a = S(I – (1-\a)\Lambda )S^{-1}$ and I just showed that $\Lambda$ will have exactly $K$ diagonal entries of $0$. This in turn leads to there being exactly $K$ diagonal entries of $I – (1-\a)\Lambda$ that are equal to $1$.

$\square$

At last I know everything I need to know to answer my final question. Let $V = C_1 \cup \dots \cup C_m$ be the partition of the nodes into their connected components, and define $\one_{C_i}\in \mathbb R^V$ by
$$
\one_{C_i}(j) = \begin{cases} 1 & j \in C_i \\ 0 & \text{o.w.}\end{cases}.
$$
Consider $R_\a\one_{C_i}$:
$$
R_\a\one_{C_i} = \a \one_{C_i} + (1-\a)D^{-1}A\one_{C_i} = \one_{C_i}
$$
so $\one_{C_i}$ is an eigenvector of $1$. Furthermore, the $\{\one_{C_1}, \dots, \one_{C_m}\}$ are orthogonal so $\{|C_1|^{-1/2}\one_{C_1}, \dots, |C_m|^{-1/2}\one_{C_m}\}$ forms an orthonormal basis for the eigenspace of $1$ for $R_\a$. This means
$$
R_\a^k \to \sum_{i=1}^m |C_i|^{-1}\one_{C_i}\one_{C_i}^T
$$
which is block diagonal according to the connected components of $G$, and then
$$
R_\a^k x \to \sum_{i=1}^m \bar x_{C_i}\one_{C_i}
$$
where $\bar x_{C_i}$ is the average of $x$ on component $C_i$. Thus within each component this averaging still takes place but different components act independently. It’s interesting that $\a$ doesn’t affect the actual result. It’s only role is to ensure that convergence happens by preventing unending oscillations.

So that’s it: provided $\rho_n > -1$, which necessarily happens when $\a > 0$ or if $G$ is not bipartite, then this process settles on the average within each connected component.

Here’s an example of this convergence happening in a graph with two connected components. I set $\a = .9$ so that the convergence would be on the slower side just to make the gif a little more interesting.

Leave a Reply Cancel reply