Hodge theory and persistent homology

Persistent homology is a great way to get information about “approximate topology” of some space. The standard way to approach it is through filtrations of topological spaces. Say we have a simplicial complex $X$ with a filtration function $f: X \to \mathbb{R}$ . This filtration gives each simplex of $X$ a filtration level or weight. We then have a sequence of inclusions $i_a^b: X^a = f^{-1}((-\infty,a]) \to f^{-1}((-\infty, b]) = X^b$ for each pair of weight values $a \leq b$ in the filtration. This leads to a sequence of maps $H_k (i_a^b): H_k(X^a) \to H_k(X^b)$ , which turn out to decompose nicely in the sense that there exists a basis for each $H^k(X^a)$ so that for every $a \leq b$ the matrix representing $H_k i_a^b$ is diagonal: each basis element of $H^k(X^a)$ is either mapped to zero or to a basis element of $H^k(X^b)$ .

In essence, we can represent the way the homology of $X^a$ evolves by a collection of bars, where a bar represents a homology class that persists across a certain distance in the filtration of $X$ . Bars which persist for a long time then represent, perhaps, features of the filtered space which are close to being homology classes.

Another way to approach approximate topology of a weighted simplicial complex is with discrete Hodge theory. Here we start with a weighted simplicial complex $X$ ; the weights on $k$ -simplices give a natural inner product structure on $C_k(X;\mathbb{R})$ . This means that we can produce adjoints of the boundary maps $\partial_k$ . The Hodge Laplacian of $X$ is the graded operator $\Delta = (\partial + \partial^\ast)^2 = \partial \partial^\ast + \partial^\ast \partial$ . On $C_k$ , this restricts to $\Delta_k = \partial_{k+1}\partial_{k+1}^\ast + \partial_k^\ast\partial_k$ . The main theorem of discrete Hodge theory is that $\ker \Delta_k \simeq H_k(X;\mathbb{R})$ . The $k$ -cochains in $\ker \Delta_k$ are canonical representatives for homology classes of $X$ , and the eigenvectors of $\Delta_k$ with eigenvalues close to $0$ can be seen as representing $k$ -chains which are almost homology classes.

I’ve wanted a way to connect these two perspectives for a while now. I recently discovered a paper that provides a straightforward way of relating them. The paper is Persistent Homology and Floer-Novikov Theory by Michael Usher and Jun Zhang. On their way to building a connection between persistence and Floer theory, the authors develop a new way of looking at persistent homology that makes the analogy with Hodge theory much more evident.

The key is to express persistent homology in terms of a non-Archimedean singular value decomposition of the boundary operator. To see what this means, we’ll have to go through a little bit of background. Fortunately, we don’t need to go into all the generality that Usher and Zhang do.

Definition: A valuation on a field $K$ is a map $\nu: K \to \mathbb{R} \cup \{\infty\}$ such that

$\nu(x) = \infty$ iff $x = 0$ ,
$\nu(xy) = \nu(x) + \nu(y)$ , and
$\nu(x+y) \geq \min(\nu(x),\nu(y))$ .

These conditions make $N(x) = \exp(- \nu(x))$ into a (multiplicative) norm or absolute value on $K$ : $\exp(-\nu(x)) = 0$ iff $\nu(x) = \infty$ , $N(xy) = N(x)N(y)$ , and $N(x+y) \leq \max(N(x),N(y)) \leq N(x) + N(y)$ . Indeed, this is what you might call an ultrametric absolute value. Every field has the trivial valuation with $\nu(x) = \infty$ if $x = 0$ and $\nu(x) = 0$ otherwise, and this is the only valuation we will actually need.

Definition: A non-Archimedean normed vector space over $K$ is a vector space $C$ together with a filtration function $\ell: C \to \mathbb{R} \cup \{-\infty\}$ such that

$\ell(x) = -\infty$ iff $x = 0$ ,
$\ell(\lambda x) - \ell(x) - \nu(\lambda)$ for $\lambda \in K$ , and
$\ell(x + y) \leq \max(\ell(x),\ell(y))$ .

Taking $\|{v}\| = \exp( \ell(v))$ we get $\|x\| = 0$ iff $x = 0$ , $\|\lambda x\| = N(\lambda)\|x\|$ , and $\|x+y\| \leq \max(\|x\|,\|y\|)$ . Again, this is a non-Archimedean or ultrametric-type norm on $C$ .

A simplicial complex $X$ with a filtration function $f: X \to \mathbb{R}$ has a natural non-Archimedean norm on its space of $k$ -chains $C_k(X;K)$ , where $K$ has the trivial valuation. This norm is given by the filtration function $\ell(\sum_{i} a_i \sigma_i) = \max f(\sigma_i)$ for $i$ with $a_i \neq 0$ .

Definition: Let $C$ be a non-Archimedean normed vector space. Subspaces $V$ and $W$ of $C$ are orthogonal if $\ell(v+w) = \max(\ell(v),\ell(w))$ for all $v \in V$ , $w \in W$ . That is, $\|v + w\| = \max(\|v\|,\|w\|)$ . Compare this to orthogonality for an inner product space, where we have $\|v + w\|^2 = \|v\|^2 + \|w\|^2$ when $v$ and $w$ are orthogonal. Similarly, a tuple of vectors $(v_1,\ldots,v_r)$ is orthogonal if $\ell(\sum \lambda_i v_i) = \max_i (\lambda_i w_i)$ for all choices of $\lambda_1,\ldots,\lambda_r \in K$ .

Not all non-Archimedean normed vector spaces admit an orthogonal basis. However, if one exists, any basis can be orthogonalized via a sort of Gram-Schmidt algorithm. This orthogonalized basis is not unique.

We’re finally at the point where we can define a singular value decomposition. Let $C$ and $D$ be non-Archimedean normed vector spaces with orthogonal bases, and let $A: C \to D$ be a linear map of rank $r$ . A singular value decomposition of $A$ is a choice of orthogonal ordered bases $(y_1,\ldots,y_n)$ for $C$ , $(x_1,\ldots,x_n)$ for $D$ such that

$(y_r,\ldots,y_n)$ is an orthogonal ordered basis for $\ker A$
$(x_1,\ldots,x_r)$ is an orthogonal ordered basis for $\text{im } A$
$A y_i = x_i$ for $1 \leq i \leq r$
$\ell_C(y_1) - \ell_D(x_1) \geq \cdots \geq \ell_C(y_r)- \ell_D(x_r)$ .

The last condition is what determines the ordering for the singular values. If we write this in terms of norms, this says that $\|y_1\|/\|x_1\| \geq \cdots \geq \|y_r\|/\|x_r\|$ . The singular values should be the inverses of these, i.e., $\sigma_i = \|x_i\|/\|y_i\|$ .

Usher and Zhang show that these non-Archimedean SVDs exist, and can be computed via a Gaussian elimination-like algorithm. Now let’s interpret this when $A = \partial_{k+1}: C_{k+1}(X;K) \to Z_k(X;K) = \ker \partial_k$ , with the norm given by the simplicial filtration. We get bases $Y$ of $C_{k+1}(X;K)$ and $X$ of $Z_k(X;K)$ , and the basis $X$ has elements corresponding to $k$ -cycles which are either the boundary of some $(k+1)$ -chain, or are not. The $k$ -cycles which are never killed by a boundary are those $x_i$ for $i > r$ , and the others have $i \leq r$ , and hence are paired with basis elements $y_i$ of $C_{k+1}$ . A singular value $\sigma_i$ less than 1 corresponds to a pair of a $k$ -cycle $x_i$ and a $(k+1)$ -chain $y_i$ bounded by $x_i$ , where the $x_i$ is born at an earlier filtration level than $y_i$ . In persistent homology terms, we have a bar of length $\ell(y_i) - \ell(x_i)$ . The $x_i$ for $i > r$ correspond to bars starting at $\ell(x_i)$ and extending to infinity, because they are not the boundary of any $(k+1)$ -chain, no matter how large its norm.

So we’ve arrived at the remarkable fact that the non-Archimedean SVD of $\partial_{k+1}$ contains all the information in the barcode for the $k$ -dimensional persistent homology of the filtered complex $X$ . We’re nearly to the connection with the Hodge-theoretic version of approximate homology—all we need to do is recall the connection between the eigendecomposition of $\Delta_k$ and the singular value decomposition of $\partial_{k+1}$ .

Since $\ker \partial_k$ contains $\text{im } \partial_{k+1}$ , the eigenvalues of $\Delta_k$ split into three parts: the set of zero eigenvalues, the nonzero eigenvalues of $\partial_{k+1}\partial_{k+1}^*$ , and the nonzero eigenvalues of $\partial_{k}^*\partial_k$ . We define the up-Laplacian $\Delta_k^+ = \partial_{k+1}\partial_{k+1}^*$ and the down-Laplacian $\Delta_k^- = \partial_{k}^*\partial_k$ . The eigenvalues of $\Delta_k^+$ are the squares of the singular values of $\partial_{k+1}$ ; restricting to $\ker \partial_k = Z_k$ just means we ignore the subspace spanned by the nonzero eigenvalues of $\Delta_k^-$ . Returning to the interpretation of the SVD, we see that the eigenvectors of $\Delta_k^+$ with nonzero eigenvalues correspond to $k$ -cycles which are the boundaries of certain $(k+1)$ -chains; the eigenvalues tell us about the relative sizes of the cycles and the chains that they bound. In particular, an eigenvector with small eigenvalue corresponds to a $k$ -cycle which is only filled in by a large $(k-1)$ -chain.

What about the other nonzero eigenvalues, the ones that come from $\Delta_k^-$ ? I think the best way to look at these is to switch to cohomology. Eigenvectors of $\Delta_k^-$ form an orthogonal basis of $k$ -cocycles which are paired with an orthogonal basis of $(k-1)$ -cochains that they cobound. The eigenvalues are the relative norms of the cocycles and the cochains that kill them. A small eigenvalue represents a cocycle which almost fails to be a coboundary—it requires a lot of “energy” to produce it as a coboundary, perhaps.

The eigendecomposition of the full Hodge Laplacian $\Delta_k$ then contains representatives of the genuine homology classes of $X$ , as $\ker \Delta_k$ , as well as approximate homology (eigenvectors of $\Delta_k^+$ ) and approximate cohomology (eigenvectors of $\Delta_k^-$ ) of $X$ .

In any event, we have a formal analogy between discrete Hodge theory and persistent homology by way of two types of singular value decomposition. One might interpret the distinction between the two in terms of the way the size of chains are measured—Hodge theory explicitly works with $\ell^2$ norms to measure size, while persistent homology feels more like using an $\ell^\infty$ norm (although this is not exactly the case). But both approaches try to find “small” $k$ -cycles that are killed by “large” $(k+1)$ -chains. This relationship should be helpful in developing a deeper understanding and interpretation of both persistence and Hodge theory.