Could Network Modeling be the New Fibonacci Spiral?

Posted on December 26, 2021

Keywords: graph theory, network science, complexity

Network models are getting recognized everywhere and increasingly more attention is being focused on understanding complex systems using the network paradigm. The whole literature emerging from such studies over the past several years has deservedly been dubbed as Network Science. To an untrained eye, network science may at first glance appear to be a superficial extension of graph theory. However, sampling the theories in the two groups reveals something different is going on. This is obvious to the experts but it may still be a good idea to make this difference more explicit even if only from an expository point of view but more particularly from a pedagogical standpoint for the early initiate who may find this line of research relevant and even promising.

One look at the importance afforded to the topic of \(``\)Networks and Complex Systems\("\) over the past few years in the popular journal called Physical Review E should be enough to drive home the point that quite a few serious researchers are devoting much of their careers to investigating the network hypothesis. Moreover, in an era of algorithm engineering, empirical algorithmics, theoretical analysis of algorithms, the allure of network science only enhances and that too significantly, even as avenues for more traditional academic research in STEM departments continually keep becoming less navigable.

The goal here is to pick up the most basic theory one can come across in graph theory and network science because even that little bit of understanding goes a long way in developing research perspectives in these areas while connecting them to real-world applications. For the purpose of this article, one can treat graph theory and graph algorithms under the same heading, say GT&A. By graph algorithms, I refer to all of the relevant combinatorial optimization algorithms. Network science, however, is the result of the statistical mechanician unleashing his toolset to understand complexity but more on that later.

Skimming Across Graph Theory

The most basic proof we can find in graph theory pertains to Euler tours. Euler was studying the famous problem now known as Seven Bridges of Königsberg and the authoritative version calls this moment the inception of graph theory. This gave rise to the theory of Eulerian circuits.

But first, we will need some rudimentary definitions that appear in any standard¹ graph theory book. A walk (of length \(k\)) in graph \(G=(V,E)\) is a non-empty alternating sequence \(v_{0} e_{0} v_{1} e_{1} \ldots e_{k-1} v_{k}\) of vertices and edges in \(G\) such that \(e_{i}=\left\{v_{i}, v_{i+1}\right\} \,\,\forall\ i < k\). If \(v_{0}=v_{k}\), the walk is closed. A trail is a walk without repeated edges. A path is a walk without repeated vertices. A closed walk in a graph is called an Euler tour if it traverses every edge of the graph exactly once. A graph is Eulerian if it admits an Euler tour.

Now Euler's theorem states that the necessary and sufficient condition for a connected graph to be Eulerian is for every vertex in the connected graph to have even degree. Stated in a slightly more mathematical sense, we are asserting that a connected graph is Eulerian iff every vertex in the connected graph has an even degree.

This theorem was already conceived as far back as \(1736\) by none other than Euler himself. Euler easily proved the necessity part by inspection because a vertex appearing \(x\) times in an Euler tour must have degree \(2x\). It was the sufficiency part that eluded him or a better way to say this would be that Euler did not consider this a serious problem in mathematics otherwise he would definitely have been able to produce an elegant proof and graph theory would have started flourishing much earlier than it actually did. The constructive proof as explained below along with the resulting linear-time complexity graph algorithm to find Euler tours was a stroke of genius when it appeared several decades later in \(1871\) due to Carl Hierholzer's labours just before his death. What a tragedy!

The following proof for the if-part appears in many textbooks today² but it is helpful to reproduce it here. This helps us quickly see in one place how graph theory and network science differ. So let us begin by invoking the strong principle of induction. This style of reasoning is a staple in graph theory, in fact also in all of sciences and mathematics. Here it is apt to remember what Poincare³ said: \(``\)Mathematicians proceed then by construction, constructing more and more complicated combinations of elements. Next, through the analysis of these combinations, of these sets so to speak, to their primitive elements, they perceive the relations between these elements and from them deduce the relations of the sets themselves. This is a purely analytic approach\(\ldots\)we have to work our way back from the particular to the general, climbing up the ladder. The analytic process by construction does not require us to delve down, but rather leaves us at the same level. We can ascend only by mathematical induction, for it alone can teach us something new. Without the help of this induction, different from physical induction in some respects but just as productive, construction would be incapable of creating science. In closing, let us note that induction is possible only if a given operation can be repeated indefinitely, which is why chess theory will never become a science, since the different moves of a given game do not resemble one another.\("\)

So let the proof begin. Suppose the degree of each vertex is even. We will show that there is an Euler tour by induction on the number of edges \(m\) of the graph. Note that here we simply state \(m\) to be our induction parameter but this is generally the most crucial step when giving a proof by induction: how to decide what should be our induction parameter?

For the base case we take the graph \(G\) with \(2\) vertices with \(2\) edges between them. Verifying that this graph is Eulerian is easy and concludes our basis step for the sufficiency proof.

Now suppose we have a graph \(G\) with \(m>2\). Let us start at an arbitrary vertex \(v\) and follow edges by arbitrarily selecting one edge after another. But we impose a constraint that we must select each edge only once. Now what is the guarantee that you will be able to traverse each edge only once which is what this constraint demands? The guarantee comes from the fact that if you enter a vertex through one of its edges, there will be another edge that has not been used (remember every degree is even).

One can trace several such walks starting at vertex \(v\). Out of all such walks, focus on those which maximize the total length travelled assuming each edge has unit weight. Denote one of such maximal walks as \(W\). Let \(F\) be the set of its edges.

Let us try to prove that \(W\) must return to \(v\). Here we prove by contradiction. So suppose \(W\) ends in a vertex \(v^{\ \prime}\), then it must have traversed every edge of vertex \(v^{\ \prime}\) otherwise the walk can be extended. But if after entering \(v^{\ \prime}\) you cannot exit it using some unused edge, then \(v^{\ \prime}\) does not have even degree which is the contradiction that we are seeking. Hence, \(W\) must end at \(v\).

Now there can be two cases. The trivial case which we would like to occur is when \(F=E(G)\) because then \(W\) is an Euler tour, and we are done. However, in general, that will not be the case. So let us delete the edges in \(F\), thus obtaining the modified graph \(G^{\ \prime}:= G-F\).

The resulting graph will either remain a single component or get decomposed into multiple smaller components \(C_1,C_2,\cdots,C_k\). Now in each of these components, there may be one or more vertices that were on the walk \(W\). The deletion of the edges of \(W\) reduces the number of original edges of each such vertex by an even number because it is easy to verify that every vertex on the walk \(W\) is adjacent to an even number of edges belonging to that walk. Since the original number of edges for each such vertex itself is even, the final number of edges for each such vertex in each component will also be even. So, each component satisfies the requirements of the induction hypothesis: (a) connected, (b) less than \(m\) edges, and (c) every degree is even and so each component has an Euler tour \(\tau_1,\tau_2,\cdots,\tau_k\).

Now since the original graph \(G\) is connected, there must be a vertex \(v_i\) in each component \(C_i\) such that it is adjacent to both \(W\) and \(\tau_i\). So to construct the Euler tour for the original graph, we may simply begin by traversing each vertex of the walk until a vertex \(v_i\) is encountered whereupon we must execute the Euler tour \(\tau_i\). Thus, by the principle of induction, we are done with the sufficiency proof.

Apart from such proofs by induction, graph theory is replete with famous duality theorems and it would be amiss to skip a mention of these. The simplest one is due to Kőnig (note the name coincidence from before?) which states that the maximum cardinality of a matching in \(G\) is equal to the minimum cardinality of a vertex cover of its edges. Here, we assume \(G\) to be a bipartite graph with bipartition \(A\) and \(B\), and call a set \(M\) of independent edges a matching.

Its proof is not that long-winded, and at the same time it gives an essential glimpse into graph theory. The proof involves understanding what an augmenting path is which in turn requires knowing what an alternating path is. An alternating path is a path that starts an an unmatched vertex and then goes along one of the unmatched edges of the starting vertex. Thereafter, the path keeps alternating between matched and unmatched edges like a snake slithering down a zigzag course in the matched bipartite graph. When such an alternating path terminates in an unmatched vertex, we call such a path an augmenting path because switching its unmatched edges to matched edges and vice versa creates a matching whose cardinality is increased by one.

In order to prove Kőnig's theorem, begin by considering a matching \(M\) in \(G\) of maximum cardinality for such a maximal matching must naturally exist. It is obvious that since any vertex cover \(U^{\prime}\) of \(E\) must cover \(M\), so there cannot be any vertex cover with fewer than \(|M|\) vertices so \(|U^{\prime}|\geq |M|\). Therefore, to prove Kőnig's theorem all we need to do now is to construct set \(U\) of \(|M|\) vertices covering the edge-set \(E\) because doing this produces a vertex cover with the minimum possible cardinality and this minimum cardinality is \(|M|\) by construction. However, we already know that \(|M|\) is also the maximum cardinality of a matching in \(G\) and so we will be done with the proof.

To construct such a set \(U\), let us propose the following rule which will be justified later: for every matched edge in \(M\) insert its end in \(B\) in set \(U\) if some alternating path ends in that vertex, otherwise its end in \(A\) goes in set \(U\). It follows that \(|U|=|M|\). As a direct consequence of this rule, also note that if an alternating path \(P\) ends in a vertex \(b \in B\), then \(b\) is clearly matched to some \(a \in A\) otherwise \(P\) will be an augmenting path which cannot happen because \(M\) is a maximal matching, hence \(b\in U\).

Now to prove that \(U\) thus constructed indeed covers \(E\), note that it is obvious from construction that all matched edges will be covered. So consider an unmatched edge \(ab \in E\). We have to show that \(U\) covers edge \(ab\). Now we must perform an exhaustive case-by-case analysis as follows

Case \(1\): \(a\in U\). Then \(U\) trivially covers edge \(ab\).
Case \(2\): \(a\notin U\). Here we have to show that \(b\in U\).
Case \(2.1\): If \(a\) is an unmatched vertex, then \(ab\) being an unmatched edge is an alternating path that ends in \(b\) so \(b\in U\) from construction.
Case \(2.2\): If \(a\) is a matched vertex, then \(ab^{\ \prime} \in M\) for some \(b^{\ \prime} \in B\). Now one of \(a\) or \(b^{\ \prime}\) must be in \(U\). Since \(a\notin U\), there exists an alternating path \(P\) ending in \(b^{\ \prime}\).
Case \(2.2.1\): If \(b\in P\) then \(P\) trivially contains an alternating path that ends in \(b\) and so \(b\in U\).
Case \(2.2.2\): If \(b\notin P\) then \(Pb^{\ \prime}ab\) is an alternating path that ends in \(b\) and so \(b\in U\).

There are two other famous graph-theoretic duality theorems, all of which we mention together for the sake of completeness and also because there is more to this Trinity of nested dualities than meets the eye.

Kőnig theorem: The maximum cardinality of a matching in bipartite graphs is equal to the minimum cardinality of a vertex cover of its edges.
Menger theorem: The minimum number of vertices separating two sets of vertices in a graph is equal to the maximum number of disjoint paths that can be found between them.
Ford–Fulkerson theorem: In every network, the maximum total value of a flow equals the minimum capacity of a cut.

Skimming Across Network Science

Now let us pivot to the domain of network science. There is a formidable literature of social network analysis in sociology but it is the physics point of view that I am interested in, and that is what most people essentially refer to as network science. The simplest theory one can encounter in network science has to be due to Erdős and Rényi. Their proposed graph ensemble model⁴ shall be denoted here by \(\mathcal{G}^{\ p}\), or to be more expansive we may also use \(\mathscr{G}(n,p)\).

Network science prefers to call vertices nodes, edges links, and though it allows for node and edge weights just like graph theory, most of the times it requires the assignment of the more involved node and edge attributes to complete the modeling part, even requiring multiple layers of graphs overlaid on top of each other. The Erdős\(-\)Rényi ensemble refers to the set of all simple, undirected graphs on \(n\) nodes for which each of the \(\binom{n}{2}\) possible edges is decided by a coin toss. The coin can land heads with probability \(p\) which means the edge is allowed to be present in that particular instance of this random graph-generating mechanism. This should make it obvious where the word random comes from in the phrase \(``\)Erdős-Rényi random graph model.\("\) More specifically, the elements of the adjacency matrix \(A_{ij}\) corresponding to any particular graph in \(\mathscr{G}(n,p)\) will also be scalar random variables. More succinctly, we can visualize the adjacency matrix itself as being a random matrix variable \(\mathbf{A}\). Then it is not hard to see why the following holds

\[\LARGE p(\mathbf{A})=\prod\limits_{i < j}p(A_{ij}))=\prod\limits_{i < j}\left[p \delta_{A_{i j}, 1}+(1-p) \delta_{A_{i j}, 0}\right]\]

Here we have employed the standard Kronecker delta notation which is as follows

\[\LARGE \delta_{xy} = \begin{cases} 0 &\text{if } x \neq y, \\ 1 &\text{if } x=y. \end{cases}\]

There is also an integral representation for the Kronecker delta that we will find handy. It can be derived using the methods of complex analysis.

\[\LARGE \delta_{x,y} = \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ (x-y)\varphi} \,d\varphi\]

In an undirected graph, we need only work with the lower triangular matrix because the other half is automatically known by symmetry. Exploiting the fact that \(A_{ij}\) are Boolean random variables, we can simplify the above expression using a common trick as follows

\[\LARGE p(\mathbf{A})=\prod\limits_{i < j}\left[p^{A_{i j}}(1-p)^{1-A_{i j}}\right]=p^{\sum\limits_{i < j} A_{i j}}(1-p)^{\frac{1}{2} n(n-1)-\sum\limits_{i < j} A_{i j}}\] \[\LARGE p(\mathbf{A})=p^{L(\mathbf{A})}(1-p)^{n(n-1) / 2-L(\mathbf{A})}\]

Here we are keeping a record of the number of links in the network with the help of the symbol \(L(\mathbf{A})\) simply defined to be \(\sum\limits_{i < j}A_{ij}\). Thus, all those random graphs in \(\mathscr{G}(n,p)\) which share the same number of links have equal \(L(\mathbf{A})\) and hence equal \(p(\mathbf{A})\). Thus, graphs with the same number of links are assigned equal probabilities. The mean degree \(\overline{k}(\mathbf{A})\) of the individual graphs in the Erdős-Rényi random graph ensemble is itself a random variable by being a function of the random matrix \(\mathbf{A}\). Thus, averaging over the entire ensemble gives

\[\LARGE \langle\overline{k}(\mathbf{A})\rangle =\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(\mathbf{A}) \overline{k}(\mathbf{A})\]

This follows from the usual definition of the expectation operator. Note that the summation is over all the graphs in the ensemble. Then, we invoke the usual definition of averaging to substitute for \(\overline{k}(\mathbf{A})\) as follows

\[\Large \langle\overline{k}(\mathbf{A})\rangle= \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(\mathbf{A}) (\frac{1}{n} \sum\limits_{i} k_{i}(\mathbf{A}))=\frac{1}{n} \sum\limits_{i} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(\mathbf{A}) k_{i}(\mathbf{A})\]

Now noting that we assumed the graph to be simple, thereby ruling out self-loops, we have \(\large k_{i}(\mathbf{A})=\sum\limits_{j:j \neq i}A_{i j}\) from which it follows that

\[\Large \langle\overline{k}(\mathbf{A})\rangle=\frac{1}{n} \sum\limits_{i \neq j} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(\mathbf{A}) A_{i j}\]

where \(\large\sum\limits_{i \neq j}\) is actually the double summation \(\large\sum\limits_{i}\sum\limits_{j:j \neq i}\). Now we already saw that \(p(\mathbf{A})=\prod\limits_{q < \ell} p\left(A_{q \ell}\right)\), so substituting it gives

\[\Large \langle\overline{k}(\mathbf{A})\rangle =\frac{1}{n} \sum\limits_{i \neq j} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}}\left[\prod\limits_{q<\ell} p\left(A_{q \ell}\right)\right] A_{i j}=\frac{1}{n} \sum\limits_{i \neq j} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}}\left[ p\left(A_{ij}\right) \prod\limits_{\substack{q<\ell \\ (q,\ell)\neq (i,j)}}p\left(A_{q \ell}\right)\right] A_{i j}\] \[\Large =\frac{1}{n} \sum\limits_{i \neq j} \sum\limits_{A_{i j} \in\{0,1\}}\sum\limits_{A_{q\ell} \in\{0,1\}}\left[ p\left(A_{ij}\right) \prod\limits_{\substack{q<\ell \\ (q,\ell)\neq (i,j)}}p\left(A_{q \ell}\right)\right] A_{i j}\] \[\Large =\frac{1}{n} \sum\limits_{i \neq j} \sum\limits_{A_{i j} \in\{0,1\}} p\left(A_{i j}\right) A_{i j} \sum\limits_{A_{q\ell} \in\{0,1\}} \prod\limits_{\substack{q<\ell \\ (q,\ell)\neq (i,j)}}p\left(A_{q \ell}\right)\] \[\Large =\frac{1}{n} \sum\limits_{i \neq j}\left[\sum\limits_{A_{i j} \in\{0,1\}} p\left(A_{i j}\right) A_{i j}\right] \times \prod\limits_{\substack{q<\ell \\ (q,\ell)\neq (i,j)}}\left[\sum\limits_{A_{q \ell} \in\{0,1\}} p\left(A_{q \ell}\right)\right]\]

But we can see that

\[\Large \sum\limits_{A_{i j} \in\{0,1\}} p\left(A_{i j}\right) A_{i j}=p\left(A_{i j}=0\right) \cdot 0+p\left(A_{i j}=1\right) \cdot 1=p\]

Also we have

\[\Large \sum\limits_{A_{q \ell} \in\{0,1\}} p\left(A_{q \ell}\right)=p\left(A_{q \ell}=0\right)+p\left(A_{q \ell}=1\right)=(1-p)+p=1\]

So we can simplify the above cumbersome looking expression for mean connectivity as follows

\[\Large \langle\overline{k}(\mathbf{A})\rangle = \frac{1}{n} \sum\limits_{i \neq j}p \times \prod\limits_{\substack{q<\ell \\ (q,\ell)\neq (i,j)}} (1)=\frac{p}{n} \sum\limits_{i \neq j}1=\frac{p}{n}n(n-1)\]

Substituting for \(\langle\overline{k}(\mathbf{A})\rangle\) the shorter and more used notation \(\langle k\rangle\) we obtain a simple yet important insight as follows

\[\LARGE \langle k\rangle=p(n-1)\]

We could have arrived at this more simply by using a less involved and more intuitive argument. A node can only be joined to the remaining \((n-1)\) nodes and probability of each such edge is \(p\) so a node will have \(p(n-1)\) edges on an average. But the rigorous argument is favored for theory building. Now recall that

\[\LARGE p(\mathbf{A})=\prod\limits_{i < j}\left[p \delta_{A_{i j}, 1}+(1-p) \delta_{A_{i j}, 0}\right]\]

So if \(\langle k\rangle\) is known, then we get the following parametric version of the graph probability distribution in the Erdős-Rényi ensemble which will come handy later.

\[\LARGE p(\mathbf{A} \mid\langle k\rangle)=\prod\limits_{i < j}\left[\frac{\langle k\rangle}{n-1} \delta_{A_{i j}, 1}+\left(1-\frac{\langle k\rangle}{n-1}\right) \delta_{A_{i j}, 0}\right]\]

Network science is not the study of graphs in general (that would be graph theory!) as it usually operates in the spare regime where \(\langle k\rangle\) is finite and \(n\to\infty\). So if we view the degree of a vertex \(k\) in such a random graph as being a discrete random variable, then we are justified in asking what is the degree distribution \(p(k)\) for graphs belonging to such ensembles. We will be assuming that we are operating in the limit of large number of vertices so that \(n\to\infty\). If we treat \(p(k)\) as a marginal distribution, then we can find the degree distribution by marginalizing out the random matrix \(\mathbf{A}\) as follows

\[\LARGE p(k) =\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(k,\mathbf{A})=\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(k\mid\mathbf{A})p(\mathbf{A})=\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(k\mid\mathbf{A})p(\mathbf{A}\mid \langle k\rangle)\]

Note that \(p(k\mid\mathbf{A})\) is a parameterized version of the degree distribution to simplify which further we again have to employ the Kronecker delta notation. Then it is not difficult to see that the parameterized degree distribution can be written as follows in terms of the random adjacency matrix.

\[\LARGE p(k \mid \mathbf{A})=\frac{1}{n} \sum\limits_{i=1}^{n} \delta_{k, k_{i}(\mathbf{A})}=\frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ (k- k_{i}(\mathbf{A}))\varphi} \,d\varphi\] \[\LARGE =\frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ k\varphi} \mathrm{e}^{-\iota\ k_{i}(\mathbf{A})\varphi} \,d\varphi\]

But we already saw that \(k_{i}(\mathbf{A})=\sum\limits_{j}A_{i j}\) so that the above expression changes as follows

\[\LARGE =\frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{-\iota\ \varphi\sum\limits_{j}A_{i j}} \,d\varphi\]

Now we use this expression to continue our derivation of the degree distribution as follows

\[\LARGE p(k) =\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} \left[\frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{-\iota\ \varphi\sum\limits_{j}A_{i j}} \,d\varphi\right] p(\mathbf{A}\mid \langle k\rangle)\] \[\LARGE =\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} p(\mathbf{A}\mid \langle k\rangle) \mathrm{e}^{-\iota\ \varphi\sum\limits_{j}A_{i j}} \,d\varphi\]

Again by the property of Kronecker delta, we can see that \(A_{i j} = \sum\limits_{\ell} A_{\ell j}\delta_{i\ell}\) so we have \(\sum\limits_{j}A_{i j} = \sum\limits_{j} \sum\limits_{\ell} A_{\ell j}\delta_{i\ell}\). But if we only sum over the elements in the lower triangular matrix, viz. \(\sum\limits_{j<\ell} A_{\ell j}\delta_{i\ell}\), then we can still recover the entire sum by noting that the missing term \(A_{j \ell}\) is equal to \(A_{\ell j}\) and its corresponding missing term is \(\delta_{ij}\) so when adding the term \(A_{\ell j}\delta_{i\ell}\) we also sneak in \(A_{j\ell}\delta_{ij}=A_{\ell j}\delta_{ij}\). This means that the summand changes from \(A_{\ell j}\delta_{i\ell}\) to \(A_{\ell j}(\delta_{i\ell}+\delta_{ij})\). It is also not difficult to see why the diagonal terms will not get double-counted. Performing this substitution we obtain

\[\LARGE p(k)=\sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} p(\mathbf{A}\mid \langle k\rangle) \mathrm{e}^{-\iota\ \varphi\sum\limits_{j<\ell} A_{\ell j}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(\mathbf{A}\mid \langle k\rangle) \mathrm{e}^{-\iota\ \varphi\sum\limits_{j<\ell} A_{\ell j}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} p(\mathbf{A}\mid \langle k\rangle) \prod\limits_{j<\ell}\mathrm{e}^{-\iota\ \varphi A_{\ell j}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\]

The next step must be evident which will be to replace the parameterized distribution of the adjacency matrices with the full expression we derived earlier. So recall that

\[\LARGE p(\mathbf{A} \mid\langle k\rangle)=\prod\limits_{r < s}\left[\frac{\langle k\rangle}{n-1} \delta_{A_{r s}, 1}+\left(1-\frac{\langle k\rangle}{n-1}\right) \delta_{A_{r s}, 0}\right]\]

and we substitute it as follows

\[\LARGE p(k)= \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} \prod\limits_{r < s}\left[\frac{\langle k\rangle}{n-1} \delta_{A_{r s}, 1}+\left(1-\frac{\langle k\rangle}{n-1}\right) \delta_{A_{r s}, 0}\right] \prod\limits_{j < \ell}\mathrm{e}^{-\iota\ \varphi A_{\ell j}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\]

Let us exploit the symmetry of the adjacency matrix to tweak the term \(A_{\ell j}\) at the end of the above expression.

\[\LARGE =\frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} \prod\limits_{r < s}\left[\frac{\langle k\rangle}{n-1} \delta_{A_{r s}, 1}+\left(1-\frac{\langle k\rangle}{n-1}\right) \delta_{A_{r s}, 0}\right] \prod\limits_{j < \ell}\mathrm{e}^{-\iota\ \varphi A_{j\ell}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\]

Now the two products are over the same index set, so it does not take much ingenuity to first multiply the two multiplicands and then take its product over the same indices

\[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \sum\limits_{\mathbf{A} \in \mathcal{G}^{\ p}} \prod\limits_{j<\ell}\left[\frac{\langle k\rangle}{n-1} \delta_{A_{j \ell}, 1}+\left(1-\frac{\langle k\rangle}{n-1}\right) \delta_{A_{j \ell}, 0}\right] \mathrm{e}^{-\iota\ \varphi A_{j\ell}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\]

Now observe that interchanging the product and the sum will restrict the index set of summation with respect to the scalar term \(A_{j\ell}\) only because the product operation that appears first fixes the values of \(j\) and \(\ell\).

\[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \prod\limits_{j<\ell}\sum\limits_{A_{j\ell}\in \{0,1\}}\left[\frac{\langle k\rangle}{n-1} \delta_{A_{j \ell}, 1}+\left(1-\frac{\langle k\rangle}{n-1}\right) \delta_{A_{j \ell}, 0}\right] \mathrm{e}^{-\iota\ \varphi A_{j\ell}(\delta_{i\ell}+\delta_{ij}) } \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \prod\limits_{j<\ell}\left[\frac{\langle k\rangle}{n-1} \mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) } +\left(1-\frac{\langle k\rangle}{n-1}\right) \right] \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \prod\limits_{j<\ell}\left[1+ \frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1) \right] \,d\varphi\]

Now let \(\Large x= \frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1)\). Since \(n\rightarrow\infty\), then \(x\rightarrow 0\). Recall the Maclaurin series expansion for \(\ln(1+x)\) which gives \(\Large x-\frac{x^2}{2}+\frac{x^3}{3}-\cdots\). Now when \(x\rightarrow 0\), we can comfortably ignore all the higher order terms, so say \(\Large\ln(1+x)=x-\frac{x^2}{2}+\mathcal{O}(x^3)\) or \(\Large 1+x=\mathrm{e}^{x-\frac{x^2}{2}+\mathcal{O}(x^3)}\). For our particular choice of \(x\), it is easy to see that \(x^3=\mathcal{O}(n^{-3})\). Therefore,

\[\LARGE \prod\limits_{j<\ell}\left[1+ \frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1) \right] = \prod\limits_{j<\ell}\mathrm{e}^{\frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{(n-1)^2} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1)^2+\mathcal{O}(n^{-3})}\]

Noting that limit \(\lim\limits_{n \to \infty} (n-1) \approx n\), we have

\[\LARGE =\prod\limits_{j<\ell}\mathrm{e}^{\frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{n^2} (\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1)^2+\mathcal{O}(n^{-3})}\]

Now note that in the indicated index set, \(j\) cannot be equal to \(\ell\). This implies that there are only two possibilities for \(\delta_{i\ell}+\delta_{ij}\). It can be either zero or one. Hence, one can readily check that for such a situation, \(\mathrm{e}^{-\iota\ \varphi (\delta_{i\ell}+\delta_{ij}) }-1=(\delta_{i\ell}+\delta_{ij}) \left[\mathrm{e}^{-\iota\ \varphi }-1\right]\). Using this trick, we can simplify the above expression as follows

Now a Boolean variable squared is the same as the Boolean variable itself. Using this trick, we simplify the above expression further

\[\LARGE = \prod\limits_{j<\ell}\mathrm{e}^{\frac{\langle k\rangle}{n-1} (\delta_{i\ell}+\delta_{ij}) \left[\mathrm{e}^{-\iota\ \varphi }-1\right]-\frac{1}{2}\frac{\langle k\rangle^2}{n^2} (\delta_{i\ell}+\delta_{ij}) \left[\mathrm{e}^{-\iota\ \varphi }-1\right]^2+\mathcal{O}(n^{-3})}\] \[\LARGE = \prod\limits_{j<\ell}\mathrm{e}^{(\delta_{i\ell}+\delta_{ij}) \left[\frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{n^2} (\mathrm{e}^{-\iota\ \varphi }-1)^2\right]+\mathcal{O}(n^{-3})}\] \[\LARGE = \mathrm{e}^{\sum\limits_{j<\ell}(\delta_{i\ell}+\delta_{ij}) \left[\frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{n^2} (\mathrm{e}^{-\iota\ \varphi }-1)^2\right]+\mathcal{O}(n^{-3})}\]

Let us expand the double summation over the Kronecker delta functions so that a more simplified expression can emerge.

\[\LARGE \sum\limits_{j<\ell}(\delta_{i\ell}+\delta_{ij})=\sum\limits_{\ell}\sum\limits_{j<\ell}(\delta_{i\ell}+\delta_{ij})\]

The double summation tells us that the diagonal as well as the upper triangular matrix is to be excluded. If while excluding the diagonal elements we still decide to bring back the upper triangular matrix elements, we note that this will exactly double the original sum because of the symmetry of the expression with respect to \(j\) and \(\ell\). Thus, the above expression gets further simplified as follows

\[\LARGE =\frac{1}{2}\sum\limits_{\ell}\sum\limits_{j\neq\ell}(\delta_{i\ell}+\delta_{ij})=\frac{1}{2}\sum\limits_{\ell}\sum\limits_{j\neq\ell}\delta_{i\ell}+\frac{1}{2}\sum\limits_{\ell}\sum\limits_{j\neq\ell}\delta_{ij}\] \[\LARGE =\frac{1}{2}\sum\limits_{\ell}\delta_{i\ell}\sum\limits_{j\neq\ell}1+\frac{1}{2}\sum\limits_{\ell}1\] \[\LARGE =\frac{1}{2}\sum\limits_{\ell}\delta_{i\ell}(n-1)+\frac{1}{2}n=\frac{2n-1}{2}\]

Making this substitution while noting that \(\lim\limits_{n \to \infty} \frac{2n-1}{2} \approx n\), we get

\[\LARGE p(k) = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{\sum\limits_{j<\ell}(\delta_{i\ell}+\delta_{ij}) \left[\frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{n^2} (\mathrm{e}^{-\iota\ \varphi }-1)^2\right]+\mathcal{O}(n^{-3})} \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{n \left[\frac{\langle k\rangle}{n-1} (\mathrm{e}^{-\iota\ \varphi }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{n^2} (\mathrm{e}^{-\iota\ \varphi }-1)^2\right]+\mathcal{O}(n^{-3})} \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{ \left[\langle k\rangle (\mathrm{e}^{-\iota\ \varphi }-1)-\frac{1}{2}\frac{\langle k\rangle^2}{n} (\mathrm{e}^{-\iota\ \varphi }-1)^2+\mathcal{O}(n^{-3})\right]} \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{\langle k\rangle (\mathrm{e}^{-\iota\ \varphi }-1)+\mathcal{O}(n^{-1})} \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{\langle k\rangle (\mathrm{e}^{-\iota\ \varphi }-1)} \,d\varphi\] \[\LARGE = \frac{1}{n} \sum\limits_{i=1}^{n} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{\langle k\rangle \mathrm{e}^{-\iota\ \varphi }} \mathrm{e}^{-\langle k\rangle} \,d\varphi\] \[\LARGE = \left[\frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{\langle k\rangle \mathrm{e}^{-\iota\ \varphi }} \mathrm{e}^{-\langle k\rangle} \,d\varphi\right] \frac{1}{n} \sum\limits_{i=1}^{n} 1\] \[\LARGE = \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{\langle k\rangle \mathrm{e}^{-\iota\ \varphi }} \mathrm{e}^{-\langle k\rangle} \,d\varphi\] \[\LARGE =\mathrm{e}^{-\langle k\rangle} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \sum\limits_{f \geq 0} \frac{\langle k\rangle^{f}}{f !} \mathrm{e}^{-\iota\ f \varphi} \,d\varphi\]

In the last step, we performed Taylor series expansion of \(\Large \mathrm{e}^{\langle k\rangle \mathrm{e}^{-\iota\ \varphi}}\) and now we can spot the integral representation of the Kronecker delta symbol lurking behind somewhere by uncluttering as follows

\[\LARGE =\mathrm{e}^{-\langle k\rangle} \sum\limits_{f \geq 0} \frac{\langle k\rangle^{f}}{f !} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ \varphi k} \mathrm{e}^{-\iota\ f \varphi} \,d\varphi\] \[\LARGE =\mathrm{e}^{-\langle k\rangle} \sum\limits_{f \geq 0} \frac{\langle k\rangle^{f}}{f !} \frac1{2\pi} \int_0^{2\pi} \mathrm{e}^{\iota\ (k-f) \varphi} \,d\varphi\] \[\LARGE =\mathrm{e}^{-\langle k\rangle} \sum\limits_{f \geq 0} \frac{\langle k\rangle^{f}}{f !} \delta_{kf}\] \[\LARGE =\mathrm{e}^{-\langle k\rangle} \frac{\langle k\rangle^{k}}{k !}\]

Now we can clearly recognize the Poissonian distribution. Could we have arrived at this result in a simpler intuitive style? One way would be to think of all \(n\) nodes in a space with nothing else. If we let \(n\) be large and \(p\) be small as is mostly the case, then by the law of rare events, it is easy to realize why we should have expected the degree distribution to be Poissonian. But admittedly, intuitive arguments tend to be hand-wavy and often lack academic rigor. This then only reaffirms the need for rigorous thinking to avoid irrational arguments and also helps us to see why rational thinking ends up requiring an excessive expenditure of time and energy; therefore, also renders it less frequent inside any plausible biological reality.

Another interesting avenue for exploration is the plethora of generative network models which is beyond the scope of the present article. The configuration model is the first example that comes to mind when trying to generate networks with desired properties. Generative models have limitless applications and can be used as tools to probe the most unrelated problems. Stephen Wolfram has an interesting idea to simulate the universe which I mention later. The generative network models derive such power by dint of being so open-ended. Suppose, for example, in an empty space, node \(\#1\) pops out of nowhere. Then node \(\#2\) appears and performs a coin toss to decide whether it should be joined to node \(\#1\). Then node \(\#3\) appears and the process continues like this up to infinity. Then by the logic of infinity, almost every node will be viewing the process of receiving links from the arriving nodes as identical to each other because almost every node is preceded and succeeded by infinite nodes. Yes, infinite is weird but then it is also theoretically provable to be unreal! Banach–Tarski paradox, anyone? Moreover, it is also possible to bring in the memorylessness property that appears in elementary queuing theory. And just like that, we may continue going in very unexpected directions.

Epiloguoid

Finally, having only sampled some preliminary theories, we are already obtaining a sense that graph theory is more like mathematical novel writing whereas the network science approach is reminiscent of statistical mechanics. Especially if one has ever done an introductory course in statmech, then this is obvious. Also then it is not difficult to see that graph theory is trying to do mathematics whereas network science is trying to do physics. Most importantly, everything in graph theory is extremely precise, meticulous and also appears to be self-referential in many places if understood and assimilated very well. Its vast landscape is invariably imbued with a sense of cold formal logic carrying with it all the hallmarks of conceptual abstract mathematics. Much of this is at least partly missing in network science since approximations abound as they must because network science is trying to reconcile probability and graph theory with real-world phenomena. So even though it does not appear so at first glance, network science is much more difficult than graph theory in terms of the ambitiousness of the end goal desired.

Although we invested heavily in discussing the Erdős-Rényi random graph model, here comes the bad news. Most real-world degree distributions are fat-tailed, meaning their tails are fatter than that of the normal distribution (Wikipedia endorses the term \(``\)heavy-tailed\("\) but at this point, we are just quibbling over semantics). Though the log-normal distribution is considered the archetype of the heavy-tailed probability distribution, it turns out that yet another distribution steals much of the attention of the theoreticians. To be more precise, folklore wisdom suggests that most real-world degree distributions are scale-free (though it must be noted that this point is still heavily debated). The Pareto distribution symbolizing such scale-free networks is the prime example of long-tailed distributions.

A preferential attachment based generative process which formalizes the \(``\)rich get richer\("\) phenomena has been shown to produce scale-free networks. This is where we get introduced to Barabási–Albert network model. This model has got a lot of attention in the research community and may be regarded as single-handedly establishing the new field of network science even though it is clear that the foundations lie elsewhere. At first glance, the foundations appear to lie in the physics of spin glasses or maybe even in the nonlinear dynamical systems based physics theory given by Yoshiki Kuramoto.

Against the backdrop of all this, it is clear that the Erdős–Rényi model gets pretty redundant very fast, but even then there is another random graph model called the Watts–Strogatz model that is still being heavily investigated and with solid reason. Six degrees of social connectivity is what exemplifies the wisdom of the crowd and these phenomena can be modeled using such small-world networks. The point to note is that neither scale-free nor small-world behavior retains the last word on explaining complexity as a network phenomenon.

Complexity studies aim for much broader and far-reaching questions. How do properties emerge in a system that cannot be explained as the sum of its parts? How can local interactions give rise to global order spontaneously such that it is also sustained over a significant period of time? How can systems be attracted towards a critical point purely on the basis of simple interaction rules, and that too over a wide range of possible parameter values? How can standard diffusion processes and generative network models cause a sudden transition that is significantly perceivable at some macrolevel? How can interactions lead all the interacting systems into getting synchronized to each other in different ways? How a whole host of inputs to a complex system can be managed efficiently to produce a rich variety of outputs using far fewer number of controlling protocols, for example, are only \(\sim 1,500\) transcription factors able to play a bow tie role in regulating \(20,000-25,000\) genes in the human genome, and if so how? To sum it all up, for complexity studies, network science is a tool and the future of Complexity and Network Science (CNS) looks quite promising. It is difficult to give a clear-cut definition for a complex system. A makeshift definition would be the human body. Only systems sharing some indubitable characteristics with the human body should be called complex systems. But then, would we call spin glass a complex system? We could but it would be more appropriate to call them networked systems showing a significant degree of complexity, a measure that must be defined by keeping the human body as a reference.

Network science is essentially adopting a physicist's mindset to analyze complex adaptive systems. The network approach took off in the STEM community only after its success in condensed matter physics with the Ising models so its origins in physics cannot be ignored even when taken up by researchers and practitioners in more applied engineering and technology domains. In fact, the physics connection may be an answer to the physics envy of anyone trying to get into this field. The power of network science is also being harnessed particularly well by the neuroscience community to understand how the brain works. Network Neuroscience is a new MIT Press open access journal and the work they have published in only the last five years points to having found the key to understanding different aspects of the connectome. But early days!

Approaching towards the end, it is apt to take a brief note of the works of three towering individuals. Stephen Grossberg made seminal contributions⁵ to understanding the brain and mind. Grossberg modeled the brain as a network of neurons linked by the white matter, and then went on to develop quite sophisticated models to explain questions in cognitive science with the help of experimental data. There was a 2010 paper in which this year's Nobel laureate Giorgio Parisi was involved where they tried to reason out how scale-free network effects explain starling murmurations. Parisi had also pointed out elsewhere that non-equilibrium thermodynamics plays a strong role in understanding complex systems. Many theoretical biologists today vigorously seek to pursue this direction. Could that which Parisi views as complex systems (spin glasses, fragile glasses, colloids, granular materials) also be used to study society and large-scale public systems? Stephen Wolfram is also planning to apply the cellular automaton approach⁶ for understanding the entire physics of the universe. What a grand goal! Will it be possible to start with a finite number of space and time quanta interacting with each other (add a dash of energy quanta, whatever that may be) and create all the complexity that we know to have emerged several billion years later? If Team Wolfram succeeds in doing this, then the biggest Physics revolution will have taken place. One wonders what if the visions of these three Greats reach their climax (the E-L summit if you will). It is indeed an exciting time in the world of science right now! May netsci remain the gift to the STEM community that keeps on giving and may this STEM-tree never run out of low-hanging fruits! The only worthwhile caution to be offered here is that just because the golden spiral started appearing everywhere, thinking about it did not produce the whole story; turned out to be shadows in Plato's cave. As these popular sci-fi lines go, \(``\)the truth of our world is that it cannot be easily summed up with mathematics since there is no simple pattern. It is complex and chaotic! If one wants to find a mathematical pattern in the world he or she will be able to pull it out of anywhere. When your mind becomes obsessed with anything it will filter everything else out and find examples of that thing everywhere; you will find it everywhere in nature (similar thing has happened with Poissonian and Gaussian distributions). But as soon as you discard scientific rigor you are no longer a mathematician.\("\) This may not neccesarily hold for network science but no harm in pointing it out. Let us end by quoting Buckminster Fuller's words from his outdated yet amazingly insightful book: \(``\)Critical Path.\("\)

\(``\)Stars are trying to tell humanity to awake and prosper and consciously assume the important cosmic responsibilities for which it was designed. Since realization and fulfillment of that responsibility involve evolutionary discovery by humanity of cosmic stature of mind and the inconsequentiality of its muscle, planting of humans on Earth may not bear fruit. So poor is the probability of self-discovery by humans of the infinite potential of the mind and the relative triviality of human muscle power (which is not even as capable as a grasshopper's) Great Nature must have planted a myriad of human-function-equivalent seedlings on a myriad of planets.\("\)

References

Jay Yellen, Jonathan L. Gross, and Mark Anderson. Graph Theory and Its Applications. Chapman and Hall/CRC, 2018.(back)
Douglas West. Introduction to Graph Theory. Prentice hall, 2001.(back)
Henri Poincaré. Science and Hypothesis. Science Press, 1905.(back)
Ton Coolen, Alessia Annibale, and Ekaterina Roberts. Generating Random Networks and Graphs. Oxford university press, 2017.(back)
For example, one can sample this and this.(back)
For example, one can sample his three (and hopefully more) appearances in Lex Fridman's podcast. Wolfram (and quite possibly Strogatz) has to be to young STEM enthusiasts today what Steve Jobs and Bill Gates were to IT geeks back then.(back)