The n-tangle method. (A) We randomly sample connected induced subgraphs of n nodes and calculate their normalized link density. (B) We construct the n-tangle histogram for a given value of n (the example shows the 8-tangle distribution for 4 animal social networks). (C) We calculate the distance between any two distributions I and J through, e.g., a Kolmogorov-Smirnov statistic D. These D values are used as the distance between the original networks and can be mapped to a minimum spanning tree (shown here for the four networks), a hierarchical tree, or a threshold-based network. (D) Variation of the distance between these 4 networks as a function of the subgraph size n.
To overcome this problem, we recently proposed the n-tangle (Topological Analysis of Network subGraph Link/Edge) density method. In practice, we relax the requirement of the motif/graphlet exact matching pattern. We still consider all possible induced subgraphs with n nodes, but instead of matching the pattern we calculate the link density in the subgraph. Consider for example an induced subgraph of 100 nodes. All patterns that include e.g. 1000 links are classified by n-tangle as being the same. In general, and for most practical applications, it makes little difference how these 1000 links are distributed in the 100 nodes, for a large number of properties. However, now we have one numerical index to characterize the structure of n nodes (n-tangle density).
To directly compare the structure of two networks, we calculate the distribution of n-tangle density in each structure. This distribution represents the fingerprint of the structure at this scale. To determine the degree of similarity we need to calculate this distribution for each of the network and compare these fingerprints, e.g. through a Kolmogorov-Smirnov index D(n). If the distributions are the same, then this index is D(n)=0. On the contrary if the distributions have no overlap, then D(n)=1, and intermediate values of D(n) determine the degree of network similarity. We can then calculate this index for different values of n, and find how the scale of observation influences the similarity. Two networks may be similar at small scales but different at larger scales, which means that, for instance, diffusion at short scales would look similar but we would start see deviations as the scale increases.
Notice that this measure is independent of the network size, as long as n remains much smaller than the network size. Therefore, we can compare two networks without the requirement of having the same or even similar size.
Networks of similar origin are detected to be closer to each other by the n-tangle method.
In our paper Revealing effective classifiers through network comparison we found that the n-tangle method works very well in all cases that we studied. The method nicely separated networks according to their origin, so that two networks representing metabolic networks were much closer to each other rather than with facebook friendship networks. More intriguingly, within the same family of networks we were able to discover consistent trends, providing external or internal factors that can be further evaluated on their importance. For example, we found that University friendship networks were much closer to each other if the two Universities were of similar size, compared to Universities of different size.