The code is written in simple C and should compile easily under linux with gcc:
gcc ntangle.c -o ntangle
When you execute the program it expects to find four arguments in the command line, such as
./ntangle file1 file2 runs max_n
1) file1 and file2 are the names of the two files that contain the two networks to be compared. These have to be ascii files, where the first line is a single number N that contains the number of nodes in the network. The other lines contain the links of the network in the form A B, where A and B are the nodes in this link. Note that this code only works with undirected networks, so that if you have A B in the file you should not have B A, too. Also, the nodes need to be numbered consecutively from 0 to N-1 and form one connected cluster.
2) The argument runs indicates how many samples will be considered for each distribution. In general, a value of 10000 gives very good results, but this depends on the network size too, of course.
3) The final argument, max_n, indicates the maximum subgraph size that we want to compare. This is done in logarithmic fashion starting from n=3, i.e. if max_n=40, then the KS index will be calculated for n=3,4,5,6,7,8,9,10,20,30,40. If the max_n value is very large, e.g. max_n>10000, the program may get slower, but again this depends on the network size/density.
The results are written in the standard output, typically the screen. To redirect the output to a file, please use :
./ntangle file1 file2 10000 100 > results.dat
The results have two columns, n and KS(n). The first column, n, is the subgraph size, and the second column shows the KS index of the two networks for this subgraph size (where KS=0 indicates identical networks and KS=1 completely different networks).
For any questions, please email me at lgallos@gmail.com, and if you use the code please cite:
L.K. Gallos and N.H. Fefferman, "Revealing effective classifiers through network comparison", EPL 108, 38001 (2014).