NetBunch/doc/fitmle.1

.\" generated with Ronn/v0.7.3
.\" http://github.com/rtomayko/ronn/tree/0.7.3
.
.TH "FITMLE" "1" "September 2017" "www.complex-networks.net" "www.complex-networks.net"
.
.SH "NAME"
\fBfitmle\fR \- Fit a set of values with a power\-law distribution
.
.SH "SYNOPSIS"
\fBfitmle\fR \fIdata_in\fR [\fItol\fR [TEST [\fInum_test\fR]]]
.
.SH "DESCRIPTION"
\fBfitmle\fR fits the data points contained in the file \fIdata_in\fR with a power\-law function P(k) ~ k, using the Maximum\-Likelihood Estimator (MLE)\. In particular, \fBfitmle\fR finds the exponent \fBgamma\fR and the minimum of the values provided on input for which the power\-law behaviour holds\. The second (optional) argument \fItol\fR sets the acceptable statistical error on the estimate of the exponent\.
.
.P
If \fBTEST\fR is provided, the program associates a p\-value to the goodness of the fit, based on the Kolmogorov\-Smirnov statistics computed on \fInum_test\fR sampled distributions from the theoretical power\-law function\. If \fInum_test\fR is not provided, the test is based on 100 sampled distributions\.
.
.SH "PARAMETERS"
.
.TP
\fIdata_in\fR
Set of values to fit\. If is equal to \fB\-\fR (dash), read the set from STDIN\.
.
.TP
\fItol\fR
The acceptable statistical error on the estimation of the exponent\. If omitted, it is set to 0\.1\.
.
.TP
TEST
If the third parameter is \fBTEST\fR, the program computes an estimate of the p\-value associated to the best\-fit, based on \fInum_test\fR synthetic samples of the same size of the input set\.
.
.TP
\fInum_test\fR
Number of synthetic samples to use for the estimation of the p\-value of the best fit\.
.
.SH "OUTPUT"
If \fBfitmle\fR is given less than three parameters (i\.e\., if \fBTEST\fR is not specified), the output is a line in the format:
.
.IP "" 4
.
.nf

    gamma k_min ks
.
.fi
.
.IP "" 0
.
.P
where \fBgamma\fR is the estimate for the exponent, \fBk_min\fR is the smallest of the input values for which the power\-law behaviour holds, and \fBks\fR is the value of the Kolmogorov\-Smirnov statistics of the best\-fit\.
.
.P
If \fBTEST\fR is specified, the output line contains also the estimate of the p\-value of the best fit:
.
.IP "" 4
.
.nf

    gamma k_min ks p\-value
.
.fi
.
.IP "" 0
.
.P
where \fBp\-value\fR is based on \fInum_test\fR samples (or just 100, if \fInum_test\fR is not specified) of the same size of the input, obtained from the theoretical power\-law function computed as a best fit\.
.
.SH "EXAMPLES"
Let us assume that the file \fBAS\-20010316\.net_degs\fR contains the degree sequence of the data set \fBAS\-20010316\.net\fR (the graph of the Internet at the AS level in March 2001)\. The exponent of the best\-fit power\-law distribution can be obtained by using:
.
.IP "" 4
.
.nf

    $ fitmle AS\-20010316\.net_degs
    Using discrete fit
    2\.06165 6 0\.031626 0\.17
    $
.
.fi
.
.IP "" 0
.
.P
where \fB2\.06165\fR is the estimated value of the exponent \fBgamma\fR, \fB6\fR is the minimum degree value for which the power\-law behaviour holds, and \fB0\.031626\fR is the value of the Kolmogorov\-Smirnov statistics of the best\-fit\. The program is also telling us that it decided to use the discrete fitting procedure, since all the values in \fBAS\-20010316\.net_degs\fR are integers\. The latter information is printed to STDERR\.
.
.P
It is possible to compute the p\-value of the estimate by running:
.
.IP "" 4
.
.nf

    $ fitmle AS\-20010316\.net_degs 0\.1 TEST
    Using discrete fit
    2\.06165 6 0\.031626 0\.17
    $
.
.fi
.
.IP "" 0
.
.P
which provides a p\-value equal to 0\.17, meaning that 17% of the synthetic samples showed a value of the KS statistics larger than that of the best\-fit\. The estimation of the p\-value here is based on 100 synthetic samples, since \fInum_test\fR was not provided\. If we allow a slightly larger value of the statistical error on the estimate of the exponent \fBgamma\fR, we obtain different values of \fBgamma\fR and \fBk_min\fR, and a much higher p\-value:
.
.IP "" 4
.
.nf

    $ fitmle AS\-20010316\.net_degs 0\.15 TEST 1000
    Using discrete fit
    2\.0585 19 0\.0253754 0\.924
    $
.
.fi
.
.IP "" 0
.
.P
Notice that in this case, the p\-value of the estimate is equal to 0\.924, and is based on 1000 synthetic samples\.
.
.SH "SEE ALSO"
deg_seq(1), power_law(1)
.
.SH "REFERENCES"
.
.IP "\(bu" 4
A\. Clauset, C\. R\. Shalizi, and M\. E\. J\. Newman\. "Power\-law distributions in empirical data"\. SIAM Rev\. 51, (2007), 661\-703\.
.
.IP "\(bu" 4
V\. Latora, V\. Nicosia, G\. Russo, "Complex Networks: Principles, Methods and Applications", Chapter 5, Cambridge University Press (2017)
.
.IP "" 0
.
.SH "AUTHORS"
(c) Vincenzo \'KatolaZ\' Nicosia 2009\-2017 \fB<v\.nicosia@qmul\.ac\.uk>\fR\.