parent
f72ea48892
commit
13da475a0a
Binary file not shown.
@ -0,0 +1,242 @@ |
||||
\documentclass[reprint,onecolumn,aip,byrevtex,showpacs]{revtex4-1} |
||||
%\documentclass[aps,pre,onecolumn,byrevtex,footinbib,longbibliography,bibnotes,superscriptaddress,showpacs]{revtex4-1} |
||||
%\documentclass[11pt,aps,pre,a4paper,byrevtex,showpacs,showkeys,longbibliography,notitlepage,nofootinbib]{revtex4-1} |
||||
|
||||
\usepackage{booktabs} |
||||
\usepackage{boldline} |
||||
\usepackage{caption} |
||||
\usepackage{grffile} |
||||
\usepackage{verbatim} |
||||
\usepackage{microtype} |
||||
\usepackage{multirow} |
||||
%\usepackage{listings} |
||||
\usepackage{enumitem} |
||||
\usepackage{amsmath} |
||||
\usepackage{amssymb} |
||||
\usepackage{amsthm} |
||||
\usepackage{mathtools} |
||||
\usepackage{graphicx} |
||||
\usepackage{subfig} |
||||
%\usepackage{times}\let\Bbbk\relax |
||||
%\usepackage{mtpro2} |
||||
\usepackage{color} |
||||
\definecolor{myblue}{rgb}{0.153,0.322,0.706} |
||||
\usepackage[colorlinks,linkcolor=myblue,urlcolor=myblue,citecolor=myblue]{hyperref} |
||||
\usepackage{geometry} |
||||
%\geometry{a4paper, total={170mm,257mm}, left=20mm, top=10mm, right=20mm, bottom=10mm } |
||||
\usepackage[export]{adjustbox} |
||||
\usepackage{lmodern}% http://ctan.org/pkg/lm |
||||
|
||||
|
||||
\setlength{\parskip}{0pt} |
||||
\renewcommand{\baselinestretch}{1.0} |
||||
|
||||
\newcommand{\be}{\begin{equation}} |
||||
\newcommand{\ee}{\end{equation}} |
||||
\newcommand{\ra}{\rightarrow} |
||||
\newcommand{\mD}{\mathcal{D}} |
||||
\newcommand{\mL}{\mathcal{L}} |
||||
\newcommand{\om}{\omega} |
||||
%\newcommand{\ave}[1]{E[#1]} |
||||
\newcommand{\ave}[1]{\langle #1\rangle} |
||||
\newcommand{\reals}{\mathbb{R}} |
||||
\newcommand{\ER}{Erd\"os-R\'enyi~} |
||||
\def\bc{\begin{center}} |
||||
\def\ec{\end{center}} |
||||
\def\bea{\begin{eqnarray}} |
||||
\def\eea{\end{eqnarray}} |
||||
\newcommand{\avg}[1]{\langle{#1}\rangle} |
||||
\newcommand{\Avg}[1]{\left\langle{#1}\right\rangle} |
||||
\newcommand{\ars}[1]{\renewcommand{\arraystretch}{#1}} |
||||
\newcommand{\latinword}[1]{\textsf{\itshape #1}}% |
||||
|
||||
\graphicspath{{C:/Users/fcoghi/Documents/LTCC/Algorithms on graphs/Project/Results latex/Plots/}} |
||||
\captionsetup{font=scriptsize,labelfont=scriptsize} |
||||
|
||||
|
||||
|
||||
% RevTeX 4.1: * in front of citation keys to merge them |
||||
|
||||
\begin{document} |
||||
|
||||
\title{Small subgraphs: node clustering coefficient and reciprocated arcs} |
||||
|
||||
\author{Francesco Coghi} |
||||
\email{f.coghi@qmul.ac.uk} |
||||
\affiliation{School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK} |
||||
|
||||
\date{\today} |
||||
|
||||
\begin{abstract} |
||||
|
||||
\end{abstract} |
||||
|
||||
|
||||
\maketitle |
||||
|
||||
\tableofcontents |
||||
|
||||
\newpage |
||||
|
||||
\section{Node clustering coefficient on undirected random regular graphs} |
||||
|
||||
\subsection{Background} |
||||
|
||||
Here we focus on the presence of triangles in undirected random regular graphs $G = (V,E)$. There are two alternative measures two quantify the presence of triangles in a graph (cite book Vito Vincenzo). |
||||
|
||||
The first measure is called \textbf{graph transitivity} $T$ and it is defined as follows: |
||||
|
||||
\be |
||||
T = \frac{3(\text{number of triangles})}{(\text{number of triads})} = \frac{3 n_{\bigtriangleup}}{n_{\wedge}} |
||||
\ee |
||||
|
||||
where $n_{\bigtriangleup}$ is the number of triangles and $n_{\wedge}$ is the number of triads (trees of three nodes) in the graph. |
||||
|
||||
The second measure is the \textbf{clustering coefficient} and it averages local information coming from the node clustering coefficients |
||||
|
||||
\be |
||||
c_{i} = \left\{ |
||||
\begin{array}{@{}ll@{}} |
||||
\frac{K[G_i]}{{k_i \choose 2}} & \text{for}\ k_i \geq 2 \\ |
||||
0 & \text{for}\ k_i = 0,1 |
||||
\end{array}\right. |
||||
\label{for:nod_clust_coeff} |
||||
\ee |
||||
|
||||
where $K[G_i]$ represents the number of links, and ${k_i \choose 2}$ its theoretical maximum, in $G_i$ the subgraph induced by the neighbours of $i$. From this, the graph clustering coefficient $C$, is obtained just by averaging over all the nodes as follows: |
||||
|
||||
\be |
||||
C = \left\langle c \right\rangle = \frac{1}{N} \sum_{i \in V} c_i . |
||||
\label{for:clust_coeff} |
||||
\ee |
||||
|
||||
$C$ and $T$ are in principle similar measures, nevertheless on several graphs they could give quite different results. This is due to the fact that |
||||
|
||||
\be |
||||
T = \frac{\sum_i n_i^{\wedge} c_i}{\sum_i n_i^{\wedge}} , |
||||
\ee |
||||
|
||||
and so, in words, while the graph clustering coefficient $C$ is the arithmetic average of the clustering coefficients of all the nodes, the transitivity $T$ is the average of the clustering coefficients weighted by the number of triads centred on each node $i$. |
||||
|
||||
In the following we will focus on the clustering coefficient $C$. |
||||
|
||||
\subsection{Calculation of the clustering coefficient for some graphs} |
||||
|
||||
\subsubsection{\ER random graphs} |
||||
|
||||
Let us consider a graph picked up from the canonical ensemble $G(p,N)$ composed by graphs with a fixed number of nodes $N$ and a probability $p$ to have an edge between two nodes. We want to compute the expected clustering coefficient $E[C]$ of such a graph (averaging over the ensemble) according to \ref{for:clust_coeff}. We can consider a node $i$ with degree $k$, its subgraph will contain in average $k[G_i] = p {k \choose 2}$ edges. Hence, $c_i = p$, and also the clustering coefficient is |
||||
|
||||
\be |
||||
C=p . |
||||
\label{for:clust_ER} |
||||
\ee |
||||
\subsubsection{Random regular graphs} |
||||
|
||||
Let us consider a random d-regular graphs built according to the configuration model (cite bollobas paper). We want to calculate the expected clustering coefficient $E[C]$ of such a graph (averaging over the ensemble). We are interested in |
||||
|
||||
\be |
||||
E[C] = \left\langle E[c] \right\rangle = \frac{1}{N} \sum_{i \in V} E[c_i] , |
||||
\ee |
||||
|
||||
so the main problem is calculating $E[c_i] = \frac{E[K(G_i)]}{{d \choose 2}}$ |
||||
|
||||
\begin{equation} |
||||
\begin{split} |
||||
E[K(G_i)] &= \sum_{\left\langle jk \right\rangle} P(e_{jk} \in E, j \in \mathcal{N}(i), k \in \mathcal{N}(i), j \neq k) \\ |
||||
& = \frac{1}{2} \sum_{j \in V \setminus i} P(j \in \mathcal{N}(i)) \sum_{k \in V \setminus i} P(k \in \mathcal{N}(i), k \neq j) P(e_{jk} \in E | j \in \mathcal{N}(i), k \in \mathcal{N}(i)) \\ |
||||
& = \frac{1}{2} \sum_{j \in V \setminus i} P(j \in \mathcal{N}(i)) \sum_{k \in V \setminus i} P(k \in \mathcal{N}(i)| k \neq j) P(k \neq j) P(e_{jk} \in E | j \in \mathcal{N}(i), k \in \mathcal{N}(i)) \\ |
||||
& = \frac{1}{2} \sum_{j \in V \setminus i} \frac{d}{N-1} \sum_{k \in V \setminus i} \frac{N-2}{N-1} \frac{d-1}{N-2} \frac{d-1}{N-1}\\ |
||||
& = \frac{1}{2} \frac{d(d-1)^2}{N-1} |
||||
\end{split} |
||||
\end{equation} |
||||
|
||||
where in the first line we consider summing over all the links $\left\langle jk \right\rangle$ and in the following we rewrite the sum as considering to sum over the nodes instead of the links (easier). Finally we get |
||||
|
||||
\be |
||||
E[C] = \frac{E[K(G_i)]}{{d \choose 2}} = \frac{d-1}{N-1} |
||||
\ee |
||||
|
||||
\subsection{Numerical simulations to confirm the results} |
||||
|
||||
In the following we refer to the code contained in the file \underline{Clustering th vs num.ipynb}. |
||||
|
||||
\subsubsection{\ER random graphs} |
||||
|
||||
We consider \ER random graphs extracted from the usual canonical ensemble. We implemented the naive code in order to pick-up a graph from such an ensemble. The time complexity of the code is of $O(N^2)$ as we can immediately see from the function \latinword{def ERG(p,N)}. |
||||
|
||||
We run simulations for 6 different graph dimensions ($N=10,50,100,500,1000,5000$) with average degree $d=4$ and for each dimension we average the clustering coefficient $C$ over $n = 10^2$ different graphs. We plot these values in fig. .... with their statistical errors (standard deviations from the mean value). On the same fig. .... we plot theoretical results we got from the expression in \ref{for:clust_ER}. |
||||
We get the clustering coefficient out from the function \latinword{def C numerical(edge list)}. The time complexity of such a function is of $O(Nd(d-1)) = O(2K_tot(d-1))$ where $K_tot$ is the total number of edges in the graph. |
||||
|
||||
As we can see there is good agreement between theory and numerical simulations. Nevertheless the theory seems always to over-estimate the true value (why?). |
||||
|
||||
\subsubsection{Random regular graphs} |
||||
|
||||
Here we consider random d-regular graphs extracted from the configuration model (cite bollobas). |
||||
The function we wrote to pick them up is \latinword{def RRG direct(d,N)}. The algorithm follows this sequence of steps: |
||||
|
||||
\begin{enumerate}[label=(\roman*)] |
||||
\item we begin with a set of $N$ nodes; |
||||
\item we create a set of $Nd$ points and we distribute them across $N$ buckets, such that each bucket contains $d$ points; |
||||
\item we take randomly a point and we pair it randomly with another one, until $\frac{Nd}{2}$ pairs are obtained; |
||||
\item we collapse the points, so that each bucket maps onto a single node of the former graph. We consider all edges between points as the edges of the corresponding nodes. |
||||
\item we check if the resulting graph is simple (no loops and multi-edges). If it is not we restart. |
||||
\end{enumerate} |
||||
|
||||
The time complexity of the algorithm to create the graph is of $O(N^2)$ lead by (iv) and to check if it is good (v) it is $O(Nd^2)$. |
||||
|
||||
Even here we run simulations for 6 different graph dimensions ($N=10,50,100,500,1000,5000$) with degree $d=4$ and for each dimension we average the clustering coefficient $C$ over $n = 10^2$ different graphs. We plot these values in fig. .... with their statistical errors (standard deviations from the mean value). On the same fig. .... we plot theoretical results we got from the expression in \ref{for:clust_ER}. |
||||
|
||||
As we can see there is good agreement between theory and numerical simulations. Nevertheless the theory seems always to over-estimate the true value (why?). |
||||
|
||||
%\begin{figure}[h!] |
||||
%\centering |
||||
%\includegraphics[scale=0.5]{Imp_nodes_UA} |
||||
%\label{} |
||||
%\caption{We report a study about optimal percolation on the United-American airline duplex ($N=73$). The figure (a) highlights nodes that at $p_c=0.4$ are important to be safeguarded from the initial damage in order to maintain an extensive MCGC in the duplex. The measure we consider to define such nodes is those discussed in the text $\Delta s(i)$. Figure (b) and (c) analyse variations in the MCGC size distribution $\pi (R)$ in case (b) we sequentially knock-out the important nodes spotted in (a), and, on the other way around, in case (c) we safeguard them from the initial damage.} |
||||
%\label{fig:Imp_nodes_UA} |
||||
%\end{figure} |
||||
|
||||
\section{Reciprocated arcs in directed regular random graphs} |
||||
|
||||
In a directed graph $G(V,E)$ we say that the connection between node $i$ and $j$ is reciprocated if both $(i,j) \in E$ and $(j,i) \in E$. |
||||
|
||||
\subsection{Calculation of the expected fraction of reciprocated arcs in a directed random regular graph} |
||||
|
||||
We consider directed random d-regular graphs $G(V,E)$ built according to the configuration model (a variant of the configuration model presented in the previous paragraph). |
||||
|
||||
We are interested in finding an expression for the expected fraction of reciprocated arcs $r$ in a graph picked up from such an ensemble. |
||||
|
||||
The maximum number of reciprocated arcs is $R_{max} = \frac{dN}{2}$ (i.e. the number of edges on an undirected random d-regular graph). We define $r$ as follows: |
||||
|
||||
\be |
||||
r = \left\langle \frac{R}{R_{max}} \right\rangle |
||||
\ee |
||||
|
||||
where $R$ refers to the number of reciprocated arcs on a directed random d-regular graph and $\left\langle \right\rangle$ refers to the average over the ensemble. |
||||
|
||||
Finding $r$ is then translated in calculating $\left\langle R \right\rangle$, we do it in the following way: |
||||
|
||||
\begin{equation} |
||||
\begin{split} |
||||
\left\langle R \right\rangle & = \frac{1}{2} \sum_{i \in V} \sum_{j \in V \setminus i} P(i \ra j, j \ra i) \\ |
||||
& = \frac{1}{2} \sum_{i \in V} \sum_{j \in V \setminus i} P(i \ra j) P(j \ra i) \\ |
||||
& = \frac{1}{2} \sum_{i \in V} \sum_{j \in V \setminus i} \frac{d}{N-1} \frac{d}{N-1} \\ |
||||
& = \frac{1}{2} N (N-1) \frac{d^2}{(N-1)^2} \\ |
||||
& = \frac{Nd^2}{2(N-1)} , |
||||
\end{split} |
||||
\end{equation} |
||||
|
||||
where we exploited the fact that edges are independent random variables $P(i \ra j, j \ra i) = P(i \ra j) P(j \ra i)$ and identically distributed too $P(i \ra j) = P(j \ra i) = \frac{d}{N-1}$. |
||||
|
||||
We finally get that the expected fraction of reciprocated edges is |
||||
|
||||
\be |
||||
r = \left\langle \frac{R}{R_{max}} \right\rangle = \frac{d}{N-1} |
||||
\ee |
||||
|
||||
\subsection{Numerical simulations to confirm the results} |
||||
|
||||
\bibliography{mybib}{} |
||||
\bibliographystyle{plain} |
||||
|
||||
\end{document} |
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
Loading…
Reference in new issue