Actions

Difference between revisions of "Networks Basic Statistics"

From OPOSSEM

(Created page with "<!-- add any hidden notes here --> =Objectives= * Compute the degree distribution for undirected and directed networks * Quantify the local density of connectivity in the netwo...")
 
(Degree Distribution)
Line 10: Line 10:
 
Fundamentally, a network is a body of data.  As with other data sets, our first task it to identify significant features of that data.  In much the same way as the mean and standard deviation are easy to compute but very valuable summary statistics, degree, clustering, and motif statistics are easy to compute attributes that can reveal a great deal about the structure of a network.  We discuss these different basic statistics here.
 
Fundamentally, a network is a body of data.  As with other data sets, our first task it to identify significant features of that data.  In much the same way as the mean and standard deviation are easy to compute but very valuable summary statistics, degree, clustering, and motif statistics are easy to compute attributes that can reveal a great deal about the structure of a network.  We discuss these different basic statistics here.
  
=Degree Distribution=
+
=Degree-based Statistics=
 +
Given that networks are composed of nodes with relationships, an obvious first question to ask is how these relationships are distributed among the nodes in the network.  The number of edges that are incident to a node is called the node's ''degree''.  In this section we discuss some ways in which the degree of nodes can help us understand the structure of the network and the mechanisms that formed it.
  
 +
==Degree Density==
 +
A very easy-to-compute attribute we can consider is the average degree of nodes in the network: how many edges does each node have?  This can be calculated using the ratio <math>\frac{2|E|}{|V|}</math> which indicates the number of edges on average that are incident to a node.
  
==Out and In Degrees==
+
==Degree Distribution==
 +
The problem with using average degree is that, as with all statistical distributions, it can be produced by a wide array of distributions.  In fact, the distribution of edges can tell us much more than just the average about the distribution of power and resources within the network.  Consider, as an example, the campaign contribution network discussed earlier.  A network in which all candidates have similar number of contributions (edges) would indicate a very different world from a network in which most contributions (edges) go to a handful of candidates with most receiving very few.  Both could have the same average degree, but the former network would indicate that resources are fairly distributed among candidates; the latter would suggest that a handful of candidate receive a disproportionate number of resources.  Determining what situation exists in the real world would help us in formulating hypotheses about how the campaign system works.
  
==Scale-free Distributions==
+
In order to study this distribution, we could look at the ''degree distribution''.  The degree distribution, <math>P(k)</math> is the fraction of nodes with degree <math>k</math>.  Often the cumulative degree distribution is also considered, which is the fraction of nodes with degree less than or equal to <math>k</math>.
  
 +
===In and Out Degrees===
 +
 +
===Scale-free Distributions===
  
 
=Clustering Coefficient=
 
=Clustering Coefficient=

Revision as of 12:55, 7 July 2011


Objectives

  • Compute the degree distribution for undirected and directed networks
  • Quantify the local density of connectivity in the network
  • Understand why motif distributions must be estimated in large networks

Introduction

Fundamentally, a network is a body of data. As with other data sets, our first task it to identify significant features of that data. In much the same way as the mean and standard deviation are easy to compute but very valuable summary statistics, degree, clustering, and motif statistics are easy to compute attributes that can reveal a great deal about the structure of a network. We discuss these different basic statistics here.

Degree-based Statistics

Given that networks are composed of nodes with relationships, an obvious first question to ask is how these relationships are distributed among the nodes in the network. The number of edges that are incident to a node is called the node's degree. In this section we discuss some ways in which the degree of nodes can help us understand the structure of the network and the mechanisms that formed it.

Degree Density

A very easy-to-compute attribute we can consider is the average degree of nodes in the network: how many edges does each node have? This can be calculated using the ratio <math>\frac{2|E|}{|V|}</math> which indicates the number of edges on average that are incident to a node.

Degree Distribution

The problem with using average degree is that, as with all statistical distributions, it can be produced by a wide array of distributions. In fact, the distribution of edges can tell us much more than just the average about the distribution of power and resources within the network. Consider, as an example, the campaign contribution network discussed earlier. A network in which all candidates have similar number of contributions (edges) would indicate a very different world from a network in which most contributions (edges) go to a handful of candidates with most receiving very few. Both could have the same average degree, but the former network would indicate that resources are fairly distributed among candidates; the latter would suggest that a handful of candidate receive a disproportionate number of resources. Determining what situation exists in the real world would help us in formulating hypotheses about how the campaign system works.

In order to study this distribution, we could look at the degree distribution. The degree distribution, <math>P(k)</math> is the fraction of nodes with degree <math>k</math>. Often the cumulative degree distribution is also considered, which is the fraction of nodes with degree less than or equal to <math>k</math>.

In and Out Degrees

Scale-free Distributions

Clustering Coefficient

Transitive Triples

Motif Distributions

Conclusion

References

<references group=""></references>

Discussion questions

Problems

Glossary

  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]