# Objectives

• Compute the degree distribution for undirected and directed networks
• Quantify the local density of connectivity in the network
• Understand why motif distributions must be estimated in large networks

# Introduction

Fundamentally, a network is a body of data. As with other data sets, our first task it to identify significant features of that data. In much the same way as the mean and standard deviation are easy to compute but very valuable summary statistics, degree, clustering, and motif statistics are easy to compute attributes that can reveal a great deal about the structure of a network. We discuss these different basic statistics here.

# Degree-based Statistics

Given that networks are composed of nodes with relationships, an obvious first question to ask is how these relationships are distributed among the nodes in the network. The number of edges that are incident to a node is called the node's degree. In this section we discuss some ways in which the degree of nodes can help us understand the structure of the network and the mechanisms that formed it.

## Degree Density

A very easy-to-compute attribute we can consider is the average degree of nodes in the network: how many edges does each node have? This can be calculated using the ratio $\frac{2|E|}{|V|}$ which indicates the number of edges on average that are incident to a node.

## Degree Distribution

The problem with using average degree is that, as with all statistical distributions, it can be produced by a wide array of distributions. In fact, the distribution of edges can tell us much more than just the average about the distribution of power and resources within the network. Consider, as an example, the campaign contribution network discussed earlier. A network in which all candidates have similar number of contributions (edges) would indicate a very different world from a network in which most contributions (edges) go to a handful of candidates with most receiving very few. Both could have the same average degree, but the former network would indicate that resources are fairly distributed among candidates; the latter would suggest that a handful of candidate receive a disproportionate number of resources. Determining what situation exists in the real world would help us in formulating hypotheses about how the campaign system works.

In order to study this distribution, we could look at the degree distribution. The degree distribution, $P(k)$ is the fraction of nodes with degree $k$. Often the cumulative degree distribution is also considered, which is the fraction of nodes with degree less than or equal to $k$.

### In and Out Degrees

In directed networks, where edges point from one node to another, there are two distributions: the in-degree $P^+(k)$ which is the fraction of nodes with in-degree of $k$ and the out-degree $P^-(k)$ which is the fraction of nodes with out-degree of $k$.

### Scale-free Distributions

One distribution which has been observed in a wide array of networks is called the scale-free distribution. This distribution follows a power-law, $P(k) \tilde k^{-\gamma}$. On a qualitative level, in a network with such a distribution, the majority of the nodes have very low degree, but with a small, but non-trivial number of nodes have very high degree. The high degree nodes are called 'hubs' and are often found to hold positions of power in the network proportional to their degree.