## Networks Centrality

### From OPOSSEM

## Contents

# Objectives[edit]

- Learn the different ways of measuring node centrality
- Understand the reasons for choosing one measure over another
- Learn the computational obstacles involved in some measures of centrality

# Introduction[edit]

We know from experience that different people have different positions within a group, community, or organization. Furthermore, some positions are more essential to aspects of the organization and functioning of a community than others. The concept of *centrality* provides us a way of quantifying the importance of individual nodes in a network. While these measures use only the connectivity of the network (the patterns of edges), in many cases centrality can be an enlightening approximation for the actual importance of an individual to the group to which they belong. Because importance can be difficult to directly measure, centrality has become a valuable way of assessing node importance in an objective way.

The problem of assessing importance is made harder by the fact that importance can mean many things. This problem remains when we consider network centrality. The fundamental idea is to identify nodes that are deeply embedded in the network, important to the overall connectivity of the network, as highly central. This, however, is still too vague to be operationalized. In the following sections, we will examine a number of different ways of defining and implementing "embeddedness" in the network. Depending on the kind of importance you are interested in, one measure will often be more valuable than another.

There are two related, but distinct camps to which centrality measures belong. Some use only information about the immediate neighborhood of a node to determine it's importance. These methods implement the idea that a node's local connectivity tells us about its embeddedness. The other camp considers centrality to be an issue of proximity to other parts of the network. In this way of thinking, distances to various parts of the network reveal how important a node is to the network.

# Neighborhood-based Methods[edit]

## Degree Centrality[edit]

Degree centrality is the simplest of all centrality measures. It implements the quite intuitive idea that more important people know more people (i.e., have more connections to other people). Thus, the degree centrality measure is defined as

<math>C_D(v) = \frac{k_v}{\sum_{u\in V} k_u}</math>

where <math>k_v</math> is the degree (number of edges) of node v. <math>C_D(v)</math> will range between 0 and 1 - being larger for nodes that have higher degree.

In addition to being simple, degree centrality is also very fast to compute since it only requires visiting each node and counting the number of edges it has. We'll see that all other measures of centrality are significantly more computationally expensive than this one. This explains why, particularly for large networks, degree centrality is a favored measure.

## Eigenvector Centrality[edit]

While degree centrality is intuitively appealing, it has a glaring flaw: it assumes that connections to different people are equally valuable. Consider the utility of a poorly connected person (a homeless beggar) versus a well-connected person (a powerful CEO). Are these connections of equal utility? Under most circumstances, no. We'd like to modify degree centrality to take this into account - assigning greater importance to connections that lead to other well-connected individuals.

This modification is called eigenvector centrality and is the basis for, among other things, the algorithm that Google uses to rank the importance of web pages. We don't go into the math for this measure here. However, it can be a very useful measure to use.

# Distance-based Methods[edit]

## Closeness Centrality[edit]

If we consider a node's importance to be proportional to its ability to spread information (or disease) quickly through a network, then we must take a different view of centrality. In such a situation, the speed with which a node can spread information has more to do with its average distance from all nodes in the network than the number of edges it has in its neighborhood.

In a network, the distance between two nodes is measured in terms of the smallest number of edges that must be traversed to travel from one node to the other node. This is called the *shortest path* between the two points and is often denoted as <math>/sigma_{u,v}</math>. In a weighted network, the length of a path is the sum of the weights of the edges along it (not just the number of edges). Thus, the shortest path in a weighted network may not contain the fewest edges.

If we can compute the shortest path between any two nodes (which we can), then we can formulate a definition of centrality based on closeness:

<math>C_C(u) = \sum_{v \in V} \sigma_{u,v}</math>

A drawback to this approach is that the resulting closeness score is not normalized. Furthermore, in situations where there is no path from one vertex to another, the measure is actually undefined. One way of fixing this second issue is to reciprocate each distance. In situations where there is no path to a vertex, this distance is infinite and the reciprocal is zero.

<math>C_{C2}(u) = \sum_{v \in V} \frac{1}{\sigma_{u,v}}</math>

## Betweenness Centrality[edit]

Betweenness centrality appeals to the notion that an embedded node will often lie on the most direct path between any two individual nodes. This most direct path is the same as the shortest path. The concept of betweenness implements this idea: the betweenness of a node is the fraction of all shortest paths through the network that pass through that node. The more shortest paths that pass through that node, the higher its centrality.

Betweenness has been shown to correspond to a wide range of real-world measures of importance in systems ranging from biochemical systems to trade networks. The only downside of this method is that it is extremely expensive to compute on large networks - to the point that it is virtually impossible to compute on networks with more than several tens of thousands of nodes.

# Conclusion[edit]

Existing work in network science has established that centrality is a very powerful and useful tool for understanding the relative importance of different nodes to the overall structure of the network. As outlined in the sections above, using centrality requires choosing a measure that maps well to the kind of importance you are interested in as well as selecting a measure that is not too expensive to be evaluated on your network.

# References[edit]

# Discussion questions[edit]

# Problems[edit]

# Glossary[edit]

- [[Def: ]]
- [[Def: ]]
- [[Def: ]]