## Difference between revisions of "Networks Centrality"

### From OPOSSEM

Derek Ruths (talk | contribs) (Created page with "<!-- add any hidden notes here --> =Objectives= * Learn the different ways of measuring node centrality * Understand the reasons for choosing one measure over another * Learn t...") |
Derek Ruths (talk | contribs) (→Closeness Centrality) |
||

Line 34: | Line 34: | ||

==Closeness Centrality== | ==Closeness Centrality== | ||

+ | If we consider a node's importance to be proportional to its ability to spread information (or disease) quickly through a network, then we must take a different view of centrality. In such a situation, the speed with which a node can spread information has more to do with its average distance from all nodes in the network than the number of edges it has in its neighborhood. | ||

+ | |||

+ | In a network, the distance between two nodes is measured in terms of the smallest number of edges that must be traversed to travel from one node to the other node. This is called the ''shortest path'' between the two points and is often denoted as <math>/sigma_{u,v}</math>. In a weighted network, the length of a path is the sum of the weights of the edges along it (not just the number of edges). Thus, the shortest path in a weighted network may not contain the fewest edges. | ||

+ | |||

+ | If we can compute the shortest path between any two nodes (which we can), then we can formulate a definition of centrality based on closeness: | ||

+ | |||

+ | <math>C_C(u) = \sum_{v \in V} \sigma_{u,v}</math> | ||

+ | |||

+ | A drawback to this approach is that the resulting closeness score is not normalized. Furthermore, in situations where there is no path from one vertex to another, the measure is actually undefined. One way of fixing this second issue is to reciprocate each distance. In situations where there is no path to a vertex, this distance is infinite and the reciprocal is zero. | ||

+ | |||

+ | <math>C_{C2}(u) = \sum_{v \in V} \frac{1}{\sigma_{u,v}}</math> | ||

==Betweenness Centrality== | ==Betweenness Centrality== |

## Revision as of 11:01, 8 July 2011

## Contents

# Objectives

- Learn the different ways of measuring node centrality
- Understand the reasons for choosing one measure over another
- Learn the computational obstacles involved in some measures of centrality

# Introduction

We know from experience that different people have different positions within a group, community, or organization. Furthermore, some positions are more essential to aspects of the organization and functioning of a community than others. The concept of *centrality* provides us a way of quantifying the importance of individual nodes in a network. While these measures use only the connectivity of the network (the patterns of edges), in many cases centrality can be an enlightening approximation for the actual importance of an individual to the group to which they belong. Because importance can be difficult to directly measure, centrality has become a valuable way of assessing node importance in an objective way.

The problem of assessing importance is made harder by the fact that importance can mean many things. This problem remains when we consider network centrality. The fundamental idea is to identify nodes that are deeply embedded in the network, important to the overall connectivity of the network, as highly central. This, however, is still too vague to be operationalized. In the following sections, we will examine a number of different ways of defining and implementing "embeddedness" in the network. Depending on the kind of importance you are interested in, one measure will often be more valuable than another.

There are two related, but distinct camps to which centrality measures belong. Some use only information about the immediate neighborhood of a node to determine it's importance. These methods implement the idea that a node's local connectivity tells us about its embeddedness. The other camp considers centrality to be an issue of proximity to other parts of the network. In this way of thinking, distances to various parts of the network reveal how important a node is to the network.

# Neighborhood-based Methods

## Degree Centrality

Degree centrality is the simplest of all centrality measures. It implements the quite intuitive idea that more important people know more people (i.e., have more connections to other people). Thus, the degree centrality measure is defined as

<math>C_D(v) = \frac{k_v}{\sum_{u\in V} k_u}</math>

where <math>k_v</math> is the degree (number of edges) of node v. <math>C_D(v)</math> will range between 0 and 1 - being larger for nodes that have higher degree.

In addition to being simple, degree centrality is also very fast to compute since it only requires visiting each node and counting the number of edges it has. We'll see that all other measures of centrality are significantly more computationally expensive than this one. This explains why, particularly for large networks, degree centrality is a favored measure.

## Eigenvector Centrality

While degree centrality is intuitively appealing, it has a glaring flaw: it assumes that connections to different people are equally valuable. Consider the utility of a poorly connected person (a homeless beggar) versus a well-connected person (a powerful CEO). Are these connections of equal utility? Under most circumstances, no. We'd like to modify degree centrality to take this into account - assigning greater importance to connections that lead to other well-connected individuals.

This modification is called eigenvector centrality and is the basis for, among other things, the algorithm that Google uses to rank the importance of web pages. We don't go into the math for this measure here. However, it can be a very useful measure to use.

# Distance-based Methods

## Closeness Centrality

If we consider a node's importance to be proportional to its ability to spread information (or disease) quickly through a network, then we must take a different view of centrality. In such a situation, the speed with which a node can spread information has more to do with its average distance from all nodes in the network than the number of edges it has in its neighborhood.

In a network, the distance between two nodes is measured in terms of the smallest number of edges that must be traversed to travel from one node to the other node. This is called the *shortest path* between the two points and is often denoted as <math>/sigma_{u,v}</math>. In a weighted network, the length of a path is the sum of the weights of the edges along it (not just the number of edges). Thus, the shortest path in a weighted network may not contain the fewest edges.

If we can compute the shortest path between any two nodes (which we can), then we can formulate a definition of centrality based on closeness:

<math>C_C(u) = \sum_{v \in V} \sigma_{u,v}</math>

A drawback to this approach is that the resulting closeness score is not normalized. Furthermore, in situations where there is no path from one vertex to another, the measure is actually undefined. One way of fixing this second issue is to reciprocate each distance. In situations where there is no path to a vertex, this distance is infinite and the reciprocal is zero.

<math>C_{C2}(u) = \sum_{v \in V} \frac{1}{\sigma_{u,v}}</math>

## Betweenness Centrality

# Conclusion

# References

# Discussion questions

# Problems

# Glossary

- [[Def: ]]
- [[Def: ]]
- [[Def: ]]