Nothing new to install this time! Make sure, however, to restart your R session (“Session” menu > “Restart R”). Network visualization is hard work for your machine, so it’s good to start clean.

# (1) the data sandwich, AGAIN

```
library(dataculture)
library(igraph)
library(ggraph)
```

The `ggraph`

package incorporates, as example data, a variable `whigs`

. Enter `whigs`

on its own, and `help(whigs)`

to see the documentation. As it happens, the Rutgers library has digital access to David Hackett Fischer’s *Paul Revere’s Ride* (Oxford: Oxford University Press, 1995).

Look at Hackett’s Appendix D (301–7) at https://hdl-handle-net.proxy.libraries.rutgers.edu/2027/heb31559.0001.001.

- How were these pages used to create
`whigs`

, and by whom? - How did Hackett create these pages? (Hint: scholarship has citations.)

- How were these pages used to create
Han discusses five groups in his analysis of Fischer’s data.

- How many groups are represented in
`whigs`

? - What’s the reason for the difference? (Hint: “Data Sources and Methods.”)

- How many groups are represented in

To follow Han, you have to drop some columns, then drop the rows corresponding to people who belong to none of the remaining groups. The following incantation^{1} does this:

```
# remove columns 5 and 7
whig_han <- whigs[ , -c(5, 7)]
# keep rows whose sum is strictly positive
whig_han <- whig_han[rowSums(whig_han) > 0, ]
```

The following line creates a variable with the bipartite graph of Boston revolutionaries and their groups. `igraph`

calls an affiliation matrix an “incidence matrix.”

`whig_net <- graph_from_incidence_matrix(whig_han)`

# Ties by eye

Make the hairball using

`ggraph`

. Use a`color`

aesthetic with`geom_node_point`

to distinguish persons and groups. Work with your group members to recall the syntax from the previous lab.Look carefully at the plot. Where are the three

*people*who seem to play an especially important role tying others together?Identify the people by putting in the node labels (recall the previous lab).

Han of course has already told you who these people are. In the previous lab, you learned how to cut vertices out of graph. To cut more than one, you can use something like

`c("Hamlet", "Claudius")`

where you would put just`"Hamlet".`

- Cut them out of the network, make plots again, and describe what happens.

# The strength of weak ties

Now try the following:

```
P <- whig_han %*% t(whig_han)
G <- t(whig_han) %*% whig_han
wp <- graph_from_adjacency_matrix(P, weighted="strength",
mode="undirected")
wg <- graph_from_adjacency_matrix(G, weighted="strength",
mode="undirected")
```

What are these four variables? Without needing to know what

`%*%`

and`t`

do, a little printing out and perhaps plotting ought to be enough to let you guess.`weighted="strength"`

creates a new edge attribute,`strength`

. Both`wp`

and`wg`

have extra information attached to each edge, a`strength`

number. What is this? If you look at`G`

you should be able to figure it out.

Plot

`wg`

. Try`geom_edge_fan(aes(alpha=strength))`

.Plot

`wp`

, again visualizing the edge`strength`

.- Explain in network terms what seems notable about the three “central” figures.

Han uses some quantitative measures of bridging to support his argument. Measures of bridging usually depend on the idea of a “geodesic” or shortest path between nodes.

The **betweenness** of an edge is the proportion of all geodesics between pairs of nodes that include that edge. This can be calculated with a special `igraph`

function, `edge_betweenness`

. The following code creates a data frame with rows for each edge of `wp`

and columns for edge betweenness and `strength`

:

```
wp_edge_stats <- tibble(
eb=edge_betweenness(wp),
strength=edge_attr(wp, "strength"))
```

- Make a scatterplot (
**not**a network diagram) of edge betweenness against strength (use jitter since there are only a smaller number of possible values). What does this reveal about weak ties and connectivity?

A similar measure exists for nodes. The **betweenness centrality** of a node is the fraction of *all* geodesics between pairs of nodes that pass through that node.

The special function `centrality_betweenness()`

(nothing inside the parentheses) can be used inside an `aes`

specification for a `geom_node_*`

. (It sometimes generates silly warnings about `nobigint`

which you can ignore.)^{2}

- Make a plot of the network of persons
`wp`

in which the`size`

of a node is proportional to its betweenness centrality. Who stands out?

The `transitivity`

function calculates the clustering coefficient, but getting this data into a comparable visualization is slightly annoying. Here is some code to help you get there by adding a `transitivity`

attribute to the nodes of `wp`

.

`V(wp)$transitivity <- transitivity(wp, type="local")`

- Now make a network plot of
`wp`

in which the`size`

is proportional to`transitivity`

, and compare to the previous plot.

The syntax for manipulating matrices is unfortunately not as readable as the

`filter`

syntax we used with data frames.`A[u, ]`

means “select rows of`A`

according`u`

” and`A[ , v]`

means “select columns of`A`

according to`v`

”.`rowSums`

does what it says.↩︎There is a corresponding edge-betweenness function you can use in

`geom_edge_*`

,`centrality_edge_betweenness()`

.↩︎