At Indigo Research, we do a lot of work with networks and graphs (the vertex-edge kind) and we figured that it would be interesting to plot and visualize out the social networks of Marvel Cinematic Universe characters. Hopefully this project might explain a few things that might happen in Infinity War or even Endgame.
No spoilers will be present in this article.
Before we get into the MCU social network, we first have to explain what a network is. Graphs or networks are mathematical tools used to model relationships using vertices and edges.
Objects are represented as vertices, and relationships are modelled with edges. In the basic network above, vertex 1 has a relationship with 4 and 5. Vertex 3 only has an edge to 1, but not to any other vertex. In our network, we used vertices to represent MCU characters, and interactions in their movies to represent edges. We'll go into the details of how we figured that out in a bit.
A path is a sequence of edges that connects vertices. It can be said that a path exists between 2 vertices when you can get to one from the other by traversing the edges between them. In the graph above, we can say that it is a connected graph because any vertex is reachable from any other vertex. 4 to 1 has a shorter path than 3 to 5, and 2 to 4 has the same path length as 5 to 3.
The degree of a vertex is designated by how many edges are connected to it. Vertex 1 has a degree of 4, while vertex 3 only has a degree of 1.
How we did it.
We used this project as a valid excuse to marathon the entire MCU movie list save for Captain Marvel.The movies were watched in the sequence shown in the BBC image above(yes we also watched The Hulk).
Heroes, support characters, murderbots, strong AIs (we unfortunately didn't include Tony's Dum-E and U robot arms because they were relatively weak AIs), love interests, SpaceX CEOs, Stan Lees, and other characters that were either named or appeared more than once we identified to be vertices.
We designated an edge between characters whenever they appeared in a scene together, it's as simple as that. J.A.R.V.I.S talking with Tony created edges between them, while the battle at the Leipzig airport built edges between all who were present. Scenes like Tony looking at a picture of Howard Stark do not create edges between them, but flashbacks where they actually interact do.
Connecting vertices with edges are fun, but we can get more information out of the graph by looking at the clusters the network forms, and the various centrality scores of the characters.
Centrality scores are measure of relative importance of a node in a network. We’ll show two types of centrality scores: degree, and centrality. The degree centrality of a node is the number of connections a node has. In first network shown in this post, node A has degree 2, while node C has degree 3. Betweenness centrality, on the other hand, considers the number of shortest paths a node belongs to. A shortest path is the smallest number of jumps required to reach one node from another. Betweenness centrality of a node v is computed using the following formula:
where p is the number of shortest paths between nodes s and t, and p(v) is the number of shortest paths from s to t that include node v. High centrality scores often denote nodes that are situated along many paths in the network, and are ideal nodes to have spread information to other parts of the network.
Community partitions also help us understand the network better. We can divide the network based on which nodes are more connected to each other. To divide our network, we used the Louvain Method for community detection. This method involves maximising the modularity score of the network, given by
where m is the number of edges in the network, k represents the degrees of nodes i and j, and δ is simply a delta function, which gives 1 when nodes i and j are different, or 0 when i and j are equal.
The Louvain method involves an iterative process which begins by assigning each node in the network to a different community. Then we compute for the modularity by adding a node to the community of the nodes in its neighbourhood. If the change in modularity is positive, then the node stays in that community. In our visualisation, we show the result of community detection using the Louvain Method by assigning a color to each community.
Now for some results! Below is a list of the top 10 characters with highest degree.
As you can see, Thor is the character with highest degree in the network. This is maybe due to his adventures with some members of the Guardians of The Galaxy, as well as maybe because he's the only who whose weapons (Mjolnir and Stormbreaker) we actually treated as separate characters. Iron Man being in the list comes as no surprise, because he's literally the first character introduced in the Marvel Cinematic Universe, and he's involved with so many other characters throughout the films. Aside from the other Avengers in the list, it's interesting to the see The Reality Stone up there. We'd have thought The Space Stone (as the Tesseract) would be the Infinity Stone with highest degree, but maybe The Reality Stone's journey across earth, the hands of the Dark Elves, Asgard, as part of the collection of The Collector on Knowhere, then finally in Thanos' Infinity Gauntlet is what gave it importance in our network.
Next is a list of characters with highest betweenness centrality:
Like we said earlier, it's hard to deny Tony Stark's importance to the MCU. He's pretty central in the network, meaning many parts of the network are connected thanks to Tony. Peter Parker also seems to play an important role in keeping the network connected, maybe because he's connected to Principal Morita, the grandson of Jim Morita, one of the Howling Commandos, Captain America's elite combat unit during WWII.
To see how the nodes are divided into communities, head on to https://avengers.indigoresearch.xyz/ and see how we colored the nodes! Each color represents what we call a modularity class, which is a partition given by the Louvain Method we discussed earlier. Do these partitions look familiar? Seems to us that some partitions represent the movie universes some characters belong to! For one, Captain America's friends from the 1940s belong to one community, Peter Parker's friends, teachers, and Aunt May also belong to their own community, Asgardians also form their own community. What other communities can you differentiate?
In conclusion, we created a social network of heroes, villains, artifacts, and robots, and showed that network science can be a ton of fun, and a great excuse to watch 20 movies in a row.