Hello. Welcome to a new episode. Today’s topic is Network Analysis part 1. Networks & Communities. One of the objectives is to understand what a network is. We will introduce the so-called adjacency matrix to represent the network. We will see how a simple adjacency matrix is built. We will try to understand what the modularity is and how we can get the community structure of a network. Today’s episode is mostly about the basic concepts. A network graph is composed of nodes and edges. A node can be an observation, a person or a case. There is a great flexibility in representing a node, as we can see some examples here. An edge represents the close relationship between two nodes. As we can see, an edge is usually drawn as a line connecting the nodes pairwise. Some of the applications of the network are: social network analysis, clustering analysis among others. Here, we can see an example of the network graph. We have the nodes which are connected by undirected edges. And there is no direction in the connections. The closely related nodes may form a community or a cluster. Here, the different communities are shown in different colors. So far so Good. Now, we can use a matrix to represent mathematically a network. This matrix is called a simple adjacency matrix. It is a symmetric matrix of 1s and 0s. In order to illustrate how a matrix can represent the network, let me show you step by step. Suppose that we have two nodes labeled 1 and 2 which are connected by an edge. It would be the simplest network we can think of. A 2 x 2 matrix can represent this network. The matrix element at the first row and the second column is 1. and the element at the second row and the first column is also 1. Imagine we flip this matrix around the diagonal, it would look exactly the same. So, this matrix is symmetric. Now, we connect the node 3 to the node 2. The matrix elements in blue represent the new edge. This matrix is also symmetric around the diagonal. We continue on. The node 4 is now connected to the node 2. Again, the matrix elements in blue represent the new edge. And, the remaining nodes are added to the network as we can see here. So, are you convinced that matrix on the left represents the network on the right? Adjacency matrix elements. In order to build an adjacency matrix, we would need a measure of being related. So we need to be able to express the distance or similarity between the nodes. In this case, we will use the so-called “cosine similarity”. To help you understand the cosine similarity, let me compare it with a commonly used distance metric called “Euclidean distance”. Then, we will discuss how to convert the cosine similarity into 1s and 0s of the adjacency matrix. For that purpose, we are going to apply an appropriate cutoff. Suppose that we have two nodes. The node positions are represented by the vectors X1 and X2. The difference between two vectors is another vector as we can see here. So, the Euclidean distance is the magnitude of X1 – X2. If the magnitudes of the vectors X1 and X2 increase, the Euclidean distance also increases. It seems that the Euclidean distance really measures how close the nodes are. The cosine similarity is the cosine of the angle theta between the vectors X1 and X2. What is interesting is that, even when the lengths of the vectors increase, the angle theta remains the same and the cosine similarity also remains the same. The cosine similarity has its maximum value 1 when the vectors align. The cosine similarity is 0 when the vectors are perpendicular to each other. The cosine similarity has its minimum value -1 when the vectors anti-align. The cosine similarity can be any value between -1 and +1. So, the question is How do we convert it into 1s and 0s of the adjacency matrix? Well, the answer is quite simple. When the cosine similarity is larger than a certain cutoff or threshold, we assign a matrix element 1. Otherwise, we assign a matrix element 0. Modularity. Modularity is an important concept. It is a measure of the network structure, that is, how the communities or the clusters are formed. The principle of the modularity is as following: It rewards putting the connected nodes together into the same community. At the same time, it penalizes for the probabilities of the connections being made at random. The optimal community structure corresponds to the maximum modularity. The modularity can be expressed by this formula where S-tilde and B-tilde are matrices. At first look, this formula is rather hard to understand. So, we will try to make sense of it bit by bit. This part means taking the trace of the matrix within the parenthesis. Trace is the sum of the diagonal elements. The matrix S has as many rows as the number of nodes and has as many columns as the total number of communities. The matrix S has the information about the community structure. The matrix S to the power t is the transposed matrix of S. Now, the matrix B is a square matrix of size equal to the number of nodes. The elements of the matrix B obey this formula. This term is the “reward” associated with putting the connected nodes into the same community. This term is the “penalty” associated with the random connections. So the goal is to get the matrix S that maximizes the modularity. We can repeat the maximization after changing the total number of communities: 2, 3, 4 and so-on. Then, we can get the optimal number of communities which corresponds to the global maximum of the modularity. Finally, a summary. Today, we saw how to represent a network graph with the simple adjacency matrix. We tried to answer to the question of how to tell when the nodes are close to each other. We discussed about the modularity maximization and the formation of the communities. Today’s episode was a conceptual introduction. In the next episode, we are going to see an actual example. We will think of the network analysis as a clustering method. So, we will compare it with other clustering methods such as k-means. Great! I hope that you enjoyed watching this video. Thank you so much. Until next time, Bye!