Working With NodeGoat


Original art by blog author

I’ve already mentioned networks in a previous post, where I briefly touch upon the value of connections in the study of the humanities. My argument is that connections are made such that one thinker, one art piece, one text, etc. can be influenced by other people, ideas, creations, etc. across space and time. These connections can be easily analysed using networks. As Scott B. Weingart states in his article “Demystifying Networks, Parts I & II”, “representing information as a network implicitly suggests not only that connections matter, but that they are required to understand whatever’s going on.”

In class, we were introduced to NodeGoat, a “web-based data management, network analysis and visualization environment”. A thought before talking about my experience using NodeGoat: upon accessing its homepage, one encounters updates (blog posts!) regarding projects made with NodeGoat, workshops carried out by “the NodeGoat people”, among others. A quick look at the latest posts shows the versatility of digital tools (specifically, of this one), and the richness and diversity of topics in the digital humanities. The projects include network visualizations of letters of 19th century intellectuals, interactive visualizations of the movement across the United States of members of the U.S. House of Representatives, and a world-wide Geography of Violence based on military conflicts. These blog posts also show the interconnectedness that characterizes the Internet, in terms of space (workshops and meetings will take place in Ghent (Belgium), Washington D.C. (U.S.A.), Mons (Belgium), Düsseldorf (Germany), and Lyon (France)) and time (the posts are dated, and they refer to past and future events). It’s interesting to consider, then, that an announcement page such as this one is in itself a network.

Our professor showed us the various features of the program, which range from the basic step, building up a dataset, to displaying geographical, social, and chronological visualizations (networks) based on the data. We were given the task to add objects to an already set-up data table, meaning that the information that had to be associated to each object was already defined. The topic of our networks project was Egyptian cinema, which, as we discussed in class, is a very good topic indeed.

The Egyptian film industry is the largest in the Arab world. According to Wikipedia, it began in the early 20th century, and today, more than 3/4 of the films that have been “made in Arabic-speaking countries” are Egyptian. So, A) there’s plenty of information about Egyptian films, product of years and years of a “flourishing” industry, B) these films are famous enough to find information about them online, and C) they are relevant to our context (Abu Dhabi), given that Egyptian films have influenced the cultural and artistic scene of the Arab world for over a century.

The objects of our dataset (called types on NodeGoat), then, were films, people (actor/actress/director), and cities (primarily, the birth and death places of people). However, I’ll refer only to the visualizations in which films and people were the nodes of the networks. In the data table on NodeGoat, each object was connected in the following way: each film had a title, release date, and three people asssociated to it. Some entries from the data table appear in Figure 1.


Figure 1. Screenshot of data compiled for Egyptian Cinema Networks project

For the process of compiling data (each of the students had to add 10-15 films to the dataset), I used online databases: the Contemporary Arab Cinema database, made by the University of Manchester, the Internet Movie Database (IMDb), and Wikipedia (particularly the data found in infoboxes). The following are the main obstacles I encountered/lessons I learned during this process.

The Thing About Spelling

Writing Arabic names into English requires translating from one writing system to another, which is not an easy task. I first realized this posed a problem for our dataset when I noticed that one actor’s name is spelled differently in Wikipedia and in IMDb. If I entered this person as “Emah Hamdy” and one of my classmates entered him as “Imad Hamdi”, or viceversa, our network would divide one person into two. If a network, like a balanced ecosystem, depends on the connections of all of its parts, then altering those connections would alter the network. We discussed this problem in class. Software like NodeGoat is not coded to understand that a spelling mistake was made; its purpose is to identify equal data and group it together. Therefore (as with a scientific experiment) human error can greatly affect the interpretations of data displayed in a network.

The Importance of the “What Am I Interested In?” Question

Throughout the semester, we’ve come to understand how digital humanities projects are often more about the research process than about the results. Therefore, a hypothesis doesn’t have to be proven for a project to be succesful. However, structure is always necessary (especially with the amount of data on Egyptian cinema out there). This is why the data table was set up to include specific information.

Nevertheless, I realized that even though a hypothesis about the connections in the network was not necessary, understanding the focus of the project and specifying it is essential. If this is not done, the interpretation of the resulting network would be overly complicated, or even incorrect.

The social visualization that NodeGoat created for our data is shown in Figure 2. In these networks, the nodes are either films or people. The edges could be read as “was part of” when going from a person to a film, and as “was made by” (implying that actors and actresses also make the film, not just the director) when the direction is from film to person. The edges, then, are symmetric. As Figure 3, a zoom-in on a section of one of the networks show, films (red) are only connected to people (blue), and viceversa.


Figure 2. Social visualization  of the Egyptian Cinema Networks project


Figure 3. Zoom-in of a section of Figure 2. Red nodes are films, while blue nodes are people.

One of the first doubts I had when compiling data had to do with adding people to the network. When adding a new person to the dataset, part of the information related to this person had to do with current and past spouses (marriages and divorces). Some of the actors, actresses, and directors that I added were related to each other in this way. However, some of them had married people outside the industry. I wondered if these “outsider” spouses should be included in the data, to appear as nodes in the network. What purpose would that serve? Are we interested in people who are not actors/actresses/directors? I made the choice to ignore them, and later realized that they wouldn’t have appeared in the social visualization either way (as people nodes are not connected to other people nodes).

Another doubt had to do with the people assigned to each movie on the dataset. I made the choice to prioritize the director; she/he should always be one of the three people (did my classmates make the same decision? I didn’t know then). But then, who are the remaining two? If three or more actors/actresses from the dataset appear in the same movie, but we can only associate two with it, then according to the social visualization, the actor/actress did not appear in said movie. Say that, instead of any actor/actress in the movie, only the leading roles are connected to the film. But what if a movie has only one leading role, or three? For some movies, even, choosing which characters are the main ones might be subjective.

In Figure 2, there are over four independent networks; without context, a viewer might think that each of these networks are unrelated to the others. If the connections where limited to directors and leading roles, for example, then they might be. But if any two actors/actresses can be chosen for each film, then connections between these networks  are invisible, altering the “ecosystem”. After this exercise, we learned (as explained by the developers of NodeGoat) that we could have assigned as many people as desired to each film. But then, would we include every single actor/actress who participated in the film? Would everyone’s information be online? There’s still a need to specify. At a more general level, we know that Egyptian cinema is incredibly broad. Would we include every single film? If not, which ones would we choose?

Leave a Reply

Your email address will not be published. Required fields are marked *