The networks session at CAA2011 in Beijing was a success! We had some great papers and a fascinating discussion. Read the summaries of the papers, the questions and answers, as well as the discussion here. Read more about the session, including the abstracts and the introduction on the dedicated page.
- Maximilian Schich and Michele Coscia
- Diego Jimenez
- Johannes Preiser-Kapeller
- Mihailo Popovic
- Ladislav Smedja
- Tom Brughmans
- Leif Isaksen
The first presentation of the day was by Maximilian Schich and Michele Coscia talking about ‘Untangling the Complex Overlap of Subject Themes in Classical Archaeology’.
Maximilian and Michele used the Archäologische Bibliographie, a library database consisting of over 450.000 titles, 45.000 classifications, and 670.000 classification links. They looked at the co-occurrence of classifications, creating networks where two classifications are connected if they appear in the same book as well as networks where classifications are connected when the same author writes about them. Using whatever database software you can look at the local level of this massive dataset. This was not of interest to the authors. In stead, Max and Michele looked at the bigger picture. They devised a method that allowed them to explore the dataset on three different scales: the local level (database level), the meso-level and the global level. On the global level they were able to identify academic communities, but also clusters of communities (so communities of communities). They also looked in detail at how these communities evolved over time. On the meso-level they threshold the data based on co-occurrence and significance, which produced interesting results. Max and Michele concluded that this approach to academic literature allows us to look at the fine-grained structure of how archaeology actually works. Their three-level method using hierarchical link clustering and association rule mining made it blatantly clear that complex overlaps are everywhere in academia!
Questions: Guus Lange asked what type of clustering was applied, to which Max responded that no clustering was performed on nodes but on the links. Graeme Earl asked how the classifications were derived from the database and whether they thaught about exploring how the classifications themselves grew and transformed. Max replied that there is no limit to the number of books per classification but there is a sharp limit to the number of classifications there are per book. What is interesting, he said, is that we nevertheless get this big picture. Tom Brughmans wrapped up with a final question about how long it took them to do this work. Michele and Max mentioned that it took them one year but once the workflow is engineered it could be done in two weeks time.
Diego Jimenez was our next speaker. He presented on ‘Relative Neighborhood Networks for Archaeological Analysis’.
Diego is interested in archaeological attempts to find meaningful spatial structure between archaeological point data. He relies on graph theory to find structure based purely on the spatial distribution of points and suggests objective ways of analysing connections between them. In his talk Diego focused mainly on the methodology rather than any specific applications. Rather than nearest neighbour approaches, he suggested a relative neighbourhood concept as the basis for his method. Two points are relative neighbours if the regions of influence drawn around this pair does not include other points. Graphs can be constructed using this concept. Most interestingly, Diego mentioned that a parameter beta can be included to change the regions of influence. This allows for a series of graphs to be created with different levels of connectivity. Diego suggested some space syntax approaches to analysing these graphs including graph symmetry, relative asymmetry and distributedness.
Questions: Maximilian Schich was interested in how control was defined in Diego’s analysis of the graphs and mentioned that peripheral nodes might often have a high level of control in a network. Diego mentioned that these are indeed important patterns that need to be acknowledged by archaeologists and his method would be a way to be sensitive to them.
After Diego we had the honour of listening to a historians experiences with network analysis. Johannes Preiser-Kapeller talked about ‘Networks of border zones – multiplex relations of power, religion and economy in South-eastern Europe, 1250-1453 CE’.
Johannes’ paper made it very clear that, although archaeologists can rarely obtain datasets of such quantity or quality as in other disciplines, we still have sources that inform us of different types of relationships for which a networks approach can lead to highly interesting results. He constructed five distinct networks from different data types (streets, coastal sea routes, church administration, state administration, participants of the 1380 synod) some of which were compared for three different moments in time (1210, 1324, 1380). Initially some general measurements, like average distance, clustering coefficient and density, are used to explore the topology of individual networks, as well as compare between networks of different sources. Secondly the overlap of groups of related nodes is identified to explore the correlation between different networks. Johannes then merged all these networks to create what he considers a multiplex representation of frameworks of past human interactions. Thirdly, the combined effects of the multiplex network on the topology of social interaction, as illustrated through the participants in the 1380 synod, is explored. He concluded by stating that this framework that emerged from different sources might be more than merely the sum of its parts. In short, even though we are dealing with fragmentary and limited datasets, applying a networks perspective explicitly might still guide us to highly interesting and surprising results.
Mihailo Popovic presented the final paper before lunch. His talk titled ‘Networks of border zones – a case study on the historical region of Macedonia in the 14th century AD’ was strongly related to that of his colleague and fellow historian Johannes.
Mihailo’s paper explored the border zone between the Byzantine empire and the emerging Serbian state in the 14th century AD. His case-study focused on the area of the city Stip and the valley of the river Strumica. Four central places were identified in the valley on the basis of written medieval sources: the towns of Stip, Konce, Strumica, and Melnik. Mihailo is interested in understanding how these places interact with each other. For example, can an exclusive relationship between the central places and the surrounding smaller settlements be assumed? Or did all settlements interact equally with each other? Mihailo stresses the importance of evaluating the landscape on the ground to explore how this might have influences urban interactions. Based on Medieval written sources that identify the larger settlements as religious, administrative and economic centres, he argues for an exclusive relationship of the larger towns with the smaller ones. This leads to astral-shaped networks. Mihailo’s analysis shows that Strumica has the highest closeness centrality value, whilst Stip has the highest betweenness value. To conclude he stressed the wider questions that his networks approach leave open: is the settlement pattern complete? Is the network realistic in view of the landscape? Is the networks’ astral-shape justifiable or did the villages also interact with each other? May we assume interactions between other villages? How to integrate human behaviour?
We reconvened in the afternoon to listen to Ladislav Smejda talking about ‘Of graphs and graves: towards a critical approach’
Ladislav discussed the artefact distributions from a cemetery dated around 200 BC. He explored eleven attributes consisting of grave dimensions and the presence or absence of grave good categories, which can appear in many combinations. Ladislav limited the relationships of co-presence of grave goods to statistically significant correlations, which resulted in a graph representing his eleven attributes and relationships of positive and negative correlations between them. He then moved on to divide the graph into two substructures. Substructure A is defined by correlations between ornaments (faience beads, bone beads, hair ornaments) and grave depth. Substructure B includes stone artifacts, cattle ribs and grave length. These two sets seem to show strongly different patterns, which can be explored as networks. Simple networks were created based on the presence or absence of artefacts significant to either substructure A or B, showing different structured. Secondly, Ladislav introduced the concept of the hypergraph where the edges are more like areas in which more than one node can be included. Ladislav concludes that a graph theory and network analysis approach is useful to handle, visualise, and explore the structure of archaeological datasets, whilst leaving plenty of options open to take the analysis further with different tools (like GIS).
Questions: Ladislav’s presentation sparked many questions, partly because we had plenty of time in the afternoon due to serious changes in the conference schedule. So I decided to transcribe the questions as a simple dialogue.
Leif Isaksen: what does the negative correlation mean? That the attributes don’t occur together?
Ladislav: they don’t appear together with statistical significance.
Maximilian Schich: what’s the negative correlation with grave depth and faience beads?
Ladislav: deep graves have bone beads and shallow graves tend to have faience beads.
Leif Isaksen: how has the grave depth been recorded?
Ladislav: data was taken from excavation reports. There is no specification of how they measured that. The whole site was excavated by a single person. Possibly grave depth was measured from the top soil downwards.
Maximilian Schich: You could compare every link in this diagram, maybe as an XY diagram where you have bone beads vs grave depth for example. Do you know how many bone beads there are? How many graves? Are these measured just as presence/absence or as real counts?
Ladislav: There are 70 graves with bone beads, and 470 graves in total. I tried both approaches but presence/absence is better because in many cases it was impossible to count precise numbers. I don’t think it is important to know how many bone beads they had exactly.
Maximilian Schich: so you could draw an XY diagram. If you only have 470 graves it’s very easy to draw a histogram. And instead of the correlation you could give us all the data points.
Ladislav: I did all these things. At the moment I have so many outputs of this data that it could not be presented in a 15 minute paper. Clearly there is much more you could with this data.
Maximilian Schich: how can you assign grave depth to a region where there is no grave?
Ladislav: the grey background is just an interpolation of the grave points. The crucial thing this shows is that there are no deep graves on one end of the matrix and no shallow graves on the other.
Diego Jimenez: is there any significance in the distribution of objects within each grave, and is that relevant for the analysis.
Ladislav: it’s recorded, I tried to follow this up but not with graph theory.
Diego Jimenez: this is what sparked my interest in using graphs, as I used it to understand the spatial distribution of artefacts within graves. The spatial arrangement might have a symbolical importance.
Tom Brughmans: it’s a good example of a network within a network as well.
Leif Isaksen: it would be great to see these graves’ locations projected in geographical space, did you pursue a geographical approach as well?
Ladislav: yes, but that is the topic of another presentation.
Tom Brughmans: I am not sure if the statistics used to explore correlations are necessary, because these correlations might just emerge when exploring the co-presence of different types of artefacts as a network.
Ladislav: the presence/absence is exactly what is represented, so it is a different way of achieving the same thing.
Maximilian Schich: you have enough data but not too much to prevent a real networks visualisation. There is no need to reduce your data to a few nodes and links. All your data can be shown on one graph and a few histograms.
Ladislav: I did not do this because I am looking for the simplest possible structure, in the simplest possible representation.
Due to the changes in the conference schedule the afternoon also saw two unscheduled presentations by Leif Isaksen and myself being added to the network analysis session.
Tom Brughmans presented a paper titled ‘Facebooking the Past: a critical social network analysis approach for archaeology’.
I started out with a short fiction about how Cicero became consul of Rome thanks to Facebook and Twitter. Obviously, that is not the story we will find in the history books. But by making the analogy between modern ideas of social networks and past social processes it becomes clear what it is we are actually doing when using social network analysis. I argued that there are three issues related to the archaeological (and indeed historical) use of social network analysis. Firstly, that the full complexity of past social interactions is not reflected in the archaeological record, and social network analysis does not succeed in representing this complexity. Secondly, that the use of social network analysis as an explanatory tool is limited and it implies the danger that the network as a social phenomenon and as an analytical tool are confused. Thirdly, human actions are based on local knowledge of social networks, which makes the task of deriving entire past social networks from particular material remains problematic. To confront these issues I argued to turn the network from the form of analysis to the focus of analysis and back again in an integrated analytical process drawing upon ego-networks, complex real-world network models and affiliation networks approaches.
Discussion: the questions about this paper changed into a fascinating discussion about the nature of archaeological and historical data and how this influence our use of network techniques.
Maximilian Schich: I think that indeed data from today is different than from the past but only because more is different. In a sense I think it cannot be justified to say that we should not look for social networks because the data is incomplete. Modern day data, like mobile phone record for example, are also incomplete. Facebook does not cover all social interactions. One topic that has been mentioned a lot today is that of multiplex networks. There is a conceptual danger with this because it assumes that we can discretize between different types of networks, whilst actually that is not possible. When collecting data there is one thing that is definitely different from data like mobile phone networks for example, which is the multiplicity of opinion. If you collect something and I collect something the data will look completely different. All these things are complicated, a lot of time needs to be invested in this, I agree that we have to work with what we have. But we should not capitulate in front of this problem saying that it’s perfectly fine to just bullshit theoretically because the data is unavailable.
Tom Brughmans: I agree that archaeological data is not necessarily any different than data sociologists or physicists use, like mobile networks for example. Another example is e-mail communication. A sample of this type of social interaction might be limited because some people were out of office whilst you were taking the sample, and it is also an indirect reflection of social relationships as we explore the e-mail directly but not the people. So our data might not be different. But what possibly makes archaeology (and other historical disciplines) different is that all our theory is geared towards this issue. We are very aware that we are dealing with indirect fragmentary samples to explore dynamic processes in the past. Whilst in other disciplines scholars might over simplify this issue, in the historical disciplines we are very aware of it and cannot avoid it. Another difference might be what you said that when different people excavate the same thing, different data will emerge. But more crucial I think is that after collection the data is actually destroyed, it is not a repeatable test. The data only lives on in a structure that makes sense to the person who collected it. So given these two issues I think archaeological applications of social network analysis can be different from other disciplines.
Yasuhisa Kondo: Just a comment. I believe that social networking like Facebook and Twitter is also changing archaeologists’ behaviour. When I was in Oman a few months ago, for example, the Middle East crisis was picking up and I was informed about the situation of Egyptian heritage through social networks. Secondly, in Japan we use Facebook to collect data. So it is interesting to see that it is not only useful to think about present social relationships between archaeologists but also about past social networks.
Johannes Preiser-Kapeller: when comparing modern complex network analysis in physics and historical network analysis, in physics scholars don’t want to just analyse but they also want to explain, to understand the mechanism that makes the network function. They generate ideas on how such network actually worked, like through preferential attachment for example. We do not know if networks in the past actually worked in the same way, if such mechanisms can be imposed on historical networks. Our data sometimes isn’t even large enough to identify degree distributions that reveal power laws for example.
Tom Brughmans: I am glad that you bring this up because I have been struggling with a similar issue. Do these real-world network emergent properties actually explain anything. Aren’t they just a description of a complex network structure, of how it evolves rather than explain the network. The descriptive aspects of such models can easily be applied to historical data, when we accept the assumption that the whole is greater than the sum of its parts and complexity arises from local interactions. But it does not really explain much does it.
Johannes Preiser-Kapeller: modern complex network models assume that they are not merely descriptive but they are laws that explain how things like social relationships functioned. It’s more than description, they are looking for mechanisms. The question is if we can also identify such mechanisms for past networks which can help us to explain how social interaction worked.
Maximilian Schich: concerning the power-law thing, preferential attachment is only one of thousands of mechanisms which can result in a power law. And in some cases it can not even be proven that the power law is there because of a lack of data. So we cannot blame the people that came up with the idea of preferential attachment in the first place as if they assumed that it explained all power laws. It is not their fault that they got cited 60.000 times. We should acknowledge that this is just one model that actually works, and it explains a lot, just like the small-world model. But both of them are incomplete. Concerning historical networks: I think it is a big mistake of historians or other scholars in humanities to think that we are special, cause we are actually not. Of course we have different documentation and different numbers. But the underlying approach of hypothesis testing and of saying “let’s look at what structure the data has”, that is the approach complex network scientists have. They do not assume a universal law. This is the same approach taken in the humanities.
Mihailo Popovic: many people are not aware of the exact historical situation. Like 14th centure Byzanthium for example: 90% of the population lived in vilages, the flow of information does not exist on an international level it is a local thing.
Maximilian Schich: are you sure?
Mihailo Popovic: I am sure, based on the sources we have. Thirdly, there are slaves in the villages who’s movement is restricted. Finally, Illiteracy is immense. To come to my point: we have written sources that are written by 5% of the population, if even that. And of those perhaps 20% percent survive. So what do we do? We cannot just assume that comparing a dataset of six million people communicating over the internet with a historical dataset like the one I described can be done through the same approach. We have to face the reality of the historical period. It took us a lot of time and effort to collect these relatively small and still fragmentary datasets.
Maximilian Schich: but we can agree that things are being spread between people, even if they are not aware of it. Information can spread in the same way electricity spreads for example, electrons push other electrons along, not every electron goes all the way from Europe to China. We have such a situation where we can assume that some information was spread for most periods in the past. So to say that there are individuals who are immobile and construct sampling boundaries based on that, I don’t think such a strict limitation can work.
Johannes Preiser-Kapeller: of course, there was some kind of globalization already in the 14th century, there was some connection which even reached villages. It would be perfect if we could paint a picture of such a global system. We can do it on a superficial level, but we do not have the necessary sources to go in more depth. A prosopographic database of the Byzantine period, for example, contains 30.000 people. Of those, 80% were clerics and not more than 200 were farmers. We can see what is going on for the top 5% of the people, and we can see the mechanisms like preferential attachment working on this level. But we are still struggling with the artificial border created by our data, as you mentioned. We do not have the entire system. This sample problem will always be there in the historical discipline.
Maximilian Schich: that’s exactly the same problem as we have in any other discipline. It is not a history or non-history problem but a percolation problem. Physicists working on percolation have to come up with a solution and then we can make an educated gues of how much of the system we have.
Johannes Preiser-Kapeller: let me give you another example. When I showed my work to Stephan Thurner in Vienna, who worked on a massive dataset of 300.000 individuals interacting through a computer game, he said my dataset of only 200 aristocrats is not enough. If you do not have at least 1000 individuals you cannot identify any mechanism, you need statistical significance. So this is a limit imposed on historical disciplines in applying interesting mechanisms identified in complex real-world networks.
After the discussion we still had the pleasure to listen to Leif Isaksen talking about ‘Lines, Damned Lines and Statistics: Unearthing Structure in Ptolemy’s Geographia’. Sadly my tape recorder died at this point, so here is Leif’s abstract rather than a review.
Ever since the rediscovery of Ptolemy’s Geographia in 1295, scholars have noted that it is troublingly inconsistent both internally and with the environment in which it was supposedly compiled. The problem for analysts to overcome is that the catalogue has been corrupted, amended and embellished throughout its history. It is therefore imperative to find more robust means to look for structural trends. Recent publications of the theoretical chapters and a digital catalogue of coordinates provide a variety of new possibilities. We are not alone in advocating computational procedures but will discuss two techniques that do not appear to have been considered in the literature so far and the conclusions they appear to give rise to.
First, statistical analysis of the coordinates assigned to localities demonstrates clearly that ostensible precision (whether to the nearest 1/12, 1/6, 1/4, 1/3 or 1/2 degree) varies considerably by region and feature type and is locally heterogeneous. In other words, the composite nature of the data cannot only be confirmed, but we can build a clearer picture of how the sources varied by area. Secondly, while many studies have addressed either the point data or the finished maps, simple linear interpolation between coordinates following the catalogue provides a unique insight into the ‘invisible hand’ of the author(s). The unmistakable stylistic families that emerge, and the occasionally arbitrary limits imposed on them, provide further important evidence about the catalogue’s internal structure.