Doctor of Philosophy (Ph.D.)
Degree Granting Department
Geography, Environment and Planning
Joni Downs (Firat), Ph.D.
Steven Reader, Ph.D.
Lori Collins, Ph.D.
Somayeh Dodge, Ph.D.
twitter, health, influenza, hierarchical clustering, time geography
The spread of infectious diseases can be described in terms of three interrelated components: interaction, movement, and scale. Transmission between individuals requires some form of interaction, which is dependent on the pathogen, to occur. Diseases spread through the movement of their hosts; they spread across many spatial scales from local neighborhoods to countries, or temporal scales from days to years, or periodic intervals. Prior research into the spread of disease have examined diffusion processes retrospectively at regional or country levels, or developed differential equation or simulation models of the dynamics of disease transmission. While some of the more recent models incorporate all three components, they are limited in the way they understand where interactions occur. The focus has been on home or work, including contact with family or coworkers. The models reflect a lack of knowledge about how transmissions are made at specific locations in time, so-called nodes of transmission. That is, how individuals’ intersections in time and space function in disease transmission.
This project sought to use the three factors of interaction, movement, and scale to better understand the spread of disease in terms of the place of interaction called the node of transmission. The overarching objective of this research was: how can nodes of transmission be identified through individual activity spaces incorporating the three factors of infectious disease spread: interaction, movement, and scale?
This objective fed into three main sub-objectives: defining nodes of transmission, developing an appropriate methodology to identifying nodes of transmission, and applying it using geotagged social media data from Twitter. To develop an appropriate framework, this research relied on time geography, and traditional disease. This particularly relied on the idea of bundling to create the nodes, and a nesting effect that integrated scale.
The data source used to identify nodes of transmission was collected from Twitter for the Los Angeles County, USA, area from October 2015 to February 2016. Automated text classification was used to identify messages where users self-reported an influenza-like-illness. Different groupings were created that combined both the syndrome and the symptoms of influenza, and applied to the automated classification. The use of Twitter for small-area health analysis was evaluated along with different text classification methodologies.
A space-time hierarchical clustering technique was adapted to be applied towards the twitter data in both identifying nodes of transmission and identifying spatiotemporal contact networks. This clustering data was applied to the classified Twitter data to look at where interaction between the classified users were occurring. This pointed to six nodes that were typically densely populated areas that saw the merging of large groups of people in Los Angeles (e.g. Disneyland and Hollywood Boulevard).The movement of these individuals were also examined by using a edit distance to compare their visits to different clusters and nodes.
Scholar Commons Citation
Lamb, David Sebastian, "Identifying Nodes of Transmission in Disease Diffusion Through Social Media" (2017). Graduate Theses and Dissertations.