Document Type




Degree Granting Department

Mathematics and Statistics

Major Professor

Nataa Jonoska

Co-Major Professor

Masahico Saito


Assembly Graph, Homologous Recombination, Mapper, Topological Data Analysis


Homologous DNA recombination and rearrangement has been modeled with a class of four-regular rigid vertex graphs called assembly graphs which can also be represented by double occurrence words. Various invariants have been suggested for these graphs, some based on the structure of the graphs, and some biologically motivated.

In this thesis we use a novel method of data analysis based on a technique known as partial-clustering analysis and an algorithm known as Mapper to examine the relationships between these invariants. We introduce some of the basic machinery of topological data analysis, including the construction of simplicial complexes on a data set, clustering analysis, and the workings of the Mapper algorithm. We define assembly graphs and three specific invariants of these graphs: assembly number, nesting index, and genus range. We apply Mapper to the set of all assembly graphs up to 6 vertices and compare relationships between these three properties. We make several observations based upon the results of the analysis we obtained. We conclude with some suggestions for further research based upon our findings.