Graduation Year


Document Type




Degree Granting Department

Mathematics and Statistics

Major Professor

Natasha Jonoska, Ph.D.

Co-Major Professor

Masahiko Saito, PhD


Assembly graph, Assembly number, Assembly word, DNA recombination model, Polygonal path


Motivated by genome rearrangements that take place in some species of ciliates we introduce a combinatorial model for these processes based on spatial graphs. This model builds up on two earlier models for pointer-guided DNA recombination (intramolecular model and intermolecular model) and is influenced by a molecular model for RNA guided DNA recombination. Despite their differences, the intermolecular and intramolecular model formalize the recombination events through rewriting operations applied on formal words. Both models predict the same set of molecules as a result of correct rearrangement. Here, we give an algorithm that for an input of scrambled gene structure outputs a set of strings which represents the expected set of molecules after complete assembly.

Moreover, we prove that both the set of all realistic words (words that model a possible gene structure) and the set of all nonrealistic words are closed under the rewriting operations in the intramolecular model. We investigate spatial graphs that consist of 4-valent rigid vertices, called assembly graphs. An assembly graph can be seen as a representation of DNA molecule during certain recombination processes, in which 4-valent vertices represent molecular alignment of the recombination sites. We introduce a notion of polygonal path in assembly graph as a model for a single gene. Polygonal paths are defined as paths that make "90° turn'' at each vertex of the assembly graph and define smoothing of the vertices visited by the paths. Such vertex smoothing models a homologous DNA recombination. We investigate the minimal number of polygonal paths that visit all vertices of a given graph exactly once, called assembly number.

We prove that for every positive integer n there is assembly graph with assembly number n. We also study the relationship between the number of vertices in assembly graph and its assembly number. One of the results is that every assembly graph with assembly number n has at least 3n-2 vertices. In addition, we show that there is an embedding in three dimensional space of each assembly graph with a given set of polygonal paths, such that smoothing of vertices with respect to the polygonal paths results in unlinked circles. We study the recombination strategies by subsets of vertices. Such a subset is called a successful set if smoothing of all vertices from the set with respect to a polygonal path results in a graph that contains the polygonal path in a single component. We characterize the successful sets in a given assembly graph by a notion of complementary polygonal path.

Furthermore, we define a smoothing strategy in assembly graph relative to a polygonal path as a sequence of successful sets which model a successive DNA recombinations for correct gene assembly. Recent experimental results suggest that there might be different pathways for unscrambling a gene. These results lead to a mathematical model for gene recombination that builds upon the intermolecular model. We introduce assembly words as a formalization of a set of linear and circular DNA molecules. Assembly words are partially ordered, so that any linearly ordered subset models a pathway for gene rearrangement. We suggest two different pathways for unscrambling of the actin I gene in O.Trifallax and we prove that they are the only theoretically possible pathways.