Graduation Year

2020

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Nataša Jonoska, Ph.D.

Co-Major Professor

Masahico Saito, Ph.D.

Committee Member

Brendan T. Nagle, Ph.D.

Committee Member

Margaret A. Park, Ph.D.

Committee Member

Dmytro Savchuk, Ph.D.

Keywords

Ciliate genomics, Sequence alignment algorithms, Edge labellings, Ordered graphs

Abstract

In this work, language and tools are introduced, which model many-to-many mappings that comprise DNA rearrangements in nature. Existing theoretical models and data processing methods depend on the premise that DNA segments in the rearrangement precursor are in a clear one-to-one correspondence with their destinations in the recombined product. However, ambiguities in the rearrangement maps obtained from the ciliate species Oxytricha trifallax violate this assumption demonstrating a necessity for the adaptation of theory and practice.

In order to take into account the ambiguities in the rearrangement maps, generalizations of existing recombination models are proposed. Edges in an ordered graph model the relative positions of precursor DNA segments and their labels indicate the orientations and destinations in the product genome. Properties of these structures are introduced with the intention to narrow down the space of possible rearrangements that can be described by the model to include only the types of complexities that appear in nature. The various subspaces of rearrangements defined by these properties are explored via a series of combinatorial counting results. The new model is applied to sequencing data of O. trifallax to assess the extent to which these properties describe the rearrangements this organism undergoes.

To reduce the filtering of data, an algorithm which annotates the rearranging segments in precursor and product genomes without discarding ambiguities is presented. Furthermore, a generalization of the notion of scrambling that can be applied to such ambiguous rearrangement maps is defined. An algorithm that detects the generalized scrambling property in ambiguous rearrangement maps is also presented. Next, a computational tool implementing the two algorithms is introduced and tested. The annotation algorithm involves a step by which gapped sequence alignments are obtained from ungapped sequence alignments in an efficient and controlled manner. This method of combining ungapped alignments gives rise to another algorithm that can be applied to the more general problem of efficiently detecting gapped sequence alignments, which is implemented and tested in this work.

Share

COinS