Graduation Year

2015

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Computer Engineering

Degree Granting Department

Computer Science and Engineering

Major Professor

Yi-Cheng Tu, Ph.D.

Committee Member

Sagar Pandit, Ph.D.

Committee Member

Yao Liu, Ph.D.

Committee Member

Michael Weng, Ph.D.

Committee Member

Wen-Xiu Ma, Ph.D.

Keywords

Big Data, Molecular Simulations, Push-Based, SDH, Streaming

Abstract

Thanks to the advancement of the modern computer simulation systems, many scientific applications generate, and require manipulation of large volumes of data. Scientific exploration substantially relies on effective and accurate data analysis. The shear size of the generated data, however, imposes big challenges in the process of analyzing the system. In this dissertation we propose novel techniques as well as using some known designs in a novel way in order to improve scientific data analysis.

We develop an efficient method to compute an analytical query called spatial distance histogram (SDH). Special heuristics are exploited to process SDH efficiently and accurately. We further develop a mathematical model to analyze the mechanism leading to errors. This gives rise to a new approximate algorithm with improved time/accuracy tradeoff.

Known MS analysis systems follow a pull-based design, where the executed queries mandate the data needed on their part. Such a design introduces redundant and high I/O traffic as well as cpu/data latency. To remedy such issues, we design and implement a push-based system, which uses a sequential scan-based I/O framework that pushes the loaded data to a number of pre-programmed queries.

The efficiency of the proposed system as well as the approximate SDH algorithms is backed by the results of extensive experiments on MS generated data.

Share

COinS