Graduation Year

2018

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yi-Cheng Tu, Ph.D.

Committee Member

Ken Christensen, Ph.D.

Committee Member

Adriana Iamnitchi, Ph.D.

Committee Member

Bo Zeng, Ph.D.

Committee Member

Ming Ji, Ph.D.

Keywords

Database Management System (DBMS), Data Consolidation, Data Stream Management System (DSMS), Dynamic Power Management (DPM), Stream (Window) Join

Abstract

Energy consumption has become a first-class optimization goal in design and implementation of data-intensive computing systems. This is particularly true in the design of database management systems (DBMS), which is one of the most important servers in software stack of modern data centers. Data storage system is one of the essential components of database and has been under many research efforts aiming at reducing its energy consumption. In previous work, dynamic power management (DPM) techniques that make real-time decisions to transition the disks to low-power modes are normally used to save energy in storage systems. In this research, we tackle the limitations of DPM proposals in previous contributions and design a dynamic energy-aware disk storage system in database servers. We introduce a DPM optimization model integrated with model predictive control (MPC) strategy to minimize power consumption of the disk-based storage system while satisfying given performance requirements. It dynamically determines the state of disks and plans for inter-disk data fragment migration to achieve desirable balance between power consumption and query response time. Furthermore, via analyzing our optimization model to identify structural properties of optimal solutions, a fast-solution heuristic DPM algorithm is proposed that can be integrated in large-scale disk storage systems, where finding the most optimal solution might be long, to achieve near-optimal power saving solution within short periods of computational time. The proposed ideas are evaluated through running simulations using extensive set of synthetic workloads. The results show that our solution achieves up to 1.65 times more energy saving while providing up to 1.67 times shorter response time compared to the best existing algorithm in literature.

Stream join is a dynamic and expensive database operation that performs join operation in real-time fashion on continuous data streams. Stream joins, also known as window joins, impose high computational time and potentially higher energy consumption compared to other database operations, and thus we also tackle energy-efficiency of stream join processing in this research. Given that there is a strong linear correlation between energy-efficiency and performance of in-memory parallel join algorithms in database servers, we study parallelization of stream join algorithms on multicore processors to achieve energy efficiency and high performance. Equi-join is the most frequent type of join in query workloads and symmetric hash join (SHJ) algorithm is the most effective algorithm to evaluate equi-joins in data streams. To best of our knowledge, we are the first to propose a shared-memory parallel symmetric hash join algorithm on multi-core CPUs. Furthermore, we introduce a novel parallel hash-based stream join algorithm called chunk-based pairing hash join that aims at elevating data throughput and scalability. We also tackle parallel processing of multi-way stream joins where there are more than two input data streams involved in the join operation. To best of our knowledge, we are also the first to propose an in-memory parallel multi-way hash-based stream join on multicore processors. Experimental evaluation on our proposed parallel algorithms demonstrates high throughput, significant scalability, and low latency while reducing the energy consumption. Our parallel symmetric hash join and chunk-based pairing hash join achieve up to 11 times and 12.5 times more throughput, respectively, compared to that of state-of-the-art parallel stream join algorithm. Also, these two algorithms provide up to around 22 times and 24.5 times more throughput, respectively, compared to that of non-parallel (sequential) stream join computation where there is one processing thread.

Share

COinS