Graduation Year

2020

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yicheng Tu, Ph.D.

Committee Member

Sriram Chellappan, Ph.D.

Committee Member

Xinming Ou, Ph.D.

Committee Member

Mingyang Li, Ph.D.

Committee Member

Ming Ji, Ph.D.

Keywords

Batch processing, Query optimization, Equivalence rules, Batch partitioning

Abstract

Techniques based on sharing data and computation among queries have been an active research topic in database systems. While work in this area developed algorithms and systems that are shown to be effective, there is a lack of logical foundation for query processing and optimization. In this paper, we present PsiDB, a system model for processing a large number of database queries in a batch. The key idea is to generate a single query expression that returns a global relation containing all the data needed for individual queries. For that, we propose the use of a type of relational operators called ψ-operators in combining the individual queries into the global expression. We tackle the algebraic optimization problem in PsiDB by developing equivalence rules to transform concurrent queries with the purpose of revealing query optimization opportunities. Centering around the ψ-operator, our rules not only covered many optimization techniques adopted in existing batch processing systems, but also revealed new optimization opportunities. Experiments conducted on an early prototype of PsiDB show a performance improvement of up to 50X over a mainstream commercial RDBMS.

Following the development of the system and showing how effective they are in handling large number of queries, there is a lack of rigorous modeling and theoretical study for query batching. Query batching in database systems has strong resemblance to order batching in the warehousing context. While the latter is a well-studied problem, the literature on optimization techniques for query batching problem is quite scarce/nonexistent. In following of PsiDB framework, we develop a Mixed Binary Quadratic Program (MBQP) to optimize query-batching in a database system. This model uses the coefficients of a linear regression on the query retrieval time, trained by a large

set of randomly generated query sets. We also propose two heuristics, for solving the proposed MBQP. To demonstrate the effectiveness of our proposed techniques, we conduct a comprehensive computational study over two well-known database benchmarks. The computational results show that when the proposed MBQP is solved using the designed heuristics, an improvement of up to 45% in retrieving time over a single batch is achieved.

Share

COinS