Doctor of Philosophy (Ph.D.)
Degree Granting Department
Computer Science and Engineering
Yicheng Tu, Ph.D.
Sriram Chellappan, Ph.D.
Xinming Ou, Ph.D.
Mingyang Li, Ph.D.
Ming Ji, Ph.D.
Batch processing, Query optimization, Equivalence rules, Batch partitioning
Techniques based on sharing data and computation among queries have been an active research topic in database systems. While work in this area developed algorithms and systems that are shown to be effective, there is a lack of logical foundation for query processing and optimization. In this paper, we present PsiDB, a system model for processing a large number of database queries in a batch. The key idea is to generate a single query expression that returns a global relation containing all the data needed for individual queries. For that, we propose the use of a type of relational operators called ψ-operators in combining the individual queries into the global expression. We tackle the algebraic optimization problem in PsiDB by developing equivalence rules to transform concurrent queries with the purpose of revealing query optimization opportunities. Centering around the ψ-operator, our rules not only covered many optimization techniques adopted in existing batch processing systems, but also revealed new optimization opportunities. Experiments conducted on an early prototype of PsiDB show a performance improvement of up to 50X over a mainstream commercial RDBMS.
Following the development of the system and showing how effective they are in handling large number of queries, there is a lack of rigorous modeling and theoretical study for query batching. Query batching in database systems has strong resemblance to order batching in the warehousing context. While the latter is a well-studied problem, the literature on optimization techniques for query batching problem is quite scarce/nonexistent. In following of PsiDB framework, we develop a Mixed Binary Quadratic Program (MBQP) to optimize query-batching in a database system. This model uses the coefficients of a linear regression on the query retrieval time, trained by a large
set of randomly generated query sets. We also propose two heuristics, for solving the proposed MBQP. To demonstrate the effectiveness of our proposed techniques, we conduct a comprehensive computational study over two well-known database benchmarks. The computational results show that when the proposed MBQP is solved using the designed heuristics, an improvement of up to 45% in retrieving time over a single batch is achieved.
Scholar Commons Citation
Eslami, Mehrad, "PsiDB: A Framework for Batched Query Processing and Optimization" (2020). Graduate Theses and Dissertations.