Graduation Year

2007

Document Type

Thesis

Degree

M.S.C.S.

Degree Granting Department

Computer Science

Major Professor

Adriana Iamnitchi, Ph.D.

Committee Member

Gabriele Garzoglio, Ph.D.

Committee Member

Ken Christensen, Ph.D.

Keywords

Caching, Data management, File grouping, Grid, Scientific computing, Workload characterization

Abstract

Grids provide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reached the stage where deployments are run in production mode. In the most active Grid community, the scientific community, jobs are data and compute intensive. Scientific Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models and hence reevaluate traditional resource management techniques.

In this thesis, we study usage patterns from a large-scale scientificGrid collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. We perform a detailed workload characterization which led us to propose a new data abstraction, filecule, that groups correlated files. We characterize filecules and show that they are an appropriate data granularity for resource management.

In scientific applications, job scheduling and data staging are tightly coupled. The only algorithm previously proposed for this class of applications, Greedy Request Value (GRV), uses a function that assigns a relative value to a job. We wrote a cache simulator that uses the same technique of combining cache replacement with job reordering to evaluate and compare quantitatively a set of alternative solutions. These solutions are combinations of Least Recently Used (LRU) and GRV from the cache replacement space with First-Come First-Served (FCFS) and the GRV-specific job reordering from the scheduling space. Using real workload from the DZero Experiment at Fermi National Accelerator Laboratory, we measure and compare performance based on byte hit rate, cache change, job waiting time, job waiting queue length, and scheduling overhead.

Based on our experimental investigations, we propose a new technique that combines LRU for cache replacement and job scheduling based onthe relative request value. This technique incurs less data transfer costs than the GRV algorithm and shorter job processing delays than FCFS. We also propose using filecules for data management to further improve the results obtained from the above LRU and GRV combination.

We show that filecules can be identified in practical situations and demonstrate how the accuracy of filecule identification influences caching performance.

Share

COinS