Graduation Year

2007

Document Type

Thesis

Degree

M.S.C.S.

Degree Granting Department

Computer Science

Major Professor

Adriana Iamnitchi, Ph.D.

Committee Member

Gabriele Garzoglio, Ph.D.

Committee Member

Ken Christensen, Ph.D.

Keywords

Caching, Data management, File grouping, Grid, Scientific computing, Workload characterization

Abstract

Grids provide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reached the stage where deployments are run in production mode. In the most active Grid community, the scientific community, jobs are data and compute intensive. Scientific Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models and hence reevaluate traditional resource management techniques.

In this thesis, we study usage patterns from a large-scale scientificGrid collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. We perform a detailed workload characterization which led us to propose a new data abstraction, filecule, that groups correlated files. We characterize filecules and show that they are an appropriate data granularity for resource management.

In scientific applications, job scheduling and data staging are tightly coupled. The only algorithm previously proposed for this class of applications, Greedy Request Value (GRV), uses a function that assigns a relative value to a job. We wrote a cache simulator that uses the same technique of combining cache replacement with job reordering to evaluate and compare quantitatively a set of alternative solutions. These solutions are combinations of Least Recently Used (LRU) and GRV from the cache replacement space with First-Come First-Served (FCFS) and the GRV-specific job reordering from the scheduling space. Using real workload from the DZero Experiment at Fermi National Accelerator Laboratory, we measure and compare performance based on byte hit rate, cache change, job waiting time, job waiting queue length, and scheduling overhead.

Based on our experimental investigations, we propose a new technique that combines LRU for cache replacement and job scheduling based onthe relative request value. This technique incurs less data transfer costs than the GRV algorithm and shorter job processing delays than FCFS. We also propose using filecules for data management to further improve the results obtained from the above LRU and GRV combination.

We show that filecules can be identified in practical situations and demonstrate how the accuracy of filecule identification influences caching performance.

Scholar Commons Citation

Doraimani, Shyamala, "Filecules: A New Granularity for Resource Management in Grids" (2007). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/697

Download

Included in

American Studies Commons

COinS

USF Tampa Graduate Theses and Dissertations

Filecules: A New Granularity for Resource Management in Grids

Graduation Year

Document Type

Degree

Degree Granting Department

Major Professor

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Filecules: A New Granularity for Resource Management in Grids

Author

Graduation Year

Document Type

Degree

Degree Granting Department

Major Professor

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links