Articles

HaRD: a Heterogeneity-aware Replica Deletion for HDFS

Hilmi Egemen Ciritoglu, University College Dublin
John Murphy, University College Dublin
Christina Thorpe, Technological University DublinFollow

Document Type

Article

Disciplines

Computer Sciences

Publication Details

Journal of Big Data

Abstract

The Hadoop distributed fle system (HDFS) is responsible for storing very large datasets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-of between better data availability and higher disk usage. Recent studies propose diferent data replication management frameworks that alter the replication factor of fles dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we frst confrm that reducing the replication factor causes unbalanced data distribution when using Hadoop’s default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneityaware replica deletion scheme (HaRD). HaRD considers the nodes’ processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively

DOI

https://doi.org/10.1186/s40537-019-0256-6

Recommended Citation

Ciritoglu, H., Murphy, J. & Thorpe, C. (2019). HaRD: a heterogeneity‑aware replica deletion for HDFS. Journal of Big Data6(94). doi:10.1186/s40537-019-0256-6

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download

Included in

Computer Sciences Commons

COinS

Articles

HaRD: a Heterogeneity-aware Replica Deletion for HDFS

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Articles

HaRD: a Heterogeneity-aware Replica Deletion for HDFS

Authors

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links