Designing and evaluating hybrid storage for high performance cloud computing
ABSTRACT
The need for reliable and fast storage systems is increasingly critical in various fields including artificial intelligence and data analytics. This paper proposes a new architecture for large-scale data storage systems, focusing on comparing performance of software and hardware storage technologies that effectively reduce the computational latency and improve performance. The main contributions include: the combination of Singled Magnetic Recording (SMR) for storing data and Solid State Devices (SSD) for storing metadata is a viable solution for implementing large data storage systems; and ( the combination of Conventional Magnetic Recording (CMR) for storing data and SSD for storing metadata shows the highest performance for high performance computing. Our experiments are carried out in multiple settings, demonstrating that the proposed architecture successfully improves performance for sequential and random reads/writes. The prototypes are evaluated with a set of workloads, showing the superiority of the proposed data storage configurations. This work provides new opportunities for efficiently processing and storing data and metadata in large-scale data analysis systems.
EXISTING SYSTEM :
Cloud Computing is an ever-growing paradigm shift in computing allowing user’s commodity access to compute and storage services. As such cloud computing is an emerging promising approach for High Performance Computing (HPC) application development. Automation of resource provision offered by Cloud computing facilitates the eScience programmer usage of computing and storage resources. Currently, there are many commercial services for compute, storage, network and many others from big name companies. However, these services typically do not have performance guarantees associated with them. This results in unexpected performance degradation of user’s applications that can be somewhat random to the user. In order to overcome this, a user must be well versed in the tools and technologies that drive Cloud Computing. One of the state of the art cloud systems, is a cloud system that provides bare metal server instances on demand. Unlike traditional cloud servers, bare metal cloud servers are free from virtualization overheads, and thus promise to be more suitable for HPC applications.
PROPOSED SYSTEM :
In our proposed data storage architecture, keys or metadata, which tend to be small objects, are stored in the shared SSD to provide high input/output operations per second (IOPS); on the other hand, values or data, which tend to be larger files, are stored in high capacity CMR or SMR drives. In this paper, two storage architectures are proposed for large-scale data storage systems with high input and output performance. They are implemented using the Ceph file system [14] on storage nodes composed of different combinations of CMR, SMR, and SSD. The evaluation results ultimately show the superiority of the proposed data storage configurations.
CONCLUSIONS :
In this paper, we propose and evaluate cloud-based big data storage architectures working with the Ceph file system. Selecting a particular Ceph server architecture should be based on the tradeoff between speed (IOPS) and drive density (Tb/sq.). The results show that the combination of CMR + Metadata on SSD gives us the optimum read/write performance, but these CMR drives have very low drive density. This CMR + Metadata on SSD architecture can be best utilized when a user needs high response times but not high storage density. The combination of SMR + Metadata on SSD gives us very high drive density though this system has moderately lower sequential read speed (IOPS) compared to the CMR + Metadata on SSD server. SMR + Metadata on SSD architecture can be best utilized when a user needs high storage capacity for big data applications. The performance of Swift gradually increases as the object size increases, leading to more data being read or written. Swift can also send a larger amount of data with higher object sizes and fewer input/output operations. This implies that Swift can be used to handle huge data sets for big data analytics