Comparison of Cluster File Systems

Csaba Gere

research associate

Department of Internet Technologies and Applications

MTA-SZTAKI

H-1132 Budapest, Victor Hugo u. 18-22.

telefon: (+36 1) 2796027

email: gcsaba@sztaki.hu

Péter Stefán

research associate

Supercomputing Centre

NIIFI

H-1132 Budapest, Victor Hugo u. 18-22.

telefon: (+36 1) 4503076

email: stefan@niif.hu

Abstract

Implementing Internet services into fault-tolerant environments has been getting more and more popularity nowadays. The goal of implementing fault-tolerant services is to let the availability of the specific service above a well-defined threshold value, such as 99.9% by installing it on a redundant, distributed system architecture which enable services to run at reduced performance even under extreme conditions such as hardware or software failure. There are numerous ways and levels of implementing fault tolerant services. The undermost level of this is to use fault-tolerant cluster file systems.

Cluster file systems were developed on the grounds of Network File System (NFS), since the continuously growing demand revealed its many shortcomings, and required more new features.

The key requirements that a cluster file system should meet is as follows: fault tolerant behavior (handling distributed data and failover), load leveling, utilization of high network bandwidth, scalability and effective resource utilization (addition/removal of disk areas, merging, striping and mirroring).

Cluster file systems can be used in three fundamental ways: as a local file system, such as the ordinary Unix File Sytem (UFS) or Reiser File Sytem (ReiserFS); as a network file system with improved NFS functionality; as an integrated Storage Area Network (SAN).

Using cluster file systems as a local file system is very rare, since most of their relevant features cannot be, or can be restrictedly utilized, furthermore, at reduced performance.

Using cluster file systems as network file systems means that there are specific machines in the distributed environment which exclusively access the storage area, and provide file service to the other machines via common network (e.g. an Internet Protocol network). In this layout there is a master server, which provides the file service under normal operation, and several backup servers which can take over the role of the master if it fails transparent to the clients. The important feature of such setup is the appropriate "failover" and "failback" of the file service.

In the case of SAN architecture all nodes of the cluster access the same disk area via a high-speed Fiber Channel Arbitrated Loop (FC-AL) or a Fiber Channel Switch (FC-SW). In this setup, the task of the file system is to provide efficient read and write locking mechanisms to let the large throughput effectively utilized.

In our presentation we are going to describe our experiences on the EMC Storage installed at NIIFI how fault-tolerant file systems can be created. We are going to provide configuration examples and performance analysis results on two cluster file systems: Sistina's GFS and IBM's GPFS.