Abstract Grid computing is attracting more and more interest academia, industry and the wider public. Several ambitious projects are underway to create ever-larger systems to create a wide-area computing platform with vast performance. In our paper we argue that a new, service-oriented viewpoint is needed for future grid systems and propose that they should be build on top of a Java/Jini grid middleware. The paper describes our arguments for this model and describes the main requirements of such a system. Finally, various design issues are discussed in the hope of leading to the design of better quality grid systems.

1. Introduction

The past few years have been generous for grid system researchers. The media is full of press releases about the importance of the grid, and the launch of new grid applications and systems. The big players of the industry, Microsoft, IBM, Sun, and HP are all lined up behind the grid as the next "big thing" in the history of computing. Daily newspapers write about the new world where computing is a common utility like tap water, gas or electricity.

It is not that all easy though to live up to expectations. We have learnt that hype does not help research. The history of artificial intelligence research should warn us that hype and early promises without timely delivery of results could lead to enormous disappointment and result in serious cutback in funding, besides losing the reputation as a profession.

The early years of grid development (mid to late 90s) can be characterised as the era of competing grid development efforts. Several important systems (then not yet called grid) appeared mainly from US research labs. Most notable representatives are Legion [1], Globus [2], Javelin [3], and NetSolve [4]. By the end of the century, Globus has emerged - not necessarily due to its technical merit - as a de facto grid system standard. This is mainly due to the fact that this was the system that had built on existing tools and components, and consequently required the least training and modification from existing high-performance sites and their users.

During the past five years, other developments in the IT sector became apparent and dominant. Electronic commerce has become a commonplace. E-stores and e-banks operate in many countries of the world. Companies perform an enormous number of B2B business transactions routinely every day. With the developments in mobile technology and wireless networking, computer-class mobile phones and PDAs are being used as Internet terminals.

With these developments, the Internet world could be described as hundreds of millions of ordinary users on one side relying on web servers and browser-based interaction with limited amount of resources in use at a time, and on the other side a relatively small number of scientific users demanding an infinite amount of resources for number crunching. To close the gap between the two user communities and used technologies, a service-oriented grid architecture, the Open Grid Service Architecture (OGSA) [5] has been proposed as the successor of the Globus system. This has been a welcome paradigm shift, but as the authors of this paper believe, not a substantial enough shift.

In this paper we contrast current grid system development recommendations with ones that we could expect from future ones and show that the creation of grid systems that appeal to the widest possible user base requires a more radical approach, i.e. to design systems with the user in mind and learn from other large distributed system development efforts. We propose that a grid infrastructure based on Java and Jini technologies is more suited to the creation of large-scale grid systems.

In the next section of the paper, we overview the current requirements for grid systems and show why they are not fully adequate for widely usable grids. We also explain our interpretation of the word 'grid' and show how Jini can provide improved interaction and work experience for users. In Section 3 we introduce our proposed Service Grid approach and systematically outline and discuss the requirements and design issues of such a system. In Section 4 we outline the major design issues that should be resolved for successful grid systems. We also discuss how Jini technology and good software engineering can help in the practical design and implementation of such systems.

2. Grid requirements

The Globus system has defined its requirements with scientific (computational science) users in mind. The aim of the system has been to be able to connect remote computing, storage, visualisation facilities and specialised instruments, creating a virtual supercomputer, to carry out computationally intensive experiments, and to process and store scientific data. Initially, the intended participant sites have been supercomputing centres, later they have been extended with high-end clusters of Unix workstations. As a result, the Globus toolkit system has the following main components [2].

ˇ Resource management (GRAM) - This module provides resource allocation and process management.

ˇ Communication (Nexus/MPI) - Communication subsystem, earlier using Nexus, now relying on the MPI standard.

ˇ Security (GSI) - Module performing authentication and related security services.

ˇ Information (MDS) - Global resource information module providing distributed access to structure and state information.

ˇ Health and status (HBM) - Module to monitor the health and status of system components.

ˇ Remote data access (GASS) - Remote access to data via sequential and parallel interfaces.

ˇ Executable management (GEM) - Management module providing support for the construction, caching, and location of executables.

These components are implemented over the following collection of fundamental modules: module activation/deactivation, portable thread library (POSIX subset), thread-safe and portable libc wrappers, timed and periodic callbacks, data object and error object management, modules to manipulate lists, FIFOs, URLs [6]. The complete system consists of approximately 12 MBytes of source code and 23 MBytes of configuration scripts [6]. One of the greatest problem with the Globus approach is that it is based on the C language that (i) does not support the development of distributed system, and (ii) offers very limited code reuse feature in the form of shared libraries. Consequently, the Globus development team had to develop support routines for many system-level problems for all possible target platforms, e.g. remote process creation, thread support, file access, communication, security, etc. The large number of platforms and the size of the effort bring about serious software management issues. As the platform base expands (remember mobile devices!) version control and maintenance might become a crucial problem.

To help in the standardisation of grid systems, an independent international forum - the Global Grid Forum (GGF) - has been established bringing together grid researchers from all over the world. GGF operates as a collection of Working Groups, each concentrating on particular areas of grid research, e.g. security, scheduling, etc.

The Grid Protocol Architecture Working Group has its role to define the functionality of grid systems. In a recent proposal they define the fundamental modules (core functionality) of grid systems as follows [7]:

ˇ persistent state, registry, and resource discovery

ˇ resource scheduling

ˇ uniform computing access

- runtime / hosting environment for Unix style process interaction

- runtime / hosting environment for OGSI style process interaction

- means to establish application runtime environments

ˇ uniform data access

ˇ asynchronous information sources (events and monitoring)

ˇ authentication, delegation, and secure communication

ˇ authorization

ˇ identity certificate management

ˇ system management and access

This list demonstrates that the primary focus of grid systems is resource access and usage, i.e. running C/Fortran/etc. programs on remote computers ideally providing at the same time fair utilisation of these devices. The key elements are resource discovery, scheduling, uniform computing and data access, and security.

We argue that it is the system administrators who worry about resource access and utilisation. End users - scientific or other - just want to perform a task (get the job done) as simply and quickly as possible. Coupled with this is the fact that millions of Internet (and potential grid) users do not want to perform number crunching applications, which imply that a different, higher-level grid abstraction is needed. This is service-orientation that is the central motif of Jini technology [8]. This concept has also emerged within the Open Grid Service Architecture initiative.

Service-orientation moves the focus from device (resource) to service (application) level and eventually concentrates on what an entity provides to other components in a system. As an end user, one can concentrate on the functionality (e.g. printing, paying for a book, etc.) not on which printer is available and how to send the letter to it. Once the grid is a collection of services instead of resources (but of course services can represent resources too), it is immediately open to millions of users, who can move from text-oriented web browsers to the more direct interaction with services. Now, the grid is not just computing and data any more, but a software infrastructure providing universal access to services to anyone, anytime, anywhere. This, however, introduces new requirements, which we explore in the next section.

3. Requirements of a Service Grid

To establish requirements, it is assumed that future grids will be dynamic collections of services. The grid is not "only" a computational platform, but a service delivery architecture. In this type of grid, anything can be a service: a bank for personal account management, video-on-demand providers, a supercomputer, a meeting place, or your favourite corner shop.

We envisage the following sample grid usage scenario (in contrast to a traditional remote shell/script oriented one).

On the way to work, John would like to listen Bach's cello suite. He has left the CD at home, so uses his PDA in the car to ask his grid broker to find music service that can play Suite No 1. Within seconds the music starts and John forgets about the traffic jam he is in. At work, John leads a meteorological research project that involves the use of many remote sensors and demanding computations. He instructs the grid broker from his workstation to start collecting data and using computational resources from different parts of the world, he specified in his preferences, perform the experiment and send data to the visualisation room. In the afternoon he holds a meeting where he retrieves the stored data from the grid using the grid broker and displays it on the active board. After a long day at work he decides to have a dinner with his wife and friends. The grid broker (based on John's preferences) quickly locates a new French restaurant and John, using his PDA again, books a table for four.

One could question whether this is a grid or not. We believe that service-based systems should be viewed as grids, since a data source Ž processing Ž visualisation computational grid pipeline is similar to a media database Ž processing Ž video display one. Any service-based system would just as well perform processing, store and retrieve data, use specific application-dependent hardware elements, and coordinate and share resources in order to perform its task.

Choosing a user-centred design approach, we concentrate on usability requirements first. From the sample scenario the following usability requirements can be derived.

- Universal access - Users should be able to access grid services from any type of device (from supercomputers down to mobile phones). They should also be able to discover previously unknown services.

- No administration - Users should not install software in order to use a service. Every grid service should provide its user interface dynamically on an on-demand basis to the user taking both device (e.g. screen size, media capabilities, etc.) and user characteristics (e.g. type of disability, personal preferences) into consideration. The interface will be downloaded from the service.

- Helper services - Users should be able to use special helper (broker) services that off-load work from them and carries out tasks (e.g. finding music service, setting up connection between sensors and computers) on their behalf.

- Security - Users data, programs and execution activities should be protected from stealing, modification and unauthorised monitoring as well as from dishonest services.

- Performance - Users will require constant availability of the required services (providing they or a substitute service does exist) regardless of time and location.

From the service providers point of view these requirements translate to the followings:

- Universal access - Services should be accessible to users of any device having any disability. The behaviour of the service should not be altered by the particular device and interface used.

- No administration - Services should be able to change their implementation whenever they choose without causing disruption to end-users. Services should be able to join and leave the grid at their own will without causing disruption in the rest of the grid system.

- Helper services - Services should be able to cooperate with helper and other services to carry out tasks as a team. Service providers will need to offer a variety of payment schemes, hence be able to co-operate with third party payment service providers.

- Security - Services (their data, programs and execution activities) should be protected from stealing, modification and unauthorised monitoring, as well as from dishonest clients and their attacks.

- Performance - Services should operate providing maximum availability regardless of time and location.

The most fundamental differences in this model when compared to the resource-oriented grids are the followings. It is expected that services will be found and used by other programs unlike before where the system returned a list of resources for manual user selection. Services will be used on-demand with interfaces downloaded from service providers. Services will not be free; hence, online payment mechanisms are essential. Hardware as well service implementations will change over time. The system will have to live together with changes and survive changes as well as system failures.

We believe that these usability requirements should drive the functionality and specification of grid systems and that the Java programming language, due to its platform independence and facility for mobile code, can provide the necessary "wiring" system for service grids. The missing dynamic distributed features then can be added by Jini technology.

4. The Jini Service Grid

In a service oriented environment, users and service providers expect always-on and fault-resilient behaviour, full-blown security, powerful global service discovery, minimal administration and no software installation, device and user-adaptive user interfaces as well as personalisation. It is largely the responsibility of the service grid infrastructure/middleware to provide these features.

The authors have developed a Jini-based computational grid prototype system to field-test these ideas [9]. The structure of minimal system is shown in Figure 1. The system consists of Compute, Lookup and Broker services as well as clients. Compute services abstract out processors that are willing to execute programs for clients. The lookup service is a key component of any Jini community; it holds proxy objects of registered services and makes them available to clients. Broker services connect user requests to compute services managing resource selection and task execution on behalf of the client. In this system, service discovery is performed with the help of the lookup service. In large systems, where there are too many services to register with only one lookup service, a broker hierarchy can be used to provide wide-area search for services [10]. In this base system the client creates and executes Java programs by sending them to Compute Services via the Broker.

Figure 1. The minimal Jini Grid system.

This minimal research prototype system only has compute services and lacks many features that are essential in a production grid. However, it has already demonstrated its suitability by providing effective service discovery, failure management as well as ease of use and maintenance. The following subsystems (functionalities) are yet to be added to our system:

§ Flexible, dynamic and configurable security systems that can authenticate and authorise users, enforce data and program integrity, and provide trust management and delegation.

§ An accounting subsystem that logs all necessary system and user activities, service and resource usage that are stored in persistent records for billing and analysis.

§ Secure payment system offering a variety of payment options (free, per use, pre-paid, discount, package, etc.)

§ User management system enabling personalisation

§ Interoperability subsystem to allow it to work with other Java or non-Java based programs and grid systems

§ Ability to introduce intelligent agent based services that can self-optimise their behaviour, learn usage patterns in order to increase performance and/or profit.

These requirements determine the base functionality of every service in the system. Due to the object-oriented design we follow, it is a straightforward process to encapsulate these functionalities in a ready-to-use Grid service base class. Service developers, thus, can concentrate on the specific service logic and functionality.

4.1. Generic service architecture

Figure 2 illustrates the modules that constitute a generic grid service in our proposed system. The relationships and connections among modules have been omitted for clarity. It is evident, that most of these modules will be in use when a client requests a particular function of the specialised service derived from this base class.

Our emphasis is on software engineering issues at the language and system level. What language should be used for the implementation of this generic service architecture? Are all features required provided by the language? Any large system is destined to change during its lifetime. How can we design for change? What are the software models that enable fault-resilience, self-healing as well as dynamic behaviour? Are there successful patterns and/or systems we can use as examples of good design?

Figure 2. The modules of the generic Jini Grid service.

4.1.1 Language choice

The Java programming language has been used in our prototype and it has certainly proved to be a good choice. Java offers, first of all, a mature object-oriented programming language widely accepted in the software industry. Java has become especially strong in Web server systems, which shows its suitability for grid services too. Secondly, with the arrival of version 1.4, it now offers all the features one would need for large concurrent and distributed system. Support for threads, networking (unicast, multicast, HTTP, secure transmission), logging, a flexible and powerful security model and architecture, strong typing and support for interfaces, various device-specific implementations (J2ME/SE/EE), support for interoperability (Java Native Interface, XML, Web Service and SOAP support, database access), and range of application-specific packages for multimedia, graphical user interfaces, sound, speech and accessibility, etc. This provides a solid base for implementation.

4.1.2 Design for change

The most challenging design problem is to design a system for change. In any distributed system that is a collection of independently developed entities, it is inevitable that components will in the future change in their functionality and implementation, and at the same time the behaviour of the system is changing dynamically as components fail or are added to the system. The following techniques provide solutions for these problems.

The most important rule is to separate specification form implementation (also known as the separation of concerns). The system components need to be decoupled from implementation, so that recompilation in the system is not required if a component implementation changes. Among the mainstream languages, Java has the purest notion of the interface that specifies only what a component is responsible for instead of how it is done. Using interfaces throughout the system to specify components/services enable us to change implementation in runtime.

Hardware and software independence are also crucial for evolving systems. The Java platform shields hardware from the end users, providing a single "hardware image". Although in our system we rely on Java for setting up connections among clients and services, the proxy pattern of Jini guarantees communication and service implementation independence (i.e. any language can be used for implementation).

Soft state management and event notification is provided by the Jini remote event mechanism. Used together with the Lease mechanism, failed resources will be automatically removed from the system and the state change communicated to interested parties in the system, which lead to properly implemented graceful degradation and failure resilience properties.

Finally, services are discovered by their type (interface) and additional attributes (e.g. location) that is the best way to ensure the integrity of service description and semantics. Interfaces should be standardised and not changed, but even if a change is unavoidable, new versions of the interface can be created with inheritance. In this case old clients can use the old interface but new clients, however, can make use of the improved new interface as well.

4.1.3 Software Engineering

Creating grid systems are challenging. Global grid systems are candidates for becoming the largest and most complex pieces of software ever created and used. These require both sound software engineering methods and reliable building blocks.

The Java has demonstrated its reliability and robustness both as a programming language and a computing platform. Jini builds on these results to create an elegant abstraction for dynamic distributed systems. Software engineers can in turn rely on design patterns not to "reinvent the wheel", and use, build upon tried-and-tested software architectures. We would only highlight few of the patterns that can be used in grid systems:

§ The Proxy pattern to connect clients and services in an implementation and protocol dependent way.

§ The Adapter pattern to provide automatic translation among grid services and clients who do not necessarily understand each other's interfaces. This can be provided within the on-demand user interface of a service.

§ The Composite pattern to enable components and collections of components to be treated uniformly.

§ Various Factory patterns to enable object creation without specifying the concrete class or to defer object creation to subclasses.

§ The Façade pattern to provide new services based on interface composition.

§ The Mediator pattern to control the interaction of grid entities by third parties, e.g. using a broker between the client and the service.

§ The Observer pattern for monitoring service changes or any other interesting state modification.

Not only does the use of design patterns make the design of the grid easier, but also it makes the design accessible and understandable to outside programmers who will need to develop applications for this grid or perhaps happen to maintain it in the future.

In addition to patterns, the world of distributed computing provides us with many examples of large-scale systems from which we can learn. Worldwide payment and transaction architectures used in the financial world, the global network of ATMs (Automated Teller Machine), wired and wireless telecommunication networks and their added value services are all to be studied and if possible re-used in order to create grid systems. If grids want to succeed, it is not only computational performance they should focus on. Robustness, reliability, high-availability, security and maintainability are equally important aims.

One final issue that needs discussing is security and dynamic system configurability. For space reasons, these are the topics of another paper, but nonetheless, it should be said the next release of Jini technology will provide and extensive set of mechanisms to cater for the most stringent security and configuration needs.

5. Conclusions

This paper focused on the design issues of large-scale grid systems. We briefly looked at the history of grid computing and discussed the generally accepted requirements for grid systems. We argued that a service-centric approach is more adequate for large grid systems, and defined the top-level requirements for such systems. It has also been proposed that a Java/Jini based middleware is the most suitable choice as it provides the best programming language and software engineering support available to date. As well as offering platform independence for free, it is also the most suited implementation platform for a design based on design patters. The paper however only scratched the surface of the design problems, but we hope this is the right start to creating a highly usable, global-scale service-oriented grid system.

References

[1] Grimshaw, W. Wulf et al. The Legion Vision of a Worldwide Virtual Computer. Communications of the ACM, vol. (40)1, January 1997.

[2] Foster and C. Kesselman, The Globus project: a status report, Future Generation Computer Systems 15 (1999) pp 607-621.

[3] M.O. Neary, B.O.Christiansen, P. Capello and K.E.Schauser, Javelin: Parallel Computing on the Internet, Future Generation Computer Systems 15 (1999) pp 659-674.

[4] Henri Casanova and Jack Dongarra, "NetSolve: A Network Server for Solving Computational Science Problems", The International Journal of Supercomputer Applications and High Performance Computing, Volume 11, Number 3, pp 212-223, Fall 1997.

[5] I. Foster, C. Kesselman, J. Nick, S. Tuecke, "The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration", Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002., http://www.gridforum.org.

[6] The Globus Toolkit: Basics. http://www.globus.org/training/toolkit-internals/

[7] W. Johnston and J. Brooke, "Core Grid Functions: A Minimal Architecture for Grids", Working Draft, Version 3.1, Global Grid Forum, Grid Protocol Architecture Working Group, http://www-itg.lbl.gov/GPA/

[8] Sun Microsystems, "The Jini Specification", http://www.sun.com/jini

[9] Zoltan Juhasz, Arpad Andics, and Szabolcs Pota, "JM: A Jini Framework for Global Computing", in Proc 2nd International Workshop on Global and Peer-to-Peer Computing on Large Scale Distributed Systems at IEEE International Symposium on Cluster Computing and the Grid (CCGrid'2002) May 21 - 24, 2002 Berlin, Germany. pp. 395-400.

[10] Zoltan Juhasz, Arpad Andics, and Szabolcs Pota, "Towards A Robust And Fault-Tolerant Multicast Discovery Architecture For Global Computing Grids", in P. Kacsuk, D. Kranzlmüller, Zs. Nemeth, J. Volkert (Eds.): Distributed and Parallel Systems - Cluster and Grid Computing Proc. 4th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Kluwer Academic Publishers, The Kluwer International Series in Engineering and Computer Science, Vol. 706, Linz, Austria, September 29-October 2, 2002, pp. 74-81.

[1] This project is supported by the Hungarian Ministry of Education under the IKTA programme.