P-GRADE: Developing and Running Parallel Programs on Supercomputers, Cluster and Grid systems 

 

Peter Kacsuk

MTA SZTAKI

kacsuk@sztaki.hu

 

 

The P-GRADE system was originally designed for supporting the development of parallel programs by the Laboratory of Parallel and Distributed Systems of MTA SZTAKI. Developing parallel programs is essentially more difficult than creating sequential ones. That was the reason to construct a graphical programming environment by which even non-IT specialist end-users like meteorologists, biologists, etc. are able to develop supercomputer and cluster programs. P-GRADE provided in its original form a graphical language, a graphical editor, pre-compiler, PVM library support, distributed graphical debugger, monitoring system, performance and execution visualization tool. In the framework of the IKTA-3 project "Cluster Programming Technology and its Usage in Meteorology", we extended P-GRADE with new and novel tools that significantly increase the efficiency and reliability of P-GRADE programs executed on clusters.

 

As Grid systems became a reality, a parallel program can be executed simultaneously on several clusters and/or supercomputers. In order to exploit these new possibilities, P-GRADE was further developed towards the Grid. The aim is the same either using P-GRADE for supercomputers, clusters or Grid: to hide the details of the parallel/distributed execution environment from the user in order to allow him concentrate on the problem to be solved. Another aim was to enable the usage of the same P-GRADE system no matter whether the developed parallel program will run on a supercomputer, cluster or in a Grid.

 

This extension of P-GRADE is solved in the framework of the IKTA-4 project "Hungarian Supercomputing Grid" in several stages. In the first step a new execution mode, namely the job execution mode, was introduced into P-GRADE. The job execution mode is indispensable in the Grid but it also very useful on supercomputers and clusters. In order to introduce the job execution mode, P-GRADE was integrated with the Condor job management system. The advantage of this marriage is that the parallel programs developed in P-GRADE are automatically transferred to Condor job and then Condor takes care of running them either in a single cluster or on several "friendly clusters" using the "Condor flocking" technique. We demonstrated this integrated system at the Grid Demo workshop of the CCGrid'2002 conference in Berlin. This P-GRADE/Condor system is a good candidate to use in the Hungarian Cluster Grid, which is also based on Condor.

 

The drawback of the P-GRADE/Condor system is that due to the restrictions of Condor it cannot support file staging, application monitoring, on-line visualization and parallel program check-pointing. The lack of parallel program check-pointing prevents the temporary suspension of Grid programs and their resumption at a later time. Such feature would be very important in the Hungarian Cluster Grid where the clusters serve as Grid resources only at night and at the weekends. In order to solve the problems above we developed a new Grid layer called PERL-GRID. The new system, called TotalGrid, that combines P-GRADE and PERL-GRID enables the execution of parallel programs on arbitrary Grid resources, and supports file staging, application monitoring, on-line visualization and parallel program check-pointing. The TotalGrid system can be applied not only for scientific Grids but also to form company Grids. The TotalGrid system was demonstrated at the 5th EU DataGrid conference in September 2002 at Piliscsaba by the MEANDER ultra-short weather forecast program package of the Hungarian Meteorology Service. A new workflow execution mode of P-GRADE programs is under development in the framework of the SuperGrid project. The workflow execution mode will enable the Grid execution of very complex problems consisting of several jobs whose dependency is described by the workflow graph of P-GRADE.

 

The talk will explain, compare and evaluate the various execution modes mentioned above of P-GRADE both for clusters and Grid systems.