Managing MPICH-G2 jobs with WebCom-G

Research output: Chapter in Book/Report/Conference proceedingsChapterpeer-review

Abstract

This paper discusses the use of WebCom-G to handle the management & scheduling of MPICH-02 (MPI) jobs. Users can submit their MPI applications to a WebCom-G portal via a web interface. WebCom-G will then select the machines to execute the application on, depending on the machines available to it and the number of machines requested by the user. WebCom-G automatically & dynamically constructs a RSL script with the selected machines and schedules the job for execution on these machines. Once the MPI application has finished executing, results are stored on the portal server, where the user can collect them. A main advantage of this system is fault survival, if any of the machines fail during the execution of a job, WebCom-G can automatically handle such failures. Following a machine failure, WebCom-G can create a new RSL script with the failed machines removed, incorporate new machines (if they are available) to replace the failed ones and re-launch the job without any intervention from the user. The probability of failures in a Grid environment is high, so fault survival becomes an important issue.

Original languageEnglish
Title of host publicationISPDC 2005
Subtitle of host publication4th International Symposium on Parallel and Distributed Computing
Pages258-264
Number of pages7
DOIs
Publication statusPublished - 2005
EventISPDC 2005: 4th International Symposium on Parallel and Distributed Computing - Lille, France
Duration: 4 Jul 20056 Jul 2005

Publication series

NameISPDC 2005: 4th International Symposium on Parallel and Distributed Computing
Volume2005

Conference

ConferenceISPDC 2005: 4th International Symposium on Parallel and Distributed Computing
Country/TerritoryFrance
CityLille
Period4/07/056/07/05

Keywords

  • Globus
  • Grid fortais
  • MPI
  • MPICH-G2
  • Scheduling and fault survival
  • WebCom-G

Fingerprint

Dive into the research topics of 'Managing MPICH-G2 jobs with WebCom-G'. Together they form a unique fingerprint.

Cite this