Fault tolerance in the WebCom metacomputer

Research output: Chapter in Book/Report/Conference proceedingsChapterpeer-review

Abstract

This paper addresses fault tolerance in the WebCom metacomputer. WebCom's computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the effect of a fault to its immediate neighbourhood.

Original languageEnglish
Title of host publicationProceedings - International Conference on Parallel Processing Workshops, ICPPW 2001
EditorsTimothy Mark Pinkston
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages245-250
Number of pages6
ISBN (Electronic)0769512607
DOIs
Publication statusPublished - 2001
EventInternational Conference on Parallel Processing Workshops, ICPPW 2001 - Valencia, Spain
Duration: 3 Sep 20017 Sep 2001

Publication series

NameProceedings of the International Conference on Parallel Processing Workshops
Volume2001-January
ISSN (Print)1530-2016

Conference

ConferenceInternational Conference on Parallel Processing Workshops, ICPPW 2001
Country/TerritorySpain
CityValencia
Period3/09/017/09/01

Keywords

  • Character generation
  • Computer science
  • Costs
  • Distributed computing
  • Fault tolerance
  • Hardware
  • Internet
  • Redundancy
  • Safety
  • Wire

Fingerprint

Dive into the research topics of 'Fault tolerance in the WebCom metacomputer'. Together they form a unique fingerprint.

Cite this