We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems.
- Authors
Chung, Jinsuk; Lee, Ikhwan; Sullivan, Michael; Ryoo, Jee Ho; Kim, Dong Wan; Yoon, Doe Hyun; Kaplan, Larry; Erez, Mattan
- Abstract
This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics and are nested to take advantage of the machine and application hierarchies and to enable hierarchical state preservation, restoration and recovery. We evaluate the scalability and efficiency of containment domains using generalized trace-driven simulation and analytical analysis and show that containment domains are superior to both checkpoint restart and redundant execution approaches.
- Subjects
FAULT-tolerant computing; ERROR detection (Information theory); SEMANTIC computing; SEMANTIC networks (Information theory); SEMANTIC integration (Computer systems)
- Publication
Scientific Programming, 2013, Vol 21, Issue 3, p197
- ISSN
1058-9244
- Publication type
Article
- DOI
10.1155/2013/473915