Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Processing Architectures. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CPA tells you how to avoid the effects of downtime.



In this issue:


   Availability Topics

      Fault Tolerance for Virtual Environments - 2

   Recommended Reading

      Aberdeen's 2008 Business Continuity Survey

   Product Reviews

      Parallel Sysplex - Fault Tolerance from IBM

  The Geek Corner

      Is Parallel Repair Really Better?


Complete articles may be found at

Attend the Active/Active Preconference Seminar at HPTF this June


I will be presenting a full-day seminar on the theory and practice of active/active systems at the 2008 HP Technology Forum, to be held in Las Vegas, Nevada, USA, this June. The seminar will be held on Monday, June 16, starting at 8:30 AM and will cover a variety of subjects, including:

  • Why active/active systems provide such high availability.

  • The architecture of active/active systems.

  • Descriptions of data replication engines, with examples.

  • How active/active systems compare to clusters.

  • Other benefits of multinode active/active architectures.

  • Several case studies.

This seminar is available for Encompass or ITUG attendees. You can register at  or


I hope to see you there.


Dr. Bill Highleyman, Managing Editor


Availability Topics


Fault Tolerance for Virtual Environments – Part 2

In Part 1 of this series, we described how virtualization can significantly reduce the capital and operational costs of a data center. It does this by creating several virtual machines on a single physical server. Each virtual machine runs its own operating system, known as a guest operating system, such as Linux, Windows, or Unix. Each operating system thinks that it is running in its own virtual server.

The physical layer requests made by the guest operating systems are adjudicated by an intervening layer, the hypervisor. The hypervisor, in effect, multiplexes the requests from the operating system and allows only one request at a time to be passed to the physical server.


Consequently, the number of physical servers required by a data center can be reduced, in some cases dramatically. This reduction in physical hardware is accompanied by associated reductions in capital and operating costs. Fewer machines mean less space, less air conditioning, less power, less power backup, less maintenance, and less system administration.


In Part 2 of this series, we look at the various architectures that are being used to provide virtualization.





Recommended Reading


Aberdeen’s 2008 Business Continuity Survey


The Aberdeen Group, a major industry analysis firm, has published its most recent survey concerning the state of business continuity planning and implementation among a broad spectrum of small to large companies. This March, 2008 report, entitled Business Continuity: Implementing Disaster Recovery Strategies and Technologies, is publicly available on its web site.


Aberdeen notes that 62% of the companies surveyed experienced between one and five business interruption events in the last year. 34% of all companies surveyed have yet to implement a solution. It seems that there remains a disconnect between reality and action in the marketplace when it comes to protecting a company’s IT assets from failures and disasters.





Product Reviews

Parallel Sysplex – Fault Tolerance from IBM

IBM’s Parallel Sysplex, HP’s NonStop server, and Stratus’ ftServer are today the primary industry fault-tolerant offerings that can tolerate any single failure, thus leading to very high levels of availability. The Stratus line of fault-tolerant computers is aimed at seamlessly protecting industry-standard servers running operating systems such as Windows, Unix, and Linux. As a result, Stratus does not compete with the other two systems because Parallel Sysplex and NonStop systems compete exclusively in the large enterprise marketplace.


IBM’s Parallel Sysplex systems are multiprocessor clusters that can support from two to thirty-two mainframe nodes. The nodes in a Parallel Sysplex system interact as an active/active architecture. The system allows direct, concurrent read/write access to shared data from all processing nodes without sacrificing data integrity. Furthermore, work requests associated with a single transaction or database query can be dynamically distributed for parallel execution on the nodes in the Parallel Simplex cluster based on available processor capacity.





The Geek Corner

Is Parallel Repair Really Better Than Sequential Repair?

 We recently had a thought-provoking challenge from one of our readers questioning our reasoning on parallel repair versus sequential repair. In previous articles in the Availability Digest, we had argued that parallel repair would restore a downed system faster than sequential repair.


Our reader correctly points out that this observation depends upon the distribution of repair times. For instance, if repair time were absolutely constant (for instance, each repair took exactly four hours, no matter what), then parallel repair has no advantage over sequential repair.


In this article, we reprint that conversation and then summarize the results of a detailed analysis of how repair time distribution affects system restore time under parallel repair.






Would You Like to Sign Up for the Free Digest by Fax?


Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543




Email Address:



Telephone No.










The Availability Digest may be distributed freely. Please pass it on to an associate.

To be a reporter, visit

Managing Editor - Dr. Bill Highleyman

© 2008 Sombers Associates, Inc., and W. H. Highleyman