|Read the Digest in
You need the free Adobe
The digest of current topics on Continuous Processing Architectures. More than Business Continuity Planning.
BCP tells you how to recover from the effects of downtime.
CPA tells you how to avoid the effects of downtime.
In this issue:
Complete articles may be found at https://availabilitydigest.com/articles.
Ignore History, and You Are Doomed to Repeat It
During a meeting at the recent HP Technical Forum (HPTF), an executive of a major bank stated that the bank did not have a backup for a critical NonStop system because the system had not failed in seven years. I can see a great Never Again story coming!
In my years of building mission-critical systems, I have continually urged my clients to incorporate backup systems into their IT planning and to frequently exercise their backup facilities. Yet as our Never Again stories show, this message still has not been widely accepted. Over and over again, we hear stories of critical systems that went down because there was incomplete backup or because the redundant systems didnít work.
This month, we summarize over two dozen outages experienced in the first six months of this year. We hope that these will help drive home our message of availability to IT staff and executives alike.
Dr. Bill Highleyman, Managing Editor
We continue our tradition of reviewing just a small portion of computer failures (and a couple of other interesting failures) that have occurred over the first six months of 2008. In our previous article published a half-year ago, we noted that one-third of all problems were power-related. That trend continues. 25% of the 28 stories below have to do with power failures of various kinds. In addition, five of the following incidents were caused by upgrades, usually with no failback procedure in place.
Thousands of small to medium businesses (and some large ones as well) use Amazon.comís services to implement their online presences. Amazon significantly extended these services recently with AWS, the Amazon Web Services, which opens Amazonís massive infrastructure to the use of its customers. The result is what is known as cloud computing.
If these services should go down, thousands of businesses go out of business during the outage. And that is exactly what has happened this year. Amazon has racked up hours of downtime in several incidents during the past six months.
Many experts believe that cloud computing will be the dominant IT delivery model of the future. However, based on current experience, this technology has a long way to go.
Can cloud computing be trusted? What can a business do to protect itself from outages over which it has no control? Does cloud computing have a future? We explore these questions in this article.
HP has ported its fault-tolerant NonStop server to its HP c-Class BladeSystem. Named the NSMA (NonStop Multicore Architecture) system, a bladed NonStop server can contain up to sixteen processors, the same as HPís largest contemporary NonStop servers. Based on dual-core Integrity processors, an NSMA system delivers twice the power of the HP NS16000, until recently HPís largest NonStop server, in half the footprint.
Existing applications can be ported seamlessly to the new bladed system. Using standard NonStop management facilities, NSMA nodes can be added to existing NonStop clusters comprising other Integrity and S-series NonStop servers.
Perhaps equally important is that NSMA leverages existing HP technology. Except for the ServerNet fabric so important to tying a NonStop system together, all of the hardware in a NonStop BladeSystem is standard hardware used in other HP products. This includes the c7000 processor blades and enclosures, the Proliant I/O servers, and the SAS disk arrays.
In the first three parts of this series, we showed how to calculate the availability of complex systems comprising serial and parallel combinations of subsystems with varying availabilities. We considered not only system downtime due to multiple system failures but also system downtime due to failover times and failover faults.
In this final Part 4, we demonstrate the use of these results to calculate the availability of an active/active system backed up by a standby system that takes over only in the event of the failure of the entire active/active system.
The example demonstrates that system failure intervals measured in centuries can be achieved with todayís technology used in reasonable system configurations.
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
The Availability Digest may be distributed freely. Please pass it on to an associate.
To be a reporter, visit https://availabilitydigest.com/reporter.htm.
Managing Editor - Dr. Bill Highleyman email@example.com.
© 2008 Sombers Associates, Inc., and W. H. Highleyman