|Read the Digest in
You need the free Adobe
The digest of current topics on Continuous Processing Architectures. More than Business Continuity Planning.
BCP tells you how to recover from the effects of downtime.
CPA tells you how to avoid the effects of downtime.
In this issue:
Complete articles may be found at https://availabilitydigest.com/articles.
Join me for my Active/Active Seminar at Community Connect Europe 2008 in Mannheim
I will be presenting a four-hour seminar on “Active/Active Systems: Theory and Practice” at the upcoming Community Connect Europe 2008 conference. The conference is being held from November 10th to the 12th in Mannheim, Germany. As you probably know, active/active systems can achieve nearly continuous availability.
The seminar will describe why active/active systems provide such high availability and how to build them. The superiority of active/active systems as a high-availability solution will be demonstrated by comparing them to clusters. The many other advantages that active/active systems bring will also be detailed.
You will learn the desirable characteristics of the data-replication engines so important to these systems and will receive an overview of many of the replication-engine products available today. Finally, a series of real case studies will be presented showing active/active in action.
So please join me in Mannheim for this highly educational seminar, and learn how you can virtually eliminate the cost of downtime.
Dr. Bill Highleyman, Managing Editor
NHSBT: UK National Health Service – Blood and Transplant
The UK National Blood Service manages the human blood supply throughout England and North Wales. It tracks the blood supply from when it is first donated through its testing and separation into various products and finally to its dispatch to hospitals.
If the National Blood Service is not operational, lives can be lost. This is particularly true following a major catastrophe. Therefore, the availability of the computing infrastructure used by the NBS is of paramount importance.
To ensure that data-processing services will not be interrupted, the NBS has established dual processing centers using split-site OpenVMS clusters that provide multiple levels of redundancy. It has just recently undergone a major upgrade to this system – an upgrade that was accomplished with minimal disruption in services. This article describes the upgraded NBS system.
London Stock Exchange PC-Trading System Down for a Day
On Monday, September 8, 2008, the London Stock Exchange (LSE) – the third largest stock exchange in the world – crashed for most of the day. Hundreds of millions of pounds in lost commission revenue resulted on what turned out to be one of the most hectic trading days of the year.
The LSE had recently moved its trading system from Tandem fault-tolerant computers (now HP NonStop) to a massive distributed PC-based system called TradElect. Was the crash a failure in the new PC network? Was it due to trading volume? Was it an upgrade gone wrong? Was it caused by a network failure? Who knows because the Exchange remains silent on the cause.
It seems that the London Stock Exchange has not yet learned the value of frequent and accurate communications with its clients during crises of this sort.
VRRP – Virtual Router Redundancy Protocol
The quest for continuous availability is not localized to the computer room. Mission-critical systems are unavailable if users do not have access to them. Therefore, the networks interconnecting users with their servers must be equally reliable.
This means, of course, that all network paths must be redundant. Furthermore, failover to a backup path should be very fast so that users are not inconvenienced. In large IP networks, high reliability is achieved typically by dynamic routing around failure points.
However, dynamic routing protocols are complex and impose a significant burden on the network routing components. As a result, it is impractical to carry these techniques all the way back to the LANs supporting the users’ laptops and desktops.
To solve this problem, the Virtual Router Redundancy Protocol (VRRP) provides virtual routers comprising multiple physical routers with a common IP address so that first-hop routing survives in the presence of a physical router failure. Furthermore, it does so with complete transparency to the client systems that the virtual routers are supporting. In addition, the physical routers comprising a virtual-router group can load-share the network traffic routed to the virtual router.
In several previous articles, we have talked about availability being the poor cousin to performance when it comes to benchmarks. For over two decades, the IT industry has depended upon performance benchmarks to make informed decisions concerning system purchases. However, in this era of 24/7 requirements for mission-critical systems, availability and the cost of downtime are becoming predominant considerations.
But how do we measure system availabilities when today’s failure intervals are months to years? In our most recent article on this subject, entitled Adding Availability to Performance Benchmarks (September, 2007), we suggested that recovery time is a useful and measurable metric appropriate as an availability benchmark. The use of recovery time measurements incorporates the cost of downtime in the system choice. After all, in many applications, it is the cost of downtime that is the predominant cost over the life of the system.
How does management use a recovery-time measure to aid them in a purchase decision? We suggest a formal method to do so in this article.
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
The Availability Digest may be distributed freely. Please pass it on to an associate.
To be a reporter, visit https://availabilitydigest.com/reporter.htm.
Managing Editor - Dr. Bill Highleyman firstname.lastname@example.org.
© 2008 Sombers Associates, Inc., and W. H. Highleyman