Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Availability. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CA tells you how to avoid the effects of downtime.

www.availabilitydigest.com

Thanks to This Month's Availability Digest Sponsor

FileSync automatically replicates and synchronizes applications and files between NonStop servers.

OPTA2000 simulates multiple virtual clocks and time zones on a single HP NonStop server.

The Enscribe-2-SQL Toolkit converts Enscribe calls to NonStop SQL/MP and SQL/MX statements.

Our TMF-Audit Toolkit easily converts non-Audited TMF files to Audited TMF files.

In this issue:

Case Studies

Bank Beats Earthquakes with A/A

Never Again

RBS Offline for Two Weeks

Best Practices

Avoiding Capacity Exhaustion

Recommended Reading

Tandem Computers Unplugged

Browse through our Useful Links.

Check our article archive for complete articles.

Join us on our Continuous Availability Forum.

Check out our seminars.

Check out our technical writing services.

Move to Continuous Availability One Step at a Time

In this issue’s “Ring-of-Fire” article, we describe how a bank, following a major earthquake, decided to move to geographically-distributed continuous availability by taking a series of controlled steps towards implementing an active/active environment. The bank is now partway through its journey, and its progress demonstrates that moving to an active/active configuration does not have to be one major project.

If you are already running a disaster-recovery center, you already have made the major investments required for active/active operation. You have two (or more) physical facilities. You have two copies of your IT environment, each capable of handling your entire processing load. You have networking in place. What you need is a good data-replication product, perhaps some program modifications, and lots of testing.

As the bank found, the controlled steps involve moving from a cold-standby environment to virtual tape, then to a hot-standby system, then to a sizzling-hot standby system, and finally to active/active.

These are the topics we discuss in our seminars on high- and continuous availability. You can also find a great deal of information in our article archive.

Dr. Bill Highleyman, Managing Editor

Case Studies

Ring-of-Fire Bank Beats Earthquakes with Active/Active

Not many banks avoided exposure to the recent subprime crisis and speculative real-estate mortgage meltdown. One bank that did, due to its rational credit policies, remained the number one lender in its area while other financial institutions severely restricted credit to their customers.

However, the bank was not as well prepared for Mother Nature. The bank is located on the Pacific Rim Ring of Fire and lost its data center for several hours during an earthquake. When the failover to the backup system did not go as planned, the bank said, “Never again!” It started on the path to move its data center operations to a continuously available active/active environment.

The success that this bank has achieved in moving in a controlled fashion towards continuous availability teaches an important lesson – do not give up if you think moving to higher availability architectures is too hard. Each architectural step, from magnetic-tape backup to virtual-tape backup to active/passive to sizzling-hot standby to active/active, moves you closer to continuous availability. The migration is a process that can be managed and controlled to ensure success on your schedule.

--more--

Never Again

Royal Bank of Scotland Offline for Two Weeks

On Tuesday, June 19, 2012, operations at the Royal Bank of Scotland, NatWest, and Ulster Bank came to a halt. Millions of bank customers were affected. They could not receive their salaries or pension payments. They could not pay their bills or use the banks’ online services. The outage spilled over to customers of other banks when expected payments could not be made.

The problem was a software upgrade that had gone terribly wrong in the data center that serviced the banks. It was two weeks before operations returned to normal.

One common cause of data-center outages is that upgrades are attempted with no fallback plan in place. If an upgrade fails, the application is down. However, in this case, the banks did have a fallback plan. The problem was that it didn’t work. Was it a problem with proper documentation? Proper training? Proper testing? We don’t know, but we’re sure that the banks now know.

One major lesson learned from this experience is the importance of a flexible customer-service model. The fact that the banks could immediately engineer the opening of 1,200 branch offices for extended hours and could rapidly double their call-center capacities certainly helped in serving customers during this time of crisis.

--more--

Best Practices

Avoiding Capacity Exhaustion

Is your business growing at a rapid rate? Are your IT systems getting close to their capacity limits? Is there enough of a capacity margin to get you through the peaks of the coming year? What will a CPU failure mean in terms of overloading your systems during peak traffic times?

At the recent BITUG (British Isles Tandem User Group) meeting in London in December, 2011, Damian Ward, NonStop Solutions Architect at VocaLink, presented an in-depth analysis of the capacity planning used by VocaLink, the provider of the Faster Payments Service (FPS) and LINK Scheme (LS) service to UK banks. Damian has been immersed in the IT industry for over twenty years and was Vice Chairman of BITUG at the time of his presentation. He is now Chairman.

The technique he describes is based on past statistics and future projections of system traffic. It results in an amazingly simple graphic showing daily “hot spots” predicted for the coming year and predicts the probability that a CPU failure during a particular hot spot will cause an overload condition. Damian shows how this analysis was recently used to select an LS service upgrade strategy from several options.

--more--

Tandem Computers Unplugged: A People’s History

In her book, Tandem Computers Unplugged: A People’s History, author Gaye I. Clemson paints a poignant picture of the history of Tandem Computers, Inc., from Jimmy Treybig’s concept for a fault-tolerant computer in 1974 to Tandem’s acquisition by Compaq in 1997 and then by HP in 2002. Having worked at Tandem for eight years from 1984 to 1992, moving there from Bell Canada’s Computer Communication Group, Gaye is privy to a great deal of Tandem’s history and culture.

The author has spiced up her book with hundreds of photos of Tandem people and early computers as well as documents that include marketing brochures, advertisements, and the original Mackie diagram that illustrated simply how Tandem achieved its fault tolerance. That diagram was on the back of every employee’s badge.

If you are a Tandem person from the early era, you will probably find references and quotes related to many of the people you knew. Tandem Computers Unplugged is a must-read for any Tandemite.

--more--

Sign up for your free subscription at https://availabilitydigest.com/signups.htm

Would You Like to Sign Up for the Free Digest by Fax?

Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543

Name:

Email Address:

Company:

Title:

Telephone No.:

Address:

____________________________________

The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.

Managing Editor - Dr. Bill Highleyman editor@availabilitydigest.com.