Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Availability. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CA tells you how to avoid the effects of downtime.

www.availabilitydigest.com


Thanks to This Month's Availability Digest Sponsor

Connect HP's largest and most engaged IT professional user community.

Hope you joined us in Las Vegas in June at HP Discover 2011, HP's showcase technology event. Check out the presentations which have been posted online.

 

 

In this issue:

 

   Never Again

      Verizon 4G Network Down for Two Days

      Mizuho Bank Down for Ten Days

   Best Practices

     The Value of Availability

  The Geek Corner

     Simplifying Failover Analysis - Part 2

 

 

Browse through our Useful Links.

Check our article archive for complete articles.

Sign up for your free subscription.

Join us on our Continuous Availability Forum.

 

How Many Nines Are Good Enough?

 

We have started a very active thread on our Continuous Availability Forum on LinkedIn. It’s entitled How Many Nines are Good Enough? It raises the issue of whether the fault-tolerant market is going away. In today’s IT climate, has “good enough” become the enemy of “great”? Are “mostly go” systems the architectures of choice for industries that supposedly clamor for high- or even continuous availabilities? If that’s the case, what does the future hold for platforms such as NonStop and Stratus?

 

There have been a multitude of comments on the thread, most agreeing that life has to be somehow breathed back into the fault-tolerant marketplace. It was noted that most young IT professionals have never heard of Stratus or NonStop. Interestingly, I am currently involved in structuring a course in fault tolerance for a major university. Maybe what we need to do is to get to the students who are entering the IT marketplace.

 

If you have any insights on this issue, I’d appreciate your adding your own comments to those that are already there. Thanks.

 

Dr. Bill Highleyman, Managing Editor

 

 


 

  Never Again 

 

Verizon 4G Network Down for Two Days

 

In December, 2010, Verizon Wireless inaugurated its much-touted 4G wireless service, beating AT&T and other providers to be the first major wireless carrier in the U.S. to introduce a 4G network.

 

Happy customers raved about the significantly increased speeds of download and upload. Verizon bragged about its “always reliable” network.

 

Then disaster struck. Verizon’s revered 4G network went down for almost two days. Verizon shared precious little information about its efforts to restore its nationwide service. There was nothing posted to its web site, and there were no press releases or even information given to the press. Google searches on the outage only found blog complaints. It wasn’t until a month later that Verizon explained to some extent the cause of the outage.

 

Many other companies have learned the lesson of crisis communication the hard way and have set up digital dashboards that maintain a current status of all of their services. This is an avenue that Verizon should perhaps aggressively pursue.

 

--more--

 

Mizuho Bank Down for Ten Days

This is a story of poor Business Contingency Planning. The Japanese bank, Mizuho Bank, shut down its ATM network and stopped making salary transfers to its customers in mid-March, 2011. It was ten days before the bank was back in full operation.

 

What caused this disaster? A nice thing, actually. It was excessive donations made via mobile phones in response to the devastating Japanese earthquake and ensuing tsunamis the week prior. However, the massive load created by these donations hobbled Mizuho’s batch processing of money-transfer transactions. As well-intentioned as the donations were, it was little solace to the millions of Japanese who could not get their salaries paid or who could not withdraw necessary funds from their accounts via ATMs for days.

 

An internal investigation following the outage placed the blame directly on the bank’s failure to prepare and to audit a comprehensive Business Contingency Plan.

 

--more--

 


 

Best Practices

 

The Value of Availability

 

In an earlier issue, we reviewed  Blueprints for High Availability: Designing Resilient Distributed Systems, a classic book on high availability. The authors, Evan Marcus and Hal Stern, have since published a second edition. Their new book focuses less on cluster technology (their core experience) and extends itself to the broader issues of high availability.

 

In this review, we pass on the authors’ insights into the financial justification for highly available systems as presented in Chapter 3 of their book, The Value of Availability. We summarize the techniques put forth by Marcus and Stern to evaluate the financial savings and return on investment that an availability solution might bring to a company.

 

High availability is a business decision. Whether to adopt a particular high-availability solution depends upon many factors. From a cost viewpoint, it is a comparison between the cost of the solution and the cost of the downtime that it will save. Factored into this must be other considerations with respect to indirect costs, such as customer satisfaction and company reputation.

 

--more--

 


 

The Geek Corner

 

Simplifying Failover Analysis – Part 2

 

In our article, Simplifying Failover Analysis – Part 1, we discussed the impact of failover time and failover faults on redundant systems. In a two-node redundant system, users are down if:

 

·            both nodes fail, or

·            one node fails, and the users are in the process of being failed over, or

·            one node fails, and a failover fault occurs.

 

We showed in Part 1 that failover can be modeled as a two-node redundant system with the availability of one node being reduced by the effects of failover time and failover faults.

 

In this article, we extend the results of Part 1 to accommodate two additional complexities when modeling failover:

 

  • What if the redundant nodes are different with different availability characteristics?

  • How do we handle a redundant production node that is in the process of failing over internally? Even though it is technically down, it will not fail over to its backup node.

--more--

 


 

 

 

Sign up for your free subscription at http://www.availabilitydigest.com/signups.htm

 

Would You Like to Sign Up for the Free Digest by Fax?

 

Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543

 

 

Name:

Email Address:

Company:

Title:

Telephone No.:

Address:

____________________________________

____________________________________

____________________________________

____________________________________

____________________________________

____________________________________

____________________________________

____________________________________

The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.

Managing Editor - Dr. Bill Highleyman editor@availabilitydigest.com.

© 2011 Sombers Associates, Inc., and W. H. Highleyman