|Read the Digest in
You need the free
Thanks to This Month's Availability Digest Sponsor
In this issue:
Browse through our useful links.
See our article archive for complete articles.
Sign up for your free subscription.
Visit our Continuous Availability Forum.
Check out our seminars.
Check out our writing services.
Software Sanity Checks
Complex software makes many decisions for us in real time. These decisions range from flying airplanes to running businesses. In most cases, the result of a computerized decision is to notify a person of suggested actions to take.
For instance, a flight computer might detect that the airplane’s speed is decreasing and will alert the pilot to pick up speed to avoid a stall. But what if the flight computer issued an order to lower flaps without pilot authorization while the plane was cruising at 500 knots? The wings would surely be pulled off the plane.
Similar situations exist in business environments. A computer is allowed to take actions directly without human involvement. In such cases, there must be built-in sanity checks to ensure that the actions are within acceptable bounds.
As reported in this issue of the Availability Digest, such a case occurred on Amazon UK this past holiday season. Many online sellers found that their products were selling for just a British penny because of a malfunction in an automatic price-setting utility. The sellers had not used the sanity checks that were available from the utility and from Amazon. The result was massive loss and potential bankruptcy for many.
Dr. Bill Highleyman, Managing Editor
A large, privately held grocery-store chain operates 300 supermarkets. It had been using a BASE24 financial-transaction switch from ACI on an HP NonStop server to route credit-card and debit-card payments from its point-of-sale (POS) terminals for authorization to the banks that issued the cards.
Furthermore, its POS terminals provided additional customer services such as cell-phone topping, bank deposits, and bill payments. The chain’s POS terminals were connected to the financial switch via in-store servers that offered this additional functionality.
The BASE24 financial-transaction switch ran on a single HP NonStop server. Should the switch fail, the grocery stores were limited to cash sales only and were effectively out-of-business temporarily. Following ACI’s announcement that it would no longer support BASE24 on HP NonStop servers, the grocery chain opted to move to the OmniPayments financial-transaction switch from Opsol Integrators Inc. Via its replication engine OmniReplicator, OmniPayments implemented an active/active system that guaranteed continuous availability for the chain’s stores. This packaged solution saved the grocery chain a significant amount of money.
Want to buy a set of headphones for a penny? How about a mattress? Or a dress? If you were quick on your feet between 7 PM and 8 PM U.K. time on Friday, December 12th, you could have done just that. Where? On Amazon UK, of all places.
During that hour, the prices for thousands of items fell to one pence (about a U.S. penny). The cause – a software glitch in a third-party application used by thousands of small Amazon sellers. The application provides automatic pricing adjustments to ensure that the seller’s products are priced competitively. The third-party software firm is a U.K. company, RepricerExpress.
Word of the “great deals” spread rapidly via Twitter. Buyers were delighted. Sellers were appalled. The error cost many small family-owned businesses thousands of British pounds. Many face bankruptcy.
RepricerExpress boasts on its web site that it offers “the ridiculously simple way to increase your Amazon sales.” It certainly proved that statement though in a very disastrous way.
Neither Amazon nor RepricerExpress has offered any financial support for impacted sellers. Moreover, although several lawsuits have already been initiated against RepricerExpress, sellers have a concern that the small firm will collapse under the volume of claims.
We have written frequently about whether public clouds are suitable for corporate critical applications. We have published many Never Again stories about massive failures in popular clouds such as Amazon, Google, Azure, and Rackspace. However, these are really vignettes – snapshots in time. What are the real availability statistics for them and other clouds over a long period of time?
The web-monitoring site CloudHarmony gives us an insight into this data as well as much more information about cloud performance, cloud pricing, and cloud capabilities for dozens of public clouds. Furthermore, several web services monitor clouds in real time and provide information on the current status of these clouds. We review such services in this article.
Public clouds still have a ways to go to achieve carrier-grade availabilities. Some are nearly there now. Amazon’s EC2 compute service and S3 storage service have shown average availabilities of about five 9s (five minutes of downtime per year), which is probably better than the availability achieved by most data centers. However, public clouds like Microsoft’s Azure, with an average availability of a little more than three 9s, have a lot of maturing to do before they can become serious candidates for hosting mission-critical applications.
“If you think high availability is expensive, try downtime.” With these words, Dr. Terry Critchley paints an exhaustive picture in his book “High Availability IT Services,” explaining how we can protect our critical applications effectively and economically from the plethora of faults that can take them down.
Terry’s style is refreshingly informal and conversational. Even his most difficult topics are described in easily understood terms with frequent vignettes relating his personal experiences that have spanned over four decades in the IT industry. Moreover, Terry adds significant depth to each topic with frequent references to more detailed works, complete with URLs for easy access to these resources.
Terry’s book covers the entire gamut of high-availability topics, from hardware resilience to software reliability to Service Level Agreements (SLAs) and even to the worst offender, the human fat finger. The book focuses not on hardware or software reliability but rather on service reliability. A service is a business-support function that depends upon people, products, and processes. The book analyzes each of these service components in great detail.
A challenge every issue for the Availability Digest is to determine which of the many availability topics out there win coveted status as Digest articles. We always regret not focusing our attention on the topics we bypass.
Now with our Twitter presence, we don’t have to feel guilty. This article highlights some of the @availabilitydig tweets that made headlines in recent days.
Sign up for your free subscription at https://availabilitydigest.com/signups.htm
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.
Managing Editor - Dr. Bill Highleyman firstname.lastname@example.org.
© 2015 Sombers Associates, Inc., and W. H. Highleyman