In this issue:
Browse through our useful links.
See our article archive for complete articles.
Sign up for your free subscription.
Visit our Continuous Availability Forum.
Check out our seminars.
Check out our writing services.
Check out our consulting services.
When Cost Trumps Reliability
In my years of involvement with mission-critical systems, I see organizations time and again choosing to save money rather than ensuring that their mission-critical systems are always available. They prefer to rely on faith and hope rather than a solid system architecture. This short-sighted approach often leads to far more expensive outages or even regulatory actions or lawsuits.
An excellent example is the approach to 911 systems by many communities, as discussed in our article “911 Systems Are Failing Too Often.” 911 is the nationwide U.S. emergency number that will get a caller immediate emergency assistance. If the 911 system is down, property may be damaged or lives may be lost. Nothing can be more mission-critical.
However, 911 systems fail at an alarming rate. Some of these failures are caused by equipment outages. Others are caused by calls not being rerouted to backup facilities when a primary call center goes down.
Why are communities not investing in highly available call systems such as active/active systems or NonStop systems (there are several of both installed)? Why are call rerouting procedures not periodically tested? Every community should review their 911 systems and ensure their capabilities adequately serve the people and property they are meant to protect.
Dr. Bill Highleyman, Managing Editor
Hewlett Packard Enterprise has over three decades of experience migrating many types of complex workloads. Through its diverse know-how in delivering successful migrations, including IBM Power Systems, to open-standard platforms, HPE has learned what it takes to implement a successful migration and how to manage the inherent risks.
HPE employs established processes and unique tools for IBM Power System migrations to open systems. It has developed proven approaches to maximize the ability of the target environment to deliver better results for the line of business while reducing costs. HPE has demonstrated that the vast majority of such migrations result in a significantly less expensive operating environment – often by a factor exceeding 50%. At the same time, the new HPE open environments match or exceed the performance and availability attributes of the original Power Systems.
By migrating Power Systems to open systems, HPE also has positioned IT for the cloud as well as bringing greater agility to the original applications.
It should be obvious without saying that U.S. 911 services are among the most mission-critical of all computer applications. “911” is the nationwide U.S. emergency number that will immediately get the caller to assistance. If a 911 service fails, lives and property are put at risk. A crisis call for an ambulance, to police, or to a fire department may go unanswered or be seriously delayed.
Yet a simple Google search of “911 failures” yields pages of references to 911 system failures across the country. In some cases, the failure is the result of inadequate backup systems. In others, it is a consequence of the failure of an otherwise adequate backup system. In a preponderance of cases, the failure results from the inability to reroute emergency calls from a failed emergency response center to a backup center.
In this article, we describe some of the recent 911 system failures.
The threat of cyber attacks is relatively new and was unheard of in the mainframe days of the 1970s and 1980s. It is here today and is rapidly on the increase, both numerically and with respect to damage potential and intensity.
Traditional thinking says that a cyber attack, although a nuisance, does not constitute an outage - a nuisance, yes, but not an outage. This is not borne out by experience, and there are numerous papers that have been published describing outages caused by cyber attacks. The papers include a report to the IT advisory committee of the President of the United States. One of the papers indicated a 20-fold increase in cyber attacks between 1995 and 2005 (in the thousands), so you can imagine the numbers today. Targets for these attacks cover a wide spectrum and include:
- personal: your PC/tablet and mine.
- social media, such as dating sites, with a view to extortion or blackmail. Twitter has suffered from multiple attacks.
- finance systems with a view to obtaining money in some way.
- military and other government sites for espionage or sabotage purposes.
Cyber attacks will almost certainly cause outages of variable durations and that cannot be forecast.
At a recent fault-tolerant symposium, I presented an overview of HPE NonStop systems. I stressed the immense scalability of these systems as well as their abilities to survive any single fault (and in some cases, multiple faults).
I was struck by the interest shown by the audience in the NonStop process monitor, NonStop Pathway. I came to realize that Pathway is the foundation for application fault tolerance, scalability, and load balancing in NonStop systems. Pathway removes the concerns of these important attributes from the application programmer and implements them ‘under-the-covers.’
In this article, we review the architecture of NonStop systems and explain how Pathway provides applications with fault tolerance, scalability, and load balancing with no effort on the part of the application programmer.
Now with our Twitter presence, we don’t have to feel guilty. This article highlights some of the @availabilitydig tweets that made headlines in recent days.
Sign up for your free subscription at https://availabilitydigest.com/signups.htm
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
Managing Editor - Dr. Bill Highleyman email@example.com.
© 2016 Sombers Associates, Inc., and W. H. Highleyman