|Read the Digest in
You need the free
Thanks to This Month's Availability Digest Sponsor
In this issue:
Browse through our Useful Links.
See our article archive for complete articles.
Sign up for your free subscription.
Visit our Continuous Availability Forum.
Check out our seminars.
Check out our writing services.
Will We Ever Learn?
After developing mission-critical systems for several decades, I am well aware of the need to have good specifications, a well-documented design, code reviews, unit testing, and final testing before releasing the system to production. If the system is tremendously complex, it should be rolled out slowly, first to a test group and then adding users as the system proves itself. In this way, developers can be confident not only that the system is operating properly but also that it will be able to handle the load that will come its way.
It seems that these guidelines have become lost in the implementation of perhaps one of the most complex systems of our time – the healthcare.gov website to support the Affordable Care Act. From the very beginning of its deployment, it was a disaster. It could not handle the load that was imposed upon it by people shopping for insurance. It was fraught with bugs, and it was unable to communicate with many of the insurance company sites that were necessary to get quotes.
Rolling the website out slowly rather than as a Big Bang would certainly have exposed the capacity problems and the bugs (to date, the claim is that over 400 bugs have been fixed!). Let’s hope that the website is, in fact, repairable in time to be useful.
Dr. Bill Highleyman, Managing Editor
Clouds are expected to be highly redundant and resilient to any single failure. There is always another component that can take over in the event of a failure. Right?
Wrong! The Microsoft Azure cloud has a single point of failure, and this component failed in October, 2013. The failure caused a worldwide partial compute outage. While the glitch did not prevent cloud applications from running, it took down certain cloud-management functions for a day and a half. Specifically, new applications could not be placed into service.
Although this outage did not affect existing production applications, it certainly was irritating to heavy users. Regardless of whom it affected, a worldwide outage may certainly damage confidence in Microsoft’s ability to manage a large distributed network.
It was just last year that the entire Azure cloud went down for over thirty hours, compute capacity and all. This problem was due to a software bug in the way that Microsoft developers calculated Leap Day.
These two outages lead to an interesting observation. There is a single point of failure, and that is software. A software bug that is allowed to go into production can infect every system in the cloud.
Ransomware is a class of malware that locks up a computer and demands a ransom from the computer’s owner to unlock it. Most ransomware only freezes a computer, and the computer can often be restored by an anti-virus service provider. PCs and Android phones have been common victims of ransomware.
CryptoLocker is a variant of ransomware and is much more dangerous. It does not simply freeze a computer. It encrypts all of the files on the computer. Though the computer still runs, it cannot do anything because all of the files to which it needs access are encrypted with a key that is not available to the user. No private or government agency has yet been able to break the encryption.
CryptoLocker will only release the files if a ransom of a few hundred dollars is paid within a specified time period.
The good news, if there is any, is that the hackers have proven to be honest. Once the ransom has been paid, they have decrypted files and have not reinfected the computer. However, if the ransom is not paid, be prepared for further attacks. Security companies have yet to come up with any protection against CryptoLocker.
Ponemon Institute conducts empirical studies on critical issues affecting the management and security of sensitive information about people and organizations. It has recently completed a study on the value of live cyberthreat intelligence for combating cyberattacks. Live cyberthreat intelligence refers to intelligence data about actual cyberattacks happening now. It is delivered with no delay, as compared to delays ranging from minutes to days and even weeks for many cyberthreat-monitoring facilities.
The Ponemon study was based on a survey of 708 users and over fourteen industry segments. It shows that the average cost to large organizations for defending cyberattacks is about USD $10 million per year. The organizations estimate that if they had access to live threat information, they could save 40% of the cost, or USD $4 million per year.
The Ponemon study demonstrates the importance of having timely intelligence to stop a cyberattack. However, the majority of respondents agree that it is hard to stop an attack on enterprise systems because the threat intelligence is out-of-date. Furthermore, the high rate of false positives deters staff from pursuing the real threats and attacks.
DDoS attacks are on the rise. A DDoS attack launches a massive amount of traffic against a company’s website to overwhelm it to the point that the website no longer can function.
A particularly sensitive system in a company’s web infrastructure is its DNS server. The DNS server responds to requests to convert URLs to IP addresses so that messages can be sent to target systems over the Web. Without its DNS server, a company cannot communicate with the outside world.
Secure64’s DNS Authority is a dedicated DNS name server that is designed to be self-protecting. It identifies and blocks attack traffic while continuing to respond to DNS queries from legitimate sources. DNS Authority can reduce the need to overprovision server resources, and it eliminates the need to protect DNS servers with network security devices.
The DNS Authority server uses multiple defenses to mitigate DDoS attacks. These attacks include protocol exploits, TCP SYN floods, reflected DNS attacks, and UDP and TCP data floods. Testing showed that DNS Authority survived without incident all but the UDP and TCP floods. In these cases, DNS Authority continued to service most legitimate requests; though some requests were dropped and had to be repeated.
A challenge every issue for the Availability Digest is to determine which of the many availability topics out there win coveted status as Digest articles. We always regret not focusing our attention on the topics we bypass.
With our new Twitter presence, we don’t have to feel guilty. This article highlights some of the @availabilitydig tweets that made headlines in recent days.
Sign up for your free subscription at http://www.availabilitydigest.com/signups.htm
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.
Managing Editor - Dr. Bill Highleyman firstname.lastname@example.org.
© 2013 Sombers Associates, Inc., and W. H. Highleyman