Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Availability. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CA tells you how to avoid the effects of downtime.

www.availabilitydigest.com

Follow us

@availabilitydig

Thanks to This Month's Availability Digest Sponsor

Opsol Integrators, Inc. is a leading HP NonStop System Integrator for Telco and Financial Services.

Our solutions deliver high-transaction throughput and leverage the reliability and scalability of HP

NonStop servers for unparalleled performance. Solutions include large-scale messaging services,

legacy integration with new services, and highly efficient transactional services.

In this issue:

Never Again

Azure Downed by Single Point of Failure

Best Practices

CryptoLocker - Destructive Ransomware

Availability Topics

Ponemon on Live Threat Analysis

Product Reviews

Surviving DNS DDoS Attacks

Tweets

The Twitter Feed of Outages

Browse through our Useful Links.

See our article archive for complete articles.

Visit our Continuous Availability Forum.

Check out our seminars.

Check out our writing services.

Will We Ever Learn?

After developing mission-critical systems for several decades, I am well aware of the need to have good specifications, a well-documented design, code reviews, unit testing, and final testing before releasing the system to production. If the system is tremendously complex, it should be rolled out slowly, first to a test group and then adding users as the system proves itself. In this way, developers can be confident not only that the system is operating properly but also that it will be able to handle the load that will come its way.

It seems that these guidelines have become lost in the implementation of perhaps one of the most complex systems of our time – the healthcare.gov website to support the Affordable Care Act. From the very beginning of its deployment, it was a disaster. It could not handle the load that was imposed upon it by people shopping for insurance. It was fraught with bugs, and it was unable to communicate with many of the insurance company sites that were necessary to get quotes.

Rolling the website out slowly rather than as a Big Bang would certainly have exposed the capacity problems and the bugs (to date, the claim is that over 400 bugs have been fixed!). Let’s hope that the website is, in fact, repairable in time to be useful.

Dr. Bill Highleyman, Managing Editor

Never Again

Windows Azure Downed by Single Point of Failure

Clouds are expected to be highly redundant and resilient to any single failure. There is always another component that can take over in the event of a failure. Right?

Wrong! The Microsoft Azure cloud has a single point of failure, and this component failed in October, 2013. The failure caused a worldwide partial compute outage. While the glitch did not prevent cloud applications from running, it took down certain cloud-management functions for a day and a half. Specifically, new applications could not be placed into service.

Although this outage did not affect existing production applications, it certainly was irritating to heavy users. Regardless of whom it affected, a worldwide outage may certainly damage confidence in Microsoft’s ability to manage a large distributed network.

It was just last year that the entire Azure cloud went down for over thirty hours, compute capacity and all. This problem was due to a software bug in the way that Microsoft developers calculated Leap Day.

These two outages lead to an interesting observation. There is a single point of failure, and that is software. A software bug that is allowed to go into production can infect every system in the cloud.

--more--

Best Practices

CryptoLocker – Destructive Ransomware

Ransomware is a class of malware that locks up a computer and demands a ransom from the computer’s owner to unlock it. Most ransomware only freezes a computer, and the computer can often be restored by an anti-virus service provider. PCs and Android phones have been common victims of ransomware.

CryptoLocker is a variant of ransomware and is much more dangerous. It does not simply freeze a computer. It encrypts all of the files on the computer. Though the computer still runs, it cannot do anything because all of the files to which it needs access are encrypted with a key that is not available to the user. No private or government agency has yet been able to break the encryption.

CryptoLocker will only release the files if a ransom of a few hundred dollars is paid within a specified time period.

The good news, if there is any, is that the hackers have proven to be honest. Once the ransom has been paid, they have decrypted files and have not reinfected the computer. However, if the ransom is not paid, be prepared for further attacks. Security companies have yet to come up with any protection against CryptoLocker.

--more--

Availability Topics

Ponemon on Live Threat Analysis

Ponemon Institute conducts empirical studies on critical issues affecting the management and security of sensitive information about people and organizations. It has recently completed a study on the value of live cyberthreat intelligence for combating cyberattacks. Live cyberthreat intelligence refers to intelligence data about actual cyberattacks happening now. It is delivered with no delay, as compared to delays ranging from minutes to days and even weeks for many cyberthreat-monitoring facilities.

The Ponemon study was based on a survey of 708 users and over fourteen industry segments. It shows that the average cost to large organizations for defending cyberattacks is about USD $10 million per year. The organizations estimate that if they had access to live threat information, they could save 40% of the cost, or USD $4 million per year.

The Ponemon study demonstrates the importance of having timely intelligence to stop a cyberattack. However, the majority of respondents agree that it is hard to stop an attack on enterprise systems because the threat intelligence is out-of-date. Furthermore, the high rate of false positives deters staff from pursuing the real threats and attacks.

--more--

Product Reviews

Surviving DNS DDoS Attacks

DDoS attacks are on the rise. A DDoS attack launches a massive amount of traffic against a company’s website to overwhelm it to the point that the website no longer can function.

A particularly sensitive system in a company’s web infrastructure is its DNS server. The DNS server responds to requests to convert URLs to IP addresses so that messages can be sent to target systems over the Web. Without its DNS server, a company cannot communicate with the outside world.

Secure64’s DNS Authority is a dedicated DNS name server that is designed to be self-protecting. It identifies and blocks attack traffic while continuing to respond to DNS queries from legitimate sources. DNS Authority can reduce the need to overprovision server resources, and it eliminates the need to protect DNS servers with network security devices.

The DNS Authority server uses multiple defenses to mitigate DDoS attacks. These attacks include protocol exploits, TCP SYN floods, reflected DNS attacks, and UDP and TCP data floods. Testing showed that DNS Authority survived without incident all but the UDP and TCP floods. In these cases, DNS Authority continued to service most legitimate requests; though some requests were dropped and had to be repeated.

--more--

Tweets

@availabilitydig – The Twitter Feed of Outages

A challenge every issue for the Availability Digest is to determine which of the many availability topics out there win coveted status as Digest articles. We always regret not focusing our attention on the topics we bypass.

With our new Twitter presence, we don’t have to feel guilty. This article highlights some of the @availabilitydig tweets that made headlines in recent days.

--more--

Sign up for your free subscription at https://availabilitydigest.com/signups.htm

Would You Like to Sign Up for the Free Digest by Fax?

Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543

Name:

Email Address:

Company:

Title:

Telephone No.:

Address:

____________________________________

The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.

Managing Editor - Dr. Bill Highleyman editor@availabilitydigest.com.