Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Availability. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CA tells you how to avoid the effects of downtime.

www.availabilitydigest.com

Follow us

@availabilitydig

Thanks to This Month's Availability Digest Sponsor

FileSync automatically replicates, synchs apps & files between NonStop servers. FileSync Deduplication streams only changed data to reduce bandwidth, backup size. Command Stream Replicator automatically replicates DDL structure changes, HP utility operations to backup systems. FileSync & CSR prevent configuration drift. Processing environments on backup systems match those on production systems.

In this issue:

Never Again

Mt. Gox, Largest Bitcoin Exchange, Fails

Best Practices

911 Systems Show Unacceptable Availability

Availability Topics

Leslie Lamport Wins Turing Award

Product Reviews

Introduction to HP Serviceguard Clusters

Tweets

The Twitter Feed of Outages

Browse through our useful links.

See our article archive for complete articles.

Visit our Continuous Availability Forum.

Check out our seminars.

Check out our writing services.

Can Data Theft Affect Availability?

We often think of availability issues as being caused by unanticipated events such as equipment failures, weather, power outages, user errors, and failover faults. Malware, on the other hand, is typically focused on the stealing of data of various sorts, such as personal information. However, many of us don’t consider data theft as a mechanism for taking down a system.

Think again. In this issue’s article entitled “Mt. Gox, Largest Bitcoin Exchange, Goes Belly Up,” we describe how hackers gained access to Mt. Gox’s treasury and transferred over 850,000 bitcoins worth a half-billion dollars USD into the hackers’ digital wallets. Most of these bitcoins were being stored digitally for Mt. Gox’s customers.

Even more disturbing, the theft took place over a period of several years, a few bitcoins at a time. Mt. Gox’s auditing procedures were so weak that by the time it detected the theft, it was bankrupt. It shut down its website and closed its doors.

Yes, undetected data theft can be catastrophic enough to take down a system and the entire company with it. In our corporate seminars on high availability, the potential impact of malware on system availability is becoming an increasingly important topic

Dr. Bill Highleyman, Managing Editor

Never Again

Mt. Gox, Largest Bitcoin Exchange, Goes Belly Up

Bitcoins are a digital currency. You hold your bitcoins in your bitcoin wallet. A bitcoin wallet is essentially a public key and is protected by public-key encryption. You can use bitcoins to buy and sell merchandise, or you can hold them for investment.

What an investment! If you had purchased $1,000 of bitcoins in early 2011, they would be worth $2,000,000 now – a 2,000:1 increase in value. During this period, bitcoins appreciated from $0.30 to $600 USD each (with a peak price breaking $1,000). You would have made a bundle - if you didn’t lose it to hackers, that is.

That is what happened to thousands of bitcoin investors when the world’s largest bitcoin exchange, Mt. Gox, lost almost all of its bitcoins. As it filed for bankruptcy in Japan and then in the U.S. in February, 2014, Mt. Gox admitted that hackers stole over a period of years 755,000 bitcoins that it was storing digitally for its customers and another 100,000 bitcoins that it owned. At the going price of $600, this amounts to a theft of over $500 million USD.

--more--

Best Practices

911 Systems Experience Unacceptable Availability

Around the world, we count on emergency numbers for critical police, fire, and medical support. Most emergency call systems (certainly the larger ones) are dependent upon computer-aided dispatch (CAD) systems. They can be large server farms providing multiple applications. If any one of the applications should fail, the dispatching of emergency services can be severely hampered, with potential loss of life or property.

Unfortunately, there is much evidence that these systems are not meeting the necessary availability requirements, often by a large margin. We look at some of that evidence in this article. Statistics show that the availability of these systems is, at best, dismal. Classic approaches to high availability such as clusters miss the mark by a wide margin, as do virtualized systems and cloud deployments. All of these approaches result in hours of downtime per year, during which life and property are at great danger. Only fault-tolerant systems can provide the availability needed – only a few minutes of downtime per year.

What is the value of life? Is it worth the investment in fault-tolerant systems? This is a critical decision that each town, city, county, or state must make.

--more--

Availability Topics

Leslie Lamport Wins Turing Award for Distributed Computing

Leslie Lamport, a Principal Researcher at Microsoft Research, has been named the recipient of the 2013 ACM (Association of Computing Machinery) A. M. Turing Award for his contributions to the reliability of distributed computing systems. He contributed to the theory and practice of building distributed computing systems that work as intended.

The A.M. Turing Award, the ACM's most prestigious technical award, has been awarded annually since 1966 for major contributions of lasting importance to computing. The award is accompanied by a cash prize of $250,000, which in recent years has been underwritten by Intel Corporation and Google, Inc.

Lamport’s award citation reads in part as follows:

For fundamental contributions to the theory and practice of distributed and concurrent systems, notably the invention of concepts such as causality and logical clocks, safety and liveness, replicated state machines, and sequential consistency. Along with others, he invented the notion of Byzantine failure and algorithms for reaching agreement despite such failures.

The theory and practice of concurrent distributed systems has been significantly advanced by Lamport’s work. These systems, once impractical, are now common and are working well.

--more--

Product Reviews

Introduction to HP Serviceguard Clusters

Clusters of computers can provide both high availability and high performance. While IT infrastructures are susceptible to planned and unplanned downtime, a key requirement of IT is to minimize or eliminate these disruptions as related to business. Proactively protecting IT infrastructures from failures and service interruptions can be solved via HP’s clustering technology, HP Serviceguard, to provide high availability of IT services.

In addition to high availability, science and engineering functions depend upon a high performance infrastructure to solve large problems in their work areas. Weather forecasting, DNA analyses, market predictions, where to drill for oil and gas, and advanced research in the field of science are a few examples belonging to this sector. The requirement for high-performance computing, which is not achievable through typical stand-alone computers, is addressed by aggregating computing power in a way that delivers higher computing capacity. Aggregation of computers for high-performance computing is also achieved through clusters of computers managed by Serviceguard.

In failover clusters, we refer to how quickly a workload fails over, thus providing high availability. Compute clusters, however, refer to the rate at which a job gets processed.

--more--

Tweets

@availabilitydig – The Twitter Feed of Outages

A challenge every issue for the Availability Digest is to determine which of the many availability topics out there win coveted status as Digest articles. We always regret not focusing our attention on the topics we bypass.

With our new Twitter presence, we don’t have to feel guilty. This article highlights some of the @availabilitydig tweets that made headlines in recent days.

--more--

Sign up for your free subscription at https://availabilitydigest.com/signups.htm

Would You Like to Sign Up for the Free Digest by Fax?

Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543

Name:

Email Address:

Company:

Title:

Telephone No.:

Address:

____________________________________

The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.

Managing Editor - Dr. Bill Highleyman editor@availabilitydigest.com.