Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Availability. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CA tells you how to avoid the effects of downtime.

www.availabilitydigest.com

Are you looking to modernize your NonStop System? TIC Software is the leading provider of software

products & services that help NonStop customers modernize their applications. We assist companies

integrate their NonStop environment with technologies such as: SOA, XML, Business Intelligence,

Report Distribution and Printer & Output Content Management. Visit us at http://www.ticsoftware.com.

In this issue:

Best Practices

Data Center Monitoring with Open Nagios

Availability Topics

Help! My Data Center is Down! - Storage

Amazon's Availability Zones

Product Reviews

CSR Synchronizes NonStop Systems

Browse through our Useful Links.

Check our article archive for complete articles.

Join us on our Continuous Availability Forum.

Join Us At BITUG’s Big SIG

The British Isles HP NonStop User Group (BITUG) is hosting its Big SIG on December 7^th and 8^th (www.bitug.com). Promoted as the largest dedicated NonStop event in the world in 2011, the Big SIG will be held in London at the historic Trinity House next to the Tower of London. Wednesday, December 7^th, is dedicated to an education day. Thursday, December 8^th, is the meeting proper with two dozen technical presentations.

I am honored to be giving Thursday’s keynote address, entitled “Help! My Data Center is Down!” In this presentation, I describe several spectacular data-center failures that were caused by unimaginable events. These experiences show that no matter what steps you take to protect your data center, something out there is lurking to take you down. Even your critical applications running on NonStop servers are not immune.

I encourage you to attend this major NonStop event and see what lessons you can take back to your data center based on the disaster stories we will discuss.

Dr. Bill Highleyman, Managing Editor

Best Practices

Data Center Monitoring with Open-Source Nagios

A primary requirement for achieving high availability is to be able to act proactively, not reactively, to problems as they arise. Problems should be detected at the earliest possible moment so that automated or manual actions can be taken to correct the situation. In order to accomplish this, a monitoring system that integrates all systems into a single data center-wide view must be in place.

BV Zahlungssysteme, or BV Payment Systems in English, provides services for card-based payment transactions and electronic banking for German banks. Its credit-card, debit-card, and online banking services always must be available, as their failure can bring German retail commerce to a halt.

Four HP NonStop servers comprise the heart of the company’s financial-service processing architecture. Supporting the NonStop servers are many Unix, Linux, and Windows servers. To keep this complex operational, it is imperative to be able to monitor all of the servers and the other data-center components with a single system monitor.

Unfortunately, currently available monitors that keep tabs on commodity servers do not support NonStop servers. BV Payment Systems undertook a project to extend the open-source Nagios monitor to NonStop servers so that the company can monitor its two data centers via a “single pane of glass.”

--more--

Availability Topics

Help! My Data Center is Down! Part 2: Storage Outages

Increasingly, the data center has become part of the lifeblood of a company. If the data center goes down, so do many of the services that a company provides to its customers, vendors, and employees.

In our previous article in this series, we discussed several unimaginable, power-related events that took out data centers, with outages lasting hours and even days. These ranged from a truck driver’s heart attack and a battery-room explosion to the simple act of plugging in a coffee pot. The failure to keep a tree trimmed triggered the great Northeast Blackout of 2003.

In this article, we look at some spectacular storage-system failures. Corporate data is one of the most prized assets of a company. Companies do everything they can to protect the integrity of their data, from maintaining real-time remote backups to long-term offsite storage. Unfortunately, as we shall see, the media is replete with horror stories of companies that have lost their data for long periods of time or forever.

--more--

Amazon’s Availability Zones

A major step forward in achieving high availability in the cloud is Amazon’s Availability Zones. Availability Zones allow a company to run multiple instances of its critical applications in different data centers so that the applications can survive even a data-center failure.

There have been several spectacular cloud failures recently, ranging from hours to days, due to a wide variety of causes – power, storage, networks, and people. These outages cut across all cloud-service providers, large and small – Amazon and Google have both contributed their share. A lesson to be learned from such outages is that the root cause of the next cloud failure is probably unimaginable.

Amazon’s Availability Zones provide a powerful approach to guarantee survivability of critical applications even if an entire Availability Zone should fail. Each Availability Zone is an independent data center that is fault-isolated from other Availability Zones. Application instances can be run in two or more Availability Zones either as multiple operational instances or as active/backup pairs. Should an Availability Zone fail, an instance in another Availability Zone can take over the processing of the application instance in the failed Availability Zone.

--more--

Product Reviews

FileSync and CSR Synchronize NonStop Systems: Part 2 – Command Stream Replicator

Failover to a backup system often fails because the backup system’s software configuration is different from that being run by the production system. We call this configuration drift.

For HP NonStop systems, NonStop RDF and third-party data replication engines synchronize database contents. FileSync from TANDsoft synchronizes files. However, what is left are configuration changes entered via a variety of utilities.

Command Stream Replicator (CSR) from TANDsoft fills in the last piece of the configuration-synchronization puzzle. CSR replicates specified operator commands entered on the production system to the backup system or to other target systems in order to keep the system configurations synchronized.

Command Stream Replicator replicates everything that other replicators don’t. It requires no application modifications, nor does it require access to the utility source code. It simply intercepts commands as they are entered and sends them to a target system for execution on that system.

CSR improves failover reliability to a backup system by ensuring that the production and backup systems are uniformly configured. It supports the replication of configuration changes to all systems in an active/active configuration. The results are reliable failovers and a significant simplification of NonStop system administration procedures in a multisystem environment.

--more--

Sign up for your free subscription at https://availabilitydigest.com/signups.htm

Would You Like to Sign Up for the Free Digest by Fax?

Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543

Name:

Email Address:

Company:

Title:

Telephone No.:

Address:

____________________________________

The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.

Managing Editor - Dr. Bill Highleyman editor@availabilitydigest.com.