Read the Digest in PDF. You need the free Adobe Reader.

The digest of current topics on Continuous Processing Architectures. More than Business Continuity Planning.

BCP tells you how to recover from the effects of downtime.

CPA tells you how to avoid the effects of downtime.

www.availabilitydigest.com

Thanks to This Month's Availability Digest Sponsor

HP NonStop Security & Encryption Solutions from XYPRO. From Access Control, User Authentication,

Authorization and Password Quality to Automated Compliance and Audit Reporting, Integrity Checking

and FIPS Validated Encryption, XYPRO has been exceeding your NonStop security needs since 1983.

In this issue:

Never Again

Fire Suppression Suppresses WestHost

Best Practices

Data Center Cooling Nature's Way

Availability Topics

Anti-Virus - A Single Point of Failure?

Product Reviews

HP's Reliable Transaction Router

Browse through our Useful Links

Check our article archive for complete articles.

Join us on our Continuous Availability Forum.

Full-Day Continuous Availability Seminar at HPTF

Listen to our Sneak-Peek Webinar on May 24^th

It’s that time again. The HP Technology Forum (HPTF) takes place at the Mandalay Bay in Las Vegas from June 21^st to June 24^th. Dr. Bill Highleyman, Managing Editor of the Availability Digest, will be giving a full-day preconference seminar (YOU-Conns) entitled “Achieving Continuous Availability with Active/Active Networks” on Monday, June 21^st. It will cover a range of important topics, including:

Concepts in availability, with application to active/active networks.
Why continuous availability and disaster tolerance is different from high availability and disaster recovery.
Never Again horror stories from those who ignore disaster tolerance.
Active/active network architectures.
How active/active networks eliminate both unplanned and planned downtimes.
Data-replication engines – the heart of active/active networks.
How active/active networks compare to other failure-tolerant architectures.
Other advantages of active/active networks.
Case studies of successful production deployments of active/active networks.

Register now for this informative full-day class by clicking on the HPTF logo on Connect’s home page at www.connect-community.org (Connect is the HP Business Technology Community).

If you want to know more about what will be presented, register for our sneak-peak webinar on Monday, May 24^th, from 11 AM to 12 noon EDT (15:00 GMT). We’ll “see” you then.

Never Again

Fire Suppression Suppresses WestHost for Days

It’s not a good idea to test a fire-suppression system by triggering it. But that’s what happened to WestHost, a major web-hosting provider. The accidental release of a blast of fire-suppressant gas severely damaged most of its servers and data stores.

Not only were there no offsite backup servers to which to fail over, but backup data stores were located onsite and were themselves damaged. Between repairing hardware servers and disk arrays and restoring data from damaged backup storage systems, WestHost’s customers were out of business for up to six days. Some backup data was unrecoverable.

This disaster emphasizes the fact that the one ultimately responsible for the continuity of your web services and the protection of your data is you. You must have contingency plans to ensure that you can at least remain in contact with your customers and your employees during a hosting data-center outage and that you can restore your data should your hosting center lose it.

In addition, if your web services are critical to the survival of your company (for instance, if you run an online store), you must have a way to switch over to a temporary backup web site.

--more --

Best Practices

Data Center Cooling Nature’s Way

How would you like an annual electric bill of $7,000,000? That is about what a typical large data center drawing ten megawatts of power pays, even at a negotiated rate of eight cents per kilowatt hour (about half the residential rate in many areas of the country).

Heat is a data center’s worst enemy. In many data centers, half or more of the consumed energy is used simply to cool the IT equipment – servers, network devices, storage arrays, consoles, and so forth. Cooling all this equipment not only costs a lot of money, but it also has a significant impact on our environment.

No wonder so many companies are aggressively looking at ways to reduce cooling costs. One successful approach that has been taken is to locate a data center in a cold climate and to simply use the outside air, unconditioned, to cool the data center. Intel has taken this approach to an extreme by locating an experimental data center in the desert – with amazing results.

In this article, we look at the Intel experiment and at several production data centers using Mother Nature’s own cooling.

--more --

Availability Topics

Anti-Virus – A Single Point of Failure?

What do active/active systems, clusters, fault-tolerant systems, and standby systems have in common? They all avoid a single point of failure. True, fault-tolerant systems and clusters will not survive a site failure; and standby systems have been known not to come up when needed. But active/active systems are immune, right?

On April 21^st, McAfee, one of the leading anti-virus vendors, proved this conjecture to be wrong. It sent out an anti-virus update that immediately took down hundreds of thousands – maybe millions – of computers worldwide. This one bad update could have stopped every node in an active/active system, and our “indestructible system” would have been destroyed – a single point of failure.

Worse still, the bad update required manual intervention on every individual computer to restore it to service, taking those data centers with thousands of Windows servers offline for hours and, in some cases, for days.

This was a nightmare scenario - an automatic update that wiped out a crucial system file and which could only be repaired manually. What should we do to protect ourselves from such a problem? Don’t rush to automatically apply any kind of update. Test them first in a safe environment, roll them out slowly one system at a time, or wait to see if problems are reported by others.

--more --

Product Reviews

HP’s Reliable Transaction Router

HP’s Reliable Transaction Router (RTR) provides reliable transaction-messaging services between multivendor clients and servers. The clients and servers in an RTR application network can be any mix of HP’s OpenVMS, HP-UX, Linux, and Windows servers. Therefore, RTR allows heterogeneous systems to be consolidated into a single, highly redundant, reliable, and scalable application network.

Transaction integrity is provided via the two-phase commit protocol. Recovery from a node or network failure is immediate and transparent to the users. In-flight transactions are preserved; and no data is lost, allowing RPOs (Recovery Point Objectives) of zero to be met.

Planned downtime can be eliminated by rolling upgrades through the network. The network can be scaled by adding nodes with no application changes. Extensive management facilities are provided via intuitive web-browser interfaces.

To a large extent, RTR is based on the same technology as that of HP’s OpenVMS active/active split-site clusters. RTR is an important adjunct to OpenVMS clusters since these clusters execute commands at the database read/write/update level and do not support transaction processing. RTR running in an OpenVMS cluster brings transaction processing to these active/active clusters.

--more --

Sign up for your free subscription at https://availabilitydigest.com/signups.htm

Would You Like to Sign Up for the Free Digest by Fax?

Simply print out the following form, fill it in, and fax it to:

Availability Digest

+1 908 459 5543

Name:

Email Address:

Company:

Title:

Telephone No.:

Address:

____________________________________

The Availability Digest may be distributed freely. Please pass it on to an associate.

Managing Editor - Dr. Bill Highleyman editor@availabilitydigest.com.