A Full Service Global Management Consulting and Executive Training Firm
by Kent F. Moors, Ph.D.
Executive Managing Partner:
Risk Management Associates, International, LLP
The crash of a critical legacy system at a regional carrier is a classic risk management mistake that cost the airline $20 million and badly damaged its reputation. A multi-year problem had surfaced in the airline's attempt to replace an aging legacy system used to manage flight crews. The application was one of the oldest in the company, was written in Fortran (which no one in the airline's IT department was fluent in) and was the only system left that ran on the airline's old IBM AIX platform (all other applications ran on HP Unix).
A well known software provider came in to pitch a new crew management system. The software was found to have frustrating application problems. The consensus among the personnel at the airline was, if they needed to bear the expense of replacing the old crew management system, they should wait for a more satisfactory substitute to come along. And wait they did. The prospect of replacing the aging crew management system was floated over the next four years, but nothing happened. The airline was subsequently purchased by a major carrier, was grounded by a pilot strike, and suffered the downturn that ravaged the airline industry after the September 11, 2001 attacks.
Finally, a replacement system was approved, but the switch didn't happen soon enough. Over a major holiday period, the legacy system failed, bringing down the entire airline, canceling or delaying 3,900 flights, and stranding nearly 200,000 passengers. The network crash cost the regional airline and its new parent company $20 million, significantly damaged the airline's reputation and prompted an investigation by the Department of Transportation.
In all likelihood, the whole mess could have been avoided if either the regional carrier or it's the major airline who was its parent company had done a comprehensive analysis of the risk that this critical system posed to daily operations and had taken steps to mitigate that risk. But senior executives did not consider a replacement system an urgent priority, and IT did little to disrupt that sense of complacency. Though everyone seemed to know that there was a need to deal with the aging applications and architecture that supported the growing regional carrier, and the company even created a five-year strategic plan for just that purpose, a lack of urgency prevailed.
Additionally, after its acquisition, IT heads did not do the kind of thorough management analysis that might have persuaded the parent airline to invest in a replacement system before it was too late. In fact, the acquiring company saw a strong regional service provider, and the regional carrier did nothing to shed light on problem areas.The parent company's attitude was, why fix success? Even IT looked fine to them. The regional had project time lines and good budgets. So there was no mandate to put IT division in order.
Instead, the parent kept a lid on capital expenditures at the regional, with unfortunate consequences. The failure of the over 20-year-old scheduling system not only saddled the parent company with a mountain of customer service and financial headaches that the airline could ill afford but it also provides a cautionary tale for any company that thinks it can operate on its management systems for "just one more day."Unfortunately, no one can see a crew management system age the way you can see an airplane rust. But they do.
Finally in late 2004, the regional carrier received permission from its parent to replace the aging legacy system with a new one. The change was to take place in 2005. Unfortunately, that would be too late. No one at the regional realized that the now over two decade old crew management system could process only a set number of changes, to a maximum of 32,000 per month, before shutting down. And that's exactly what happened. On Christmas Eve, 2004 all the rescheduling necessitated by bad weather forced the system to crash. As a result, the regional had to cancel all 1,100 of its flights on Christmas Day, stranding tens of thousands of passengers heading home for the holidays. It had to cancel nearly 90 percent of its flights on December 26, stranding more. There was no backup system. It took a full day for the vendor to fix the software. But the regional carrier was not able to operate a full schedule until December 29.
The lesson here addresses the absolute necessity of administering risk oversight plans in management systems. It is a classic case study in operational risk, and both the regional carrier and its parent company share responsibility. Executives at the regional should have done the kind of risk management analysis that would have alerted the new parent company of dangers likely by not updating the legacy system sooner. And IT should have repeatedly brought that analysis to the attention of parent company officials until a replacement system was funded. Simple cost-benefit analyses would have driven the point home.
Similarly, executives at the parent should have insisted on scrutinizing regional operations, including spending some time and money on conducting their own analysis of the carrier's risks.Anything that can damage a parent company's brand or reputation has to be managed. Risk assessment of worst-case scenarios at the regional was in the best interest of the parent company and should have been conducted.
A larger problem is that operational risks are not introduced into day-to-day decision making at many companies, in the same manner as more technical elements such as airplane mechanics and daily operations are. And executive responsibility at the regional airline in question still appears wanting. As late as March 2005, the regional's CEO was still blaming the service collapse on bad weather. And what of the defective and old legacy system at the center of this debacle? As of July 31, 2005, it was still the sole crew management system utilized. Plans remain in the works to replace it at some later date.