Let's blame Indian service providers for BA computer crash!

29th May 2017

BA.. getting back in the air after computer crash

Hardly surprising that Western tabloid media   finds a soft and entirely unfounded scapegoat for BA's system snafu in Indian  service providers. But the crash is part of a disturbing pattern of technology gone wrong
IndiaTechOnline Opinion
Bangalore, May 29 2017:Within minutes of the news of the massive computer failure that grounded over 1000 British Airways services and virtually shut down its operations in two of London's airports, some wire agencies were   putting out stories that were carried world-wide ( and to its shame, even in some Indian print media) with headlines like : "Outsourcing IT jobs to India caused technical failure: British Airways".
This was good old English xenophobia at full play. But never mind that BA said nothing of the kind. It was easier to quote instead,   a 'national officer for aviation' at the leading trade union of the aerospace industry, whose deep knowledge and expertise led him to instantly conclude: "This could all have been avoided" if only the airline had not laid off hundreds of its own skilled IT staff and outsourced the work to India.
Then BA came out with its official explanation, announced by CEO Alex Cruz: "We believe the root cause was a power-supply issue..." So clearly no failure on the part of nameless, defenceless Indian engineers.   But why let facts get in the way of a good story?   The UK tabloid The Sun found some India-bashing opportunities even in a power outage in London. It wrote on Day 2: "British Airways IT crash was caused because inexperienced staff in India didn’t know how to kick-start the airline’s back-up system."  Who said so? Why, that old fall back: "sources had claimed".
These nameless sources in all likelihood, couldn't tell their back sides from a backup system. Because backup systems ( also called hot start systems) are used in all computer-based operation systems except the very small, to provide redundancy if the primary system fails. These 'stepneys' kick in, when the main system fails, much like a UPS takes over when the mains supply to your PC fails. It is instantaneous -- usually within a fraction of a second to 3 seconds -- and these standby systems reside away from the primary location, but in the same geography   for multiple reasons. If the main system is struck by an earthquake, say, you don't want the standby to be also knocked out -- so you locate it   physically separate at a minimum distance. When New York had its last big black out, many of the Wall Street computers failed to switch to their backups because these were located in New Jersey -- too close to New York and subject to the same blackout.
Theoretically you can locate the standby on another continent because this is the age of Virtualization: computer servers and memory can be in one place and still appear, virtually to be somewhere else. It's called Cloud computing. But in practice, large corporates -- and BA is no exception -- like to have their critical assets physically located in the same country -- for regulatory and legal reasons. We're  guessing here because no one will go on record here -- but even if a large Western corporation uses an Indian outsourcing partner to remotely keep its systems up and working, it won't shift its physical assets to India. So if there was a massive power outage at BA last week, they will be looking under the hood of systems in the UK or US or EU or wherever the airline keeps its backup. And no, you can't blame inexperienced engineers in India for the backup not kicking in to take over in London. It is an almost automatic process. But there are some manual interventions possible -- and they will need to be done at the BA end of the system, logically in Heathrow or where ever in the UK, BA houses its system controls. Not in the Bangalore or Mumbai offices of TCS is known to be a prime supplier of IT services (but not the only one) to BA.
That said, it is a technologically troubling issue that a power outage can have such cascading and catastrophic, system-wide impact. It seems everything was down -- all the passenger-facing systems like check-in and website services and mobile apps as well as operational aspects like flight routing, baggage traces etc.   Aviation computer applications are built in separate silos and usually they are insulated from each other so that a break down in one doesn't affect other operations.
But there is a disturbing pattern here: In August last year it was another power failure that affected Delta Airlines and caused it to cancel 450 flights.   In October 2016 United Airlines suffered its own computer breakdown that disrupted its global operations. In each instance -- and massively so in the case of BA last week -- the trauma and inconvenience caused to hapless customers is immeasurable and cannot be neutralized by some grudging monetary and other compensations.
Goldfinger famously said to James Bond: 'Once is happenstance. Twice is coincidence. Three times is enemy action." It is now three strikes, in less than a year. It is time for the civil aviation industry to submits its operational computer systems to an egoless outside audit, to accept the findings and fill the technology cracks, no matter what the expense. Otherwise   lay airline customers may be pardoned if they sense some enemy action at play -- and conclude that airlines are the common enemy.