One fine day on September 9, 1947, a team of computer scientists at Harvard University found that their computer was consistently giving errors. They scratched their heads and tried to find out what the problem was. After much analysis and ‘debugging’ they reported the world’s first computer bug. When they opened the computer’s hardware, they found a moth literally trapped inside their computer. From that day when an actual bug was discovered inside a computer terminal, the software industry has recorded several more software failures that were the result of not just helpless insects buzzing around but were caused due to errors in lines of code or faulty programming logic. Let’s look at some of the software glitches that have happened in recent history and try to understand the impact a single line of code can make on the lives of people in ways more complex than a simple technological failure.
The infamous Y2K bug
I remember this quite well. It was New Year’s Eve, December 31, 1999. Some of the major computer programs written during those times had an accepted practice of using a two-digit code to represent the year as it was less costly and saved on computer memory. The number ‘19’ was left out and so 1999 would have read 99. Computer programs all over the world would read January 1, 2000, as January 1, 1900. It would consider only the last two zeroes and append 19 as was the practice till then. I remember in the small local town I grew up in, people talking about it being the end of the world. Bank interest rates would be reset to 100 years back, airlines, stock markets, companies' bottom lines, everything would be reset and there would be chaos. Y2K was both a software and hardware problem. Engineers and computer companies raced against time to reprogram the date logic. They were able to fix it and the date was expanded to its four-digit format along with the required hardware fixes. In the end, there were very few problems, and the world was able to usher in the New Year in its full glory and grandeur.
Let’s hammer out the inconsistencies
During the 2012 London Olympics, the event of hammer throw took an interesting turn when the scores were wrongly interpreted by the scoring software. First, the Russian athlete Lysenko made a throw of 77.12 meters which was correctly recorded by the Electronic Distance Measurement machine under her name. Next, the German athlete Heidler made a hammer throw of exactly 77.12 meters which was not recognized by the machine. It was not programmed to accept successive throws of exactly the same distance as unique values and automatically rated them as mistakes. So, it threw out Heidler’s throw record and only accepted Lysenko’s. It calculated the next best score to be China’s Zhang Wenxiu’s 76.34 meters and promptly the bronze medal was awarded to her. It was only after Zhang took a victory lap with her flag around the Olympic stadium that the technical glitch was discovered. The scores were re-interpreted and Heidler was awarded the bronze model. A coding error in calculating scores led to a wrong final result tally, loss of reputation and much mental anguish for the players.
Vancouver Stock Exchange crash
In 1982, the Vancouver Stock Exchange set up the VSE index to track the performance of its stock market. The stock prices going up means that the value of the index also increases. However, contrary to the economists’ expectations, the VSE index steadily declined and almost halved in value over a two-year period. This signalled a crash in the Canadian stock market as companies’ net worth eroded in value. An index is the weighted average of a group of stocks, and when the VSE index was set up, its value was 1000.000. Based on the day’s trading, the index value was updated and recorded to three decimal places. A value of 999.876457 would get recorded as 999.876 as the remaining digits after the decimal was truncated instead of being rounded off. The result was always a decrease in the index value. As the index was updated thousands of times a day, by the time the error was discovered, the minor reductions compounded into almost a 50% drop in value over a two-year time frame. It was a huge financial loss for the economy and also the confidence of investors was shaken.
The end of the world
During the Cold War era, on 26 September 1983, Officer Stanislav Petrov of the Soviet Air Defence Forces was on duty doing his job. He was monitoring the early-warning radar system of the Soviet Union which detects and informs if any nuclear missiles have been fired towards their country. That day the radar detected the launch of missiles from the bases in the United States. As per strict protocol, this should have been interpreted as a direct nuclear attack leading to retaliatory measures from Russia. However, Officer Petrov decided to wait for some more corroborating evidence and used his better sense to further investigate the matter instead of straight away raising a red flag. This element of human judgement prevented a potential escalation and nuclear war. Investigation of the satellite warning system later determined that the system had indeed malfunctioned. Upon analysis, it was found that the satellite’s software had incorrectly reported the Sun’s rays reflected off certain clouds as missile launches.
Mariner 1 rocket for Venus diverts course
Mariner 1 was a spacecraft of NASA designed to do a planetary flyby of Earth’s close neighbour, Venus. Shortly after the liftoff on July 22, 1962, communication errors were caused between the spacecraft and its ground-based guidance system. The rocket veered off its course and mission Control destroyed the rocket 293 seconds after liftoff. A programmer had incorrectly entered a formula into the computer code, missing a single superscript bar. Without the smoothing function indicated by the bar, the software treated normal variations of velocity as if they were serious. This caused faulty corrections and sent the rocket off course. The error caused $ 18.5 million. Science-fiction author Arthur C Clarke described the error as “the most expensive hyphen in history.”
Is there something about these examples that makes you feel stumped? A feeling that how could something so obvious have been missed? Is this the scale of financial cost, reputational loss, and life and property loss when something goes wrong because of a few lines of code? The principle of exhaustive testing states that it is theoretically impossible to test all the possible defect situations. Software releases are done knowing they could have latent defects or bugs. The cost of software failure increases the further on the defect is identified in the stage of the software development life cycle. The element of human error can creep inside the best-laid-out framework and cause the software to behave differently from its intended purpose.
Conclusion
A robust testing process improves the reliability of the software being developed. Intensive hands-on training in testing tools, and processes, knowledge of regulatory and compliance standards and maintaining continuous integration and deployment will help strengthen the overall testing process. It is true that there are millions of cases of success stories also where marvels in software development have catapulted mankind into a dynamic growth phase. Learning from lessons of the past, working with cross-functional teams in an agile manner, and gravitating towards a shift-left approach are all signs that we are trying to achieve the best quality output with a highly trained, dedicated and motivated team in the software development process.
Commentaires