Excel Pitfalls

ExcelExcel is the most common and popular data analysis tool in use today, being used by over a billion people . It has become the go-to tool for quick calculations, analysis, and visualizations because of its ease of use and interactive, intuitive nature. It has also become the foundation of many complex workflows and data models, especially in the financial industry. However, its ubiquity and popularity have also made it a dangerous tool, causing nightmares for all involved as well as resulting in large volumes of unsearchable homeless data and the creation of hard to search information silos. Here are some of the most common issues with Excel.

Spreadsheets are error prone

It is very easy to make mistakes in Excel and it is estimated that almost 90% of spreadsheets have mistakes. Some very experienced researchers have made some very common mistakes resulting in catastrophic results. Here are some of the most common mistakes, and some famous cases of their occurrence:

  • Not selecting all cells when performing a calculation.

The most famous example of this is the Reinhart and Rogoff study that concluded economic growth slows when a country’s debt reaches 90% of GDP. This became the foundation of many policy debates and was the cornerstone of  Paul Ryan’s proposed 2013 budget entitled The Path to Prosperity: A Blueprint for American Renewal. It turned out that Reinhart and Rogoff omitted 5 of the 20 countries in their calculations. What was calculated to be a 0.1% decline actually worked out to be a 2.2% average increase in growth.

  • Hiding cells instead of deleting them

The most prominent example of this error it the mistake made by an attorney during the purchase of Lehman Brothers assets by Barclay’s during 2008 financial crisis. In reformatting a spreadsheet containing Lehman Brothers trading contracts to be acquired by Barclay’s, the attorney did not realize that 179 trading contracts were contained in hidden cells. This cost Barclay’s millions as they had to absorb the additional contracts.

  • Cut/Copy paste errors

One of the most used features of spreadsheets is also responsible for most of the errors and has cost organizations millions of dollars. The TransAlta debacle cost the firm $24 million USD when a cut-paste error resulted in the purchase of energy transmission contracts at higher prices that it should have. A similar error cost JP Morgan $6 billion USD in the London Whale incident when data were copied and pasted incorrectly from other spreadsheets to the spreadsheet used to create the risk model.

Other errors that are commonly found in spreadsheets are miscalculations due to leading spaces in front of a number, inconsistent and incorrect data types, and missing data. Excel does not have a robust error message system to help users with these types of errors. Many of these issues get amplified when spreadsheets are shared across groups and teams work independently and then try to merge into a master spreadsheet. It is very difficult, and sometimes impossible, to find these errors.

Spreadsheets have repeatability issues and lack transparency

Spreadsheets do not lend themselves to repeatable work and auditable calculations or information. Writing formulas and incorporating logic into a spreadsheet often results in difficult to read, brace and comma heavy scripts in a linear, across the page format. As formulas become more complex, errors become more difficult to trace. It is also difficult to understand the formula itself and to make sense of what a researcher was trying to accomplish since Excel does not allow commenting. Excel does have a tool to allow one to step through and equation or formula, but this still is not as effective as a simple script written in Python to accomplish the same function.

They multiply and create homeless data

Look in any directory on a server and you will probably find loads and loads of spreadsheets created by various users. These often don’t have any project association and lack any kind of documentation. As employees move through an organization these files multiply like rabbits and forgotten. It is possible that these contain valuable information but with no context or defined relationships it is difficult to know. As mentioned before, searching one spreadsheet is easy; it is impossible to search multiple spreadsheets for a value, keyword, or commonality. This can get expensive as they occupy more and more server space. Deleting them could lead to problems if any of them are linked to other spreadsheets.

Final Thoughts

These are just some of the issues that can arise from using Excel. Initially, its use is innocent enough. You perform some quick calculations, copy in some data to create a scatterplot, and arrive at a conclusion. You pass along the spreadsheet by email and others contribute and send it back. Now you have multiple versions and you made more calculations while others were working with your original version. All of a sudden you have multiple workbooks and you work to combine them, but all versions are retained, by you and your colleagues. At some point, the spreadsheet becomes a tool for your team and grows. Errors lurk and fester undetected. This happens all the time. There are better tools, such as Python and SQL for storing and working with your data. These tools create repeatable, transparent, and easily auditable workflows that will improve your data integrity and your productivity.

Do you have any bad experiences with Excel that you would like to share? Please leave a comment or follow me on Twitter and let me know.

Image courtesy of www.science.edu