I have briefly touched on the concept of data, our interactions with data, the concept of data literacy and the discussion as to what age data literacy education should be taught. This is a necessary debate and I am of the opinion that data literacy education and numeracy should begin as early as possible, at the same time as basic reading skills, since data generation will only increase as the data economy is further developed. The early we can develop a comfort level with data and develop critical thinking skills, the better off we, and society, will likely be.
In November 2016, Oxford Dictionaries proclaimed “post-truth” as the 2016 word of the year . Post-truth is defined as:
“Relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.”
To me this is disturbing and pretty much underlines the importance of data literacy in our society. Being comfortable with and deriving information from data can help us override our emotions, belief, and biases, or at least call them into question, so that we can make informed and hopefully unbiased decisions. Using the definition provided by Ridsdale et al. in the previous post, data literacy was defined as:
“The ability to collect, manage, evaluate, and apply data in a critical manner.”
The ability to systematically, accurately, and precisely collect data is the cornerstone of research and allows for the testing of hypotheses, the evaluation of outcomes, and for questions to be accurately and honestly answered. If data are not accurately collected in a manner that can be repeated by others the integrity and validity of the research will be called into question. Improperly collected data can result in the inability to accurately answer questions, the inability to repeat and validate the study, distorted findings, poor decisions in policy, and could result in harm to others. As the old saying goes, “garbage in equals garbage out”. In evaluating information, we should ask how, where, and why the data were collected. We should also consider if the answers to these questions make sense and how they compare with best practices and other similar studies. The same questions and considerations can be used for polls, financial analyses, health information, and scientific reports.
The management of data involves quality assurance and quality control, both of which are integral to the data collection process. Quality assurance precedes data collection and is the documentation that outlines the policies and procedures as to how the integrity of the data will be collected, validated, and stored to ensure its integrity. Quality control involves the steps of inspecting the data to ensure it is accurate and that it represents its intended source. These processes will establish if the data is the right data, if the data is valid, and if the data is reliable. Additionally, storage of the data, ensuring the correct data types and data integrity, is also an important part of data management. These are all important considerations when performing a study and we should ask how these steps were performed and managed when we are provided information to evaluate.
The evaluation of data involves analytical techniques. This is where mathematics and statistics are used to make observations about the data and predictions using the data. This is often the most intimidating part to some people because of a general dislike for statistics. Mark Twain is famous for saying:
“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.”’
Despite this popular position on statistics most common descriptive and predictive techniques are not difficult to understand and apply. Many of us are quite familiar with average, median, mode, range, standard deviation, simple linear regression and correlation. However, there is some nuance in the correct use and application of these and we must be vigilant in the proper use of average versus median and we must not be too quick to imply causation when we observe a correlation. As Charles Wheelan demonstrates in Naked Statistics: Stripping the Dread from the Data, the average salary of bar patrons increases once Bill Gates has a seat at the bar, but no one makes any more money.
The final component of data literacy is the application of the information in a critical manner? What are we going to do with all of this? We have diligently collected our raw data. We have managed it well to ensure it is accurate, precise and properly preserved. We have made inferences from the data. Now what?
For one of my clients I used copper and gold assay data to predict the amount of gold as a function of copper. If we have compiled information on baseball player batting averages we may use this information to create our fantasy baseball league . If we have compiled information on home features and square footage we could apply this knowledge to find out if a house we want to make an offer on is over or under valued. If we are analyzing weather data we could use the information to decide if we want to go long or short on cocoa futures. Maybe you have used information to make some of these decisions already. Maybe these examples will give you ideas as to how you can use data to make future decisions. If so, please leave me a post to let me know how you have used data to make important informed decisions in your life.
In future posts we will look at some readily available sources and tools to help in collecting and analyzing data so that you can derive information and apply it critically to important areas in your life. Please leave me a comment to let me know if you would like to look at something specifically or if can help you make information out of your data.