Book Review: Back to Basics. Data Analysis – From Preparing and Scrubbing Your Data to Using Data Visualization Design

The book is written by Lillian Pierson, Meta S. Brown and Paul McFredies. I decided to go through this book to see if it would present additional and useful ways to address the data analysis broader vision under a business perspective, which is what I teach for the Master’s in Business and Master’s in Management classes that I usually lecture.

The seventeen main takeaways that I got out of this book are presented below. In addition to these takeaways, the book includes well-structured content that walks the reader through some of the tools that are relevant under this academic field.

  • For quite some time now, everyone has been absolutely deluged by data. It’s coming from every computer, every mobile device, every camera, and every imaginable sensor – and now it’s even coming from watches and other wearable technologies.
  • Data is generated in every social media interaction we make, every file we save, every picture we take, and every query we submit; it’s even generated when we do something as simple as ask a favorite search engine for directions to the closest ice-cream shop… We may have wondered, “What’s the point of all this data? Why use valuable resources to generate and collect it?” Although even a single decade ago, no one was in a position to make much use of most of the data that’s generated, the tides today have definitely turned.
  • Specialists known as data engineers are constantly finding innovative and powerful new ways to capture, collate, and condense unimaginably massive volumes of data, and other specialists, known as data analysts, are leading change by deriving valuable and actionable insights from that data.
  • In its truest form, data analysis represents the optimization of processes and resources. Data analysis produces data insights – actionable, data-informed conclusions or predictions that you can use to understand and improve your business, your investments, your health, and even your lifestyle and social life. Using data analysis insights is like being able to see in the dark. For any goal or pursuit you can imagine, you can find data analysis methods to help you predict the most direct route from where you are to where you want to be – and to anticipate every pothole in the road between both places.
  • Understanding the components of data analysis – to practice data analysis, in the true meaning of the term, you need the analytical know-how of math and statistics, the coding skills necessary to work with data, and an area of subject matter expertise. Without this expertise, you might as well call yourself a mathematician or a statistician. Similarly, a software programmer without subject matter expertise and analytical know-how might better be considered a software engineer or developer, but not a data analyst.
  • Because the demand for data insights is increasing exponentially, every area is forced to adopt data analysis. As such, different flavors of data analysis have emerged. The following are just a few titles under which experts of every discipline are using data analysis: ad tech data analyst, director of banking digital analyst, clinical data analyst, geoengineer data analyst, geospatial analytics data analyst, political analyst, retail personalization data analyst, and clinical informatics analyst in pharmacometrics.
  • Collect, query and consume data – dat engineers have the job of capturing and collating large volumes of structured, unstructured, and semi-structured big data – data that exceeds the processing capacity of conventional database systems because it’s too big, it moves too fast, or it doesn’t fit the structural requirements of traditional database architectures.
  • Again, data engineering tasks are separate from the work that’s performed in data analysis, which focuses more on insight, prediction, and visualization. Despite this distinction, whenever data analysts collect, query, and consume data during the analysis process, they perform work similar to that of the data engineer.
  • Although valuable insights can be generated from a single data source, often the combination of several relevant sources delivers the contextual information required to drive better data-informed decisions. A data analyst can work from several datasets that are stored in a single database, or even in several different data warehouses. At other times, source data is stored and processed on a cloud based platform that’s been built by software and data engineers.
  • No matter how the data is combined or where it’s stored, if you’re a data analyst, you almost always have to query data – write commands to extract relevant datasets from data storage systems, in other words. Most of the time, you use Structured Query Language (SQL) to query data.
  • Apply mathematical modeling to data analysis tasks – data analysis relies heavily on a practitioner’s math skills (and statistics skills) precisely because these are the skills needed to understand your data and its significance. These skills are also valuable in data analysis because you can use them to carry out predictive forecast-ing, decision modeling, and hypotheses testing.
  • Mathematics uses deterministic methods to form a quantitative (or numerical) description of the world; statistics is a form of science that’s derived from mathematics, but it focuses on using a stochastic (probabilities) approach and inferential methods to form a quantitative description of the world.
  • Data analysts use mathematical methods to build decision models, generate approximations, and make predictions about the future. The book indeed presents many complex applied mathematical approaches that are useful when working in data analysis.
  • Derive insights from statistical methods – in data analysis, statistical methods are useful for better understanding your data’s significance, for validating hypotheses, for simulating scenarios, and for making predictive forecasts of future events. Advanced statistical skills are somewhat rare, even among quantitative analysts, engineers, and scientists. If you want to go places in data analysis, though, take some time to get up to speed in a few basic statistical methods, like linear and logistic regression, naive Bayes classification, and time series analysis.
  • Play the coding name – coding is unavoidable when you’re working in data analysis. You need to be able to write code so that you can instruct the computer how you want it to manipulate, analyze, and visualize your data. Programming languages such as R and Python are important for writing scripts for data manipulation, analysis, and visualization, and SQL is useful for data querying. The JavaScript library D3.js is a hot new option for making cool, custom, and interactive web-based data visualizations.
  • Although coding is a requirement for data analysis, it doesn’t have to be this big scary thing that people make it out to be. Your coding can be as fancy and complex as you want it to be, but you can also take a rather simple approach. Although these skills are paramount to success, you can learn enough coding to practice high-level data analysis.
  • Apply data analysis to a subject area – Statisticians have exhibited some measure of obstinacy in accepting the significance of data analysis. Many statisticians have cried out, “Data analysis is nothing new! It’s just another name for what we’ve been doing all along.” However, the main point of distinction between statistics and data analysis is the need for subject matter expertise… Because statisticians usually have only a limited amount of expertise in fields outside of statistics, they’re almost always forced to consult with a subject matter expert to verify exactly what their findings mean and to decide the best direction in which to proceed. Data analysts, on the other hand, are required to have a strong subject matter expertise in the area in which they’re working.

Posted

in

by

Comments

Leave a comment