Data Analytics Process

Paramita P
2 min readJun 5, 2022

We can use different data analytics process to interpret data. This process I learnt when I first start studying in data analytics for strategic decision makers. I would like to say my understanding on this process.

Data Analytics process have five steps, including question, data, analysis, visualisation, and insight.

Drawing using Keynote icon and writing tools in Goodnote

Data driven decision start with the question. What and Why we want to know about this topic? This topic will be beneficial for whom?

Then, we will look for the data that can answer the questions. We check for data quality dimensions. Is my data relevant to the questions I want to explore? The root causes of the data quality is in data collection process. Is the quality of data is acceptable? If the data quality of data is too poor, it can lead to different results that can make us make wrong decision.

  • accuracy -> is the data reflect to reality?
  • completeness -> is there any missing value? low or high number of missing value?
  • consistency -> same information is collected in same place
  • timeliness -> information will be available when needed?
  • validity -> the data collected in same format and follow business rules?
  • uniqueness -> are there any duplicate?
https://www.precisely.com/blog/data-quality/data-quality-dimensions-measure

In reality, data is not perfect. It may be some issues. For example, we used tracking data. The sensor may not work sometimes, which make us losing tracking.

If data quality is good or acceptable, we need to clean data for the next step, analysis.

For analysis process, there are many kinds of analysis that can be used to answer our questions, including statistics model and machine learning. For example, I want to find the factors that have impact to something. I can use regression or random forest or other models.

Example of models that can be used

  • Classification (Yes/No or many different categories)-> random forest or logistic regression
  • Prediction (Continuous dependent variable) -> linear regression
  • Grouping (Segmentation) -> Clustering using similarity

For visualisation, visualisation is the visual that related to questions and they should support our analysis.

  • line chart to show trend overtime
  • bar chart to show the number of count in different groups or value different in each group
  • scatter plot to show correlation between two variables

I am also check more beautiful visual in this website.

For insight, we get both analysis and visualisation to summarise and provide insights that answer questions along with the reasons that why the results be these ways.

--

--