The Art of Data Science

Digital Transformation is still a hot topic and data is the raw material for the same. The pandemic has accelerated the adoption of digital technologies.

CIOL Bureau
New Update
Data science

Digital Transformation is still a hot topic and data is the raw material for the same. The pandemic has accelerated the adoption of digital technologies by several years. ‘About 80% of the customer interactions are now digital,’ according to a survey by McKinsey & Company. Experimentation with and investment in digital technologies have played a key role in helping companies navigate successfully through the crisis.


The investment in data science and related technologies have impacts spread across sectors. In the case of the public sector, it’s in the improvements in citizen wellness, reduction in public spending, increase in employment, and so on. In the private sector, revenue growth, increase in consumer engagement, capital reduction, etc. have been the benefits.

The primary goal of data science and analytics is to solve real-world problems using data. The Science of Analytics assumes the data used in the process have certain characteristics that support the development of models for decision-making and forecasts. The problem is, these abstract assumptions about data characteristics are never exact, and in many cases can be incorrect.

Collecting more data and trying to solve a problem can turn things worse. More data is not always better. Some data are unrelated or worthless to solve the problem. For instance, a Bank trying to develop an algorithm to automatically detect fraudulent credit card transactions should consider approximately 10 million observations in a month, out of which the number of fraudulent transactions is only 1%.


So, the effective number of data points they have is only a few thousand and not 10 million. That is usually not an adequate number to develop an algorithm. This problem is also known as the rare event problem which is difficult to manage.

The art of analytics involves selecting and applying the best available models for decision-making and forecasting. Data pre-processing is very important and most difficult in the selection and preparation of data. It involves identifying outliers and estimating missing values. It also involves identifying meaningful attributes from the existing data. This whole exercise of pre-processing the data takes up about 80%-90% of the time and effort.

The Data Mining process involves the following steps:

  1. Sample – gathering data;
  2. Explore – looking for data issues and relationships among the attributes;
  3. Modify – transforming data creating new attributes and estimating the missing values. Partitioning data into training followed by assessment;
  4. Model – predictions or classifications;
  5. Assess – selecting the best model based upon assessment partition.

Regression analysis is a well-known statistical tool that helps evaluate relationships between a dependent variable and one or more independent variables. The major outputs you are concerned about are R-squared, the intercept, and the beta coefficient. Here R-squared shows how well the model predicts the variations in the dependent variable. Cluster Analysis, Decision Tree, Random Forest, are some of the other algorithms used for predictions.

Data analytics, big data, artificial intelligence, data science are trending keywords in the current scenario. Big data and data analytics are becoming an inherent part of every enterprise regardless of the industry. Some of the latest trends in data science are:

  • Big Data on Cloud – A lot of enterprises have moved their big data to cloud platforms, for storage, processing, and distribution;
  • Data as a Service – Data exchange in the marketplace for analytics and insights;
  • Augmented Analytics – AI, machine learning, and natural language processing is used to automate analysis of huge volumes of data;
  • Hyper-Automation – with a combination of automation with AI, machine learning, and smart business processes you can unlock a higher level of digital transformation;
  • Quantum Computing – integrate data by comparing data sets for faster analysis. It helps in understanding relationships between two or more models.

Organizations should understand what data science is and what it isn’t. It incrementally improves your decision-making, however, you won’t likely receive returns immediately. It is not a silver bullet, but if done correctly will improve your business outcomes. You should audit your data fluency, evaluate where your company stands before pursuing data science further.

Data science depends on quality data and hence, you should focus on resources to improve your data strategy and quality. If you are new to data science identify a small project and give data science a try using internal resources or an outside consultant. This is a great way to expose your organization to the concept and get a quick win without having to hire an entire team. Create a data-driven culture; this will help in setting up your company for unprecedented growth going forward.

Mr. Krishnan Jayaraman

By Krishnan Jayaraman