Data Science and the question of looking into the future. How can data be used to make predictions and what are the possible applications?
We have to hurry up! Angela Merkel already used these words in the spring of 2016 in connection with big data and at the same time described data as the raw materials of the 21st century. Unlike other raw materials, however, the amount of data is not getting smaller, but is doubling every two years according to a study by the IDC (International Data Corporation) and will therefore literally explode. The current Corona crisis could further accelerate digitization. So it’s high time to get to grips with the topic of data science so as not to lose touch.
Depending on their industry and size, companies ask themselves specific questions:
- What analyses can I perform with my data?
- How can I bundle and prepare my existing data, which may be stored in various databases at different locations?
- How can I access external data?
- How can I combine my internal data with external data?
- How do I manage to implement this complex topic with scarce resources?
- How can I build up know-how in this area?
- Which evaluation methods are suitable?
Mapping the current state or looking into the future?
When using your data, it is first necessary to distinguish between a description of the actual state and a prediction (predictive analytics). Your existing data can first describe the current state and, for example, summarize which customers cancelled their contracts last year. In the case of a prediction, a look into the future also comes into play: Which customers are likely to churn next year? What are the decisive factors for this? To answer these questions, statistical models are needed, for example from the field of machine learning.
Challenge of different data sources and data types
A particularly exciting challenge with numerous predictions is that often different data sources (internal versus external) and data types (images, text, numbers, sound) have to be combined in the context of a multi-source estimation. In addition, the data was often not created specifically for an analysis and therefore must first be suitably prepared. This is typically the case, for example, with images or texts in the context of social media data.
Predictions by means of statistical models
In the context of a prediction, after suitable data have been selected and processed, statistical models are next applied for data analysis. Here, it is important to select suitable models and to avoid various errors, such as overfitting (overfitting to the training data; predictive power for new data decreases). The best model depends strongly on the problem. Basically, three problems can be distinguished:
Learning without knowing the target in advance. A typical application is clustering, such as customer segmentation. Best-known methods: k-means, neural networks, Hidden Markov, Gaussian Mixture
learning, where the target is known in advance. Typical applications are regression (such as customer termination prediction) or classification (such as sentiment analysis). Known methods are: Regression (LM, GLM, Logistic), Tree methods (Random Forest, XGBoost), Support Vector Machine, Neural Networks/Deep Learning.
Independent learning through rewards. Typical applications can be found, for example, in the field of computer games intelligence. Well-known methods are: Monte Carlo methods and Temporal Difference Learning (such as Deep-Q Learning).
Possible applications in online marketing
In online marketing in particular, the first step is often to map the current state and look at past data. On the one hand, companies ask themselves how the brand is perceived on the Internet, how users rate the products in social media, or whether social engagement is perceived by the press. On the other, they want to know how the online marketing activities performed last month, what was the reach of these activities and what were the reactions?
This is where Big Data comes into play. All relevant data is collected in a database, processed and visually depicted in an online marketing report. So to answer these questions with confidence, companies need to tap into the relevant online data, process it and analyze it.
Ideally, however, online marketing not only includes a mapping of the current state, but also predictions. Here, the above-mentioned methods from the field of machine learning can offer a decisive advantage. In this way, campaigns can be planned more efficiently, ads can be played out in a personalized manner, or SEO measures can be optimized. The associated accelerated automation in the company makes it possible to sustainably reduce costs in online marketing and increase the conversion rate. Predictive analytics also enables the company to work more efficiently and increase sales in the long term with the help of user data from previous online purchases or from comparison with the preferences of other users.
Visualization and communication
However, it is not only the choice of data and the appropriate model that is decisive for success, but also the targeted communication with the customer as well as the visualization. This is the only way to ensure that the right conclusions are drawn.