Last month, I was concluding a report for one of our customers. The analysis included drivers of liking. This analyses the impact of the specific product attributes on the overall liking score. The field work was run for two products, each with a different group of respondents.
For one of the products the results didn’t match the customer’s expectations. Some of the product attributes with a high scoring that weren’t expected to impact the liking of the product had a large impact. To check the validity, I did a deep-dive into the regression calculations. I played around with some of the assumptions (parameters) and also did some modelling on subsets of the data. We were able to prove the earlier results and to highlight some dependencies that impacted the overall product liking score.
Market research work has dried up since due to Covid-19 lock-downs. This gives me an opportunity to do some training on other areas of interest. One of these interests is Machine Learning (ML), and I am doing a project on this.
Working on this, I realized how much overlap there is between the data analysis I have been doing as part of the market research projects, and the data science behind Machine Learning (ML). For the product testing, the analysis results in an opinion about the products and its attributes. For ML, we want to use data to be able to predict what the opinion will be or what will happen.
But for both cases, the importance of using the correct data sets is key to come to the correct results. It is not only about large data sets. You need to consider the Quality, Scope and quantity of data. All used data needs to be meaningful, relevant to the project and should be of a sufficient size to give a valid result.