Predicting Consumer Choices with Python
Predicting consumer choice is one of my favorite topics as consumer choice is part of life and fundamental to marketing data science. We choose jobs, clubs, friends, diet and exercise, everything from breakfast cereal to who to spend time with— these are the vicissitudes of choice. And many of these choices we make are known to others, a record of our lives stored away in corporate databases.
To practice on the choice method, I will use data from the Sydney Transportation Study. Commuters in Sydney travel to the city by automobile or train. The response is binary, so we can use logistic regression. Examine the data we have below, we have the explanatory variables, which are the time and cost of travel by automobile and train.
As we can see from the data, we already know the real choices of these 333 commuters. Now it’s time for us to build a model and see how it works on the training data. Today we’ll use a linear combination of the four explantory variables to predict consumer choice. We will also add a code in the end to evaluate the predictive accuracy of our model.
First, do the set up as usual, and then convert string to binary integer. The next step is to design matrix for the linear predictor. After that we can finally run our logistic regression. Don’t forget to create a function that will convert the probability back to choice prediction.
To obtain an automobile or train prediction for each commuter, we’ll have to set a predicted probability cutt-off. Let’s go ahead and classify commuters with a 0.5 cut-off, which means if the predicted probability of taking the train is greater than 0.5 then we predict that the person will take the train.
Now it’s time to see how accurate is our predictive model. Run the code and we can see a four-fold table. This confusion matrix showed that this model have correctly predicted commute choice 82.6% of the time.
If you are interested in playing with the Sydney data set, feel free to message me for the password of the analytical file. If you have any other interesting predictive method feel free to share it with me.