Chris is a business analyst who likes to practice data modeling in her free time.

She particularly enjoys building analytical models to achieve marketing objectives. For example: clustering models for auto segmentation, propensity models for customer lifetime value predictions, and attribution models for channel evaluations.

This is a blog for Chris to practice her analytical skills and connect with like-minded people.

Predicting Consumer Choices with Python

Predicting Consumer Choices with Python

Predicting consumer choice is one of my favorite topics as consumer choice is part of life and fundamental to marketing data science. We choose jobs, clubs, friends, diet and exercise, everything from breakfast cereal to who to spend time with— these are the vicissitudes of choice. And many of these choices we make are known to others, a record of our lives stored away in corporate databases.

Screen Shot 2019-02-13 at 12.01.51 AM.png

To practice on the choice method, I will use data from the Sydney Transportation Study. Commuters in Sydney travel to the city by automobile or train. The response is binary, so we can use logistic regression. Examine the data we have below, we have the explanatory variables, which are the time and cost of travel by automobile and train.

As we can see from the data, we already know the real choices of these 333 commuters. Now it’s time for us to build a model and see how it works on the training data. Today we’ll use a linear combination of the four explantory variables to predict consumer choice. We will also add a code in the end to evaluate the predictive accuracy of our model.

Screen Shot 2019-02-13 at 12.18.09 AM.png

First, do the set up as usual, and then convert string to binary integer. The next step is to design matrix for the linear predictor. After that we can finally run our logistic regression. Don’t forget to create a function that will convert the probability back to choice prediction.

Screen Shot 2019-02-13 at 12.22.48 AM.png

To obtain an automobile or train prediction for each commuter, we’ll have to set a predicted probability cutt-off. Let’s go ahead and classify commuters with a 0.5 cut-off, which means if the predicted probability of taking the train is greater than 0.5 then we predict that the person will take the train.

Now it’s time to see how accurate is our predictive model. Run the code and we can see a four-fold table. This confusion matrix showed that this model have correctly predicted commute choice 82.6% of the time.

Screen Shot 2019-02-12 at 11.43.41 PM.png

If you are interested in playing with the Sydney data set, feel free to message me for the password of the analytical file. If you have any other interesting predictive method feel free to share it with me.

Multiplicative Model: Trend and Seasonality with Excel

Multiplicative Model: Trend and Seasonality with Excel

Calculating Customer Lifetime Value with Excel

Calculating Customer Lifetime Value with Excel