Predictive Modelling of Customer Membership Purchases to Minimize Marketing Costs
Author(s)
Liu, Ying
DownloadThesis PDF (1.708Mb)
Advisor
Almaatouq, Abdullah
Terms of use
Metadata
Show full item recordAbstract
This thesis develops and evaluates a series of predictive models to improve the efficiency of marketing resource allocation in the context of an outbound campaign for a premium membership product. The central objective is to identify customers most likely to respond positively to a membership offer, thereby minimizing outreach costs and maximizing return on investment. The study leverages a dataset from a large retail superstore that includes customer demographics, transactional behavior, and campaign response history. Data preprocessing involved the creation of engineered features such as age and tenure groupings and the transformation of categorical variables into factor types suitable for classification algorithms. Three modeling approaches were applied: classification with logistic regression, classification and regression trees (CART), and random forest. Logistic regression yielded strong predictive performance with an AUC of 0.851 and identified several statistically significant predictors, including spending on wine and meat products, recent purchase behavior, and tenure length. However, its primary limitation lies in its inability to accommodate cost asymmetries, as it lacks the capacity to incorporate a loss matrix which assigns different penalty to false positives and false negatives. The CART model addressed this limitation by introducing a customized loss matrix that reflects the asymmetric cost structure of marketing misclassifications—assigning a higher penalty to false negatives than to false positives. While this cost-sensitive structure aligned better with business objectives, the CART model achieved a moderate AUC of 0.767, reflecting limited classification accuracy and robustness. To overcome these limitations, a Random Forest model was implemented, combining the strengths of ensemble learning with cost-sensitive training. It achieved the highest AUC of 0.864 and allowed for the integration of a loss matrix during training. Feature importance analysis revealed that variables such as number of days since the last purchase, the amount spent on meat products, and a customer's enrollment length with the company were among the most influential predictors of customer response. The model not only improved classification performance but also supported strategic targeting through interpretable outputs. An economic evaluation demonstrated the practical value of the predictive model. Under a loss matrix where the cost of a false positive was set to $2 and a false negative to $10, the Random Forest model reduced total campaign costs by approximately 30% compared to a non-targeted approach. This cost savings translates into a meaningful economic impact, particularly when applied to large-scale campaigns. Overall, the findings support the use of Random Forest with a cost-sensitive design as a superior modeling framework in marketing applications. By aligning machine learning with real-world cost structures, this approach offers both statistical rigor and economic relevance for data-driven decision-making in customer acquisition strategies.
Date issued
2025-05Department
Sloan School of ManagementPublisher
Massachusetts Institute of Technology