Writing Artifact

Predictive Analytics in Retail marketing – Power, Ethics, and Security of Big Data

by Parita Shah


In today’s customer driven market, survival of the retail business depends on its ability to anticipate the demand and respond to it before the need is even realized. Retail sector comprising of businesses such as grocery, fashion, entertainment, insurance, mortgage, banking, and transportation, sell products and services to the end users. Like all other industries, retailers faces problems related to cost control, reliable supply-chain management, and ethical practice. Market complexity has increased due to emergence of global production-supply chains and “omni-channel retailing” (i.e. retailers serving both online and brick-mortar stores seamlessly). To face this intense competition, more and more retailers are leaning towards the data-driven analytics to gain their market-share. A survey conducted in 2013 by MIT Sloan Management Review in partnership with SAS institute showed that out of 2,037 managers interviewed, more then half (58%) strongly favored the implementation of data analytics for enhanced decision-making process in their business (Kiron, D., Kirk Prentice, P., Boucher Ferguson, R., 2014, p. 30)


The success story of Stage Stores – one of the many live examples, explains why retail managers are eager to embrace data-driven process for marketing and management. In 2010, Stage Stores- a $2 billion department store chain launched a six-months long pilot trial using “SAS change-management program”. The pilot involved comparing data-driven analytics predictions with expert-based prediction from the control group to make pricing decisions. The output of the pilot was impressively 90% in favor of data analytics based decisions. Stage Stores could compete against giants like Macys’ with fewer human resources, using the efficient and diverse application of Data analytics. (Meek, T., 2015, p. 2)


Surrounded by digital data and complex consumer market, data analytics has become an intricate and valuable part of the retail marketing for the retailers as well as for the consumers. Retails get better insight of the consumer market and customers receive more personalized promotions and superior service. However, it is important to consider that ongoing practice of retail data analytics require close monitoring of customers’ lifestyle and behavior with or without their knowledge and / or consent. Also, big data collected for such analysis contains consumers’ personal information and it is a huge liability for the businesses to store and secure it for extended period of time. Besides, big data applications are often flawed and generate incorrect results. The sheer size of the data and thousands (sometimes even millions) of variables used in the algorithm makes it unwieldy and almost impossible to track down the error. Google Flu Tracking system’s (GFT) erroneous reporting in 2011-2012 is a proof of such disaster. (Lazer, D., Kennedy, R., King, G., Vespignani, A., 2014, p.1) Predictive analytics is an important part of data analytics. The purpose of this literature review is to understand the role of the Predictive Analytics application for retail marketing and to explore ethical and practical concerns of practicing Big data analytics.


Data Analytics Process in Retail: Data collection and Analysis


In retail, first stage of the data analytics is to collect real time customer data. This data includes consumers’ demographic and psychographic information. Retailers use web tracking scripts, Social network feeds, facial recognition technologies, cc cameras, customer service phone calls, and credit card information, to acquire intelligence about how frequently customers visit the store/ website, how much they spend, what departments they visit most, what products they buy together, and what time of the year they purchase the most/least. This valuable information is then leveraged in decisions related to product promotions, placement, and staffing of the retail business. (Gandomi, A., Haider, M., 2014, p. 138-142).


The data collected through various media – text, audio, and video, is preprocessed to prepare it for mining and analysis. Predictive analytics applications are part of big data analytics. As the name suggests, these applications are used to predict pharmaceutical drug effects, to test the design of engines, and to forecast sports team/player performance for the coming season, to name a few. Compared to performing real experiments and trials, using software application that uses massive live data to forecast the outcome saves lot of time and money. Furthermore, because such applications use big data, the output is more accurate. These applications are based on a model that uses “mathematical functions to be able to learn the mapping between a set of input data variables, and a response or target variables”. (Guazzelli, A., 2012, p. 3) Using sample data, the predictive model is trained to identify hidden patterns. Back-propagation neural networks, support vector machines, decision trees, and clustering, are some of the commonly used mining techniques and statistical models for such purpose. (Guazzelli, A., 2012, p. 3)


Role of Predictive Analytics in Retail Marketing: Demand Prediction, Customer Segmentation, and Behavior Analysis


During recent years, retail marketing has taken more scientific and research approach then ever before. In Retail marketing, Predictive analytics is mainly used for demand prediction, customer segmentation, and customer behavior analysis.


Demand prediction applications identify the factors affecting products’ demand. Such factors vary based on the type of product. Retailers design sales and promotion by taking advantage of such factors. For example, grocery stores offer weekly discounts on perishables with a shorter shelf life, where as clothing retailers want to design promotions that increase the sales of “sponsored brands or seasonal collections”. (Katsov, I., 2015, p. 15)


Customer segments are groups of customers that share common demographic and psychographic characteristics. Identifying these groups / segments with distinct and correlated characteristics is one of the most important tasks of retail marketing. Understanding customer segments enables retailers to design efficient and personalized product and promotion. (Chen, M., Chiu, A., Chang, H., 2005, p. 775) Predictive analytics uses k-means clustering and decision tree induction, to identify meaningful customer segments. (Chen, D., Sain, S. L., Gua, K., 2012, p. 197) For example, one of the segments for a retail-clothing store could be “female between age 20-40, buying sports wear once every two months, spending quarterly $100-250”. This insight is important to design promotions and placement of selective products inside the store/on website to offer more exposure. (Chen, D., Sain, S. L., Gua, K., 2012, p. 200, 206) Another useful application – “Response modeling” maps group of customers responding to specific promotion. This is important to avoid targeting ineffective segments and focus on to valuable customers. (Katsov, I., 2015, p. 6, 9)


Customer behavior analysis has become more and more intricate with numerous dependent and independent variables. “Propensity models predict customers’ future behaviors” by identifying segments that are likely or not likely to change their buying behavior under certain influence such as incentives and promotions (Katsov, I., 2015, p. 6). Nearest neighbor and Neural-network models are based on artificial intelligence algorithms. They can generate correlation patterns by populating groups of customers that are likely to buy products from similar brands, price range, or department. This information is used to design and promote new brand and cross-category sale. (Guazzelli, A., 2012, p. 3) Announcing kitchenware promotions in clothing department, offering discount coupons for competitor brands at point of sale, are some of the marketing strategies developed using behavior analysis. (Katsov, I., 2015, p. 10) Association rules and the recency, frequency, and monetary (RFM) model are used to uncover the changes in buying behavior over a period of time. This model is useful for analyzing changes over a period of time that is valuable to design marketing strategies for the business. (Chen, M., Chiu, A., Chang, H., 2005, p. 774)


Traditionally retail store/department manager will look at the quarterly/annual sales and revenue numbers and use his/her experience and instincts to make marketing decisions. In the era of Internet based shopping and global chain stores, this system lacks the ability to consider numerous factors affecting product demand and consumer market even with massive human resources and financial investment. Reviewing the predictive analytics applications in retail marketing, it is apparent that the data analytics applications empowers retailers to process massive data and produce meaningful output that otherwise is impossible to achieve with only expert based knowledge. (Kiron, D., Kirk Prentice, P., Boucher Ferguson, R., 2014, p. 30)



Data analytics applications in retail marketing have proven to be very useful for the businesses to design efficient product and strategy. Also, customers are more informed in this internet-era and they want better product and services that is tailored to their budget and lifestyle. Data analytics has proven to be necessary tool in today’s data driven world serving both the retailers and customers. However, in the rush of generating big data, some of the very fundamental concerns regarding ethical practice and security of personal information, are left unanswered.

One of the most pressing issues affecting modern business’s public image is its ethical practice. Even when business is following every law, it could be slacking morally by undermining its customer’s right to privacy and freedom of choice. In retail market, when customers shop online/ in-store, they are rarely aware of the fact that their every movement is being watched to understand their buying behavior. When they make payment using their credit card, their demographic (age, gender, geographic location, etc.) information, their spending capacity and frequency are used to generate discount coupons, sales and promotions. Not all customers would want to be watched every time they shop or even stop to look at a product. Such close watch may border on digital stalking. Use of the close circuit cameras, fraud detection software, and quality assurance applications, are meant for different purposes then to profile the customers. As Tavani (1999, 2013)[1] proposes, even though retailers have legal rights to collect such information, its unintended use without direct consent of the consumers is unethical and can have serious consequences to privacy It is important to understand that in order to design personalized services and promotions, consumers are exposed to possible identity theft and other financial frauds. . (LaBrie, R. C., Cazier, J. A., Stienke, G. H., 2014, p. 2)


Expense of implementing big data analytics application is another concern that needs attention. Even with affordable computer hardware, the cost to upgrade software and hire technically skilled human resource is very expensive. (Gandomi, A., Haider, M., 2014, p. 139) Furthermore, time consuming preprocessing of the collected big data adds to the expenses that new entrants and small businesses can’t afford. (Chen, D., Sain, S. L., Gua, K., 2012, p. 198)


With the current hype of big data, existing Predictive analytics algorithms are developed to be more efficient with big data. In fact, insufficient amount of data can generate very false output by over fitting the model. (Guazzelli, A., 2012, 1-2) Even with the impressive power of predictive analytics, the blind chase for bigger data can’t be carried on. Data analytics applications use combined knowledge of the fields of data mining and statistics. Despite being closely related fields that solve problems using data driven knowledge, there is a lack of enthusiasm in combining strength of these fields to come up with better solutions. (Friedman, J. H., 1998) Relying mostly on the larger storage and processing power, data miners argue that existing statistical methods, originally designed for structured sample data, are not efficient to analyze big unstructured data. (Gandomi, A., Haider, M., 2014) Where as Statisticians, too focused on the traditional tools of the field, claim that often problems can be solved with sufficient accuracy with less than the entire data set. Pointing at this weak link, Friedman proposes two options. First is to use well-designed statistical sampling methodology to select effective size of data set instead of the entire big data and thus mitigate significant computational requirements. Second is to develop improved and powerful algorithms that applied on a sample data are more likely to provide superior accuracy then the less robust one applied on the big data. (1998). It is important to redirect the efforts to develop new “Smart Data” analytics. (Lazer, D., Kennedy, R., King, G., Vespignani, A., 2014) By redirecting the efforts to collect meaningful data that is devoid of redundant and excess information, time and expense can be saved in storage and preprocessing. This clean data is more likely to produce higher accuracy. With more focus on quality then quantity, ethical and security concerns are more likely to be addressed.




Chen, D., Sain, S. L., Gua, K. (2012). Data Mining for the online retail industry: A Case study of RFM model-based customer segmentation using data mining. Database Marketing & Customer Strategy Management,19, 3, 197-208. Retrieved from www.palgrave-journals.com/dbm/


Chen, M., Chiu, A., Change, H. (2005). Mining changes in customer behavior in retail marketing. Expert Systems with Applications, 28, 773-781. Retrieved from www.elsevier.com/locate/eswa


Friedman, J. H. (1998). Data Mining And Statistics: What’s the Connection?

Retrieved from http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf


Gandomi, A., Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35, 137-144. Retrieved from www.elsevier.com/locate/ijinfomgt


Guazzelli, A. (2012, May 29). Predicting the future, Part 1: What is predictive analytics? Retrieved from http://www.ibm.com/developerworks/library/ba-predictive-analytics1/


Katsove, I., (2015). Data Mining Problems in Retail. Retrieved from https://highlyscalable.wordpress.com/whitepapers/


Meek, T., (2007, February 18). Big Data In Retail: How To Win With Predictive Analytics. Retrieved from http://www.forbes.com/sites/netapp/2015/02/18/big-data-in-retail/#46301ffa34ec


Kiron, D., Kirk Prentice, P., Boucher Ferguson, R. (Winter, 2014). Raising the Bar With Analytics. MIT Sloan Management Review, 55, 2, 29-33. Retrieved from http://SLOANREVIEW.MIT.EDU


LaBrie, R. C., Cazier, J. A., Stienke, G. H. (2014, December). Big Data Ethics: A Longitudinal Study of Consumer, Business, And Societal Perceptions. Journal of Management Systems, 24, 2. Retrieved from https://www.researchgate.net/publication/277324187_Big_Data_Ethics_Longitudinal_Study_of_Consumer_Business_and_Societal_Perceptions


Lazer, D., Kennedy, R., King, G., Vespignani, A. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Retrieved from http://gking.harvard.edu/files/gking/files/0314policyforumff.pdf


[1] Tavani (1999), source citied within LaBrie, R. C., Cazier, J. A., Stienke, G. H., 2014, p. 2.