Ask less - Scale Market Research without Annoying Your Customers
Venkatesh Umaashankar, Girish Shanmugam S

TL;DR
This paper introduces a Bayesian network-based method to reduce survey questions in market research, enabling scalable customer segmentation without causing customer annoyance.
Contribution
It presents a novel Bayesian network approach to minimize survey questions while maintaining segmentation accuracy, facilitating large-scale market research.
Findings
Effective reduction of survey questions demonstrated
Successful segmentation of broadband customers
Bayesian networks outperform traditional methods
Abstract
Market research is generally performed by surveying a representative sample of customers with questions that includes contexts such as psycho-graphics, demographics, attitude and product preferences. Survey responses are used to segment the customers into various groups that are useful for targeted marketing and communication. Reducing the number of questions asked to the customer has utility for businesses to scale the market research to a large number of customers. In this work, we model this task using Bayesian networks. We demonstrate the effectiveness of our approach using an example market segmentation of broadband customers.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3| Occasional Pirates | Disengaged | Dishonest Enthusiasts | Honest Enthusiasts |
|---|---|---|---|
| Higher conversion potential | Moderate conversion Potential | Low conversion | High potential |
| Pirating family / action & Low risk to pirate | Low risk to pirate | Action preferred genre | Across all genres |
| females 25-34 with young family | 45+ family | younger male students | 35-44 with family |
| Segment | Precision | Recall | F-Score |
|---|---|---|---|
| S1 | 0.87 | 0.90 | 0.89 |
| S2 | 0.68 | 0.79 | 0.73 |
| S3 | 0.85 | 0.86 | 0.86 |
| S4 | 0.96 | 0.81 | 0.88 |
| Average | 0.86 | 0.85 | 0.85 |
| Segment | Precision | Recall | F-Score |
|---|---|---|---|
| S1 | 0.77 | 0.87 | 0.81 |
| S2 | 0.52 | 0.64 | 0.57 |
| S3 | 0.76 | 0.65 | 0.70 |
| S4 | 0.91 | 0.76 | 0.83 |
| Average | 0.75 | 0.74 | 0.74 |
| Segment | Precision | Recall | F-Score |
|---|---|---|---|
| S1 | 0.67 | 0.76 | 0.71 |
| S2 | 0.38 | 0.52 | 0.44 |
| S3 | 0.65 | 0.53 | 0.59 |
| S4 | 0.79 | 0.63 | 0.70 |
| Average | 0.65 | 0.62 | 0.63 |
| Id | ABBR | Expansion | Actual Question |
|---|---|---|---|
| 1 | AGP | Age group | |
| 2 | MAR | Marital Status | |
| 3 | PAM | Perception About Mobile | I find new technology exciting and want to have a mobile phone with the latest services and features. |
| 4 | AIE | Access Internet Everywhere | It’s important for me to be able to access the Internet wherever I am |
| 5 | MAP | Most Advanced Products | I’m constantly looking for the most technologically advanced products available |
| 6 | DUT | Difficulty in Using Technology | For me to use a new technology product, somebody has to show me how to use it |
| 7 | TA | Technology Avert | I feel that I am able to manage without many of the technology products that other people find essential |
| 8 | FVP | Features Vs Price | The features are more important than the price |
| 9 | U2D | Up To Date | It is important to be uptodate on major news |
| 10 | TFS | Technology For Showoff | Carrying the latest technology products makes a good impression |
| 11 | U2P | Unwilling To Pay | Even when I can afford them, I’m not willing to pay much for new technology products or services |
| 12 | DNB | Dont Need Mobile | I do not need a mobile phone |
| 13 | MBROW | Mobile Browsing | Mobile Browsing of the Internet |
| 14 | MEMAIL | Mobile Email | Send and Receive Email via the mobile phone |
| 15 | MBANK | Mobile Banking | Mobile Banking via the mobile phone. |
| 16 | MVID | Mobile Video | Watching videos on your mobile phone |
| 17 | GPS | Global Position Tracking | Mapping, navigation or positioning service (like gps) via the mobile phone |
| 18 | GAM | Gaming | Playing video games is one of my favourite activities |
| 19 | SMP | Small Payments | Small Payment service via the mobile phone |
| 20 | TFF | Time For Family | I spend a lot of time with my family |
| 21 | RURB | Rural or Urban | |
| 22 | ELS | Life Stage | |
| 23 | DIS | Diverse Internet Services | Derived Attribute |
| 24 | SGV2 | Segment Labels |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Ask less — Scale Market Research without Annoying Your Customers
Venkatesh Umaashankar and Girish Shanmugam S Venkatesh Umaashankar is with Ericsson Research, Chennai, India, e-mail: [email protected] (Corresponding author).Girish Shanmugam S is a Machine Learning Consultant, E3, Jains Green Acres, Chennai, India, e-mail: [email protected]
Abstract
Market research is generally performed by surveying a representative sample of customers with questions that includes contexts such as psycho-graphics, demographics, attitude and product preferences. Survey responses are used to segment the customers into various groups that are useful for targeted marketing and communication. Reducing the number of questions asked to the customer has utility for businesses to scale the market research to a large number of customers. In this work, we model this task using Bayesian networks. We demonstrate the effectiveness of our approach using an example market segmentation of broadband customers.
Index Terms:
Market Research, Market Segmentation, Bayesian Networks, Graphical Models, Dimensionality Reduction, Survey
I Introduction
A key technique for developing successful business strategies in business to customer (B2C) companies is to develop a good understanding of the market and the customer behavior. Market research and segmentation play an important role in framing business and marketing strategies, which help organizations to improve the efficiency of their marketing and conversion. Market segmentation could be defined as the process of breaking down the market for a particular product or service into segments of customers which differ in terms of their response to marketing strategies [1].
Market segmentation comprises of 2 major steps. (1) Consumer Survey — A survey questionnaire considering various dimensions such as psycho-graphics, demographics, attitude, product usage, preferences is meticulously designed. Psycho-graphic questions are useful in understanding the preferences and behavior of customers [2]. The carefully planned survey is then rolled out to a representative sample of customers. (2) Segment Generation — The survey responses are analyzed to create a segmentation model. The segmentation model could be rule, algorithm or factor analysis based. This model can be abstractly defined as a function that could be used to assign a segment to every surveyed customer. The process of market segmentation is discussed in detail in [3].
[TABLE]
The market segments are summarized by profiles and are given descriptive names. Consider the example of market segmentation of movie consumers [4] shown in Figure 1. This market research was done for a studio to understand the level of piracy among movie consumers. They report four clear market segments classified based on consumption level and tendency to consume pirated material.
The heterogeneity among the segments is emphasized in the descriptions are shown in Table I. Segment descriptions help to build an intuition about the nature and behavior of each segment. Market segmentation for a product or service is usually executed by expert market research companies111Ipsos and TNS are well-known market research experts in the industry. The key outcomes of the market research are segmentation model, target segments, presentations and workshops to spread the awareness within the organization. Market segmentation has been battle-tested in many consumer-facing business and it clearly helps to build the intuition about the big picture. Still, it is an open challenge to scale market research to millions of customers. It is not practical to ask a long list of questions to each and every customer, that would not only be time-consuming but also be annoying the customers.
Factor analysis is a well-known method for estimating the latent traits from question-level survey data and to reduce the number of questions [5]. However, it has also been the subject of no small amount of criticism among market researchers [6]. The major problem with factor analysis is that we loose the diversity in the collected information, and we will have only minimal information. A factor analysis carried out on one-half of the data might give different results from those obtained from the other half, thus making the reliability of results questionable. Yet another limitation is that it is unable to give a unique solution or result. An exercise in factor analysis involving a large number of variables say 50, is much bothersome, costly and time-consuming [7]. Due to these limitations, we avoided factor analysis and decided to go for a much simpler alternative.
A Bayesian framework that systematically addresses the challenges faced when the future value of customers is estimated based on survey data has been proposed in [8]. A method for building effective Bayesian network (BN) models for medical decision support from complex, unstructured and incomplete patient questionnaires and interviews was developed in [9]. It extends to challenging the decision scientists to reason about building models based on what information is really required for inference.
The closest to our work is [10] where Bayesian network modeling has been used instead of applying factor analysis technique to determine key factors from a survey questionnaire, to find the most accurate representation of the complex system and identify key variables for understanding the subsequent effects of blast exposure based on an online survey. To the best of our knowledge, there has not been any other work exploring the use of the Bayesian network for scaling market research or to reduce the number of questions in a market research survey.
In this work, we propose a novel way to use Bayesian Networks to reduce the number of questions that a customer needs to be asked. In addition to that, we demonstrate the effectiveness of our approach by evaluating the segment assigned by the Bayesian Network model when fewer questions are asked in the survey. Finally, we summarize the advantages of our approach and discuss our conclusions.
II Proposed Approach
Inspired by the success of using Bayesian networks to understand and analyze survey data [11], we propose a Bayesian Network based approach for reducing the number of questions in a market research survey. The outline of our approach is shown in Figure 2. Our approach consists of two phases: (1) Preparatory Phase and (2) Scaling Phase.
II-A Preparatory phase
In the Preparatory phase, the survey questionnaire is designed and the survey is rolled out to a representative sample of customers, which is typically 2-5% of the total customer base. Customers selected for these phases are usually sampled in stratified fashion, across various regions of value that they add to the business. The survey responses are analyzed and a segmentation model is built to divide the customers into different segments. A segmentation model could be defined as a function that takes survey responses as input and provides customer segment as output as shown in the equation 1. Note that the questions have to be carefully designed, keeping in mind what type of segments the business would benefit from. Also, note that it might be the case that all the survey responses are similar and it might not be possible to differentiate customers based on the responses. In such cases, one has to iterate again to identify the suitable questions and customers.
The next key step is to learn a Bayesian Network model that approximates the segmentation model. All the questions in the survey questionnaire and the segment are represented as nodes in this Bayesian network. Learning a Bayesian Network model involves two steps: (1) Structure learning — A Bayesian network is represented by a directed acyclic graph (DAG). The DAG structure could be learnt with either score-based approach or constraint-based approach. The score-based approach first defines a criterion to evaluate how well the Bayesian network fits the data e.g BIC Score, then searches over the space of DAGs for a structure with maximum score [12] [13]. The constraint-based case uses the independence test to identify a set of edge constraints for the graph and then finds the best DAG that satisfies the constraints [14] [15]. (2) Parameter Learning — This involves learning the parameters that are required to estimate the conditional probability tables of each node in the Bayesian network. These parameters are typically learned through Expectation maximization, Maximum likelihood, and gradient-based approaches. We use 70% of the survey data to learn the Bayesian Network model.
A key advantage of a Bayesian Network model is its ability to handle partial information at the time of inference i.e the same Bayesian Network model could be used for segment assignment even when fewer questions are asked. The main novelty in our approach is to exploit this property of Bayesian Networks to reduce the number of questions in the survey. We find an optimal hyper parameter k, which is the number of random questions that could be asked to the customer whose responses when fed to the Bayesian Network model will guarantee an average f-score above a configured threshold for example 0.70. In simple terms, how many fewer questions I could ask without compromising too much on the Bayesian Network segmentation model performance. The algorithm that we used to identify the optimal number of minimal questions is shown in Algorithm 1.
II-B Scaling Phase
Once the optimal value for k has been identified as explained in the previous section, the scaling phase becomes very simple. A customer gets asked only k random questions, instead of going through the whole questionnaire. The responses to these k questions are passed through the Bayesian Network model and the segment assignment is done. This approach also provides an opportunity for incrementally updating the segment assignment as new information becomes available. For example, the customer can also be questioned in multiple parts and the segment assigned to the customer can be updated based on his additional responses.
III Results
We implemented our proposed approach to scale the market research that was performed for an Internet Service Provider (ISP) business. A total of 100,000 customers participated in the survey. The survey participants were sampled from the total customers based on their plan and lifetime value in a stratified manner. Most of the survey questions are scale based (1 to 5), a response of 1 means the participant strongly disagrees with the statement in the question whereas a response of 5 means that the participant strongly agrees with the statement. A complete list of survey questions is shown in the Table V. The survey responses were analyzed and a combination of rule and algorithm based segmentation model was built and 4 customer segments (S1, S2, S3, and S4) were identified.
In the Preparatory phase II-A described in our approach, we learnt the structure of the Bayesian network using Hill-Climbing (hc) greedy search on the space of directed graph and Akaike Information Criterion (AIC) as the scoring criteria. We used the Maximum-Likelihood estimates for fitting the parameters of the Bayesian Network. For both structure learning and parameter fitting, we used the implementation available in the bnlearn R package [16]. Figure. 3 shows the structure of our Bayesian Network model. We used 70% (70,000) of the survey responses to learn the Bayesian Network Model. Note that the nodes in the model are responses to survey questions and the corresponding segment assignment for the customer (SGV2). The learned network structure was validated with domain experts, and we list few interesting observations: (1) A person’s perception about mobile (PAM) influences if he wants to access internet everywhere (AIE). (2) The final segment assigned to the customer is based on the fact if that customer uses diverse internet services (DIS). (3) Gender of the customer (GEN) could influence the customer’s perception about mobile (PAM) and his urge to access internet everywhere (AIE). (4) The customer’s value for features in a product (FVP) decides if he wants to use the product to showoff (TFS).
We used the line search algorithm shown in Algorithm. 1 to identify the optimal hyper parameter k, which is the number of random questions that could be asked to the customer that will guarantee an average f-score above 0.70. We used 30% of the survey responses (30,000) for this purpose. We have a total of 22 questions in the survey. We ran the Find_k algorithm with values for k as [5,10,20]. The segment classification performance metrics of Bayesian Network model for each value of k is shown in Table II , Table III and Table IV. We use the cpquery function of bnlearn to supply the partial evidence i.e responses for randomly selected questions to run a conditional probability query and predict the segment assignment. We found the optimal value for k is 10 in this case. This means that by using our approach, we could reduce the number of questions by 50%. Figure 4 shows the comparison of scores for various values of k.
In the Scaling phase II-B, we integrate our Bayesian Network model with the survey tool which randomly selects k (10) questions and collects the responses for them from the customers. These responses are passed as evidence to the Bayesian Network model and segment are assigned.
S1S2S3S4[math]0.2$$0.4$$0.6$$0.8$$1Segmentsf-scorek=5k=10k=20
IV Conclusion
In this paper, we propose a simpler way to reduce the number of questions in a Market Research survey using Bayesian networks. We evaluated the effectiveness of our approach in a real-world setting, and we observe that our approach can help to reduce up to 50% of the questions with a minor dip in classification performance. Our work shows that Bayesian networks can serve as a simpler alternative to factor analysis to reduce the number of questions in a survey, without compromising the ability to collect information about various topics.
Acknowledgments
We thank Prasad Garigipati, Henrik Palson, Andreas Timglas and Roy Ollila for their help and support. Both the authors got introduced to the area of Market Research during their tenure at Xoanon Analytics. The value in asking fewer questions in a Market Research Survey was recognized by the authors based on their practical experience.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Wind and S. P. Douglas, “International market segmentation,” European Journal of Marketing , vol. 6, no. 1, pp. 17–25, 1972.
- 2[2] N. M. Bradburn, S. Sudman, and B. Wansink, Asking questions: the definitive guide to questionnaire design-for market research, political polls, and social and health questionnaires . John Wiley & Sons, 2004.
- 3[3] L. Cremonezi, “High definition customers - a powerful segmentation,” Ipsos MORI, White Paper, 2016.
- 4[4] Z. Andrew and D. Peter, “A Guide to getting the best out of your Segmentation Analyses,” 2011.
- 5[5] R. D. F. Jr, W. W. Kulzy, and J. A. Appleget, “From data to information: Using factor analysis with survey data,” Phalanx , vol. 45, no. 4, pp. 30–34, 2012.
- 6[6] A. S. C. Ehrenberg, G. J. Goodhardt, and S. I. Marketing, Factor analysis: limitations and alternatives . Marketing Science Institute.
- 7[7] G. C. Beri, Marketing research . Tata Mc Graw-Hill Education, 2007.
- 8[8] J. Karvanen, A. Rantanen, and L. Luoma, “Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity,” Quantitative Marketing and Economics , vol. 12, no. 3, pp. 305–329, 2014.
