What Happens When We Ask Each Other For Help

Asking for help is the hardest part. It’s a cliché that just about everyone who has gone through tough times, or dealt with mental health issues, knows is only too accurate. Mental illness can be…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Starbucks Customers Segmentation

With data overflowing in every direction, businesses are smartly rushing to utilize them to make data-driven decisions that create business value.

Here’s a high-level overview of the workflow followed in this project:

The company is assumed to currently promote products to customers with no prior knowledge of their segments. It is possible hence to analyse the data to find patterns in customer behaviors. This helps the company in aiming and tailoring their marketing efforts and resources to consumers who exhibit similar characteristics and are considered most likely to opt in for business’s offerings. This form of target marketing is important because it helps the company by maximizing revenue while maintaining promotional campaigns cost smartly low. To do this, an unsupervised learning technique will be used to cluster customers into groups that can be be investigated separately to better understand their qualities and engage them accordingly.

When speaking about measurement criterias, two categories come to mind; business metrics which assist in making decisions from a business point of view, and technical metrics which assist in making assessment of the implementation of the algorithm used to cluster data.

All features will be investigated as much as possible, but there are two main themes used to drive decisions made in this regard:

2 Identify which customers react preferably to which types of offers. To measure this, offer view and completion rates will be used where the cardinalities of corresponding sets are divided by cardinalities of relevant sets.

For the algorithm chosen in this problem, K-Means, two appropriate methods will be used find the optimal number of clusters 𝑘 and to assess the model implementations; silhouette coefficient (SC) analysis and within-cluster Sum of Squared Errors (SSE) analysis.

Mathematically, for one data point 𝑖 where 𝑎(𝑖) represents the calculated mean distance of point 𝑖 with regard to to all other points in the cluster it was assigned and 𝑏(𝑖) represents the calculated mean distance of point 𝑖 with regard to to all points in its closest neighboring cluster, silhouette is defined:

Therefore, it is clear from the above equation that

Then, for one value representing the silhouette score that used for assessment of clustering outcomes, where 𝑠¯(𝑘) represents the mean 𝑠(𝑖) over all data of the entire dataset for a specific number of clusters 𝑘, silhouette coefficient is defined:

Published dataset contains simulated data that mimics customer behaviors on the Starbucks Rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an informational advertisement or an actual offer such as a discount or buy-one-get-one-free. This dataset is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Data is contained in three files:

This dataset contains meta data about promotions offered by the company to their customer base as per the following schema:

It is clear tha etvery offer has a minimum required to be spent by a customer to complete an offer represented by difficulty and therfore be eligible for the reward. Also, every offer has a validity period represented by duration after which it expires.

The above is true for offers with offer type “bogo” and “discount”. However, for “informational” offers while they have a duration, their difficulty and reward are zero since these offers are merely providing information about a product. This means they are not tracked for completion in transcript.

This dataset contains demographic data about the customers enrolled in the reward program via the mobile app as per the following schema:

This dataset contains records for transactions, offers received, offers viewed, and offers completed by customers as per the following schema:

This section covers the process of merging and aggregating previously cleaned datasets. This section is quite long, but an overview is as follows.

The final dataset when finished preprocessing has 39 features (columns) and 14,608 customer profiles (rows).

PCA With Different Number of Components for 55%, 75%, and 95% Explained Variance

The aim here was to use a sufficient percentage to capture the majority of the variability in the data while keeping number of principal components reasonably small. In this particular case, since the input dataset has only 39 features which in itself is kind of small, 95% explained variance was used.

Then before model implantation, two steps are performed; finding optimal the number of clusters using silhouette and elbow methods, and also performing silhouette analysis in which the focus is on; low misclassified points, high number of points above silhouette average, and uniform clusters width.

Finding Optimal Number of Clusters Using Silhouette and Elbow

First and foremost, the silhouette score was rather low indicating overlapping clusters (indifferent). The “elbow” method was not helpful in this case since there is no clear “elbow”. The silhouette method can be followed rigorously and thus making 2 the optimal number of clusters, or a compromise between the two methods leading to an optimal number of 3–6 clusters.

Performing Silhouette Analysis With 2 Clusters
Performing Silhouette Analysis With 3 Clusters
Performing Silhouette Analysis With 4 Clusters

As it can be seen from the graphs above, 2, 6, and to a lesser extent 3 number of clusters seems to the most reasonable choices. The decision here was to go with 6 clusters for the reasons described above in this section’s intro.

By now, the model was fitted on the data and their cluster labels were predicted. The silhouette average score was 0.11. Lower than one would have hoped. Again, indicating overlapping clusters or the data is not clustrable.

Nonetheless, outcome clusters were investigated and here’s a look of most important features grouped by clusters:

Gender Distrbution

Both genders were equally represented in clusters except for clusters 1 and 5 where males are overrepresented. Overall, the model does a decent job in avoiding clusters formation based on gender type.

In terms of seniority, all clusters seem to be formed of similar characteristics except for clusters 1 and 5 were about 200-300 days less in seniority.

Overall Average Transaction Value
Promo Average Transaction Value
Non-Promo Average Transaction Value

Looking at average transaction value in different periods, clusters tend to project comparable numbers in overall and promo except for clusters 1 and 5 with very low numbers in both measures.

Numbers tend to decrease for all clusters in nonpromo periods except for cluster 3 staying relatively high, and to lesser extent cluster 0.

In non-promotional periods, cluster 3 was most valuable to business with a mean RFM score of 2.83 and median of 3.00. Followed by cluster 1 with a mean RFM score of 2.58 and median of 3.00. Clusters 4 and 5 with the least value during non-promotional periods with a mean RFM score of 0.89 and 0.56 respectively and both medians at 0.

It is clear that promotional periods boost activity for all clusters with the lowest mean RFM score of 2.58 and a meadin of 2.67.

Considering RFM scores in non-promotional periods as an “initial” values and RFM scores in promotional periods as “final” values, The next table looks at the percentage change between the two values to put them into context.

For example, cluster 3 with relatively high RFM score in non-promotional periods display an increase of 17% and 11% in mean and median respectively during promotional periods. This indicates that customers in this cluster are active spenders even when not promoted to do so. Clusters 4 and 5 on the other hand exhibit a big chnage with about 226% and 360% in mean respectively. The median is "inf" because initial value (denominator) is zero, yet the big change can be easily noticed. This indicates that customers in these two clusters should be the prime target for promotions. Another cluster to look at is cluster 1 where their mean and median RFM scores are either the same or almost the same between the two periods. This indicates that customers in this cluster are indifferent to promotions.

Investigating income supports the narrative above to some extent. Cluster 3 with the highest mean income of $72,920 explaining their high activity during non-promotional periods. It worth noting cluster 4 which is the second highest income group, but these costumes were one of the lowest active during non-promotional periods and only active if promoted to do so. In contrast, while cluster 1 mean income is low, there are moderately active between the two periods as discussed above.

View Rate for All Offers Received

Taking a look at view rates as percentage of all offers received. It can be seen that all clusters have moderate to high view rates, with the lowest being 59% for cluster 4 and the highest being 85% for cluster 2.

As for completion rates for all offers as a percentage of offers viewed. Clusters 0, 2, 3, and 4 completed around 70-80% of all offers they have viewed. Clusters 1 and 5 only completed about 20% of all offers they have viewed.

Looking at response score, which again measures how fast a customer react to an offer with 1 being fastest, it can be seen that cluster 2 are fastest with 0.52 mean (and median) response score. Followed by cluster 4 with a mean of 0.47 response score. It worth noting cluster 1 and 5 which exhibit very low means of 0.08 and 0.07 response score respectively.

All clusters have relatively high view rates for “bogo” offers except cluster 0 with the lowesest rate being 43%. Similar argument can be made regarding cluster 1 with a 58% view rate.

Completion Rate for “bogo” Offers Viewed

Completion rates however for “bogo” offers varies with clusters 2, 3, and 4 having completed 60–78% of all ‘bogo’ offers they have viewed indicating these clusters react preferably to "bogo" offers. Clusters 0, 1, and 5 in contrast only completed 5-15% of all 'bogo' offers they have viewed.

In similar manner, for “disc” offers, clusters 0, 2, and 4 have view rates around 70-81%. Clusters 0, 4, and 5 have view rates around 33–58% .

Completion Rate for “disc” Offers Viewed

Looking at completion rate for “disc” offers as percentage of offers viewed. Clusters 0, 4, 2 completed around 68-83% of all 'disc' offers they have viewed. It worth noting, cluster 0 while previously not interested in "bogo" offers are now leading "disc" completion rates. Clusters 1, 3, and 5 only completed 6-24% of all 'disc' offers they have viewed. Also worth noting cluster 3 was leading completion rates for "bogo" offers. Special attention to cluster 5 while at 58% view rate above is only at 22% completion rate suggesting no interest in 'disc" offers.

Understandably, reactivity to “info” offers is not the same as “bogo” or “disc”. However, worth noting cluster 2 at 90% view rate.

Looking at completion rate for “info” offers as percentage of offers viewed. Again, clusters 2 off the charts with 90% completion of all 'info' offers they have viewed. Remaining clusters are about average 12–24% completion rate except cluster 4 with 0% completion rate.

Now that better understanding of clusters is formed, it is a good time to reflect on the project as a whole. This project started with lots of data preprocessing on raw datasets; portfolio, profile, and transcript to produce the final dataset coe for modeling. PowerTransformer and PCA transformations were then preformed on coe which was then fed into K-Means clustering model after deciding on 6 number of clusters.

Technically, the silhouette score (coefficient) was disappointedly low at 0.11 indicating overlapping clusters or data not clustrable. However, after evaluating and investigating the clusters formed, it is found to be intuitive. Further discussion on improving silhouette score follows later.

Where:

Clusters Rank Markings Feature-Wise

With the help of the plot above and previously collected data, summary of each cluster is as follows:

Cluster 0: This cluster represents good business value to the company. Their income is fairly high, they are senior members in the program, and they respond fairly quick to offers. They spend an average of $11.60 and $17.25 in non-promo and promo periods respectively exhibiting an increase in RFM score of 46% between the two periods. This cluster of customers is most interested in "disc" offers being with the best completion rate out of all clusters for these type of offers. They respond fairly to "info" offers, but not so for "bogo" offers.

Cluster 1: This cluster seems to be formed of low-income, low-spenders, and recent-members who does not seem to be tempted by any offer type. Their average spend in both non-promo and promo periods is almost $7.50 exhibiting an increase in RFM score of only 6% between the two periods.

Cluster 2: This cluster of customers respond very well to all offers type. In a good way too, with an average spend of $8.92 and $17.20 in non-promo and promo periods respectively exhibiting an increase in RFM score of 81% between the two periods. They are the most senior members of the program and the fastest to respond to offers. Worth noting in particular, this cluster of customers respond extremely well to "info" offers.

Cluster 3: This cluster of customers arguably represents the best business value for the company non-promo periods. They are the highest spenders in non-promo periods with an average spend of $16.67 miles ahead of most clusters. Their avergae spend increases to second highest $20.34 in promo periods. They have the highest income of all clusters and they respond to offers fairly fast. They are most interested in "bogo" and to a lesser extent "disc" offers, but not so much so for "info" offers.

Cluster 4: This cluster of customer seems to be only active during promo periods. Their average spend in non-promo is low $5.66 but that jumps to highest average spend among clusters in promo periods $21.23 exhibiting an increase in RFM score of 226% between the two periods. They are equally highly interested in "bogo" and "disc" offers but not tempted by "info" offers at all. They are high income and senior members who respond fast to offers.

Cluster 5: This cluster seems to be formed of customers who are completely inactive during non-promo periods with anvrage spend of lowest $1.23. That increase to, still lowest, $6.48 average spend in promo periods. Hence an increase in RFM score of 360% but that is due to very low average spend in non-promo periods to begin with. Their view and completion rates are among the lowest if not the lowest. This affects their response score being also among the lowest. Also, does not exhibit a pattern by which it can be said they favor a specific offer type. This cluster of customers seem to be a prime target for further and separate investigation.

Communications Channels: One portion of the dataset that was unfortunately not considered probably when doing data preprocessing is the magnitude of communication channels (web, email, mobile, and social) and their effect on viewership and completion rates. The way communication channels were aggregated is quite naive since an offer can be communicated via multiple channels. This made it difficult to construct sensible plots during the analysis of clusters. A more careful approach can be taken in data preprocessing to come up with useful results in this regard.

Trials for Finding Optimal Number of Clusters

The silhouette score is inconsistent between runs as multiple plots must be observed before making any assessment with t-SNE. After doing that, a value of 7 is proposed as a possible optimal number of clusters.

Performing Silhouette Analysis With 6 Clusters
Performing Silhouette Analysis With 7 Clusters

It can be seen here that clusters are more distinct. Again, when the formed clusters are investigated they do seem intuitive and more or less inline with results obtained by K-Means model but being unsure of the soundness of this process, it will be left at this.

In this project, I started by exploring the provided dataset to form better understanding of them. The data cleaning process took quite some time in which I experimented with lots of choices either in terms of decision-making or code-programming. The cleaned datasets were then prepared to form the final dataset grouped by customer profiles. This dataset then went through a few steps of transformation before modeling.

I do hope that all decisions made are correct and justified, although I am sure I have made lots of mistakes along the line which I definitely intend to visit back. While doing the project and by searching onlin material, I read about some subjects for the first time and hence further knowledge expansion in those subjects is on my top to-do list.

Overall, I am happy with how this project came to be. I am happy with the number of clusters as 6 clusters are practical from a business perspective. The results of clusters formed after investigation seem intuitive. However, large room for improvement exists.

Add a comment

Related posts:

Flowchain USB Dongle!

This week we finally submitted our paper to the conference. We have tried our best. I hope all reviewers will affirm our paper. I have learned a lot by writing this paper. Many questions have been…

Use This Financial Scholarship Possibility Offered to Retired Soldiers and Also Their Families

It is actually impressive any time men and women can observe the very center involving a organization from outside of it, looking in. This sort of state of affairs exists now which seems to have…