Customer Segmentation Model

INTRODUCTION

In today’s world, telecom domain has an impulsive behaviour because of the rapidly changing customer needs and usage patterns. Change in customer preferences implies a need for change in Business strategy.

Therefore Customer segmentation is of utmost priority to enhance the sales, evolve new business plans and marketing strategies, designing promotions for target audience thereby enhancing revenue and allow the organization to lead the board among other active business rivals. Hence having the capability to distinguish between different categories of customers is the need of the hour.

Many literatures have reviewed the application of data mining technology in customer segmentation, and achieved sound effectives. But in most cases, it is performed using customer data from a single system, rather than from systematic method considering all verticals i.e., most telecommunications carriers cluster their customers with the use of billing system data alone and the accuracy levels of the resulting segmentation is not upto the mark.

PROOF OF CONCEPT

The proposed idea in this article is a Proof of concept where in Customer segmentation will be performed by considering multiple systems like EAI, Credit Control, Billing, Siebel CRM, Mediation, Loyalty, etc which depicts multiple dimensions of the customer data and helps in segmentation.

We take into account customer revenue and usage in the previous months, customer demographics, customer loyalty points, vanity, credit limit, dunning cycle, active promotions, addons and formulate relevant features which have least correlation factor.

DATA CLEANING & ML MODEL

We use sampling techniques to choose the customer base for initial development of the model.

EDA(Exploratory Data Analysis) will be carried out on the sample data.

Outliers & Multicollinearity checks shall be done on the feature set.

We use K-Means Clustering algorithm(Unsupervised).

MODEL OBJECTIVE

The scope of this article depicts a model that shall be developed as a technology demonstration where sampled data(1000 samples in this article) consisting of voice, prepaid, data and postpaid customers are picked and fed to the model for segmentation.

The same can be extended to the entire data set and a full fledged Customer segmentation model could be developed.

This model helps to have an accurate customer segmentation than what is being followed by most of the telecom companies at present.

Apart from performing clustering, this model picks the upcoming customers after a minimal active period and places them in the already defined segments thereby enabling business to offer relevant promotions to these clusters right from the initial period of customer life cycle.

Now let us dive into the coding part

importing all the required libraries..

import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score, adjusted_rand_score
from sklearn.preprocessing import LabelEncoder, MinMaxScaler, StandardScaler

The input data set looks something like this. Here, we considered 238 features from the raw data set.

Correlation Matrix

test.corr(method='pearson',min_periods=2)

There is no definitive answer for finding right number of clusters as it depends upon (a) Distribution shape (b) scale in the data set © clustering resolution required by user.

We are considering 6 clusters in kMeans method of Unsupervised learning.

km=KMeans(n_clusters=6,random_state=None,init='k-means++',max_iter=300)

Filling Missing values with Mean

Standardization

We use Standard scaling for bringing all the data onto one common scale.

out=km.fit_predict(scaled_features)

The output above shows numbers from 0 to 5 which allots each customer to different clusters.

Dimensionality Reduction — Principle Component Analysis

pca=PCA()
data=pca.fit_transform(scaled_features)
scaled_features
explained_variance=pca.explained_variance_ratio_
plt.scatter(range(1,197),explained_variance)
plt.plot(range(1,197),explained_variance)
plt.title('screeplot')
plt.grid()
plt.show

Elbow Method:

sse=[]
for i in np.arange(1,21):
km2=KMeans(n_clusters=i,random_state=None,init='k-means++',max_iter=300)
km2.fit_predict(sc)
sse.append(km2.inertia_)
plt.scatter(range(1,21),sse)
plt.plot(range(1,21),sse)
plt.title('ELBOW PLOT')
plt.xticks(np.arange(1,21,step=2))
plt.xlabel("No. of clusters")
plt.ylabel("SSE")
plt.grid()
plt.show

Finally we add up a new column to our original data set to show the cluster numbers

This shows that cluster 3 has 463 customers, 1 has 225 customers and so on. This way customer segmentation can be performed in any domain taking all the verticals and thereby making it a hybrid segmentation model not just depending on one parameter.

Please comment below for any clarifications/suggestions. Thankyou.

Fascinated for Data science and Machine Learning.