DataGeneraladvanced

Predicting and Preventing Customer Churn using Modelbit and Hightouch

Learn how you can predict churn scores in Snowflake with Modelbit and sync them to tools like Salesforce using Hightouch.

Made by: Modelbit

7 minutes

Predicting and Preventing Customer Churn using Modelbit and Hightouch.

With modern data science and machine learning, it’s easier than ever to predict whether a customer is going to churn. With the right training data and modeling libraries, we can quickly train a model that scores a customer’s likelihood of churning.

Getting this information into the hands of our teammates who can save the account is the most critical piece of the puzzle.

In this playbook, you’ll learn how to create a model that predicts the likelihood that a customer will churn. Then you’ll use Modelbit to deploy that model to Snowflake, where it will run those predictions on every customer, every single day. Finally, you’ll use Hightouch to sync those predictions to Salesforce (and other tools), so that every Customer Success Manager can easily see the health of each of their accounts, and take appropriate action.

Prerequisites

Step 1: Predict Customer Churn

You can begin by training a model to predict customer churn. If you already have your own models, skip to Step 2, where you’ll deploy your model to Snowflake.

We’ll start by getting a DataFrame of our accounts. This particular example uses a sample from Modelbit. In reality, this dataset will come from anywhere you store your customer data.
This dataset includes client firmographic information (like industry and number of employees) as well as behavioral data (like the percent of the last thirty days they were active in the product, and the number of days late they were late in paying their bill). Finally, of course, we know whether the customer churned or not.

Next, you need to build a predictive model from your data. To build a model based on your sample data, you can use an XGBoost classifier with a OneHot encoder. The model specifics varies based on what makes the most sense for your data.

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier

pipeline = Pipeline([
    ('encoder', OneHotEncoder(handle_unknown = 'ignore')),
    ('classifier', XGBClassifier())
])

X_train, X_test, y_train, y_test = train_test_split(X, y)

pipeline.fit(X_train, y_train)

y_pred = pipeline.predict(X_test)
accuracy_score(y_test, y_pred)

Finally, you need to evaluate your model’s accuracy and iterate. The last line of code in this block outputs the model’s accuracy score, which is the number of classifications it got correct in testing. How high a score works for you will be a decision you’ll want to make in partnership with business stakeholders. If you decide you need to improve your score, it’s worth seeing what other data you may have, including them out as features in the model, and trying other model methods