Decision intelligence from historical observations for optimal marketing resource use
In this article we analyze direct marketing data and prototype a decision model to optimize future marketing uplift.
- Summary
- Fetch the data
- Load Python libraries
- Prepare data
- High-level model: more is better
- Nuanced model: more isn't always better and there are always tradeoffs
Summary
Marketing is a key success and revenue driver in B2C markets: An appropriate message placed at the appropriate time with a prospective customer will increase your business success.
However, marketing is also a major cost driver for businesses: Marketing efforts that are too broad, target the wrong audience, or convey the wrong message waste resources.
In the case of direct marketing via phone conversations a key cost factor is the amount of time a sales call agent spends with the prospective customer on the phone.
In this article we explore, rudimentarily, direct marketing data of a Portuguese financial institution.
We explore the relationship between call duration and success (purchase of offered financial product), and show that consideration of customer-specific factors influences how you should allocate your marketing resources.
Our prototypical analysis can be usueful in devising data-driven marketing and sales strategies that offer decision intelligence for your call agents.
Fetch the data
For our prototype we use the openly accessible Bank Marketing Data Set from the UC Irvine Machine Learning Repository.
!wget --quiet https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip
!unzip -oqq bank.zip
import graphviz
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder
np.random.seed(42)
#collapse
df = pd.read_csv('bank.csv', delimiter=';')
df['success'] = df['y']
del df['y']
df['success'] = df['success'].replace('no', 0)
df['success'] = df['success'].replace('yes', 1)
del df['education']
del df['default']
del df['housing']
del df['loan']
del df['contact']
del df['day']
del df['month']
del df['campaign']
del df['pdays']
del df['previous']
del df['poutcome']
Our tabular data set now looks as follows: Each prospective (and in some cases eventual) customer whom a call agent conversed with fills a row. On each row we have numerical variables (age, account balance, duration of sales interaction) and categorical variables (job / employment status and marital status). Our data set contains 4,521 sales interactions.
df
To test our model, we discretize the duration of our interaction with the customer into six duration buckets: bucket 1 holds the shortest interactions while bucket 6 holds the longest interactions.
no_buckets = 6
df['duration_bucket'] = pd.qcut(df['duration'], no_buckets, labels=[f'bucket {b + 1}' for b in range(no_buckets)])
df.groupby('duration_bucket').agg({'success': 'mean'})
Looking at the average success rate in each duration bucket shows us that there is positive correlation between the duration of a sales interaction and our success rate - just as our model predicted.
Hence, more marketing spend appears to lead to greater success in general.
From a data perspective this is a pretty disappointing result as we expect to glean more intelligent insights from all the data we collected.
Nuanced model: more isn't always better and there are always tradeoffs
Let's dig deeper into what is going on here: Yes, the duration of the interaction between call agent and prospective customer likely influences our success rate.
However, call agents also probably choose to spend more time on the phone with customers whose account balance is higher - hoping for a greater chance of a sale. That same account balance also likely influences how affine the customer is for spending more money on financial products.
Both present job status and marital status are also likely candidates for influencing an affinity for financial products.
And age of the customer probably influences both their job and marital status.
Since a customer's account balance probably influences both how much time we spend with them on the phone and their likelihood of purchasing another financial product we will control for account balance.
We control for account balance by training a cluster algorithm that segments our data set into three groups of similar account balance.
df['job'] = LabelEncoder().fit_transform(df['job'])
df['marital'] = LabelEncoder().fit_transform(df['marital'])
segmenter = KMeans(n_clusters=3, random_state=42)
df['segment'] = segmenter.fit_predict(df[['balance']])
df['segment'] = df['segment'].replace({0: 'low balance', 1: 'high balance', 2: 'medium balance'})
Looking at both the average account balance and age in our three segments, we notice that our clustering algorithm picked out low, medium, and high balance segments.
We also notice that average age correlates with average balance in these three segments hence our intuition codified in our above model seems valid.
df.groupby('segment').agg({'age': 'mean', 'balance': 'mean'})
Now, what about the effectiveness of our marketing resources in each segment?
Visualizing our rate of success in the six duration buckets broken down by account balance segment we see a more nuanced picture:
- Customers with low account balances really need to be worked on and only show success rates greater than 20% in the highest duration bucket 6,
- customers with medium balances already show a greater than 20% purchase likelihood in duration bucket 4, and
- customers with high balances actually max out in duration bucket 5 and drop below a 20% success rate in bucket 6.
success_rates = df.groupby(['segment', 'duration_bucket']).agg({'success': 'mean'}).reset_index()
sns.set(rc={'figure.figsize': (10,6)})
sns.barplot(
x='duration_bucket',
y='success',
hue='segment',
data=success_rates,
hue_order=['low balance', 'medium balance', 'high balance']
);
Our more nuanced model and analysis provide us with data-driven insights that provide actionable and testable advice:
- We should probably re-evaluate whether low balance individuals are sensible targets for our marketing campaigns given how resource-intensive they are,
- compute the profit and loss tradeoff between spending bucket 4 and bucket 6 resources on medium balance individuals, and
- ensure that we do not overdo it with our calls for high balance individuals.