The need for causality modelling

Intelligent planning and decision-making lie at the heart of most business success.

The decisions that our business needs to evaluate can range from those that are relatively low effort and we take potentially thousands or millions of times a day to those that are high effort and are taken every couple of months:

What will happen if I show an advertising banner to a particular user?
What will happen if I change the retail prices for certain products in my shop?
What will happen if I alter my manufacturing process?
What will happen if I swap out a particular mechanical piece in a vehicle I develop?
What will happen if I invest in new property, machinery, or processes?
What will happen if I hire this applicant?
What if I increase remuneration of my workforce?

As industrial data scientists we are oftentimes called upon to evaluate these proposed business decisions using analytics, machine learning methodologies, and past data.

What we may end up doing for the above proposed business decisions is:

Compute and rank past click-through rates for given pairs of ad banner and user,
Correlate past demand with set retail prices for product groups of interest,
Correlate past manufacturing parameters with achieved output quality,
Correlate the mechanical behavior of my vehicles with the mechnical parts used in it,
Use past data to forecast the development of real estate prices,
Use past data to correlate and predict the productivity of my team given e.g. its size or makeup, and
Use past data to correlate productivity and remuneration levels.

The way I formulated these is already pretty suggestive - but essentially some of our common approaches to evaluating business decisions do not compare our business outcomes with and without said business decisions but they rather look at our data outside the context of decision-making.

Put another way, we oftentimes analyze past data without considering the state our business or customer is in when those data were generated. For illustration:

Data fusion process (5)

So really when tasked with evaluating the above proposed business decisions we should instead think in terms of questions akin the following:

How would the user of interest behave differently if we didn't show them (and pay for) a banner now?
For each Euro we shave off a price tag how much higher will our revenue be since more customers are inclinced to place an order?
4. 5. 6. 7.

How to do causality modelling

The authors Hünermund and Bareinboim (https://arxiv.org/abs/1912.09104) proposed a methodology they called data-fusion process.

The data-fusion process maps out the individual steps necessary for evaluating the impact of past and potential future decisions:

The data-fusion process.

Use case: The impact of direct marketing on customer behavior

We'll use a data set provided by UCI (https://archive.ics.uci.edu/ml/datasets/Bank+Marketing) that demonstrates the potential impact of direct marketing on customer success.

Let's dive right in, download the data set and see what we are working with.

The direct marketing success data set

!wget --quiet https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip

!unzip -oqq bank.zip

import pandas as pd

df = pd.read_csv('bank.csv', delimiter=';')
df['success'] = df['y']
del df['y']
df['success'] = df['success'].replace('no', 0)
df['success'] = df['success'].replace('yes', 1)
del df['duration']
df['no_contacts'] = df['campaign']
del df['campaign']

df.head()

Our tabular marketing and sales data shows a number of features we observe about a given customer and our interaction with them:

The customer's age, job, marital status, education, current account balance, and whether or not they already took out a loan are recorded,
Our direct marketing interaction with a given customer is also recorded, for instance, how often we already contacted them.

A more detailed description of the features in our data can be found here:

https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

Trying to help our business with machine learning only

target = 'success'
features = [column for column in df.columns if column != target]

import lightgbm as lgb
from sklearn.preprocessing import OrdinalEncoder

model = lgb.LGBMClassifier()

X, y = df[features], df[target]

numerical_features = ['age', 'balance', 'no_contacts', 'previous', 'pdays']
categorical_features = [feature for feature in features if feature not in numerical_features]

encoder = OrdinalEncoder(dtype=int)

X_numeric = pd.concat(
    [
        X[numerical_features],
        pd.DataFrame(
            data=encoder.fit_transform(X[categorical_features]),
            columns=categorical_features
        )
    ],
    axis=1
)

X_numeric.head()

model.fit(X_numeric, y)

LGBMClassifier()

%matplotlib inline

lgb.plot_importance(model);

There are numerous ways to compute feature importance and this one implemented in the LightGBM library measures the number of times a given feature is used in the constructed trees:

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.plot_importance.html

In general, feature importance gives us a measure of how well a given measured variable correlates with the target (marketing success in our case).

The question here is: How can we use our trained success predictor and our feature importances to aid intelligent plannning and decision-making in our business?

	age	job	marital	education	default	balance	housing	loan	contact	day	month	pdays	previous	poutcome	no_contacts
0	30	unemployed	married	primary	no	1787	no	no	cellular	19	oct	-1	0	unknown	1
1	33	services	married	secondary	no	4789	yes	yes	cellular	11	may	339	4	failure	1
2	35	management	single	tertiary	no	1350	yes	no	cellular	16	apr	330	1	failure	1
3	30	management	married	tertiary	no	1476	yes	yes	unknown	3	jun	-1	0	unknown	4
4	59	blue-collar	married	secondary	no	0	yes	no	unknown	5	may	-1	0	unknown	1

	age	balance	no_contacts	previous	pdays	job	marital	education	housing	loan	contact	day	month	poutcome
0	30	1787	1	0	-1	10	1	0	0	0	0	18	10	3
1	33	4789	1	4	339	7	1	1	1	1	0	10	8	0
2	35	1350	1	1	330	4	2	2	1	0	0	15	0	0
3	30	1476	4	0	-1	4	1	2	1	1	2	2	6	3
4	59	0	1	0	-1	1	1	1	1	0	2	4	8	3

Causality modelling in Python for data scientists

The need for causality modelling

How to do causality modelling

Use case: The impact of direct marketing on customer behavior

The direct marketing success data set

Trying to help our business with machine learning only

Uses for