The Machine Learning Toolbox

Communities
- ML Collective
- Kaggle
Concepts
- Degenerate feedback loops
  - Further reading
- Data distribution shift
  - Further reading
Frameworks
- Pandas
  - When (not) to use it
  - Further reading

Communities

ML Collective

The ML Collective is a nonprofit organization that connects you with other machine learning practicioners.

Link to ML Collective

Kaggle

The primary supervised learning machine learning competition platform

Link to Kaggle

Concepts

Degenerate feedback loops

Predictions influence feedback, where the feedback is used to extract labels (e.g. recommender systems that propose popular items based on how often they’re clicked),
Detect degenerate feedback loops using aggregate diversity or average coverage of long tail items,
Introduce randomization into recommendations / predictions to gather more realistic feedback (downside user experience),
Capture features of popularity (e.g. position in recommendation list) for prediction model,

Data distribution shift

For our machine learning model we call the inputs X and the outouts Y. The training data in supervised learning is a sample of the (unknown) joint distribution P(X, Y). In machine learning we usually model P(Y|X) - i.e. the conditional probability of the output given some observed input.

P(X, Y) = P(X

Y) P(Y) = P(Y

X) P(X)

Covariate shift: P(X) changes while P(Y

X) is unchanged (distribution of the input changes but the distribution of the output given the input is unchanged)

Label shift: P(Y) changes while P(X Y) is unchanged
Concept drift: P(Y X) changes while P(X) is unchanged

Frameworks

Pandas

Pandas is the primary data manipulation framework for data scientists in Python. It entails and operates on two primary data models: Series, one-dimensional data / table columns, and dataframes, two-dimensional data akin to tables.

When (not) to use it

Use when the data you’re manipulating fit in memory

Communities

ML Collective

Kaggle

Concepts

Degenerate feedback loops

Further reading

Data distribution shift

Further reading

Frameworks

Pandas

When (not) to use it

Further reading