Communities

ML Collective

The ML Collective is a nonprofit organization that connects you with other machine learning practicioners.

Link to ML Collective

Kaggle

The primary supervised learning machine learning competition platform

Link to Kaggle

Concepts

Degenerate feedback loops

  • Predictions influence feedback, where the feedback is used to extract labels (e.g. recommender systems that propose popular items based on how often they’re clicked),
  • Detect degenerate feedback loops using aggregate diversity or average coverage of long tail items,
  • Introduce randomization into recommendations / predictions to gather more realistic feedback (downside user experience),
  • Capture features of popularity (e.g. position in recommendation list) for prediction model,

Further reading

Data distribution shift

For our machine learning model we call the inputs X and the outouts Y. The training data in supervised learning is a sample of the (unknown) joint distribution P(X, Y). In machine learning we usually model P(Y|X) - i.e. the conditional probability of the output given some observed input.

P(X, Y) = P(X Y) P(Y) = P(Y X) P(X)
  • Covariate shift: P(X) changes while P(Y X) is unchanged (distribution of the input changes but the distribution of the output given the input is unchanged)
  • Label shift: P(Y) changes while P(X Y) is unchanged
  • Concept drift: P(Y X) changes while P(X) is unchanged

Further reading

Frameworks

Pandas

Pandas is the primary data manipulation framework for data scientists in Python. It entails and operates on two primary data models: Series, one-dimensional data / table columns, and dataframes, two-dimensional data akin to tables.

When (not) to use it

  • Use when the data you’re manipulating fit in memory

Further reading