Machine Learning¶

field of study that gives computers the ability to learn from data without being explicitly programmed

Types of Machine Learning¶

Supervised Learning¶

  • Is typically what people first think of when they hear "machine learning"
  • We ask the computer to make a prediction for us based on given data.
  • Examples:
    • Predict price of a home based on the number of bedrooms, square footage, floors,...
    • Predict what class an image belongs to, cat/dog ?
    • Predict the weather of next week
    • Predict the sentiment of a piece of text, like a tweet or a product review.
    • Classify if the incoming email is spam or not
    • Voice recognition: recognize and understand speach

Labeled Data¶

  • These labels allow the computer to "study" the relationship between the given information and the correct labels.

  • The computer will establish these relationships, and then, when we give it new information (unlabeled) it will apply what it learned from the labeled datasets to make its predictions.

Unsupervised Learning¶

  • In unsupervised learning, we do not provide labels
  • Examples:
    • Give the computer information about houses and ask it to create various groups of similar homes based on price, bedrooms, square feet, floors, waterfront, and year built
    • Categorize articles based on the same story from various news outlets.
    • Identify data points, events, and/or observations that deviate from a dataset's normal behavior.
    • Customer segmentation based in buying behavior can hep
    • Recommendation systems. Amazon and facebook

Types of supervised learning¶

  • Regression tasks predict a continuous value.
  • Classification tasks predict a categorical value.

Model validation¶

  • Model validation is referred to as the process where a trained model is evaluated with a testing data set.
  • Data set is first split into training and test sets
  • The training set is used to train the model
  • The test set is used to test the accuracy of the model
  • Typically, split training 75%, test 25%

Training and test split¶

Data Leakage¶

  • Data leakage is when information from outside the training dataset is used to create the model.
  • It invalidates the final evaluation of our model.