The data we'll be using comes from the Sendy Logistics Challenge on Zindi. You'll need to register on Zindi to join the competition and access the data, which comes as 5 separate files:

  • Train.csv - is the dataset that you will use to train your model.
  • Test.csv - is the dataset on which you will apply your model to.
  • Riders.csv - contains unique rider Ids, number of orders, age, rating and number of ratings
  • VariableDefinitions.csv - Definitions of variables in the Train, Test and Riders files
  • SampleSubmission.csv - Shows the submission format (we'll look at this later, in lesson 4)

Colab: https://colab.research.google.com/github/johnowhitaker/ds_zero/blob/master/04_Practice_On_Zindi.ipynb

Load the data

Explore

Baseline Model

Feature Engineering

A better model

Making a Submission

Comparing Models

Additional features from 'Riders'