A Lightning Introduction to some basic Data Science tools and concepts.

I'm working with some cool people at the moment, and we've decided to try and learn some Data Science together. This project is going to end up being some combination of:

  • A short 'course' with some learning material and lots of exercises
  • A collection of links to other great resources
  • One or more 'suggested learning paths' - routes we find helpful for others wanting to learn and not sure where to start.

We're all pretty busy, so I'm going to try and keep things as minimal as possible.

This whole thing is built with NBDev. It should be up on https://johnowhitaker.github.io/ds_zero/, all built in Jupyter Notebooks.

What are we learning?

I tend to define 'Data Science' as the process of taking data and processing that into something useful. This can be summarizing data in a way that generates insight. It can be using existing data to derive or infer something new. It can be learning how to make better decisions from records of the past. Such a broad definition would indicate that data science comes into play in all sorts of fields. Indeed, I believe that the techniques and more importantly the mindset of data science will be an incredibly useful tool in any career or situation, not just in the relatively small world of what we might term 'corporate' data science.

With that context in mind, the obvious question is: how would one learn Data Science? Where would you even begin? This course is designed as a partial answer to that question. We'll look at a few of the most common tools, solve some common problems and, hopefully, provide enough pf a starting point that you can begin working on your own projects.

The Plan

I've split the content into individual 'lessons', which you can access from the left-hand menu. As you can see, we're only one lesson in! All feedback appreciated :)

Lesson 1 - Meet the Tools

In this lesson, we do a lightning overview of some of the libraries we'll use in the next three or four lessons. You'll load your first dataset with pandas, make some pretty plots with matplotlib and get an early preview of machine learning in action.

At the bottom of the notebook are some exercises - finish those and you'll be good and ready for lesson 2.

Lesson 2 - Data Cleaning & Visualization

At the end of lesson 1 we saw a machine learning model being created, trained and used to make predictions... in three lines of code. We don't understand that code yet, but it hints at a secret: the machine learning bit is often the easy part. So for this lesson, we look more closely at the steps that come before the modelling: data cleaning and exploration.

Lesson 3 - Supervised Learning

With some basics covered, let's revisit the machine learning example from lesson one and build on that, exploring some of the different kinds of tasks you may encounter, playing with different methods and learning how to choose the best tool for the job.

Lesson 4 - Entering Competitions (Practice)

Putting the lessons into practice, and learning some extra tips to make our models better.

Lesson 5 - Deep Learning: Image Classification

Image Classification is one of the most popular applications of deep learning. This lesson shows how easy it can be, and gives a starting point for going deeper into this topic.

Lesson 6 - Where Next?

There are many topics not covered here - where would one go to lern more? A collection of resources and suggestions, as well as some final project ideas

Resources & References

We'll keep this section updated as we add more content.