Hi Reader,

Last week, I released a 3-hour video, My top 50 scikit-learn tips.

I also finished Chapter 10 of my next ML course, which I'll tell you about once all 20 chapters are done 😅

Anyway, let’s get to today’s tip!


👉 Tip #13: Use resample() with time series data

For fun, I’ve been building an interactive dashboard using pandas, Plotly Express, and Shiny for Python. (Check out a screenshot here.)

The goal is to help me analyze sales of my online courses. And since I’m working with sales data, I’m reminded of how much I love the pandas resample function!

Let’s see an example of how to resample 😉

Pretend you have a DataFrame of sales data that looks like this:

You might ask: What are my total sales for each product?

In that case, you would use groupby:

You can read that code as: For each Product, this is the sum of the Sale column.

Another similar question you might ask is: What are my total sales for each day?

Instead of groupby, you would use resample, which I think of as “groupby for time series data”:

You can read that code as: For each day, this is the sum of the Sale column.

(Notice that it inserted 2023-03-31 with a value of 0, since there were no sales on that day.)

By changing the 'D' to an 'M', you can resample by month instead:

'D' and 'M' are known as the “offset alias”, and there are many other offset aliases you can use.

Finally, let’s say that the index is not a datetime column:

In that case, you need to use the 'on' parameter to specify the datetime column:

If you work with time series data, I bet you’ll find a use for resample!


If you enjoyed this week’s tip, please forward it to a friend! Takes only a few seconds, and it really helps me out! 🙌

See you next Tuesday!

- Kevin

P.S. What’s the worst volume control interface? (my favorite is #12)

Did someone awesome forward you this email? Sign up here to receive data science tips every week!

Learn Data Science from Data School 📊

Join 25,000+ aspiring Data Scientists and receive Python & Data Science tips every Tuesday!

Read more from Learn Data Science from Data School 📊