Hi Reader,
Last week, I released a 3-hour video, My top 50 scikit-learn tips.
I also finished Chapter 10 of my next ML course, which I'll tell you about once all 20 chapters are done 😅
Anyway, let’s get to today’s tip!
👉 Tip #13: Use resample() with time series data
For fun, I’ve been building an interactive dashboard using pandas, Plotly Express, and Shiny for Python. (Check out a screenshot here.)
The goal is to help me analyze sales of my online courses. And since I’m working with sales data, I’m reminded of how much I love the pandas resample function!
Let’s see an example of how to resample 😉
Pretend you have a DataFrame of sales data that looks like this:
You might ask: What are my total sales for each product?
In that case, you would use groupby:
You can read that code as: For each Product, this is the sum of the Sale column.
Another similar question you might ask is: What are my total sales for each day?
Instead of groupby, you would use resample, which I think of as “groupby for time series data”:
You can read that code as: For each day, this is the sum of the Sale column.
(Notice that it inserted 2023-03-31 with a value of 0, since there were no sales on that day.)
By changing the 'D' to an 'M', you can resample by month instead:
'D' and 'M' are known as the “offset alias”, and there are many other offset aliases you can use.
Finally, let’s say that the index is not a datetime column:
In that case, you need to use the 'on' parameter to specify the datetime column:
If you work with time series data, I bet you’ll find a use for resample!
If you enjoyed this week’s tip, please forward it to a friend! Takes only a few seconds, and it really helps me out! 🙌
See you next Tuesday!
- Kevin
P.S. What’s the worst volume control interface? (my favorite is #12)
Did someone awesome forward you this email? Sign up here to receive data science tips every week!