profile

Learn Data Science from Data School 📊

Tuesday Tip #7: Time zones and Daylight Savings Time in pandas ☀️

Published about 1 year ago • 2 min read

Hi Reader,

I hope you’ve been enjoying these Tuesday Tips!

If you ever need to reference a past tip, you can find them all at tuesday.tips. (Yes, that’s a real URL!)


👉 Tip #7: Daylight Savings Time in pandas

In most of the US (plus a few other places in North America), Daylight Savings Time began on Sunday at 2:00am.

So what is Daylight Savings Time, why should you care about it, and how is it handled by pandas? Let’s find out!

To start, we need to create some example data. We’ll use the date_range function to create 6 times starting on March 12 at 4:00am with an hourly frequency (abbreviated as “H”), and then convert it to a pandas Series:

You might notice that nowhere in the data is the time zone specified! This is known as “timezone-naive” data.

If you were collecting sales data for a local coffee shop, using timezone-naive data would likely be fine since it’s all from the same location and it’s never being collected overnight.

But if you were collecting rainfall data across a continent, it would be critical to specify the time zone of your data!

To specify the time zone for our existing Series, we’ll use the tz_localize method and set it to “UTC”:

UTC isn’t actually a time zone, rather it’s the standard around which all time zones worldwide are based. UTC doesn’t change based on Daylight Savings Time, which is why it’s often used internally for data storage.

Our new Series is considered “timezone-aware” data, which is why “+00:00” has been appended to all of the times. That’s called the “UTC offset”, which is the difference between a given time and UTC. But since we’ve set the time zone to UTC, the offset is always zero.

To convert our Series to US Eastern Time (which is officially known as “America/New_York”), we’ll use the tz_convert method:

Notice that the first three times have an offset of -05:00, and the last three times have an offset of -4:00.

That’s because on March 12 at 2:00am (when Daylight Savings Time started), the US Eastern Time Zone shifted from Eastern Standard Time (known as “EST” or “UTC-5”) to Eastern Daylight Time (known as “EDT” or “UTC-4”).

Thus, there’s no 2:00am local time in US Eastern Time on March 12, 2023.

That also means that there will be two instances of 1:00am on November 5, 2023, which is when Daylight Savings Time ends in the US:

Thus from mid-March to early November every year, US Eastern Time is 4 hours behind UTC, and from early November to mid-March, US Eastern Time is 5 hours behind UTC.

Keep in mind that only some countries observe Daylight Savings Time, and they also start and end DST on different dates. 🤦‍♂️

As such, we can be grateful that DST is handled by pandas automatically… all thanks to the one guy in California who maintains the time zone database used by basically every computer system in the world!

If you work with datetime data in pandas, hopefully this has given you some insights about how to work with time zones. (Here’s the code from this tip, which you can play around with!)

Otherwise, I hope this has at least given you a useful introduction to UTC, time zones, and Daylight Savings Time!

How useful was today's tip?

🤩🙂😐


If you enjoyed this issue, please forward it to a friend! Takes only a few seconds, and it really helps me out 🙏

See you next Tuesday!

- Kevin

P.S. All modern digital infrastructure

Did someone awesome forward you this email? Sign up here to receive data science tips every week!

Learn Data Science from Data School 📊

Kevin Markham

Join 25,000+ aspiring Data Scientists and receive Python & Data Science tips every Tuesday!

Read more from Learn Data Science from Data School 📊

Hi Reader, Last week, I recorded the FINAL 28 LESSONS 🎉 for my upcoming course, Master Machine Learning with scikit-learn. That's why you didn't hear from me last week! 😅 I edited one of those 28 videos and posted it on YouTube. That video is today's tip, which I'll tell you about below! 👉 Tip #45: How to read the scikit-learn documentation In order to become truly proficient with scikit-learn, you need to be able to read the documentation. In this video lesson, I’ll walk you through the five...

2 days ago • 1 min read

Hi Reader, happy Tuesday! My recent tips have been rather lengthy, so I'm going to mix it up with some shorter tips (like today's). Let me know what you think! 💬 🔗 Link of the week A stealth attack came close to compromising the world's computers (The Economist) If you haven't heard about the recent "xz Utils backdoor", it's an absolutely fascinating/terrifying story! In short, a hacker (or team of hackers) spent years gaining the trust of an open-source project by making helpful...

16 days ago • 1 min read

Hi Reader, Today's tip is drawn directly from my upcoming course, Master Machine Learning with scikit-learn. You can read the tip below or watch it as a video! If you're interested in receiving more free lessons from the course (which won't be included in Tuesday Tips), you can join the waitlist by clicking here: Yes, I want more free lessons! 👉 Tip #43: Should you discretize continuous features for Machine Learning? Let's say that you're working on a supervised Machine Learning problem, and...

23 days ago • 2 min read
Share this post