I hope you’ve been enjoying these Tuesday Tips!
If you ever need to reference a past tip, you can find them all at tuesday.tips. (Yes, that’s a real URL!)
👉 Tip #7: Daylight Savings Time in pandas
In most of the US (plus a few other places in North America), Daylight Savings Time began on Sunday at 2:00am.
So what is Daylight Savings Time, why should you care about it, and how is it handled by pandas? Let’s find out!
To start, we need to create some example data. We’ll use the date_range function to create 6 times starting on March 12 at 4:00am with an hourly frequency (abbreviated as “H”), and then convert it to a pandas Series:
You might notice that nowhere in the data is the time zone specified! This is known as “timezone-naive” data.
If you were collecting sales data for a local coffee shop, using timezone-naive data would likely be fine since it’s all from the same location and it’s never being collected overnight.
But if you were collecting rainfall data across a continent, it would be critical to specify the time zone of your data!
To specify the time zone for our existing Series, we’ll use the tz_localize method and set it to “UTC”:
UTC isn’t actually a time zone, rather it’s the standard around which all time zones worldwide are based. UTC doesn’t change based on Daylight Savings Time, which is why it’s often used internally for data storage.
Our new Series is considered “timezone-aware” data, which is why “+00:00” has been appended to all of the times. That’s called the “UTC offset”, which is the difference between a given time and UTC. But since we’ve set the time zone to UTC, the offset is always zero.
Notice that the first three times have an offset of -05:00, and the last three times have an offset of -4:00.
That’s because on March 12 at 2:00am (when Daylight Savings Time started), the US Eastern Time Zone shifted from Eastern Standard Time (known as “EST” or “UTC-5”) to Eastern Daylight Time (known as “EDT” or “UTC-4”).
Thus, there’s no 2:00am local time in US Eastern Time on March 12, 2023.
That also means that there will be two instances of 1:00am on November 5, 2023, which is when Daylight Savings Time ends in the US:
Thus from mid-March to early November every year, US Eastern Time is 4 hours behind UTC, and from early November to mid-March, US Eastern Time is 5 hours behind UTC.
Keep in mind that only some countries observe Daylight Savings Time, and they also start and end DST on different dates. 🤦♂️
As such, we can be grateful that DST is handled by pandas automatically… all thanks to the one guy in California who maintains the time zone database used by basically every computer system in the world!
If you work with datetime data in pandas, hopefully this has given you some insights about how to work with time zones. (Here’s the code from this tip, which you can play around with!)
Otherwise, I hope this has at least given you a useful introduction to UTC, time zones, and Daylight Savings Time!
How useful was today's tip?
If you enjoyed this issue, please forward it to a friend! Takes only a few seconds, and it really helps me out 🙏
See you next Tuesday!
Did someone awesome forward you this email? Sign up here to receive data science tips every week!