The incomplete guide for migrating from pytz to Python’s standard library.
Published in · 7 min read · Nov 9, 2022
--
In the last few years, the Python community has made great progress on standard time zone support. This is also something that Django has recognized and adopted during the major version 4.0 by deprecating the pytz
library.
At Back Market, we embraced this advancement and migrated to the Python standard library.
In this blog post, we attempt to give explanations and a comprehensive, but surely incomplete, guide about this migration, as time zones are not always easy to reason about.
In previous roles, I encountered teams that plainly doubled the estimation of any task related to time zones. They acknowledged that it usually takes a while to wrap one’s head around a time zone issue.
Thanks to two Python Enhancement Proposals (PEP), Python provides through its standard library everything a developer needs to handle time zones and ambiguous moments. Namely the PEP-495 and PEP-615.
PEP 495 — Local Time Disambiguation introduces the fold
attribute to datetime.time
and datetime.datetime
classes. The attribute comes in handy during events where time is turned back, often by an hour for daylight saving time. During these events, the same moment happens twice, and is therefore ambiguous without the information the fold
attribute is bringing.
As if the timeline was folded, the parameter fold=1
indicates we are referencing the later instance of the ambiguous time. Whereas fold=0
is the prior one.
In France, on October 30, 2022, clocks were turned back one hour (Sweet! One hour more sleep! 😴). At 3:00, it was actually 2:00.
Imagine trying to get a datetime
object of October 30, at 2:30:
# Before Python 3.6
>>> from datetime import datetime
>>> from dateutil import tz # from python-dateutil lib
>>> PARIS = tz.gettz("Europe/Paris")>>> print(datetime(2022, 10, 30, 2, 30, tzinfo=PARIS))
2022-10-30 02:30:00+02:00
The datetime
is ambiguous, and we have no way of selecting the time zone after turning back the clocks. With the fold
parameter:
# Since Python 3.6>>> print(datetime(2022, 10, 30, 2, 30, tzinfo=PARIS, fold=0))
2022-10-30 02:30:00+02:00>>> print(datetime(2022, 10, 30, 2, 30, tzinfo=PARIS, fold=1))
2022-10-30 02:30:00+01:00
PEP 615 — Support for the IANA Time Zone Database in the Standard Library is making time zones available and usable out of the box for developers, without requiring any third-party packages.
PEP 615 relies on the system’s time zone data whenever possible. If not available, the PyPI package tzdata can serve as fallback.
In Python 3.9, the zoneinfo.ZoneInfo
class was introduced. It can be passed to the datetime.datetime
class constructor parameter tzinfo
to specify the time zone.
>>> from datetime import datetime
>>> from zoneinfo import ZoneInfo
>>> dt = datetime(2022, 10, 20, 8, tzinfo=ZoneInfo("Europe/Paris"))>>> print(dt)
2022-10-20 08:00:00+02:00>>> print(dt.astimezone(ZoneInfo("America/New_York")))
2022-10-20 02:00:00-04:00
If you are using pytz
, it might be worth considering migrating away from it. At Back Market, we did the move and never looked back.
Let’s discuss why getting rid of pytz
can be a good idea.
Pitfalls
The migration is not only motivated by using Python built-ins and being cool kids (even though we undoubtedly are 😎), but also because using pytz
implies a fair amount of pitfalls.
To summarize a few examples that Paul Ganssle, great guru of time zones in the Python community, explains in an awesome blog post:
Using the tzinfo
parameter of the standard datetime
constructors “does not work” with pytz
for many time zones:
>>> from datetime import datetime
>>> import pytz
>>> PARIS = pytz.timezone(“Europe/Paris”)>>> print(datetime(2022, 10, 30, 2, 30, tzinfo=PARIS))
2022–10–30 02:30:00+00:09
Go home pytz, you’re drunk! 🍻
Instead, one should do the following:
>>> PARIS.localize(datetime(2022, 10, 30, 2, 30))
2022–10–30 02:30:00+02:00
Similar issues arise with datetime
arithmetic operations, but they are still a bit tricky to get. We will dive into them later in the post, in the Gotcha section.
The 2038 bug
The pytz
library suffers from the 2038 bug. This means that, precisely at 03:14:07 UTC on 19 January 2038, the epoch integer will overflow resulting in a faulty behavior. That is because pytz
uses, at the time of writing, an outdated version of the IANA Time Zone Database that is affected by this bug.
Example shamelessly copied and adapted from a GitHub issue:
>>> from datetime import datetime, timezone
>>> from zoneinfo import ZoneInfo
>>> import pytz
>>> dt_2022 = datetime(2022, 5, 31, 22, tzinfo=timezone.utc)
>>> dt_2040 = datetime(2040, 5, 31, 22, tzinfo=timezone.utc)### With pytz, in 2022
>>> print(dt_2022.astimezone(pytz.timezone(“Europe/Paris”)))
2022–06–01 00:00:00+02:00### With pytz, in 2040
>>> print(dt_2040.astimezone(pytz.timezone(“Europe/Paris”)))
2040–05–31 23:00:00+01:00### With ZoneInfo, in 2022
>>> print(dt_2022.astimezone(ZoneInfo(“Europe/Paris”)))
2022–06–01 00:00:00+02:00### With ZoneInfo, in 2040
>>> print(dt_2040.astimezone(ZoneInfo(“Europe/Paris”)))
2040–06–01 00:00:00+02:00
Speeeeed
When using Python, performance is often not relevant to this level of precision, but it’s still good to know that ZoneInfo
outperforms pytz
in many cases.
One, very empirical, example measured with the IPython magic command timeit
:
In [1]: from zoneinfo import ZoneInfo
In [2]: import pytzIn [3]: %timeit ZoneInfo(“Europe/Paris”)
46.2 ns ± 0.119 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)In [4]: %timeit pytz.timezone(“Europe/Paris”)
471 ns ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
More metrics are available in the slides of this talk, still from the one and only Paul Ganssle.
So, convinced? Let’s move on, then!
Removing direct pytz usages
As you probably expect, finding and removing pytz
imports and usages will be the crux of the migration. And the ultimate change will be the removal of pytz
from your dependency requirements.
Here is a list of typical changes necessary for the migration:
- Use
datetime.timezone.utc
instead ofpytz.UTC
. - Use the
datetime
constructor parametertzinfo
or.replace()
instead of.localize()
. - Remove
.normalize()
calls. If thedatetime
is timezone-aware, then the “wall time” arithmetic should work even across daylight saving time clock changes. More on that in the Gotcha section below! - Use
zoneinfo.ZoneInfo
instead ofpytz.timezone
to get atzinfo
object.
See the pytz-deprecation-shim package, which can help with the transition if you cannot make all the changes right away. It also contains all the necessary information for the migration itself.
Django specifics
In Django 4.0, zoneinfo
became the default timezone implementation. This major release presents itself as an opportunity to upgrade your codebase on time zones before bumping the Django version.
For Django specifically, some typical changes are:
- Ensure that the
USE_DEPRECATED_PYTZ
setting is not overridden toTrue
. - Stop specifying the
is_dst
parameter of thedjango.utils.timezone.make_aware
function. It has no effect anymore. - Use
datetime.timezone.utc
instead ofdjango.utils.timezone.utc
.
With all these changes, you should be covered for most cases and be able to leave pytz
in the past!
Of course, we cannot guarantee that you don’t have a tricky edge case. But the community is here to help you!
When using timezone-aware datetime
objects, the semantics of arithmetic operations can be a bit misleading.
It boils down to the concepts of “wall time” and “absolute time” arithmetic, as, again, Paul Ganssle puts it.
The differences show well when trying to add “one day” to a datetime
:
>>> from datetime import datetime, timedelta
>>> import zoneinfo>>> PARIS = zoneinfo.ZoneInfo(“Europe/Paris”)
>>> dt = datetime(2022, 10, 30, 2, 0, 0, tzinfo=PARIS)>>> print(dt)
2022–10–30 02:00:00+02:00>>> print(dt + timedelta(days=1))
2022–10–31 02:00:00+01:00
The same time, but one day later. This is a “wall time” operation. And in our team, we intuitively agree that this makes sense and is somehow what we expect, even if there was a daylight saving time change.
But,
technically, 25 hours elapsed between the input and the result.
That’s where the “absolute time” comes into play. The “absolute time” represents 24 hours that elapsed between both datetime
objects — the strict definition of “1 day = 24 hours”. Whereas “wall time” relates here to the perceived elapsed time of “one day = the same time one day later”.
To do “absolute time” arithmetic operations, ensure that your datetime
objects are using UTC time zone.
>>> absolute_dt = dt.astimezone(timezone.utc) + timedelta(days=1)>>> print(absolute_dt.astimezone(PARIS))
2022–10–31 01:00:00+01:00
As a rule of thumb, you can remember:
- With UTC datetimes, “wall time” and “absolute time” are equivalent. Therefore the arithmetic can be used to do “absolute time” operations.
- When doing other time zone arithmetic operations, “wall time” is used.
- Avoid doing arithmetic operations between two
datetime
objects with different time zones — that’s where the sh*tshow begins.
This gotcha shows that the general advice “use UTC everywhere” cannot be applied in all the cases. The “wall time” semantics are often more intuitive to users.
A common example of these semantics is batch jobs that run in a timezone-aware context and are user-facing. At Back Market, if our sellers expect to receive invoices every morning at 7 am, this time shouldn’t shift one-hour back and forth according to daylight saving time.
That’s all for this gotcha and this article! We hope you learned something!
Thank you very much for reading, you are now all set to conquer the world in a timely manner!
If you want to join our Bureau of Technology or any other Back Market department, take a look here, we’re hiring! 🦄