Part of my Python FAQ, which is doomed to never be finished.
Maybe you have a Python 2 codebase. Maybe you’d like to make it work with Python 3. Maybe you really wish someone would write a comically long article on how to make that happen.
I have good news! You’re already reading one.
(And if you’re not sure why you’d want to use Python 3 in the first place, perhaps you’d be interested in the companion article which delves into exactly that question?)
Don't be intimidated
This article is quite long, but don’t take that as a sign that this is necessarily a Herculean task. I’m trying to cover every issue I can ever recall running across, which means a lot of small gotchas.
I’ve ported several codebases from Python 2 to Python 2+3, and most of them have gone pretty smoothly. If you have modern Python 2 code that handles Unicode responsibly, you’re already halfway there.
However… if you still haven’t ported by now, almost eight years after Python 3.0 was first released, chances are you have either a lumbering giant of an app or ancient and weird 2.2-era code. Or, perish the thought, a lumbering giant consisting largely of weird 2.2-era code. In that case, you’ll want to clean up the more obvious issues one at a time, then go back and start worrying about actually running parts of your code on Python 3.
On the other hand, if your Python 2 code is pretty small and you’ve just never gotten around to porting, good news! It’s not that bad, and much of the work can be done automatically. Python 3 is ultimately the same language as Python 2, just with some sharp bits filed off.
Making some tough decisions
We say “porting from 2 to 3”, but what we usually mean is “porting code from 2 to both 2 and 3”. That ends up being more difficult (and ugly), since rather than writing either 2 or 3, you have to write the common subset of 2 and 3. As nifty as some of the features in 3 are, you can’t actually use any of them if you have to remain compatible with Python 2.
The first thing you need to do, then, is decide exactly which versions of Python you’re targeting. For 2, your options are:
-
Python 2.5+ is possible, but very difficult, and this post doesn’t really discuss it. Even something as simple as exception handling becomes painful, because the only syntax that works in Python 3 was first introduced in Python 2.6. I wouldn’t recommend doing this.
-
Python 2.6+ used to be fairly common, and is well-tread ground. However, Python 2.6 reached end-of-life in 2013, and some common libraries have been dropping support for it. If you want to preserve Python 2.6 compatibility for the sake of making a library more widely-available, well, I’d urge you to reconsider. If you want to preserve Python 2.6 compatibility because you’re running a proprietary app on it, you should stop reading this right now and go upgrade to 2.7 already.
-
Python 2.7 is the last release of the Python 2 series, but is guaranteed to be supported until at least 2020. The major focus of the release was backporting a lot of minor Python 3 features, making it the best possible target for code that’s meant to run on both 2 and 3.
-
There is, of course, also the choice of dropping Python 2 support, in which case this process will be much easier. Python 2 is still very widely-used, though, so library authors probably won’t want to do this. App authors do have the option, but unless your app is trivial, it’s much easier to maintain Python 2 support during the port — that way you can port iteratively, and the app will still function on Python 2 in the interim, rather than being a 2/3 hybrid that can’t run on either.
Most of this post assumes you’re targeting Python 2.7, though there are mentions of 2.6 as well.
You also have to decide which version of Python 3 to target.
-
Python 3.0 and 3.1 are forgettable. Python 3 was still stabilizing for its first couple minor versions, and from what I hear, compatibility with both 2.7 and 3.0 is a huge pain. Both versions are also past end-of-life.
-
Python 3.2 and 3.3 are a common minimum version to target. Python 3.3 reinstated support for
u'...'
literals (redundant in Python 3, where normal strings are already Unicode), which makes supporting both 2 and 3 much easier. I bundle it with Python 3.2 because the latest version that stable PyPy supports is 3.2, but it also supportsu'...'
literals. You’ll support the biggest surface area by targeting that, a sort of 3.2½. (There’s an alpha PyPy supporting 3.3, but as of this writing it’s not released as stable yet.) -
Python 3.4 and 3.5 add shiny new features, but you can only really use them if you’re dropping support for Python 2. Again, I’d suggest targeting Python 2.7 + Python 3.2½ first, then dropping the Python 2 support and adding whatever later Python 3 trinkets you want.
Another consideration is what attitude you want your final code to take. Do you want Python 2 code with enough band-aids that it also works on Python 3, or Python 3 code that’s carefully written so it still works on Python 2? The differences are subtle! Consider code like x = map(a, b)
. map
returns a list in Python 2, but a lazy iterable in Python 3. Which way do you want to port this code?
1# Python 2 style: force eager evaluation, even on Python 3
2x = list(map(a, b))
3
4# Python 3 style: use lazy evaluation, even on Python 2
5try:
6 from future_builtins import map
7except ImportError:
8 pass
9x = map(a, b)
The answer may depend on which Python you primarily use for development, your target audience, or even case-by-case based on how x
is used.
Personally, I’d err on the side of preserving Python 3 semantics and porting them to Python 2 when possible. I’m pretty used to Python 3, though, and you or your team might be thrown for a loop by changing Python 2’s behavior.
At the very least, prefer if PY2
to if not PY3
. The former stresses that Python 2 is the special case, which is increasingly true going forward. Eventually there’ll be a Python 4, and perhaps even a Python 5, and those future versions will want the “Python 3” behavior.
Some helpful tools
The good news is that you don’t have to do all of this manually.
2to3
is a standard library module (since 2.6) that automatically modifies Python 2 source code to change some common Python 2 constructs to the Python 3 equivalent. (It also doubles as a framework for making arbitrary changes to Python code.)
Unfortunately, it ports 2 to 3, not 2 to 2+3. For libraries, it’s possible to rig 2to3
to run automatically on your code just before it’s installed on Python 3, so you can keep writing Python 2 code — but 2to3
isn’t perfect, and this makes it impossible to develop with your library on Python 3, so Python 3 ends up as a second-class citizen. I wouldn’t recommend it.
The more common approach is to use something like six
, a library that wraps many of the runtime differences between 2 and 3, so you can run the same codebase on both 2 and 3.
Of course, that still leaves you making the changes yourself. A more recent innovation is the python-future project, which combines both of the above. It has a future
library of renames and backports of Python 3 functionality that goes further than six
and is designed to let you write Python 3-esque code that still runs on Python 2. It also includes a futurize
script, based on the 2to3
plumbing, that rewrites your code to target 2+3 (using python-future’s library) rather than just 3.
The nice thing about python-future is that it explicitly takes the stance of writing code against Python 3 semantics and backporting them to Python 2. It’s very dedicated to this: it has a future.builtins
module that includes not only easy cases like map
, but also entire pure-Python reimplementations of types like bytes
. (Naturally, this adds some significant overhead as well.) I do like the overall attitude, but I’m not totally sold on all the changes, and you might want to leaf through them to see which ones you like.
futurize
isn’t perfect, but it’s probably the best starting point. The 2to3
design splits the various edits into a variety of “fixers” that each make a single style of change, and futurize
works the same way, inheriting many of the fixers from 2to3
. The nice thing about futurize
is that it groups the fixers into “stages”, where stage 1 (futurize --stage1
) only makes fairly straightforward changes, like fixing the except
syntax. More importantly, it doesn’t add any dependencies on the future
library, so it’s useful for making the easy changes even if you’d prefer to use six
. You’re also free to choose individual fixes to apply, if you discover that some particular change breaks your code.
Another advantage of this approach is that you can tackle the porting piecemeal, which is great for very large projects. Run one fixer at a time, starting with the very simple ones like updating to except ... as ...
syntax, and convince yourself that everything is fine before you do the next one. You can make some serious strides towards 3 compatibility just by eliminating behavior that already has cromulent alternatives in Python 2.
If you expect your Python 3 port to take a very long time — say, if you have a large project with numerous developers and a frantic release schedule — then you might want to prevent older syntax from creeping in with a tool like autopep8, which can automatically fix some deprecated features with a much lighter touch. If you’d like to automatically enforce that, say, from __future__ import absolute_import
is at the top of every Python file, that’s a bit beyond the scope of this article, but I’ve had pre-commit + reorder_python_imports
thrust upon me in the past to fairly good effect.
Anyway! For each of the issues below, I’ll mention whether futurize
can fix it, the name of the responsible fixer, and whether six
has anything relevant. If the name of the fixer begins with lib2to3
, that means it’s part of the standard library, and you can use it with 2to3
without installing python-future.
Here we go!
Things you shouldn't even be doing
These are ancient, ancient practices, and even Python 2 programmers may be surprised by them. Some of them are arguably outright bugs in the language; others are just old and forgotten. They generally have equivalents that work even in older versions of Python 2.
Old-style classes
1class Foo:
2 ...
In Python 3, this code creates a class that inherits from object
. In Python 2, it creates a completely different kind of thing entirely: an “old-style” class, which worked a little differently from built-in types. The differences are generally subtle:
-
Old-style classes don’t support
__getattribute__
,__slots__
-
Old-style classes don’t correctly support data descriptors, i.e. the assignment behavior of
@property
. -
Old-style classes had a
__coerce__
method, which would attempt to turn a value into a built-in numeric type before performing a math operation. -
Old-style classes didn’t use the C3 MRO, so in the case of diamond inheritance, a class could be skipped entirely by
super()
. -
Old-style instances check the instance for a special method name; new-style instances check the type. Additionally, if a special method isn’t found on an old-style instance, the lookup falls back to
__getattr__
; this is not the case for new-style classes (which makes proxying more complicated).
That last one is the only thing old-style classes can do that new-style classes cannot, and if you’re relying on it, you have a bit of refactoring to do. (The really curious thing is that there doesn’t seem to be a particularly good reason for the limitation on new-style classes, and it doesn’t even make things faster. Maybe that’ll be fixed in Python 4?)
If you have no idea what any of that means or why you should care, chances are you’re either not using old-style classes at all, or you’re only using them because you forgot to write (object)
somewhere. In that case, futurize --stage2
will happily change class Foo:
to class Foo(object):
for you, using the libpasteurize.fixes.fix_newstyle
fixer. (Strictly speaking, this is a Python 2 compatibility issue, since the old syntax still works fine in Python 3 — it just means something else now.)
cmp
Python 2 originally used the C approach for sorting. Given two things A
and B
, a comparison would produce a negative number if A < B
, zero if A == B
, and a positive number if A > B
. This was the only way to customize sorting; there’s a cmp()
built-in function, a __cmp__
special method, and cmp
arguments to list.sort()
and sorted()
.
This is a little cumbersome, as you may have noticed if you’ve ever tried to do custom sorting in Perl or JavaScript. Even a case-insensitive sort involves repeating yourself. Most custom sorts will have the same basic structure of cmp(op(a), op(b))
, when the only thing you really care about is op
.
1names.sort(cmp=lambda a, b: cmp(a.lower(), b.lower()))
But more importantly, the C approach is flat-out wrong for some types. Consider sets, which use comparison to indicate subsets versus supersets:
1{1, 2} < {1, 2, 3} # True
2{1, 2, 3} > {1, 2} # True
3{1, 2} < {1, 2} # False
4{1, 2} <= {1, 2} # True
So what to do with {1, 2} < {3, 4}
, where none of the three possible answers is correct?
Early versions of Python 2 added “rich comparisons”, which introduced methods for all six possible comparisons: __eq__
, __ne__
, __lt__
, __le__
, __gt__
, and __ge__
. You’re free to return False
for all six, or even True
for all six, or return NotImplemented
to allow deferring to the other operand. The cmp
argument became key
instead, which allows mapping the original values to a different item to use for comparison:
1names.sort(key=lambda a: a.lower())
(This is faster, too, since there are fewer calls to the lambda, fewer calls to .lower()
, and no calls to cmp
.)
So, fixing all this. Luckily, Python 2 supports all of the new stuff, so you don’t need compatibility hacks.
To replace simple implementations of __cmp__
, you need only write the appropriate rich comparison methods. You could even do this the obvious way:
1class Foo(object):
2 def __cmp__(self, other):
3 return cmp(self.prop, other.prop)
4
5 def __eq__(self, other):
6 return self.__cmp__(other) == 0
7
8 def __ne__(self, other):
9 return self.__cmp__(other) != 0
10
11 def __lt__(self, other):
12 return self.__cmp__(other) < 0
13
14 ...
You would also have to change the use of cmp
to a manual if
tree, since cmp
is gone in Python 3. I don’t recommend this.
A lazier alternative would be to use functools.total_ordering
(backported from 3.0 into 2.7), which generates four of the comparison methods, given a class that implements __eq__
and one other:
1@functools.total_ordering
2class Foo(object):
3 def __eq__(self, other):
4 return self.prop == other.prop
5
6 def __lt__(self, other):
7 return self.prop < other.prop
There are a couple problems with this code. For one, it’s still pretty repetitive, accessing .prop
four times (and imagine if you wanted to compare several properties). For another, it’ll either cause an error or do entirely the wrong thing if you happen to compare with an object of a different type. You should return NotImplemented
in this case, but total_ordering
doesn’t handle that correctly until Python 3.4. If those bother you, you might enjoy my own classtools.keyed_ordering
, which uses a __key__
method (much like the key
argument) to generate all six methods:
1@classtools.keyed_ordering
2class Foo(object):
3 def __key__(self):
4 return self.prop
Replacing uses of key
arguments should be straightforward: a cmp
argument of cmp(op(a), op(b))
becomes a key
argument of op
. If you’re doing something more elaborate, there’s a functools.cmp_to_key
function (also backported from 3.0 to 2.7), which converts a cmp
function to one usable as a key
. (The implementation is much like the first Foo
example above: it involves a class that calls the wrapped function from its comparison methods, and returns True
or False
depending on the return value.)
Finally, if you’re using cmp
directly, don’t do that. If you really, really need it for something other than Python’s own sorting, just use an if
.
The only help futurize
offers is in futurize --stage2
, via libfuturize.fixes.fix_cmp
, which adds an import of past.builtins.cmp
if it detects you’re using the cmp
function anywhere.
Comparing incompatible types
Python 2’s use of C-style ordering also means that any two objects, of any types, must be either equal or occur in some defined order. Python’s answer to this problem is to sort on the names of the types. So None < 3 < "1"
, because "NoneType" < "int" < "str"
.
Python 3 removes this fallback rule; if two values don’t know how to compare against each other (i.e. both return NotImplemented
), you just get a TypeError
.
This might affect you in subtle ways, such as if you’re sorting a list of objects that may contain None
s and expecting it to silently work. The fix depends entirely on the type of data you have, and no automated tool can handle that for you. Most likely, you didn’t mean to be sorting a heterogenous list in the first place.
Of course, you could always sort on type(x).__name__
, but I don’t know why you would do that.
The sets
module
Python 2.3 introduced its set types as Set
and ImmutableSet
in the sets
module. Since Python 2.4, they’ve been built-in types, set
and frozenset
. The sets
module is gone in Python 3, so just use the built-in names.
Creating exceptions
Python 2 allows you to do this:
1raise RuntimeError, "an error happened at runtime!!"
There’s not really any good reason to do this, since you can just as well do:
1raise RuntimeError("an error happened at runtime!!")
futurize --stage1
will rewrite the two-arg form to a regular object creation via the libfuturize.fixes.fix_raise
fixer. It’ll also fix this alternative way of specifying an exception type, which is so bizarre and obscure that I did not know about it until I read the fixer’s source code:
1raise (((A, B), C), ...) # equivalent to `raise A` (?!)
Additionally, exceptions act like sequences in Python 2, but not in Python 3. You can just operate on the .args
sequence directly, in either version. Alas, there’s no automated way to fix this.
Backticks
Did you know that `x`
is equivalent to repr(x)
in Python 2? Yeah, most people don’t. It’s super weird. futurize --stage1
will fix this with the lib2to3.fixes.fix_repr
fixer.
has_key
Very old code may still be using somedict.has_key("foo")
. "foo" in somedict
has worked since Python 2.2. What are you doing. futurize --stage1
will fix this with the lib2to3.fixes.fix_has_key
fixer.
<>
<>
is equivalent to !=
in Python 2! This is an ancient, ancient holdover, and there’s no reason to still be using it. futurize --stage1
will fix this with the lib2to3.fixes.fix_ne
fixer.
(You could also use from __future__ import barry_as_FLUFL
, which restores <>
in Python 3. It’s an easter egg. I’m joking. Please don’t actually do this.)
Things with easy Python 2 equivalents
These aren’t necessarily ancient, but they have an alternative you can just as well express in Python 2, so there’s no need to juggle 2 and 3.
Other ancient builtins
apply()
is gone. Use the built-in syntax, f(*args, **kwargs)
.
callable()
was briefly gone, but then came back in Python 3.2.
coerce()
is gone; it was only used for old-style classes.
execfile()
is gone. Read the file and pass its contents to exec()
instead.
file()
is gone; Python 3 has multiple file types, and a hierarchy of interfaces defined in the io
module. Occasionally, code uses this as a synonym for open()
, but you should really be using open()
anyway.
intern()
has been moved into the sys
module, though I have no earthly idea why you’d be using it.
raw_input()
has been renamed to input()
, and the old ludicrous input()
is gone. If you really need input()
, please stop.
reduce()
has been moved into the functools
module, but it’s there in Python 2.6 as well.
reload()
has been moved into the imp
module. It’s unreliable garbage and you shouldn’t be using it anyway.
futurize --stage1
can fix several of these:
apply
, vialib2to3.fixes.fix_apply
intern
, vialib2to3.fixes.fix_intern
reduce
, vialib2to3.fixes.fix_reduce
futurize --stage2
can also fix execfile
via the libfuturize.fixes.fix_execfile
fixer, which imports past.builtins.execfile
. The 2to3 fixer uses an open()
call, but the true correct fix is to use a with
block.
futurize --stage2
has a couple of fixers for raw_input
, but you can just as well import future.builtins.input
or six.moves.input
.
Nothing can fix coerce
, which has no equivalent. Curiously, I don’t see a fixer for file
, which is trivially fixed by replacing it with open
. Nothing for reload
, either.
Catching exceptions
Historically, the way to say “if there’s a ValueError
, store it in e
and run some code” was:
1try:
2 ...
3except ValueError, e:
4 ...
Unfortunately, that’s very easy to confuse with the syntax for catching two different types of exception:
1except (ValueError, TypeError):
2 ...
If you forget the parentheses, you’ll only catch ValueError
, and the exception will be assigned to a variable called, er, TypeError
. Whoops!
Python 3.0 introduced clearer syntax, which was also backported to Python 2.6:
1except ValueError as e:
2 ...
Python 3.0 finally removed the old syntax, so you must use the as
form. futurize --stage1
will fix this with the lib2to3.fixes.fix_except
fixer.
As an additional wrinkle, the extra variable e
is deleted at the end of the block in Python 3, but not in Python 2. If you really need to refer to it after the block, just assign it to a different name.
(The reason for this is that captured exceptions contain a traceback in Python 3, and tracebacks contain the locals for the current frame, and those locals will contain the captured exception. The resulting cycle would keep all local variables alive until the cycle detector dealt with it, at least in CPython. Scrapping the exception as soon as it’s been dealt with was a simple way to keep this from accidentally happening all over the place. It usually doesn’t make sense to refer to a captured exception after the except
block, anyway, since the variable may or may not even exist, and that’s generally weird and bad in Python.)
Octals
It’s not uncommon for a new programmer to try to zero-pad a set of numbers:
1a = 07
2b = 08
3c = 09
4d = 10
Of course, this will have the rather bizarre result that 08
is a SyntaxError
, even though 07
works fine — because numbers starting with a 0
are parsed as octal.
This is a holdover from C, and it’s fairly surprising, since there’s virtually no reason to ever use octal. The only time I can ever remember using it is for passing file modes to chmod
.
Python 3.0 requires octal literals to be prefixed with 0o
, in line with 0x
for hex and 0b
for binary; literal integers starting with only a 0
are a syntax error. Python 2.6 supports both forms.
futurize --stage1
will fix this with the lib2to3.fixes.fix_numliterals
fixer.
pickle
If you’re using the pickle
module (which you shouldn’t be), and you intend to pass pickles back and forth between Python 2 and Python 3, there’s a small issue to be aware of. pickle
has several different “protocol” versions, and the default version used in Python 3 is protocol 3, which Python 2 cannot read.
The fix is simple: just find where you’re calling pickle.dump()
or pickle.dumps()
, and pass a protocol
argument of 2. Protocol 2 is the highest version supported by Python 2, and you probably want to be using it anyway, since it’s much more compact and faster to read/write than Python 2’s default, protocol 0.
You may be already using HIGHEST_PROTOCOL
, but you’ll have the same problem: the highest protocol supported in any version of Python 3 is unreadable by Python 2.
A somewhat bigger problem is that if you pickle an instance of a user-defined class on Python 2, the pickle will record all its attributes as bytestrings, because that’s what they are in Python 2. Python 3 will then dutifully load the pickle and populate your object’s __dict__
with keys like b'foo'
. obj.foo
will then not actually exist, because obj.foo
looks for the string 'foo'
, and 'foo' != b'foo'
in Python 3.
Don’t use pickle, kids.
It’s possible to fix this, but also a huge pain in the ass. If you don’t know how, you definitely shouldn’t be using pickle.
Things that have a __future__
import
Occasionally, the syntax changed in an incompatible way, but the new syntax was still backported and hidden behind a __future__
import — Python’s mechanism for opting into syntax changes. You have to put such an import at the top of the file, optionally after a docstring, like this:
1"""My super important module."""
2from __future__ import with_statement
print
is now a function
Ugh! Parentheses! Why, Guido, why?
The reason is that the print
statement has incredibly goofy syntax, unlike anything else in the language:
1print >>a, b, c,
You might not even recognize the >>
bit, but it lets you print to a file other than sys.stdout
. It’s baked specifically into the print
syntax. Python 3 replaces this with a straightforward built-in function with a couple extra bells and whistles. The above would be written:
1print(b, c, end='', file=a)
It’s slightly more verbose, but it’s also easier to tell what’s going on, and that teeny little comma at the end is now a more obvious keyword argument.
from __future__ import print_function
will forget about the print
statement for the rest of the file, and make the builtin print
function available instead. futurize --stage1
will fix all uses of print
and add the __future__
import, with the libfuturize.fixes.fix_print_with_import
fixer. (There’s also a 2to3
fixer, but it doesn’t add the __future__
import, since it’s unnecessary in Python 3.)
A word of warning: do not just use print
with parentheses without adding the __future__
import. This may appear to work in stock Python 2:
1print("See, what's the problem? This works fine!")
However, that’s parsed as the print
statement followed by an expression in parentheses. It becomes more obvious if you try to print two values:
1print("The answer is:", 3)
2# ("The answer is:", 3)
Now you have a comma inside parentheses, which is a tuple, so the old print
statement prints its repr
.
Division always produces a float
Quick, what’s the answer here?
15 / 2
If you’re a normal human being, you’ll say 2.5 or 2½. Unfortunately, if you’re like Python and have been afflicted by C, you might say the answer is 2, because this is “integer division” — a bizarre and alien concept probably invented because CPUs didn’t have FPUs when C was first invented.
Python 3.0 decided that maybe contorting fundamental arithmetic to match the inadequacies of 1970s hardware is not the best idea, and so it changed division to always produce a float.
Since Python 2.6, from __future__ import division
will alter the division operator to always do true division. If you want to do floor division, there’s a separate //
operator, which has existed for ages; you can use it in Python 2 with or without the __future__
import.
Note that true division always produces a float, even if the result is integral: 6 / 3
is 2.0. On the other hand, floor division uses the same typing rules as C-style division: 5 // 2
is 2, but 5 // 2.0
is 2.0.
futurize --stage2
will “fix” this with the libfuturize.fixes.fix_division
fixer, but unfortunately that just adds the __future__
import. With the --conservative
option, it uses the libfuturize.fixes.fix_division_safe
fixer instead, which imports past.utils.old_div
, a forward-port of Python 2’s division operator.
The trouble here is that the new /
always produces a float, and the new //
always floors, but the old /
sometimes did one and sometimes did the other. futurize
can’t just replace all uses of /
with //
, because 5/2.0
is 2.5 but 5//2.0
is 2.0, and it can’t generally know what types the operands are.
You might be best off fixing this one manually — perhaps using fix_division_safe
to find all the places you do division, then changing them to use the right operator.
Of course, the __div__
magic method is gone in Python 3, replaced explicitly by __floordiv__
(//
) and __truediv__
(/
). Both of those methods already exist in Python 2, and __truediv__
is even called when you use /
in the presence of the future
import, so being compatible is a simple matter of implementing all three and deferring to one of the others from __div__
.
Relative imports
In Python 2, if you’re in the module foo.bar
and say import quux
, Python will look for a foo.quux
before it looks for a top-level quux
. The former behavior is called a relative import, though it might be more clearly called a sibling import. It’s troublesome for several reasons.
-
If you have a sibling called
quux
, and there’s also a top-level or standard library module calledquux
, you can’t import the latter. (There used to be apy.std
module for providing indirect access to the standard library, for this very reason!) -
If you import the top-level
quux
module, and then later add afoo.quux
module, you’ll suddenly be importing a different module. -
When reading the source code, it’s not clear which imports are siblings and which are top-level. In fact, the modules you get depend on the module you’re in, so moving or renaming a file may change its imports in non-obvious ways.
Python 3 eliminates this behavior: import quux
always means the top-level module. It also adds syntax for “explicit relative” or “absolute relative” (yikes) imports: from . import quux
or from .quux import somefunc
explicitly means to look for a sibling named quux
. (You can also use ..quux
to look in the parent package, three dots to look in the grandparent, etc.)
The explicit syntax is supported since Python 2.5. The old sibling behavior can be disabled since Python 2.5 with from __future__ import absolute_import
.
futurize --stage1
has a libfuturize.fixes.fix_absolute_import
fixer, which attempts to detect sibling imports and convert them to explicit relative imports. If it finds any sibling imports, it’ll also add the __future__
line, though honestly you should make an effort to to put that line in all of your Python 2 code.
It’s possible for the futurize
fixer to guess wrong about a sibling import, but in general it works pretty well.
(There is one case I’ve run across where simply replacing import sibling
with from . import sibling
didn’t work. Unfortunately, it was Yelp code that I no longer have access to, and I can’t remember the precise details. It involved having several sibling imports inside a __init__.py
, where the siblings also imported from each other in complex ways. The sibling imports worked, but the explicit relative imports failed, for some really obscure timing reason. It’s even possible this was a 2.6 bug that’s been fixed in 2.7. If you see it, please let me know!)
Things that require some effort
These problems are a little more obscure, but many of them are also more difficult to fix automatically. If you have a massive codebase, these are where the problems start to appear.
The grand module shuffle
A whole bunch of modules were deleted, merged, or removed. A full list is in PEP 3108, but you’ll never have heard of most of them. Here are the ones that might affect you.
-
__builtin__
has been renamed tobuiltins
. Note that this is a module, not the__builtins__
attribute of modules, which is exactly why it was renamed. Incidentally, you should be using thebuiltins
module rather than__builtins__
anyway. Or, wait, no, just don’t use either, please don’t mess with the built-in scope. -
ConfigParser
has been renamed toconfigparser
. -
Queue
has been renamed toqueue
. -
SocketServer
has been renamed tosocketserver
. -
cStringIO
andStringIO
are gone; instead, useStringIO
orBytesIO
from theio
module. Note that these also exist in Python 2, but are pure-Python rather than the C versions in current Python 3. -
cPickle
is gone. Importingpickle
in Python 3 now gives you the C implementation automatically. -
cProfile
is gone. Importingprofile
in Python 3 gives you the C implementation automatically. -
copy_reg
has been renamed tocopyreg
. -
anydbm
,dbhash
,dbm
,dumbdm
,gdbm
, andwhichdb
have all been merged into adbm
package. -
dummy_thread
has become_dummy_thread
. It’s an implementation of the_thread
module that doesn’t actually do any threading. You should be usingdummy_threading
instead, I guess? -
httplib
has becomehttp.client
.BaseHTTPServer
,CGIHTTPServer
, andSimpleHTTPServer
have been merged into a singlehttp.server
module.Cookie
has becomehttp.cookies
.cookielib
has becomehttp.cookiejar
. -
repr
has been renamed toreprlib
. (The module, not the built-in function.) -
thread
has been renamed to_thread
, and you should really be using thethreading
module instead. -
A whole mess of top-level Tk modules have been combined into a
tkinter
package. -
The contents of
urllib
,urllib2
, andurlparse
have been consolidated and then split intourllib.error
,urllib.parse
, andurllib.request
. -
xmlrpclib
has becomexmlrpc.client
.DocXMLRPCServer
andSimpleXMLRPCServer
have been merged intoxmlrpc.server
.
futurize --stage2
will fix this with the somewhat invasive libfuturize.fixes.fix_future_standard_library
fixer, which uses a mechanism from future
that adds aliases to Python 2 to make all the Python 3 standard library names work. It’s an interesting idea, but it didn’t actually work for all cases when I tried it (though now I can’t recall what was broken), so YMMV.
Alternative, you could manually replace any affected imports with imports from six.moves
, which provides aliases that work on either version.
Or as a last resort, you can just sprinkle try ... except ImportError
around.
Built-in iterators are now lazy
filter
, map
, range
, and zip
are all lazy in Python 3. You can still iterate over their return values (once), but if you have code that expects to be able to index them or traverse them more than once, it’ll break in Python 3. (Well, not range
, that’s fine.) The lazy equivalents — xrange
and the functions in itertools
— are of course gone in Python 3.
In either case, the easiest thing to do is force eager evaluation by wrapping the call in list()
or tuple()
, which you’ll occasionally need to do in Python 3 regardless.
For the sake of consistency, you may want to import the lazy versions from the standard library future_builtins
module. It only exists in Python 2, so be sure to wrap the import in a try
.
futurize --stage2
tries to address this with several of lib2to3
‘s fixers, but the results aren’t particularly pleasing: calls to all four are unconditionally wrapped in list()
, even in an obviously safe case like a for
block. I’d just look through your uses of them manually.
A more subtle point: if you pass a string or tuple to Python 2’s filter
, the return value will be the same type. Blindly wrapping the call in list()
will of course change the behavior. Filtering a string is not a particularly common thing to do, but I’ve seen someone complain about it before, so take note.
Also, Python 3’s map
stops at the shortest input sequence, whereas Python 2 extends shorter sequences with None
s. You can fix this with itertools.zip_longest
(which in Python 2 is izip_longest
!), but honestly, I’ve never even seen anyone pass multiple sequences to map
.
Relatedly, dict.iteritems
(plus its friends, iterkeys
and itervalues
) is gone in Python 3, as the plain items
(plus keys
and values
) is already lazy. The dict.view*
methods are also gone, as they were only backports of Python 3’s normal behavior.
Both six
and future.utils
contain functions called iteritems
, etc., which provide a lazy iterator in both Python 2 and 3. They also offer view*
functions, which are closer to the Python 3 behavior, though I can’t say I’ve ever seen anyone actually use dict.viewitems
in real code.
Of course, if you explicitly want a list of dictionary keys (or items or values), list(d)
and list(d.items())
do the same thing in both versions.
buffer
is gone
The buffer
type has been replaced by memoryview
(also in Python 2.7), which is similar but not identical. If you’ve even heard of either of these types, you probably know more about the subtleties involved than I do. There’s a lib2to3.fixes.fix_buffer
fixer that blindly replaces buffer
with memoryview
, but futurize
doesn’t use it in either stage.
Several special methods were renamed
Where Python 2 has __str__
and __unicode__
, Python 3 has __bytes__
and __str__
. The trick is that __str__
should return the native str
type for each version: a bytestring for Python 2, but a Unicode string for Python 3. Also, you almost certainly don’t want a __bytes__
method in Python 3, where bytes
is no longer used for text.
Both six and python-future have a python_2_unicode_compatible
class decorator that tries to do the right thing. You write only a single __str__
method that returns a Unicode string. In Python 3, that’s all you need, so the decorator does nothing; in Python 2, the decorator will rename your method to __unicode__
and add a __str__
that returns the same value encoded as UTF-8. If you need different behavior, you’ll have to roll it yourself with if PY2
.
Python 2’s next
method is more appropriately __next__
in Python 3. The easy way to address this is to call your method __next__
, then alias it with next = __next__
. Be sure you never call it directly as a method, only with the built-in next()
function.
Alternatively, future.builtins
contains an alternative next
which always calls __next__
, but on Python 2, it falls back to trying next
if __next__
doesn’t exist.
futurize --stage1
changes all use of obj.next()
to next(obj)
via the libfuturize.fixes.fix_next_call
fixer. futurize --stage2
renames next
methods to __next__
via the lib2to3.fixes.fix_next
fixer (which also fixes calls). Note that there’s a remote chance of false positives, if for some reason you happened to use next
as a regular method name.
Python 2’s __nonzero__
is Python 3’s __bool__
. Again, you can just alias it manually. Or futurize --stage2
will rename it with the lib2to3.fixes.fix_nonzero
fixer.
Renaming it will of course break it in Python 2, but futurize --stage2
also has a libfuturize.fixes.fix_object
fixer that imports python-future’s own builtins.object
. The replacement object
class has a few methods for making Python 3’s __str__
, __next__
, and __bool__
work on Python 2.
This is one of the mildly invasive things python-future does, and it may or may not sit well. Up to you.
__long__
is completely gone, as there is no long
type in Python 3.
__getslice__
, __setslice__
, and __delslice__
are gone. Instead, slice objects are passed to __getitem__
and friends. On the off chance you use these, you’ll have to do something clever in the item
methods to defer to your slice logic on Python 3.
__oct__
and __hex__
are gone; oct()
and hex()
now consult __index__
. I seriously doubt this will impact anyone.
__div__
is gone, as mentioned previously.
Unbound methods are gone; function attributes renamed
Say you have this useless class.
1class Foo(object):
2 def bar(self):
3 pass
In Python 2, Foo.bar
is an “unbound method”, a type that’s generally unseen and unexposed other than as types.MethodType
. In Python 3, Foo.bar
is just a regular function.
Offhand, I can only think of one time this would matter: if you want to get at attributes on the function, perhaps for the sake of a method decorator. In Python 2, you have to go through the unbound method’s .im_func
attribute to get the original function, but in Python 3, you already have the original function and can get the attributes directly.
If you’re doing this anywhere, an easy way to make it work in both versions is:
1method = Foo.bar
2method = getattr(method, 'im_func', method)
As for bound methods (the objects you get from accessing methods but not calling them, like [].append
), the im_self
and im_func
attributes have been renamed to __self__
and __func__
. Happily, these names also work in Python 2.6, so no compatibility hacks are necessary.
im_class
is completely gone in Python 3. Methods have no interest in which class they’re attached to. They can’t, since the same function could easily be attached to more than one class. If you’re relying on im_class
somehow, for some reason… well, don’t do that, maybe.
Relatedly, the func_*
function attributes have been renamed to dunder names in Python 3, since assigning function attributes is a fairly common practice and Python doesn’t like to clog namespaces with its own builtin names. func_closure
, func_code
, func_defaults
, func_dict
, func_doc
, func_globals
, and func_name
are now __closure__
, __code__
, etc. (Note that func_doc
and func_name
were already aliases for __doc__
and __name__
, and func_defaults
is much more easily inspected with the inspect
module.) The new names are not available in Python 2, so you’ll need to do a getattr
dance, or use the get_function_*
functions from six
.
Metaclass syntax has changed
In Python 2, a metaclass is declared by assigning to a special name in the class body:
1class Foo(object):
2 __metaclass__ = FooMeta
3 ...
Admittedly, this doesn’t make a lot of sense. The metaclass affects how a class is created, and the class body is evaluated as part of that creation, so this is sort of a goofy hack.
Python 3 changed this, opening the door to a few new neat tricks in the process, which you can find out about in the companion article.
1class Foo(object, metaclass=FooMeta):
2 ...
The catch is finding a way to express this idea in both Python 2 and Python 3 — the old syntax is ignored in Python 3, and the new syntax is a syntax error in Python 2.
It’s a bit of a pain, but the class
statement is really just a lot of sugar for calling the type()
constructor; after all, Python classes are just instances of type
. All you have to do is manually create an instance of your metaclass, rather than of type
.
Fortunately, other people have already made this work for you. futurize --stage2
will fix this using the libfuturize.fixes.fix_metaclass
fixer, which imports future.utils.with_metaclass
and produces the following:
1from future.utils import with_metaclass
2
3class Foo(with_metaclass(object)):
4 ...
This creates an intermediate dummy class with the right metaclass, which you then inherit from. Classes use the same metaclass as their parents, so this works fine in any Python.
If you don’t want to depend on python-future, the same function exists in the six
module.
Re-raising exceptions has different syntax
raise
with no arguments does the same thing in Python 2 and Python 3: it re-raises the exception currently being handled, preserving the original traceback.
The problem comes in with the three-argument form of raise
, which is for preserving the traceback while raising a different exception. It might look like this:
1try:
2 some_fragile_function()
3except Exception as e:
4 raise MyLibraryError, MyLibraryError("Failed to do a thing: " + str(e)), sys.exc_info()[2]
sys.exc_info()[2]
is, of course, the only way to get the current traceback in Python 2. You may have noticed that the three arguments to raise
are the same three things that sys.exc_info()
returns: the type, the value, and the traceback.
Python 3 introduces exception chaining. If something raises an exception from within an except
block, Python will remember the original exception, attach it to the new one, and show both exceptions when printing a traceback — including both exceptions’ types, messages, and where they happened. So to wrap and rethrow an exception, you don’t need to do anything special at all.
1try:
2 some_fragile_function()
3except Exception:
4 raise MyLibraryError("Failed to do a thing")
For more complicated handling, you can also explicitly say raise new_exception from old_exception
. Exceptions contain their associated tracebacks as a __traceback__
attribute in Python 3, so there’s no need to muck around getting the traceback manually. If you really want to give an explicit traceback, you can use the .with_traceback()
method, which just assigns to __traceback__
and then returns self
.
1raise MyLibraryError("Failed to do a thing").with_traceback(some_traceback)
It’s hard to say what it even means to write code that works “equivalently” in both versions, because Python 3 handles this problem largely automatically, and Python 2 code tends to have a variety of ad-hoc solutions. Note that you cannot simply do this:
1if PY3:
2 raise MyLibraryError("Beep boop") from exc
3else:
4 raise MyLibraryError, MyLibraryError("Beep boop"), sys.exc_info()[2]
The first raise
is a syntax error in Python 2, and the second is a syntax error in Python 3. if
won’t protect you from parse errors. (On the other hand, you can hide .with_traceback()
behind an if
, since that’s just a regular method call and will parse with no issues.)
six
has a reraise
function that will smooth out the differences for you (probably by using exec
). The drawback is that it’s of course Python 2-oriented syntax, and on Python 3 the final traceback will include more context than expected.
Alternatively, there’s a six.raise_from
, which is designed around the raise X from Y
syntax of Python 3. The drawback is that Python 2 has no obvious equivalent, so you just get raise X
, losing the old exception and its traceback.
There’s no clear right approach here; it depends on how you’re handling re-raising. Code that just blindly raises new exceptions doesn’t need any changes, and will get exception chaining for free on Python 3. Code that does more elaborate things, like implementing its own form of chaining or storing exc_info
tuples to be re-raised later, may need a little more care.
Bytestrings are sequences of integers
In Python 2, bytes
is a synonym for str
, the default string type. Iterating or indexing a bytes
/str
produces 1-character str
s.
1list(b'hello') # ['h', 'e', 'l', 'l', 'o']
2b'hello'[0:4] # 'hell'
3b'hello'[0] # 'h'
4b'hello'[0][0][0][0][0] # 'h' -- it's turtles all the way down
In Python 3, bytes
is a specialized type for handling binary data, not text. As such, iterating or indexing a bytes
produces integers.
1list(b'hello') # [104, 101, 108, 108, 111]
2b'hello'[0:4] # b'hell'
3b'hello'[0] # 104
4b'hello'[0][0][0][0] # TypeError, since you can't index 104
If you have explicitly binary data that want to be bytes
in Python 3, this may pose a bit of a problem. Aside from just checking the version explicitly and making heavy use of chr
/ord
, there are two approaches.
One is to use bytearray
instead. This is like bytes
, but mutable. More importantly, since it was introduced as a new type in Python 2.6 — after Python 3.0 came out — it has the same iterating and indexing behavior as Python 3’s bytes
, even in Python 2.
1bytearray(b'hello')[0] # 104, on either Python 2 or 3
The other is to slice rather than index, since slicing always produces a new iterable of the same type. If you want to extract a single character from a bytes
, just take a one-element slice.
1b'hello'[0] # 104
2b'hello'[0:1] # b'h'
Things that are just a royal pain in the ass
Unicode
Saving the best for last, almost!
Honestly, if your Python 2 code is already careful with Unicode — working with unicode
internally, and encoding/decoding only at the “boundaries” of your code — then you shouldn’t have too many problems. If your code is not so careful, you should really try to make it a little more careful before you worry about Python 3, since Python 3’s whole jam is to force you to be careful.
See, in Python 2, you can combine bytestrings (str
) and text strings (unicode
) more or less freely. Python will automatically try to convert between the two using the “default encoding”, which is generally ascii
. Python 3 makes text strings the default string type, demotes bytestrings, and forbids ever converting between them.
Most obviously, Python 2’s str
and unicode
have been renamed to bytes
and str
in Python 3. If you happen to be using the names anywhere, you’ll probably need to change them! six
offers text_type
and binary_type
, though you can just use bytes
to mean the same thing in either version. python-future also has backports for both Python 3’s bytes
and str
types, which seems like an extreme approach to me. Changing str
to mean a text type even in Python 2 might be a good idea, though.
b''
and u''
work the same way in either Python 2 or 3, but unadorned strings like ''
are always the str
type, which has different behavior. There is a from __future__ import unicode_literals
, which will cause unadorned strings to be unicode
in Python 2, and this might work for you. However, this prevents you from writing literal “native” strings — strings of the same type Python uses for names, keyword arguments, etc. Usually this won’t matter, since Python 2 will silently convert between bytes and text, but it’s caused me the occasional problem.
The right thing to do is just explicitly mark every single string with either a b
or u
sigil as necessary. That just, you know, sucks. But you should be doing it even if you’re not porting to Python 3.
basestring
is completely gone in Python 3. str
and bytes
have no common base type, and their semantics are different enough that it rarely makes sense to treat them the same way. If you’re using basestring
in Python 2, it’s probably to allow code to work on either form of “text”, and you’ll only want to use str
in Python 3 (where bytes
are completely unsuitable for text). six.string_types
provides exactly this. futurize --stage2
also runs the lib2to3.fixes.fix_basestring
fixer, but this replaces basestring
with str
, which will almost certainly break your code in Python. If you intend to use stage 2, definitely audit your uses of basestring
first.
As mentioned above, bytestrings are sequences of integers, which may affect code trying to work with explicitly binary data.
Python 2 has both .decode()
and .encode()
on both bytes and text; if you try to encode bytes or decode text, Python will try to implicitly convert to the right type first. In Python 3, only text has an .encode()
and only bytes have a .decode()
.
Relatedly, Python 2 allows you to do some cute tricks with “encodings” that aren’t really encodings; for example, "hi".encode('hex')
produces '6869'
. In Python 3, encoding must produce bytes, and decoding must produce text, so these sorts of text-to-text or bytes-to-bytes translations aren’t allowed. You can still do them explicitly with the codecs
module, e.g. codecs.encode(b'hi', 'hex')
, which also works in Python 2, despite being undocumented. (Note that Python 3 specifically requires bytes for the hex codec, alas. If it’s any consolation, there’s a bytes.hex()
method to do this directly, which you can’t use anyway if you’re targeting Python 2.)
Python 3’s open
decodes as UTF-8 by default (a vast oversimplification, but usually), so if you’re manually decoding after reading, you’ll get an error in Python 3. You could explicitly open the file in binary mode (preserving the Python 2 behavior), or you could use codecs.open
to decode transparently on read (preserving the Python 3 behavior). The same goes for writing.
sys.stdin
, sys.stdout
, and sys.stderr
are all text streams in Python 3, so they have the same caveats as above, with the additional wrinkle that you didn’t actually open them yourself. Their .buffer
attribute gives a handle opened in binary mode (Python 2 behavior), or you can adapt them to transcode transparently (Python 3 behavior):
1if six.PY2:
2 sys.stdin = codecs.getreader('utf-8')(sys.stdin)
3 sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
4 sys.stderr = codecs.getwriter('utf-8')(sys.stderr)
A text-mode file’s .tell()
in Python 3 still returns a number that can be passed back to .seek()
, but the number is not necessarily meaningful, and in particular can’t be used to estimate progress through a file. (Python uses a few very high bits as flags to indicate the state of the decoder; if you mask them off, what’s left is probably the byte position in the file as you’d expect, but this is pretty definitively a hack.)
Python 3 likes to treat filenames as text, but most of the functions in os
and os.path
will accept either text or bytes as their arguments (and return a value of the same type), so you should be okay there.
os.environ
has text keys and values in Python 3. If you direly need bytes, you can use os.environb
(and os.getenvb()
).
I think that covers most of the obvious basics. This is a whole sprawling topic that I can’t hope to cover off the top of my head. I’ve seen it be both fairly painful and completely pain-free, depending entirely on the state of the Python 2 codebase.
Oh, one final note: there’s a module for Python 2 called unicode-nazi (sorry, I didn’t name it) that will produce a warning anytime a bytestring is implicitly converted to a text string, or vice versa. It might help you root out places you’re accidentally slopping types back and forth, which will certainly break in Python 3. I’ve only tried it on a comically large project where it found thousands of violations, including plenty in surprising places in the standard library, so it may or may not be of any practical help.
Things that are not actually gone
String formatting with %
There’s a widespread belief that str % ...
is deprecated, since there’s a newer and shinier str.format()
method.
Well, it’s not. It’s not gone; it’s not deprecated; it still works just fine. I don’t like to use it, myself, since it’s easy to make accidentally ambiguous — "%s" % foo
can crash if foo
is a tuple! — but it’s not going anywhere. In fact, as of Python 3.5, bytes
and bytearray
support %
but not .format
.
optparse
argparse
is certainly better, but the optparse
module still exists in Python 3. It has been deprecated since Python 3.2, though.
Things that are preposterously obscure but that I have seen cause problems nonetheless
Tuple unpacking
A little-used feature of Python 2 is tuple unpacking in function arguments:
1def foo(a, (b, c)):
2 print a, b, c
3
4x = (2, 3)
5foo(1, x)
This syntax is gone in Python 3. I’ve rarely seen anyone use it, except in two cases. One was a parsing library that relied pretty critically on using it in every parsing function you wrote; whoops.
The other is when sorting a dict’s items:
1sorted(d.items(), key=lambda (k, v): k + v)
In Python 3, you have to write that as lambda kv: kv[0] + kv[1]
. Boo.
long
is gone
Python 3 merged its long
type with int
, so now there’s only one integral type, called int
.
Python 2 promotes int
to long
pretty much transparently, and long
s aren’t very common in the first place, so it’s fairly unlikely that this will make a difference. On the off chance you’re type-checking for integers with isinstance(x, (int, long))
(and really, why are you doing that), you can just use six.integer_types
instead.
Note that futurize --stage2
applies the lib2to3.fixes.fix_long
fixer, which blindly renames long
to int
, leaving you with inappropriate code like isinstance(x, (int, int))
.
However…
I have seen some very obscure cases where a hand-rolled binary protocol would encode int
s and long
s differently. My advice would be to not do that.
Oh, and a little-known feature of Python 2’s syntax is that you can have long
literals by suffixing them with an L
:
1123 # int
2123L # long
You can write 1267650600228229401496703205376
directly in Python 2 code, and it’ll automatically create a long
, so the only reason to do this is if you explicitly need a long
with a small value like 1. If that’s the case, something has gone catastrophically wrong.
repr
changes
These should really only affect you if you’re using repr
s as expected test output (or, god forbid, as cache keys or something). Some notable changes:
-
Unicode strings have a
u
prefix in Python 2. In Python 3, of course, Unicode strings are just strings, so there’s no prefix. -
Conversely, bytestrings have a
b
prefix in Python 3, but not in Python 2 (though theb
prefix is allowed in source code). -
Python 2 escapes all non-ASCII characters, even in the
repr
of a Unicode string. Python 3 only escapes control characters and codepoints considered non-printing. -
Large integers and explicit
long
s have anL
suffix in Python 2, but not in Python 3, where there is no separatelong
type. -
A
set
becomesset([1, 2, 3])
in Python 2, but{1, 2, 3}
in Python 3. The set literal syntax is allowed in source code in Python 2.7, but therepr
wasn’t changed until 3.0. -
float
s stringify to the shortest possible representation that has the same underlying value — e.g.,str(1.1)
is'1.1'
rather than'1.1000000000000001'
. This change was backported to Python 2.7 as well, but I have seen it break tests.
Hash randomization
Python has traditionally had a predictable hashing mechanism: repr(dict(a=1, b=2, c=3))
will always produce the same string. (On the same platform with the same Python version, at least.) Unfortunately this opens the door to an obscure DoS exploit that was known to Perl long ago: if you know a web application is written in Python, you can construct a query string that will become a dict whose keys all go in the same hash bucket. If your query string is long enough and you send enough requests, you can tie up all the Python processes in dealing with hash collisions.
The fix is hash randomization, which seeds the hashing algorithm in such a way that items are bucketed differently every time Python runs. It’s available in Python 2.7 via an environment variable or the -R
argument, but it wasn’t turned on by default until Python 3.3.
The fear was that it might break things. Naturally, it has broken things. Mostly, repr
s in tests. But it also changes the iteration order of dicts between Python runs. I have seen code using dicts whose keys happened to always be sorted in alphabetical or insertion order before, but with hash randomization, the keys were of course in a different order every time the code ran. The author assumed that Python had somehow broken dict sorting (which it has never had).
nonlocal
Python 3 introduces the nonlocal
keyword, which is like global
except it looks through all outer scopes in the expected order. It fixes this mild annoyance:
1def make_function():
2 counter = 0
3 def function():
4 nonlocal counter
5 counter += 1 # without 'nonlocal', this declares a new local!
6 print("I've been called", counter, "times!")
7 return function
The problem is that any use of assignment within a function automatically creates a new local, and locals are known statically for the entire body of the function. (They actually affect how functions are compiled, in CPython.) So without nonlocal
, the above code would see counter += 1
, but counter
is a new local that has never been assigned a value, so Python cannot possibly add 1 to it, and you get an UnboundLocalError
.
nonlocal
tells Python that when it sees an assignment of a name that exists in some outer scope, it should reuse that outer variable rather than shadowing it. Great, right? Purely a new feature. No problem.
Unfortunately, I’ve worked on a codebase that needed this feature in Python 2, and decided to fake it with a class… named nonlocal
.
1def make_function():
2 class nonlocal:
3 counter = 0
4 def function():
5 nonlocal.counter += 1 # this alters an outer value in-place, so it's fine
6 print("I've been called", counter, "times!")
7 return function
The class here is used purely as a dummy container. Assigning to an attribute doesn’t create any locals, because it’s equivalent to a method call, so the operand must already exist. This is a slightly quirky approach, but it works fine.
Except that, of course, nonlocal
is a keyword in Python 3, so this becomes complete gibberish. It’s such gibberish that (if I remember correctly) 2to3
actually cannot parse it, even though it’s perfectly valid Python 2 code.
I don’t have a magical fix for this one. Just, uh, don’t name things nonlocal
.
List comprehensions no longer leak
Python 2 has the slightly inconsistent behavior that loop variables in a generator expression ((...)
) are scoped to the generator expression, but loop variables in a list comprehension ([...]
) belong to the enclosing scope.
The only reason is in implementation details: a list comprehension acts like a for
loop, which has the same behavior, whereas a generator expression actually creates a generator internally.
Python 3 brings these cases into line: loop variables in list comprehensions (or dict or set comprehensions) are also scoped to the comprehension itself.
I cannot imagine any possible reason why this would affect you negatively, and yet, I can swear I’ve seen it happen. I wish I could remember where, because I’m sure it’s an exciting story.
cStringIO.h
is gone
cStringIO.h
is a private and undocumented C interface to Python 2’s cStringIO.StringIO
type. It was removed in Python 3, or at least is somewhere I can’t find it.
This was one of the reasons Thrift’s Python 3 port took almost 3 years: Thrift has a “fast” C module that makes use of this private interface, and it’s not obvious how to replace it. I think they ended up just having the module not exist on Python 3, so Python 3 will just be mysteriously slower.
Some troublesome libraries
MySQLdb is some ancient, clunky, noncompliant, underdocumented trash, much like the database it connects to. It’s nigh abandoned, though it still promises Python 3 support in the MySQLdb 2.0 vaporware. I would suggest not using MySQL, but barring that, try mysqlclient, a fork of MySQLdb that continues development and adds Python 3 support. (The same people also maintain an earlier project, pymysql, which strives to be a pure-Python drop-in replacement for MySQLdb — it’s not quite perfect, but its existence is interesting and it’s sure easier to read than MySQLdb.)
At a glance, Thrift still hasn’t had a release since it merged Python 3 support, eight months ago. It’s some enterprise nightmare, anyway, and bizarrely does code generation for a bunch of dynamic languages. Might I suggest just using the pure-Python thriftpy, which parses Thrift definitions on the fly?
Twisted is, ah, large and complex. Parts of it now support Python 3; parts of it do not. If you need the parts that don’t, well, maybe you could give them a hand?
M2Crypto is working on it, though I’m pretty sure most Python crypto nerds would advise you to use cryptography
instead.
And so on
You may find any number of other obscure compatibility problems, just as you might when upgrading from 2.6 to 2.7. The Python community has a lot of clever people willing to help you out, though, and they’ve probably even seen your super duper niche problem before.
Don’t let that, or this list of gotchas in general, dissaude you! Better to start now than later; even fixing an integer division gets you one step closer to having your code run on Python 3 as well.