Part of my Python FAQ.
What does is
do? Should I use is
or ==
?
These operators tend to confuse Python newcomers, perhaps because is
doesn’t have a clear equivalent in very many other languages. Some particular quirks of Python’s canon implementation make it difficult to figure out by experimentation, as well.
The simple answer is:
==
tests whether two objects have the same value.is
tests whether two objects are the same object.
What does “the same value” mean? It really depends on the objects’ types; usually it means that both objects will respond the same way to the same operations, but ultimately the author of a class gets to decide.
On the other hand, no matter how similar two objects may look or act, is
can always tell them apart. Did you call SomeClass()
twice? Then you have two objects, and a is b
will be False
.
Overloading
There’s another critical, but subtle, difference: ==
can be overloaded, but is
cannot. Both the __eq__
and __cmp__
special methods allow a class to decide for itself what equality means.
Because those methods are regular Python code, they can do anything. An object might not be equal to itself. It might be equal to everything. It might randomly decide whether to be equal or not. It might return True
for both ==
and !=
.
Hopefully no real code would do such things, but the point is that it can happen. ==
on an arbitrary object may be unreliable; is
never will be. More on why you might care about this below.
When Python sees a == b
, it tries the following.
- If
type(b)
is a new-style class, andtype(b)
is a subclass oftype(a)
, andtype(b)
has overridden__eq__
, then the result isb.__eq__(a)
. - If
type(a)
has overridden__eq__
(that is,type(a).__eq__
isn’tobject.__eq__
), then the result isa.__eq__(b)
. - If
type(b)
has overridden__eq__
, then the result isb.__eq__(a)
. - If none of the above are the case, Python repeats the process looking for
__cmp__
. If it exists, the objects are equal iff it returns zero. - As a final fallback, Python calls
object.__eq__(a, b)
, which isTrue
iffa
andb
are the same object.
If any of the special methods return NotImplemented
, Python acts as though the method didn’t exist.
Note that last step carefully: if neither a
nor b
overloads ==
, then a == b
is the same as a is b
.
When to use which
There are actually very few cases where you want to use is
. The most common by far is for setting default arguments:
1 2 3 4 5 |
|
Why use is
here? It does read more like English, and None
is guaranteed to be a singleton object. A better reason is slightly more insidious: operator overloading! If arg
happened to overload equality, it might claim to be equal to None
. That would be some egregious misbehavior, sure, but no reason not to be correct when you can.
Sometimes None
might already have a special meaning to your function—perhaps to mean null
in JSON or SQL. If you wrote such a function the way I did above, nobody could pass None
to it; it would get replaced by your default. How can you make an argument optional if None
is a real value? is
can help here, too.
1 2 3 4 5 6 |
|
Here unspecified
is just a dummy object containing no data and having no behavior. The only useful property it has is that, if arg is unspecified
, then you know arg
must be that exact same object. It has no meaning, so it’s a perfectly safe default; it won’t prevent the caller from passing in some particular object you wanted to use as a sentinel.
==
would work the same way, of course, but it has the same caveat as arg == None
: bad overloading. Using is
also better expresses your intention
, which is that you want to test for this particular object and no other.
In general, you want ==
most of the time. is
is only useful when you are absolutely sure you want to check that you have the same object with two different names.
is
and builtins
A common pitfall is to pull out the Python REPL and try something like the following:
1 2 3 4 5 6 7 8 9 10 |
|
Hang on, what’s going on here? Those are separate numbers and separate strings, and even separate calls to int()
; why are they claiming to be the same object?
There are a lot of strings in any given Python program containing, say, __init__
. (One for every constructor, in fact!) There are also a lot of small numbers, like 0
and -1
. Strictly speaking, every time one of these appears, Python would need to create a new object, and that eats a lot of memory. Finding a method on a class would require comparing strings byte-by-byte, and that eats a lot of time.
So CPython (the canonical Python interpreter, written in C) has a behind-the-scenes optimization called interning. Small integers and some strings are cached: the integer 2
will always refer to the same object, no matter how it comes into existence.
Interning is not strictly part of the language, and other Python implementations may or may not do it. The language allows for any immutable object to be interned, but otherwise says nothing. For this reason, absolutely do not use is
on the built-in immutable types. The results are basically meaningless because of interning!
One last wrinkle. When CPython compiles a chunk of code (a “compilation unit”), it has to create objects to represent literals it sees. (Literals are objects that have native Python syntax: numbers, strings, lists that use []
, etc.) In the case of numbers and strings, literals with the same value become the same object, whether interned or not.
With that in mind, the REPL’s treatment of is
should make more sense:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
(By the way, if you really must know: CPython interns all int
s between -5
and 256
, inclusive.)
Conclusion
- Most of the time, you want
==
. - Use
arg is None
when you have a function with an argument defaulting toNone
. That’s okay, because there’s only oneNone
. - For testing whether two classes, functions, or modules are the same object,
is
is okay. Stylistic choice. - Never use
is
withstr
,int
,float
,complex
, or any other core immutable value type! Interning makes the response worthless! - Other valid uses of
is
are fairly rare and obscure, for example:- If I have a large tree structure and want to find the location of a subtree,
==
will recursively compare values (potentially very slow) butis
will tell me if I’ve found the exact same node. - A caching mechanism may want to treat all objects as distinct, without having to care about or rely on how they implement
==
.is
can be appropriate here. - Demonstrating to newbies that interning exists is only possible with
is
:)
- If I have a large tree structure and want to find the location of a subtree,
To summarize even further: don’t use is
unless you’re comparing with None
or you really, really mean it. And you don’t.
Further reading
- The Python Language Reference has a data model section which documents the possibility of caching immutable values, how
__eq__
works, and how operator overloading works in general. - The Python C API is the only documentation of what ints are interned.