Part of my Python FAQ.
What does is
do? Should I use is
or ==
?
These operators tend to confuse Python newcomers, perhaps because is
doesn’t have a clear equivalent in very many other languages. Some particular quirks of Python’s canon implementation make it difficult to figure out by experimentation, as well.
The simple answer is:
==
tests whether two objects have the same value.is
tests whether two objects are the same object.
What does “the same value” mean? It really depends on the objects’ types; usually it means that both objects will respond the same way to the same operations, but ultimately the author of a class gets to decide.
On the other hand, no matter how similar two objects may look or act, is
can always tell them apart. Did you call SomeClass()
twice? Then you have two objects, and a is b
will be False
.
Overloading
There’s another critical, but subtle, difference: ==
can be overloaded, but is
cannot. Both the __eq__
and __cmp__
special methods allow a class to decide for itself what equality means.
Because those methods are regular Python code, they can do anything. An object might not be equal to itself. It might be equal to everything. It might randomly decide whether to be equal or not. It might return True
for both ==
and !=
.
Hopefully no real code would do such things, but the point is that it can happen. ==
on an arbitrary object may be unreliable; is
never will be. More on why you might care about this below.
When Python sees a == b
, it tries the following.
- If
type(b)
is a new-style class, andtype(b)
is a subclass oftype(a)
, andtype(b)
has overridden__eq__
, then the result isb.__eq__(a)
. - If
type(a)
has overridden__eq__
(that is,type(a).__eq__
isn’tobject.__eq__
), then the result isa.__eq__(b)
. - If
type(b)
has overridden__eq__
, then the result isb.__eq__(a)
. - If none of the above are the case, Python repeats the process looking for
__cmp__
. If it exists, the objects are equal iff it returns zero. - As a final fallback, Python calls
object.__eq__(a, b)
, which isTrue
iffa
andb
are the same object.
If any of the special methods return NotImplemented
, Python acts as though the method didn’t exist.
Note that last step carefully: if neither a
nor b
overloads ==
, then a == b
is the same as a is b
.
When to use which
There are actually very few cases where you want to use is
. The most common by far is for setting default arguments:
1def foo(arg=None):
2 if arg is None:
3 arg = []
4
5 # ...
Why use is
here? It does read more like English, and None
is guaranteed to be a singleton object. A better reason is slightly more insidious: operator overloading! If arg
happened to overload equality, it might claim to be equal to None
. That would be some egregious misbehavior, sure, but no reason not to be correct when you can.
Sometimes None
might already have a special meaning to your function—perhaps to mean null
in JSON or SQL. If you wrote such a function the way I did above, nobody could pass None
to it; it would get replaced by your default. How can you make an argument optional if None
is a real value? is
can help here, too.
1unspecified = object()
2def foo2(arg=unspecified):
3 if arg is unspecified:
4 arg = make_default_object()
5
6 # ...
Here unspecified
is just a dummy object containing no data and having no behavior. The only useful property it has is that, if arg is unspecified
, then you know arg
must be that exact same object. It has no meaning, so it’s a perfectly safe default; it won’t prevent the caller from passing in some particular object you wanted to use as a sentinel.
==
would work the same way, of course, but it has the same caveat as arg == None
: bad overloading. Using is
also better expresses your intention
, which is that you want to test for this particular object and no other.
In general, you want ==
most of the time. is
is only useful when you are absolutely sure you want to check that you have the same object with two different names.
is
and builtins
A common pitfall is to pull out the Python REPL and try something like the following:
1>>> 2 == 2
2True
3>>> 2 is 2
4True
5>>> "x" == "x"
6True
7>>> "x" is "x"
8True
9>>> int("133") is int("133")
10True
Hang on, what’s going on here? Those are separate numbers and separate strings, and even separate calls to int()
; why are they claiming to be the same object?
There are a lot of strings in any given Python program containing, say, __init__
. (One for every constructor, in fact!) There are also a lot of small numbers, like 0
and -1
. Strictly speaking, every time one of these appears, Python would need to create a new object, and that eats a lot of memory. Finding a method on a class would require comparing strings byte-by-byte, and that eats a lot of time.
So CPython (the canonical Python interpreter, written in C) has a behind-the-scenes optimization called interning. Small integers and some strings are cached: the integer 2
will always refer to the same object, no matter how it comes into existence.
Interning is not strictly part of the language, and other Python implementations may or may not do it. The language allows for any immutable object to be interned, but otherwise says nothing. For this reason, absolutely do not use is
on the built-in immutable types. The results are basically meaningless because of interning!
One last wrinkle. When CPython compiles a chunk of code (a “compilation unit”), it has to create objects to represent literals it sees. (Literals are objects that have native Python syntax: numbers, strings, lists that use []
, etc.) In the case of numbers and strings, literals with the same value become the same object, whether interned or not.
With that in mind, the REPL’s treatment of is
should make more sense:
1# Interned ints
2>>> 100 is 100
3True
4# Non-interned ints, but compiled together, so still the same object
5>>> 99999 is 99999
6True
7# Non-interned ints, compiled /separately/, so different objects
8>>> a = 99999
9>>> b = 99999
10>>> a is b
11False
12# Interned ints are the same object no matter where they appear
13>>> a = 3
14>>> b = 3
15>>> c = 6 / 2
16>>> a is b
17True
18>>> a is c
19True
20# Floats are never interned, but these are compiled together, so are still the
21# same object
22>>> 1.5 is 1.5
23True
24# Strings are similar to ints
25>>> "foo" is "foo"
26True
27>>> a = "foo"
28>>> b = "foo"
29>>> a is b
30True
31>>> "the rain in spain falls mainly on the plain" is "the rain in spain falls mainly on the plain"
32True
33>>> a = "the rain in spain falls mainly on the plain"
34>>> b = "the rain in spain falls mainly on the plain"
35>>> a is b
36False
37# Two different lists; they're mutable so they can't be the same object
38>>> [] is []
39False
40# Two different dicts; same story
41>>> {} is {}
42False
43# Tuples are immutable, but their contents can be mutable, so they don't get
44# the optimization either
45>>> (1, 2, 3) is (1, 2, 3)
46False
(By the way, if you really must know: CPython interns all int
s between -5
and 256
, inclusive.)
Conclusion
- Most of the time, you want
==
. - Use
arg is None
when you have a function with an argument defaulting toNone
. That’s okay, because there’s only oneNone
. - For testing whether two classes, functions, or modules are the same object,
is
is okay. Stylistic choice. - Never use
is
withstr
,int
,float
,complex
, or any other core immutable value type! Interning makes the response worthless! - Other valid uses of
is
are fairly rare and obscure, for example:- If I have a large tree structure and want to find the location of a subtree,
==
will recursively compare values (potentially very slow) butis
will tell me if I’ve found the exact same node. - A caching mechanism may want to treat all objects as distinct, without having to care about or rely on how they implement
==
.is
can be appropriate here. - Demonstrating to newbies that interning exists is only possible with
is
:)
- If I have a large tree structure and want to find the location of a subtree,
To summarize even further: don’t use is
unless you’re comparing with None
or you really, really mean it. And you don’t.
Further reading
- The Python Language Reference has a data model section which documents the possibility of caching immutable values, how
__eq__
works, and how operator overloading works in general. - The Python C API is the only documentation of what ints are interned.