This is a story of how nothing in this story is my fault.
More than once he’s lamented that Python lacks a
frozendict—a dictionary type that can’t be changed. Dictionaries tend to crop up a lot in Python, and in a very large codebase, it’s very easy to end up with this scenario:
- Some function somewhere generates a dict that’s only used by one caller. The interface is obvious since the keys and values are created right there in one place. No problem.
- Other code comes along, notices this handy function, and starts using its return value. Some of this code may pass the ad-hoc dict up to callers, too.
- Some of that other code needs more things added to the dict, but computing the extra data is expensive, so arguments are added to the function that optionally turn on certain keys.
- Some code needs even more things added to the dict that are outside the purview of the original function, so they add helper functions that take the ad-hoc dict and add more things to it.
- Since this has all now happened multiple times throughout your codebase, someone addresses the problem by writing adapter code that infers the original dict from some other object describing its structure, thus saving everyone from writing all these functions that return dicts.
A lovely spaghetti dinner. It’s now nigh impossible to trace what the dict contains or where half of it came from.
Returning an object in the first place would have avoided much of this, but when you’re sitting at step 1, that seems like a lot of effort just to return half a dozen things from a function you wrote to another function you wrote. Swapping out
frozendict() is easy.
My opinion on
frozendict had never grown stronger than “I guess that would be cool”, so I never sat down and wrote the class, and there must be enough subtleties that nobody else at Yelp has either.
Then today, PEP 416 came to my attention. This PEP proposed adding a
frozendict type, but was rejected last year as being largely unnecessary. What’s interesting about it is that the rejection ends with almost a footnote suggesting that perhaps
dictproxy ought to be exposed to Python-land, instead. And indeed this was done, and it exists in Python 3.3.
dictproxy is the type of an object’s
__dict__, sometimes. Usually for truly immutable objects, i.e., ones implemented in C. It’s been part of Python since 2.2, albeit largely unnoticed, because who ever checks things like
Rather than being an immutable dict,
dictproxy is an immutable wrapper around an existing dict. Changes to the underlying dict are still visible, but code that only has a
dictproxy can neither change the underlying dict nor touch it directly in any way. Perfect.
Except that in Pythons before 3.3,
dictproxy doesn’t have a Python-land constructor, so Python code can’t actually use it. That leaves us 2.x suckers out in the cold.
Or does it? “Python can’t do this” is often secret code for “Python can only do this with some awful
ctypes hackery”. Let’s google up some awful hackery, then.
The only promising result I bumbled across was this ActiveState recipe.
I was extremely skeptical about this, because in my experience, and everyone else’s experience, code posted on ActiveState is some of the most bug-ridden hacked-together sludge on the planet. I think this example from my actual browser history makes that point well enough. (Look at the comments.) But this seemed simple enough. It was a quick wrapper around the C API, it had tests, it worked when I tried it, and the meat of it was only three lines. What could possibly go wrong?
All I took was the
1 2 3 4 5 6 7 8
I expanded on it considerably: I used the real type in 3.3+, I wrote a metaclass that would fool
isinstance, I made
dictproxy into a type rather than having a factory function, and I added fallback based on the
Mapping ABC in the event that
ctypes failed (as it does on PyPy). It worked beautifully on every Python incarnation I had on hand, and some people cleverer than I gave nods of approval.
I thought this useful enough that someone else might want to use it, so I tossed it on PyPI. Then I ran it again against Python 2.7, just for kicks.
The next hour produced some fascinating symptoms.
- It wouldn’t segfault within gdb.
- It wouldn’t segfault in another
- It wouldn’t segfault in another terminal window.
- It wouldn’t segfault in a subshell.
I couldn’t get a core dump (later revealed to be caused by
systemd, but not useful anyway). I could only track it down as far as the
PyObj_FromPtr call, but had no idea what the difference could be between shells. The environments were identical in every way that matters.
- The addresses of Python objects just before the segfault were very large; around 12 hex digits.
- The pointer being returned from
PyDictProxy_Newjust before the segfault was much smaller; around 6 hex digits. Rarely, they were negative.
- Non-segfaulting runs had only small addresses.
Look at the ActiveState code again and see if you can figure it out from this.
Okay, here’s the problem.
ctypes function objects don’t have a
rettype attribute. That’s supposed to be
Without a return type assigned,
ctypes assumes everything returns a 32-bit signed int, so it was truncating the pointer being returned. This worked sometimes, when allocations happened to be in the 32-bit range, but the shell I was using just happened to have won the memory allocation lottery, and it (and only its Python children, for reasons I cannot explain) was allocating rather high addresses.
In fact, this bug is the only reason
PyObj_FromPtr is necessary at all. A return type of
py_object means to convert to a Python object automatically! The correct code is much simpler:
1 2 3 4 5 6 7
Interesting trivia: the original code failed on PyPy because
PyObj_FromPtr doesn’t exist. (PyPy has no
PyObject* pointers floating around, after all.) The fixed code actually works on PyPy, until the last line, where it complains that
dict is not a
py_object. (Fixing that wouldn’t work, because… well, no pointers, etc.) I had to make the code fail early so PyPy would fall back to the dumb approach. Hopefully it gets the 3.3 stdlib soon.
That’s what I get for daring to assume that an ActiveState code sample could actually work correctly. Don’t make the same poor choices I did, kids; just say no to
code.activestate.com. Don’t even click once.
I was also surprised that assigning to a bogus attribute on a
ctypes object worked fine; C types tend to have fixed slots that yell at you when you make typos like this. Guess not, in this case.
ctypes developers, what the hell?
restype? What is that even short for? Resurn type?
Glad I bumbled into this early, though; I would’ve had a hell of a time debugging it inside a larger program. Or, worse, someone else’s larger program.