Don’t use pickle
. Don’t use pickle
. Don’t use pickle
.
The problems with Python’s pickle
module are extensively documented (and repeated). It’s unsafe by default: untrusted pickles can execute arbitrary Python code. Its automatic, magical behavior shackles you to the internals of your classes in non-obvious ways. You can’t even easily tell which classes are baked forever into your pickles. Once a pickle breaks, figuring out why and where and how to fix it is an utter nightmare.
Don’t use pickle
.
So we keep saying. But people keep using pickle
. Because we don’t offer any real alternatives. Oops.
You can fix pickle
, of course, by writing a bunch of __setstate__
and __reduce_ex__
methods, and maybe using the copyreg
module that you didn’t know existed, and oops that didn’t work, and it’s trial and error figuring out which types you actually need to write this code for, and all you have to do is overlook one type and all your rigor was for nothing.
What about PyYAML? Oops, same problems: it’s dangerous by default, it shackles you to your class internals, it’s possible to be rigorous but hard to enforce it.
Okay, how about that thing Alex Gaynor told me to do at PyCon, where I write custom load
and dump
methods on my classes that just spit out JSON? Sure, you can do that. But if you want to serialize a nested object, then you have to manually call dump
on it, and it has to not do the JSON dumping itself. There’s also the slight disadvantage that all the knowledge about what the data means is locked in your application, in code — if all you have to look at is the JSON itself, there’s no metadata besides “version”. You can’t even tell if your codebase can still load a document without, well, just trying to load it. We’re really talking about rolling ad-hoc data formats here, so I think that’s a shame.
But I have good news: I have solved all of your problems.