pickle. Don’t use
pickle. Don’t use
The problems with Python’s
pickle module are extensively documented (and repeated). It’s unsafe by default: untrusted pickles can execute arbitrary Python code. Its automatic, magical behavior shackles you to the internals of your classes in non-obvious ways. You can’t even easily tell which classes are baked forever into your pickles. Once a pickle breaks, figuring out why and where and how to fix it is an utter nightmare.
So we keep saying. But people keep using
pickle. Because we don’t offer any real alternatives. Oops.
You can fix
pickle, of course, by writing a bunch of
__reduce_ex__ methods, and maybe using the
copyreg module that you didn’t know existed, and oops that didn’t work, and it’s trial and error figuring out which types you actually need to write this code for, and all you have to do is overlook one type and all your rigor was for nothing.
What about PyYAML? Oops, same problems: it’s dangerous by default, it shackles you to your class internals, it’s possible to be rigorous but hard to enforce it.
Okay, how about that thing Alex Gaynor told me to do at PyCon, where I write custom
dump methods on my classes that just spit out JSON? Sure, you can do that. But if you want to serialize a nested object, then you have to manually call
dump on it, and it has to not do the JSON dumping itself. There’s also the slight disadvantage that all the knowledge about what the data means is locked in your application, in code — if all you have to look at is the JSON itself, there’s no metadata besides “version”. You can’t even tell if your codebase can still load a document without, well, just trying to load it. We’re really talking about rolling ad-hoc data formats here, so I think that’s a shame.
But I have good news: I have solved all of your problems.