I really like using Python's generators. Watching/reading David Beazley's talk on the topic of generators has made me try use them where ever practical.
Recently I've had to deal with streaming and capturing massive amounts of data. Recording all traffic from a high baud rate CAN bus incidentally. Anyway I have Packets of unknown type and length coming in and want to store the sequence in such a way that I can recreate the full python objects on another computer.
A quick solution is to serialize the data and store it to a file. The built in module for this is pickle. Unfortunately even with 12GB of RAM my work PC couldn't store the entire sequence in memory waiting to do a single serialization dump with pickle. After a bit of research I found a Python 2 implementation of streaming-pickle by Philip Guo.
His solution didn't support bytearray objects and also suffered from a content boundary separation problem - multiple newlines within the pickled data could be picked up as record delimiters. I upgraded that solution to use Python 3 and support all the binary packing that I required, as well as adding in base64 encoding of the pickled data to get around the content boundary problem.
The use is slightly different from the standard library pickle; as you can either dump an iterable in one hit, or passing single elements to
I imagine I'll refer back to this myself someday, but hopefully it is useful for someone else as well.
gist.github.com/hardbyte/5955010
Recently I've had to deal with streaming and capturing massive amounts of data. Recording all traffic from a high baud rate CAN bus incidentally. Anyway I have Packets of unknown type and length coming in and want to store the sequence in such a way that I can recreate the full python objects on another computer.
A quick solution is to serialize the data and store it to a file. The built in module for this is pickle. Unfortunately even with 12GB of RAM my work PC couldn't store the entire sequence in memory waiting to do a single serialization dump with pickle. After a bit of research I found a Python 2 implementation of streaming-pickle by Philip Guo.
His solution didn't support bytearray objects and also suffered from a content boundary separation problem - multiple newlines within the pickled data could be picked up as record delimiters. I upgraded that solution to use Python 3 and support all the binary packing that I required, as well as adding in base64 encoding of the pickled data to get around the content boundary problem.
The use is slightly different from the standard library pickle; as you can either dump an iterable in one hit, or passing single elements to
s_dump_elt
(streaming dump element). This will pickle, encode then append the element to a file.I imagine I'll refer back to this myself someday, but hopefully it is useful for someone else as well.
gist.github.com/hardbyte/5955010