Skip to main content

Pickling items that don't fit in memory

I really like using Python's generators. Watching/reading David Beazley's talk on the topic of generators has made me try use them where ever practical.

Recently I've had to deal with streaming and capturing massive amounts of data. Recording all traffic from a high baud rate CAN bus incidentally. Anyway I have Packets of unknown type and length coming in and want to store the sequence in such a way that I can recreate the full python objects on another computer.

A quick solution is to serialize the data and store it to a file. The built in module for this is pickle. Unfortunately even with 12GB of RAM my work PC couldn't store the entire sequence in memory waiting to do a single serialization dump with pickle. After a bit of research I found a Python 2 implementation of streaming-pickle by Philip Guo.

His solution didn't support bytearray objects and also suffered from a content boundary separation problem - multiple newlines within the pickled data could be picked up as record delimiters. I upgraded that solution to use Python 3 and support all the binary packing that I required, as well as adding in base64 encoding of the pickled data to get around the content boundary problem.

The use is slightly different from the standard library pickle; as you can either dump an iterable in one hit, or passing single elements to s_dump_elt (streaming dump element). This will pickle, encode then append the element to a file.

I imagine I'll refer back to this myself someday, but hopefully it is useful for someone else as well.

gist.github.com/hardbyte/5955010

Popular posts from this blog

Matplotlib in Django

The official django tutorial is very good, it stops short of displaying data with matplotlib - which could be very handy for dsp or automated testing. This is an extension to the tutorial. So first you must do the official tutorial! Complete the tutorial (as of writing this up to part 4). Adding an image to a view To start with we will take a static image from the hard drive and display it on the polls index page. Usually if it really is a static image this would be managed by the webserver eg apache. For introduction purposes we will get django to serve the static image. To do this we first need to change the template. Change the template At the moment poll_list.html probably looks something like this: <h1>Django test app - Polls</h1> {% if object_list %} <ul> {% for object in object_list %} <li><a href="/polls/{{object.id}}">{{ object.question }}</a></li> {% endfor %} </ul> {% else %} <p>No polls...

Homomorphic encryption using RSA

I recently had cause to briefly look into Homomorphic Encryption , the process of carrying out computations on encrypted data. This technique allows for privacy preserving computation. Fully homomorphic encryption (FHE) allows both addition and multiplication, but is (currently) impractically slow. Partially homomorphic encryption just has to meet one of these criteria and can be much more efficient. An unintended, but well-known, malleability in the common RSA algorithm means that the multiplication of ciphertexts is equal to the multiplication of the original messages. So unpadded RSA is a partially homomorphic encryption system. RSA is beautiful in how simple it is. See wikipedia to see how to generate the public ( e , m ) and private keys ( d , m ). Given a message x it is encrypted with the public keys it to get the ciphertext C ( x ) with: C ( x ) = x e mod m To decrypt a ciphertext C ( x ) one applies the private key: m = C ( x ) d mod m The homomorphic prop...

Bluetooth with Python 3.3

Since about version 3.3 Python supports Bluetooth sockets natively. To put this to the test I got hold of an iRacer from sparkfun . To send to New Zealand the cost was $60. The toy has an on-board Bluetooth radio that supports the RFCOMM transport protocol. The drive  protocol is dead easy, you send single byte instructions when a direction or speed change is required. The bytes are broken into two nibbles:  0xXY  where X is the direction and Y is the speed. For example the byte 0x16 means forwards at mid-speed. I was surprised to note the car continues carrying out the last given demand! I let pairing get dealt with by the operating system. The code to create a  Car object that is drivable over Bluetooth is very straight forward in pure Python: import socket import time class BluetoothCar : def __init__ ( self , mac_address = "00:12:05:09:98:36" ): self . socket = socket . socket ( socket . AF_BLUETO...