Skip to main content

Software Challenge 2013


Following on from a hardware challenge sent out at work last month, I posed a Software Engineering focused problem.

As a successful global company, we have many customers around the world. While marketing are gallivanting around, visiting customers, it might be handy to have a program that tells them which of our customers are closest based upon which customer they are currently visiting. Being an engineer who is interested in writing software that is useful to everyone, you decide to write a general solution to the quandary. Each of our customers lives at a unique location. For the purposes of this program, the world is flat, and the latitude and longitude are for all intents and purposes Cartesian coordinates on a flat plane. For example, in our flat world, lat 45, long -179 is not as close to lat 45, long 179 when compared to lat 45, long 170.

The task was to write a program that takes a single argument on the command line. This argument is a file name which contains the input data. The program should output the nearest three other points for each customer in the list. Note at least one test will require having to deal with all of our (virtual) customers.

Entry

Final timing will be carried out on my 64bit ubuntu machine which has support for the following languages/compilers:
  • python 2.7, 3.2, 3.3
  • pypy 1.9 (python 2.7.2)
  • gcc 4.7.2
  • clang 3.0
  • php 5.4.6
  • ruby 1.9.3
  • nodejs 0.6.19
  • ghc 7.4.2
  • go 1.0.2
  • mono 2.10.8 (.NET 4.0)
  • octave 3.6.2
  • lua 5.2
  • luajit-2.0.1
  • java (OpenJDK Runtime Environment (IcedTea7 2.3.6))
More esoteric environments may be added if you really want. If the program failed to terminate in five minutes for any single verification test, or if the program fails to compile, or terminates early I discarded that program as an incorrect entry and sent an email with the reason.

Example Input

1 40.71 -74.01
2 25.04 121.53
3 -4.33 15.32
4 -12.05 -77.05
5 30.05 31.25
6 39.91 116.40
7 51.51 -0.13

Example Output

1 4,7,3
2 6,5,3
3 5,7,4
4 1,3,7
5 3,7,6
6 2,5,3
7 5,3,1

Results!

Many people made very successful brute force implementations. Some used squared distances for comparison to avoid computing square roots. Around 1kB of source was enough for bruteforce solutions in lua and python. Smallest was around 40 lines of Python code.

For really small N, brute force was quicker than the "smarter solutions". But there was still quite a spread in the max N each solution could solve:


KDTree

These are the data structures I was expecting, I created one in pure python that was reasonable but couldn't compete with the compiled solutions over a certain input size. One engineer submitted a Python + Scipy solution that simply imported scipy.spatial. This was reasonably fast, and very easy! The same engineer also wrote a brute force solver in cython that was much faster for the smaller tests.

Nontree?

The winning solution was a bit of a new take on a quadtree - with nine children instead of four and splitting occurred at 40. It easily solved fifty million customers in around two minutes. When all my resources were dedicated it could often solve one hundred million as well. I had a few occasions where my system locked up at that size though!

Timings


Summary

A single bitcoin was the main prize, although finding one has proven tricky! I thought it was a fun challenge, and I look forward to the next one!

Popular posts from this blog

My setup for downloading & streaming movies and tv

I recently signed up for Netflix and am retiring my headless home media pc. This blog will have to serve as its obituary. The box spent about half of its life running FreeNAS, and half running Archlinux. I’ll briefly talk about my experience with FreeNAS, the migration, and then I’ll get to the robust setup I ended up with.

The machine itself cost around $1000 in 2014. Powered by an AMD A4-7300 3.8GHz cpu with 8GB of memory. A SilverStone DS380 case is both functional, quiet and looks great. The hard drives have been updated over the last two years until it had a full compliment of 6 WD Green 4TiB drives - all spinning bits of metal though.

Initially I had the BSD based FreeNAS operating system installed. I had a single hard drive in its own ZFS pool for TV and Movies, and a second ZFS pool comprised of 5 hard drives for documents and photos.

FreeNAS is straight forward to use and setup, provided you only want to do things supported out of the box or by plugins. Each plugin is install…

Driveby contribution to Python Cryptography

While at PyConAU 2016 I attended the Monday sprints and spent some time looking at a proposed feature I hoped would soon be part of cryptography. As most readers of this blog will know, cryptography is a very respected project within the Python ecosystem and it was an interesting experience to see how such a prominent open source project handles contributions and reviews.

The feature in question is the Diffie-Hellman Key Exchange algorithm used in many cryptography applications. Diffie-Helman Key Exchange is a way of generating a shared secret between two parties where the secret can't be determined by an eavesdropper observing the communication. DHE is extremely common - it is one of the primary methods used to provide "perfect forward secrecy" every time you initiate a TLS connection to an HTTPS website. Mathematically it is extremely elegant and the inventors were the recipients of the 2015 Turing award.

I wanted to write about this particular contribution because man…

Python, Virtualenv and Docker

Unsurprisingly I use some very popular Scientific Python packages like Numpy, Scipy and Scikit Learn. These packages don't get on that well with virtualenv and pip as they take a lot of external dependencies to build. These dependencies can be optional libraries like libblas and libatlas which if present will make Numpy run faster, or required dependencies like a fortran compiler.

Back in the good old days you wouldn't pin all your dependency versions down and you'd end up with a precarious mix of apt-get installed and pip installed packages. Working with other developers, especially on different operating system update schedules could be a pain. It was time to update your project when it breaks because of a dependency upgraded by the operating system.

Does virtualenv fully solve this? No, not when you have hard requirements on the binaries that must be installed at a system level.



Docker being at a lower level gives you much more control without adding too much extra comp…