Thoughts On OST’s Python Programming Certificate

I completed the four-course Python Programming Certificate program at O’Reilly School of Technology a few weeks ago and now that I’ve had a while to think about it I thought it might be useful for anyone else considering the program to jot down a couple of paragraphs, especially since when I was first looking at the program I wasn’t able to find anything about people that had gone through the whole thing.

First, a word about where I’m coming from.  I’d written some Python code at/for the day job for a year or two prior, and felt fairly comfortable with the language before I started the program.  I wasn’t really looking for a certificate à la Java certification, I was mainly looking for something that’d give me more Python experience and make me feel comfortable in saying I “knew” Python.

All in all I quite enjoyed the program.  There’s plenty of details online about how OST courses work but in a nutshell it’s read a lesson and complete exercises as you go, then a quiz or two with three or four questions and a programming assignment to turn in.  The assignment’s usually 100-odd lines of code (not including unit tests), with the final assignments at the end of each course a little more involved.  Each course has 15 or so lessons, and OST says you should expect to spend about 40 hours total on each course.  I didn’t time my progress but I don’t think it took me that long, but your mileage may differ.

Of the two I very much prefer the assignments – I found the quizzes a little too “copy-paste” from the lesson, and in a lot of cases it seemed as though you had to be fairly precise with your answers’ wording to get it right.  I found the wording for the quizzes to be a little fuzzy occasionally so that I had a bit of a tough time figuring out exactly what the question was asking.

The assignments on the other hand were pretty enjoyable for the most part – you’re given a task to code and some criteria to meet, then you’re more or less left to your own devices.  The instructors were pretty good about pointing out places your solution could be improved, alternate ways of doing things, and so on.

After you’re taught the basics of Python, the rest of the coursework is primarily looking at the use of specific modules in the standard library.  Some modules get more of a workout than others, so for example there are several lessons on both Tkinter and MySQL.

By far my favorite aspect of the program was the enforced unit testing and test-driven development process.  Almost at the beginning you’re introduced to unittest, and once the preliminary introductions are out of the way you’re expected to use TDD for all the assignments.  In fact if I remember correctly you’re even told at one point that your assignments won’t get a passing grade without unit tests.  I found this to be the most valuable part of the coursework – by the time I was done unittest and TDD were completely second nature to me, and it’s probably the most important thing I picked up from the program.

If you’re already a hard-core Pythonista I don’t think there’s much in this program you won’t already know.  On the other hand, if you’re looking to get a little workout with Python or coming to it fresh I can definitely recommend it.  And in case you’re wondering, if you complete all four courses you do indeed get a certificate, suitable for framing.  :)

Delivering wxPython Applications With PyInstaller

Questions like “What GUI should I use with Python?” and “What’s the best way to make a standalone Python app?” come up on Reddit and Stack Overflow pretty frequently.  Having put together a standalone Python application a couple of months ago, I thought I’d post something about my experiences.

PODBrowser is a simple desktop application I put together as part of a larger project to revamp a handbook of Probability Of Detection (POD) curves.  The hows and whys of POD don’t really concern us here, all that matters is that I’m writing a program that lets the user easily preview and plot technical data from 500+ Excel files.  The files are organized by folders so that the root data folder has sub-folders for every technique used; inside each technique folder is one or more sub-folders for every sample that was inspected by that technique.

PODBrowser searches through its data folder and organizes data by technique and sample.  When the user selects an Excel file to preview, a new window pops up with the Excel file in question previewing the first sheet in the file that isn’t a chart (PODBrowser skips the “Chart1″ sheets in these data files).  The user can select any of the sheets in the spreadsheet to preview, and plot the POD curve for the given sheet.  The end goal was an application that could be distributed as a CD in the back of the hardcopy POD Handbook we were contracted to update, or as a promotional USB flash drive giveaway at conferences and trade shows.

I used three third party Python libs to write PODBrowser:  wxPython as the UI toolkit, xlrd to read the Excel files, and matplotlib to produce the plots.  All three work great with PyInstaller, my Python standalone creator of choice.  I went with PyInstaller over py2exe because PyInstaller handles the whole DLL mess that otherwise crops up in bundling Python 2.6+ applications, and because it knows how to handle matplotlib without any tinkering on my part.

Once you’ve got PyInstaller installed, the next step is to create the spec file for your application.  PyInstaller’s docs are pretty good for getting going on this step; the only options I used were -w to keep the cmd console hidden and --icon to specify the icon I’d made for the final product.  Once you get the spec file working there’s not much need to ever visit it again unless your project changes or you want to change the final product; I got in the habit of thinking of it as a Makefile of sorts and

python Build.py specfile

as my compilation. I actually had more trouble putting together a decent icon for PODBrowser under Windows than I ever did building the standalone with PyInstaller.

I went with wxPython over say PyQt or Tkinter mainly because I’m more familiar with it, and because I like that wxWidgets uses the underlying native toolkit where possible.  You can substitute your own UI toolkit of choice over wxPython, PyInstaller works with most of them.  Here’s what the final product looked like on a couple of platforms:

PODBrowser Ubuntu Linux I

PODBrowser Ubuntu Linux I

PODBrowser Ubuntu Linux II

PODBrowser Ubuntu Linux II

PODBrowser Windows 7 I

PODBrowser Windows 7 I

PODBrowser Windows 7 II

PODBrowser Windows 7 II

PODBrowser XP I

PODBrowser XP I

PODBrowser XP II

PODBrowser XP II

(Those parentheses are actually part of the original Excel filenames – I have no idea what the reasoning was behind that.)

The final tally in distribution size was around 45MB, including some basic documentation but not including the Excel data files (which was another 130MB, give or take).  Of that, about 8MB was for matplotlib, another 4MB for matplotlib fonts and toolbar icons, 10-15MB for wxPython, and only about 200kB or so for xlrd.  Ballpark, if you weren’t using matplotlib and xlrd, you’re looking at around a 30MB distributable.  You can use UPX to bring that final size down, but for this application I really only needed it small enough to fit on a CD and/or thumb drive handed out at conferences so 45MB was small enough in this case.  I also put together a setup.msi Windows Installer package with Advanced Installer; the final tally there including some brief installation instructions was 43MB.

If you’d like to see the final result in action, here’s a link to an archive of the source code. Also included in there is a copy of the PyInstaller spec file I used to make it a standalone application under Windows.

Font Errors: PyInstaller + matplotlib

If you ever want to deliver a standalone Python-based app that doesn’t require a full-blown Python install, have a look at PyInstaller.  It’s just about the most painless way I know of to do it and it supports a ton of Python libs right out of the box.  If you don’t see your package listed there, go ahead and try it since there’s a good chance it’ll work anyway- I found that the Excel spreadsheet lib xlrd works without a hitch, for example.

One thing I did discover this morning though about matplotlib – on Windows at least (I haven’t confirmed for other platforms), matplotlib keeps a font cache under the .matplotlib folder that points to font installations.  I put together a wxPython + matplotlib + xlrd app that worked fine with PyInstaller but I couldn’t figure out why it wouldn’t plot data it pulled from the Excel spreadsheets.  Looking at the debug messages, matplotlib seemed to be complaining that it couldn’t find the Vera font; PyInstaller knows to include the fonts with the frozen app so it looked like matplotlib wasn’t looking in the right place.

It turned out matplotlib’s font cache was pointing to a non-existent folder – delete the cache and the app worked as advertised on next startup.  In my case, it hit me because I used to have matplotlib installed as part of my Python installation but I later removed it; but you might also run into it if you’ve run and removed other matplotlib-based applications.  I’m sure there’s a better fix than the one I’ve outlined but I thought I’d post my findings in case anyone else comes across this same problem.

SWIG and Python

I originally wrote this up earlier this year as part of a how-to on creating plugins for NDIToolbox.  I thought it might be worth posting it on its own in case anyone’s looking for a quick intro to using SWIG with Python and C++.  Among other things, SWIG is great if you started with a pure Python program but find yourself needing a bit of a speed boost:  just take the number-crunching or other expensive code and replace it with C or C++.

Pure Python Application

Structure of Helmholtz coils - two identical coils carrying the same currentTo demonstrate how to build and structure a Python-only toolkit for NDIToolbox we’ll build a Helmholtz coils calculator to calculate the magnetic field produced by a pair of identical coils of radius a, number of turns N, and each carrying a current I. If the coils are in opposition (either the coils are wound in opposite directions or the coils’ currents are in opposite direction), their magnetic fields interact to create a maximum at the center.

Organization

Following our recommended approach, we’re going to develop our main logic and user interface separately from the toolkit interface. We’ll have one Python file for the logic hhcoils.py that has a class HelmholtzCoils in which all our calculations are contained. We’ve decided to do a simple wxPython user interface, hhcoils_ui.py, that’s responsible for taking user input and reporting the results of our calculator back to the user. Finally, our toolkit interface with NDIToolbox will be in a third Python file hhcoils_toolkit.py.

(Note-I’m skipping the toolkit for this post, and the UI’s been replaced with a simple script that demonstrates function.)

Logic

Our hhcoils.py file that handles all the calculations is as follows. By itself it’s not very interesting but it helps to illustrate a simple way of structuring your application as both a standalone and as a toolkit.

#!/usr/bin/env python

''' HelmholtzCoils - simple magnetic field calculator (originally used to demo plugin development in NDIToolbox)

Chris Coughlin (TRI/Austin, Inc.)
'''

import math

class HelmholtzCoils(object):
    '''Calculates the magnetic field produced by a pair of Helmholtz coils wound in opposition'''
    mu_0 = 4.*math.pi*1e-7

    def __init__(self, turns_per_coil, current_per_coil, coil_radius):
        self.N = int(turns_per_coil) # Turns per coil
        self.I = float(current_per_coil) # Current (A) per coil
        self.a = float(coil_radius) # Common radius (m) of coils
        self.lhcoil_position = -self.a/2. # Position of left coil (m), defined as a/2
        self.rhcoil_position = self.a/2. # Position of right coil (m), defined as a/2
    
    def geometry_correction(self, coil_position, position):
        '''Geometry correction for magnetic field calcs'''
        return math.pow(1 + math.pow(coil_position - position, 2) / math.pow(self.a, 2), -1.5)

    def H(self, position):
        '''Magnetic field at position (m) in A/m'''
        lh_geometry_factor = self.geometry_correction(self.lhcoil_position, position)
        rh_geometry_factor = self.geometry_correction(self.rhcoil_position, position)
        total_geometry_factor = lh_geometry_factor + rh_geometry_factor
        return ((self.N * self.I)/(2 * self.a)) * total_geometry_factor
        
    def centerH(self):
        '''Magnetic field at dead center of coils'''
        return self.H(0)
        
    def B(self, position):
        '''Flux density at position (m) in mT'''
        return HelmholtzCoils.mu_0 * self.H(position)*1000
        
    def centerB(self):
        '''Flux density at dead center of coils'''
        return self.B(0)
        
    def B_mG(self, position):
        '''Flux density at position (m) in mG'''
        return self.B(position) * 1e4
        
    def wirelength(self):
        '''Length of wire (m) to make N turns of radius a'''
        coil_circumference = math.pi * 2. * self.a
        wire_per_coil = float(self.N) * coil_circumference
        return 2.*wire_per_coil
        
    def awg_recommendation(self):
        '''Returns a recommendation for AWG wire gauge to use for the coils for currents between
0.0125 and 15 Amps. Conservative estimate, returns 0 as recommendation for currents
outside this range.'''
        gauge_current = {10:15, 11:10, 14:5, 17:2.5, 20:1.5, 21:1.0, 22:0.75, 24:0.5,
            27:0.25, 30:0.125, 31:0.100, 32:0.075, 34:0.050, 37:0.025, 40:0.0125}
        delta = float('inf')
        epsilon = 0.
        awg_rec = 0
        if self.I >= 0.0125 and self.I <= 15:
            for gauge, current in gauge_current.items():
                epsilon = current - self.I
                if epsilon >= 0 and epsilon < delta:
                    delta = epsilon
                    awg_rec = gauge
        return awg_rec
view raw hhcoils.py This Gist brought to you by GitHub.

To use the calculator in our user interface, we’ll create an HelmholtzCoils instance with the user’s choice for the number of turns N per coil, the current per coil I, and the coil radius (and the coil separation distance) a. So by way of a demonstration, let’s pretend the following is your UI…

import hhcoils

if __name__ == "__main__":
    coils = hhcoils.HelmholtzCoils(100, 0.6, 0.8382)
    print("Field at centre (A/m) = {0}".format(coils.centerH()))
    print("Recommend {0} metres of {1} AWG wire.".format(coils.wirelength(),
        coils.awg_recommendation()))

Switching To C++

As an example of how to use more than one language in developing your toolkit, suppose we’ve decided to swap out our Python Helmholtz coils engine for a new high-performance C++ engine. We’re happy with the user interface as-is and the toolkit interface works well; since we kept our layers separate we can simply swap the backends without any other changes.

Logic

We’ve decided that we need a high-performance C++ backend to increase the speed of our calculations. Investigating our options for interfacing C++ and Python we’ve decided to use SWIG. To use SWIG, we’ll first write our C++ Helmholtz coils calculator code; once we’re satisfied with it we’ll write a simple configuration file and let SWIG create a Python wrapper for us.

We’ve decided to have a single simple class HelmholtzCoils, defined in a header hhcoils.h and an implementation file hhcoils.cxx. Notice that we’re using exactly the same method names and arguments as our original Python engine – by doing this the changes are transparent to the GUI and we won’t have to make changes to hhcoils_ui.py. Since our GUI won’t be altered we also won’t have to update our toolkit interface, meaning that we can just drop the new engine into the hhcoils folder and the upgrade is complete. Current customers of our toolkit can do the same with their installations to take advantage of the new engine, or we could write a simple installer to automate the process for them.

/* HelmholtzCoils - demonstrating separation of analysis and toolkit interface in NDIToolbox by creating a
simple magnetic field calculator

Chris Coughlin (TRI/Austin, Inc.)
*/
#include "hhcoils.h"

const double HelmholtzCoils::H(double position) const{
    double lh_geometry_factor = geometry_correction(lhcoil_position,position);
    double rh_geometry_factor = geometry_correction(rhcoil_position,position);
    double total_geometry_factor = lh_geometry_factor + rh_geometry_factor;
    return ((N*I)/(2*a))*total_geometry_factor;
}

const double HelmholtzCoils::geometry_correction(const double coil_position, const double position) const{
    return pow(1 + pow(coil_position - position,2) / pow(a,2), -1.5);
}

const double HelmholtzCoils::wirelength(void) const {
    double coil_circumference = M_PI*2*a;
    double wire_per_coil = N*coil_circumference;
    return 2*wire_per_coil;
}

const double HelmholtzCoils::awg_recommendation(void) const {
    double wire_gauges[] = {10,11,14,17,20,21,22,24,27,30,31,32,34,37,40};
    double currentrecs_awg[] = {15,10,5,2.5,1.5,1.0,0.75,0.5,0.25,0.125,0.100,0.075,0.050,0.025,0.0125};
    double delta = DBL_MAX;
    double epsilon = 0;
    double awg_rec = 0;
    if(I>=0.0125 && I<=15) {
        for(int iter=0;iter<16;iter++) {
            epsilon = currentrecs_awg[iter] - I;
            if(epsilon>=0 && epsilon<delta) {
                delta = epsilon;
                awg_rec = wire_gauges[iter];
            }
        }
    }
    return awg_rec;
}
view raw hhcoils.cxx This Gist brought to you by GitHub.
/* HelmholtzCoils - demonstrating separation of analysis and toolkit interface in NDIToolbox by creating a
simple magnetic field calculator

Chris Coughlin (TRI/Austin, Inc.)
*/

#ifndef HELMHOLTZCOILS_H_
#define HELMHOLTZCOILS_H_

#define _USE_MATH_DEFINES // Required on some platforms to get mathematical constants
#include <cmath>
#include <cfloat>

static const double mu_0 = 4*M_PI*1e-7;

class HelmholtzCoils {
public:
    HelmholtzCoils(int turns_per_coil, double current_per_coil, double coil_radius):
        N(turns_per_coil), I(current_per_coil), a(coil_radius), lhcoil_position(-a/2), rhcoil_position(a/2) { }
    
    const double H(const double position) const; // Magnetic field at position (m) in A/m
    const double centerH(void) const { return H(0); } // Magnetic field at dead center of coils
    const double B(const double position) const { return mu_0*H(position)*1000; } // Flux density at position (m) in mT
    const double centerB(void) const { return B(0); } // Flux density at dead center of coils
    const double B_mG(const double position) const { return B(position) * 1e4; } // Flux density at position (m) in mG
    
    const double wirelength(void) const; // Length of wire (m) to make N turns of radius a
    const double awg_recommendation(void) const; // Lookup table to make AWG recommendations based on current I
        
private:
    int N; // Turns per coil
    double I; // Current (A) per coil
    double a; // Common radius (m) of coils
    double lhcoil_position; // Position of left coil (m), defined as -a/2
    double rhcoil_position; // Position of right coil (m), defined as a/2
    // Geometry correction for magnetic field calcs
    const double geometry_correction(const double coil_position, const double position) const;
};

#endif // HELMHOLTZCOILS_H_
view raw hhcoils.h This Gist brought to you by GitHub.
%module hhcoils
%{
/* Includes the header in the wrapper code */
#include "hhcoils.h"
%}
/* Include various STL interfaces - not really needed here but included as a demo*/
%include "std_string.i"
%include "std_vector.i"
/* For vectors it's necessary to specify what type of vector(s) you'll be using-here we're
using the Python "vector_float" as an alias for vector<float>*/
namespace std {
   %template(vector_float) vector<float>;
   %template(vector_vector_float) vector<std::vector<float> >;
};

/* Parse the header file to generate wrappers */
%include "hhcoils.h"
view raw hhcoils.i This Gist brought to you by GitHub.

SWIG’s role in our engine swap is to compile the C++ code into a library, and then to create new Python code as a wrapper. Our engine code is simple enough that we don’t require much in the way of configuration, but the config file adds handlers for strings and vectors just to illustrate their use.

To create the library depends on your C++ compiler and your operating system, what version of Python you’re using, etc. but essentially it entails running SWIG on your configuration file, compiling your C++ code and the wrapper code SWIG generates, and finally linking the compiled code into a static library (.so on Linux, .lib on Windows, etc.). For example, running SWIG with GCC on Linux on our toolkit engine:

swig -c++ -python hhcoils.i
c++ -fpic -c hhcoils.cxx hhcoils_wrap.cxx -I /usr/include/python2.7/ -lstdc++
c++ -shared hhcoils_wrap.o hhcoils.o -o _hhcoils.so

This creates a new Python wrapper hhcoils.py; this file and the _hhcoils.so static library is the new C++ Helmholtz coils calculation engine. We can then copy these two files to our pre-existing structure (overwriting our previous Python-only hhcoils.py engine), and the engine upgrade is complete. Our toolkit interface is unchanged and loads our GUI; our GUI already points to hhcoils.py and since we kept the same method names and signatures we don’t have to change the GUI either:

#!/usr/bin/env python
import hhcoils

if __name__ == "__main__":
    coils = hhcoils.HelmholtzCoils(100, 0.6, 0.8382)
    print("Field at centre (A/m) = {0}".format(coils.centerH()))
    print("Recommend {0} metres of {1} AWG wire.".format(coils.wirelength(),
        coils.awg_recommendation()))

Wrapping Up

The nice thing about using this approach to develop our application is we can get the best of both worlds. Python is generally a more productive programming language than C++, which means that we get our product out to the end user faster by developing in Python. If we find later on that we need a little speed boost, we can swap out the slow bits with a little C++ (or another language of your choosing) with little or no trouble.

There are many other ways you can speed up your Python application-the Cython project takes a lot of the heavy lifting out of writing C extensions; PyPy uses a JIT compiler to speed up your code, and so on.  Generally speaking if you’re worried about performance you should first make sure your Python code is as efficient as you can make it before you start thinking about these other options, but it’s nice to know they’re there if you need ‘em.

AA&S 2011 Presentation Available

I hadn’t noticed until now but Dave Forsyth’s presentation at the 2011 Aircraft Airworthiness & Sustainment (AA&S) Conference is available (mirrored here).  Dave was kind enough to add me to the list of authors; I don’t do much in the way of Probability Of Detection (POD) but I did do the Python front-end that runs the models (originally written in R).

Developing the front end (PODToolkit) was interesting for me in that in was one of the first times I’ve really gone through a formal mock-up process.  Ordinarily I’ll sketch a few ideas for the user interface on paper or on the whiteboard, but this time around I needed to make sure that my ideas matched pretty closely with the end user’s.  After a few phone calls, emails, and one big fax we ended up at this for our mock-up:

Which after a while ended up looking like this:

All in all, I’m pretty pleased with how it turned out. There’s a lot of information that has to be presented when you’re working with POD in Nondestructive Evaluation (NDE) and I like to think we did a good job of putting it together. Hopefully we’ll get the chance to do some more work on it in the future.