24 July 2011

Scientists "vs." programmers: It's all about reproducibility

I was troubled by Dr. John Cook's recent post on the different programming styles of scientists and programmers.  It felt like an exaggerated dichotomy to me: the "cowboy" scientist who tosses off throw-away scripts, and the cube-dwelling programmer implementing a serious product. Dr. Cook was trying hard to think of this dichotomy as a cultural distance, based on his experience working with and trying to understand the needs of both groups.  However, I think he missed the point that joins scientists and programmers:  reproducibility

As the article mentions, programmers understand that their code needs to produce predictable results.  They set up tests to verify this.  Entire fields of computer science research revolve around proving code correctness.  As a developer of numerical algorithms, I think of my code as an experimental apparatus for testing my hypothesis about the superiority of one algorithm or optimization over another.  I can't say I've always done my best to make my code work that way, but I'm working on it.

I think many scientists haven't yet learned that code is really just experimental apparatus, and the reason for this in part may be that modern scientists are not rewarded for reproducing "known" results.  Programmers often are -- why else did Google+ follow Facebook, which in turn followed MySpace?  Now, a lot of code really is throw-away analysis scripts.  This is probably what Dr. Cook means by the following:  "Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision." However, simulations are not like this.  A simulation code is an experimental apparatus.  It need not be permanent, but it should be constructed well enough to make its results trustworthy.  Just as importantly, it should be described and documented sufficiently well that other scientists can replicate its results. 

The real problem is not a cultural difference between scientists and programmers: the real problem is reproducibility.

21 July 2011

A mathematical chauvinist?

An afternoon discussion at work among some of our interns revolved around a question that one (A) posed to another (B): "How many molecules of Abraham Lincoln are in a glass of water?"  After sufficient simplifying assumptions, B (a mathematician) was able to solve the problem with a simple computation.  The process resulted in a debate between A and B over the virtues of being able to derive formulae and constants, rather than looking them up with a search engine.  I sided with B, of course! as having and exercising the ability to carry out that process is much more useful than demanding that a search engine do all the work.  Simple units and order-of-magnitude calculations are things my PhD advisor does very rapidly, and when I was his student, I always found myself lagging behind and feeling not so intelligent.  Exercising that skill at least saved me a little embarrassment!

Later that evening, A was explaining to me how one writes applications for a cell phone.  I found myself feeling perfectly happy that I didn't have to do all that work.  Cell phones are things I want to use to make phone calls, and maybe look up directions in an emergency; I struggled to consider why I would want to write my own app.  However, then I realized I was doing the same thing that A had done above: not valuing the process over the answer.  Learning something about writing a cell phone application might serve me well at some future time, even though I can't imagine the value of this time investment now.  I suppose I was being a mathematical chauvinist!