24 June 2010

"The limits of my language define the limits of my world"

John Rose, speaking of programming languages, quotes the Logisch-philosophische Abhandlung of
Ludwig Wittgenstein:  "The limits of my language define the limits of my world" (Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt).  John nicely links to the German version of Wittgenstein's work, which gives me an lets me practice my rusty German! 

The phrase recalled a question a coworker asked me today:  "If I could choose the programming language I use for my work, what would it be?"  I answered that it depends on the work I want to do:  if I need to have a dialogue with the operating system and with low-level hardware details, C(++) is the tool of choice (mainly for ill); for numerical computations, some midway point between (modern) Fortran and Matlab would be nice.  C is more expressive than Fortran, I went on to say.  "Doesn't that mean C is better than Fortran?" he asked?  No, it doesn't mean that.  Fortran's restrictiveness (esp. its rules about pointer declarations and aliasing) make it easier for compilers to generate efficient code.  C's expressiveness makes it easier to control the computer at a low level.

Wittgenstein's observation applies here:  in C, everything looks like a pointer.  In Fortran, everything looks like an array (C doesn't have real multidimensional arrays, remember!).  In Smalltalk, everything looks like an object.  Lisp programmers fear writing domain-specific languages much less than Java programmers.  SISAL programmers fear returning an (apparently) freshly created array much less than C programmers.  SQL programmers fear complicated searches over tables of data much less than programmers of other languages, etc.  By choosing the language, I choose the way in which I attack programming problems.

If the language suggests a programming model (or a small set of them), then if a new programming model suggests itself, the rational response is to create a new language or modify an existing one in a way that naturally leads the programmer to work within the model.  Programming models ultimately reduce to our assumptions about computer hardware.  Where are these assumptions going?  It's not quite a Turing machine anymore.  We have to extend our hardware assumptions to include unreliability:  segfaults, kernel panics, nodes of a cluster going down and up, the occasional bad bit in memory or on disk. 

"Cloud" programming models ("everything is a reduction pass from disk to CPU and back to disk again") already handle nodes crashing to some extent, but it doesn't look like programming models for handling unexpected bit flips have gotten past the discussion phase.  In part this is because the hardware hasn't presented us with a programming model other than "compute and pray": even fixed addresses in read-only code pages aren't invulnerable, as the above link explains.  It's fair to assume, though, that the hardware will let us distinguish between "accurate" and "inaccurate" data or computations, just like it lets us distinguish between IEEE 754 float and IEEE 754 double. 

This distinction between accurate and possibly inaccurate computations and data will enter the programming model.  Languages have type systems for that sort of thing; it doesn't seem much different than a "const" annotation in C++, or a "synchronized" annotation in Java.  This attribute needs to be orthogonal to the actual datatype; the typical "oh well, floating-point numbers are inexact anyway" viewpoint will lead to disaster in an LAPACK LWORK query, for example.  The programming model will have to include promotions and demotions; these will have costs, of which the programmer must be made aware (just like with Java's "synchronized").

What are the implications of such a model?  First, programmers will have yet another type annotation to learn.  Just like with C's "volatile," it will lead to endless confusion, especially as hardware evolves.  Compilers can help with type deduction, but programmers will be tempted to use the "inaccurate" annotation as a magic "go faster" switch.  If computer science students can't wrap their heads around tail recursion in Scheme, how can we expect them to understand yet another low-level annotation? 

Second, programmers' view of the hardware memory space will grow even more fragmented.  We already have "GPU memory" and "CPU memory" (for OpenCL and CUDA programmers), "local" and "remote" memory (for those working in a Partitioned Global Address Space (PGAS) model), caches and local stores and "shared memory" (which is really a cache) and vector registers and...  In the future, we may have this times two: for each, an "accurate but slow" and an "inaccurate but fast" version.  Going from one to the other will require copying -- which may be implicit (as in a PGAS-like model), but still costs something.  Relaxing bit accuracy is a hardware performance optimization and so if you don't want to declare everything "accurate," then you likely are worried about the costs of copying things around.  This means you will have to reason about all of those fragmented memory spaces.


Given this likely state of hardware, I can rephrase my colleague's question: "What programming model would I like to use if the hardware behaves in this way?"  First, the PGAS model seems most natural: it lets me reason about different memory spaces if I need to, or program as if there is only one memory space (at potentially higher cost) if it helps me get my work done.  Second, the model should help programmers distinguish between "data" and "metadata."  Metadata (indices, permutations, pointers) should only be bit accurate.  (That doesn't mean integers always have to be!  Integer types get used a lot for signal processing and other bit-oriented domains.)  Third, programmers should have to handle metadata as little as possible.  The less metadata you handle explicitly, the less likely you are to "pollute" it with inexactness, and the more freedom the compiler has to perform optimizations on the metadata.  That means the language should support all kinds of collections.  Operations on collections should be fast, otherwise programmers won't want to use them.  Fourth, by extension of the third point, the language should prefer sequence operations ("map f over the array v") that let programmers avoid naming variables that don't really matter, such as index variables and intermediate quantities in loops.  (If you name it, the compiler has to keep it around unless it is clever enough to reason it away.  Storage necessitates annotation of its accuracy, which necessitates conservativism in optimizations, in the same way that C aliasing rules force the C compiler to be conservative.)

This post rambles, but I'm thinking out loud, and I'm curious to hear what others have to say about programming models for this new world of unexpected flipping bits.  The one thing that encourages me is the intelligence of those thinking about the problem.  As Shakespeare has Miranda say in The Tempest, "How beauteous mankind is! O brave new world, That has such people in't!"