## 21 November 2011

### Nobody has solved the fault resilience problem

1. Nobody has solved the programming models program. This has never stopped anyone from writing unmaintainable but functional scientific codes, however.

2. Nobody has solved the fault resilience problem. This actually terrifies everyone who has thought about it, because we have never had to deal with this problem before. Furthermore, most of us are ignorant of the huge body of existing research and practice on fault-resilient computing in highly fault-prone environments with a huge penalty of failure (space-based computing, sometimes with human payloads). Most of us balk at the price of fault resilience (triple modular redundancy) that this experience recommends. Nevertheless, we're being asked by policymakers to do computations to help them make high-consequence decisions (asteroid-earth collisions? climate change? whether The Bomb might go off by mistake if unfortunate techs drop it on their big toe?).

The #2 situation is comparable to that of floating-point arithmetic in the awful, awful days before industry and academia converged on a standard. The lack of a sensible standard required immense floating-point expertise in order to have some confidence in the result of a computation. These days, the lack of reliability of data and computations might require the same level of expertise in order to prove correctness of existing algorithms. Around six years ago, Prof. William Kahan (UC Berkeley) suggested that perhaps only one PhD expert in floating-point computation graduates per year. While his standards are exceptionally high, I have no doubt that the lack of training in numerical analysis is the bottleneck for making use of computers whose data and computations are likely to experience undetected errors.

## 01 October 2011

### Fastest Krylov method for symmetric indefinite problems?

BiCG reduces to a redundantly computed CG if the matrix is symmetric positive definite. Thus, CG should always be faster than BiCG in that case. I haven't thought about this question before for the case of a symmetric indefinite problem, but my guess is that MINRES would be faster than BiCG (or, say, QMR) in most cases. MINRES promises monotonic convergence of the residual norm; BiCG doesn't. Their storage requirements are comparable.

MINRES is a special case of GMRES for symmetric indefinite problems. MINRES should be faster than GMRES, though MINRES is a bit less robust.

LSQR works a little bit like BiCG (it builds up two Krylov bases). My guess is that MINRES would be faster than LSQR.

If the linear system is singular or very ill conditioned, one may have more success with LSQR than with GMRES, more success with GMRES than with MINRES, and more success with MINRES than SYMMLQ. LSQR is able to solve rank-deficient problems in a least-squares sense. GMRES and MINRES (depending on the implementation) may be able to solve rank-deficient problems in a least-squares sense.

On the other hand, "LSQR with preconditioning" means something rather different than users might think.

Alas, I want to write more, but time constrains me...

## 13 September 2011

### The ghosts of departed quantities

My friend and old colleague John Cassel responds: "Which is why we refer to Cauchy today as having invented 'the calculus of the undead.'"

Newton: "BLAAAAARGHARARRR FLUUUUXIONSSSSSS..."

## 09 September 2011

### Commenting code may lead to category theory

Not that category theory is hazardous to your health, of course!

This topic came up when i was thinking about how to justify the deficiencies of a particular C++ interface i had written. These functions take the name of a file containing some kind of matrix data. They then construct and return the corresponding matrix object, possibly distributing that object over some number of MPI processes. The trick is that a file could contain all kinds of different data -- both different kinds of matrices (sparse or dense, symmetric or not) and different kinds of data (integer, real, or complex). The file begins with a header containing all of this "metadata" that describes its contents, but you have to open up the file and start reading it in order to find that out.

The issue is that the natural interface calls for each of these functions to return an object of a particular type. The type of object returned by the function is therefore fixed, even though the contents of the file are unknown. There are two reasons for this. First, different kinds of matrices (e.g., sparse or dense) require different kinds of metadata as input, before reading the file at all, in order to interpret the file's contents. (For example, large square sparse matrices may be distributed in a 2-D fashion over processors, so they need both row and column distribution information. Large, dense vectors may only be distributed by rows over processors, so they only need row distribution information.) Second, even if we fix what kind of matrix we want in the interface, we still don't know the type of data *in* the matrix. It could contain integers, real numbers of various precisions, complex numbers, or even more complicated objects (quotients of polynomials with coefficients in a finite field, for example).

Concretely, in C++ terms, i might have a SparseMatrix or DenseMatrix type. Each of these is "templated" (parametrized) on the type of entries in the matrix: SparseMatrix<double> for a sparse matrix with double-precision real values, SparseMatrix<int> for a sparse matrix with 32-bit signed integer values, etc. This lets us implement (for example) SparseMatrix once for entries of any type. It also lets me write the file reader function once for any parametrization of SparseMatrix. All these are good. The problem is that the user has to pick a specific parametrization of SparseMatrix (e.g., SparseMatrix<double>)

*before*calling the function.

Now, there are ways of implementing types that could be any of various types. Really, what you want are "algebraic data types" that are a disjoint union of different types (for example, "int or double or complex

Suppose, though, that C++ did have algebraic data types. I will represent the disjoint union here nonstandardly with a plus sign: e.g., "int + double + complex<double>." What i might like the file reader to do is to return a sparse matrix, whose entries are int + double + complex<double>. In our notation, this would be called a "SparseMatrix<int + double + complex<double> >." However, C++ makes this undesirable. This is because in C++, the bitwise arrangement of data in memory matters a lot. Implementing a "variant" type in C++ that can hold any of int, double, or complex<double> would require at least as much space as that required by a complex<double> -- twice as much as a double, and twice as much again as an int. Furthermore, i really don't want a sparse matrix for which different entries might have different types -- all of the entries should have the same type. Yet, i'm forced to do this if the function returns a SparseMatrix<int + double + complex<double> >.

The function might instead return a SparseMatrix<int> + SparseMatrix<double> + SparseMatrix<complex<double> >. However, now when i get the return value, i have to guess what type the object is, trying each of the possibilities until i find out. This is easy enough for the three possibilities here. However, suppose that SparseMatrix takes several type parameters (in our case, it takes five). Do i have to guess and check for all possible combinations?

It seems natural that SparseMatrix<int> + SparseMatrix<double> should be the same type as SparseMatrix<int + double>. However, i was explaining this problem to a colleague tonight, and it occurred to me that templated classes in C++ are just functions between types. For example, SparseMatrix<T> defines a function from the entry type T to the specific sparse matrix type SparseMatrix<T>. If f is a function, why should i expect that f(x + y) = f(x) + f(y)? That would make the function f only one step away from linear -- a very special function indeed! Similarly, i should expect that a SparseMatrix<int> + SparseMatrix<double> would be a very different sort of object than a SparseMatrix<int + double>. This makes sense if you think about possible representations of each of these objects in memory: the former might be a pointer to some kind of matrix, whose arrays of matrix entries are either arrays of int or arrays of doubles, while the latter would be a matrix whose arrays of matrix entries are arrays of a variant type.

I'm not a category theorist, but thinking about this problem made me think about type systems and algebras over type systems. So, i suppose it's a little bit of category theory, no? All i have to do is not embarrass myself too much in front of a real category theorist!

## 24 July 2011

### Scientists "vs." programmers: It's all about reproducibility

*reproducibility*.

As the article mentions, programmers understand that their code needs to produce predictable results. They set up tests to verify this. Entire fields of computer science research revolve around proving code correctness. As a developer of numerical algorithms, I think of my code as an experimental apparatus for testing my hypothesis about the superiority of one algorithm or optimization over another. I can't say I've always done my best to

*make*my code work that way, but I'm working on it.

I think many scientists haven't yet learned that code is really just experimental apparatus, and the reason for this in part may be that modern scientists are not rewarded for reproducing "known" results. Programmers often are -- why else did Google+ follow Facebook, which in turn followed MySpace? Now, a lot of code really is throw-away analysis scripts. This is probably what Dr. Cook means by the following: "Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision." However, simulations are

*not*like this. A simulation code is an experimental apparatus. It need not be permanent, but it should be constructed well enough to make its results trustworthy. Just as importantly, it should be described and documented sufficiently well that other scientists can replicate its results.

The real problem is not a cultural difference between scientists and programmers: the real problem is reproducibility.

## 21 July 2011

### A mathematical chauvinist?

Later that evening, A was explaining to me how one writes applications for a cell phone. I found myself feeling perfectly happy that I didn't have to do all that work. Cell phones are things I want to use to make phone calls, and maybe look up directions in an emergency; I struggled to consider why I would want to write my own app. However, then I realized I was doing the same thing that A had done above: not valuing the process over the answer. Learning something about writing a cell phone application might serve me well at some future time, even though I can't imagine the value of this time investment now. I suppose I was being a mathematical chauvinist!