<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2572826743364240863</id><updated>2011-11-21T21:49:21.221-08:00</updated><category term='GPU'/><category term='Python'/><category term='templates'/><category term='math'/><category term='LAPACK'/><category term='business'/><category term='Microsoft'/><category term='numerical analysis'/><category term='Krylov'/><category term='engineering'/><category term='C'/><category term='security'/><category term='programming'/><category term='alchemy'/><category term='metaprogramming'/><category term='multicore'/><category term='hermetic'/><category term='HPC'/><category term='Lisp'/><category term='horror'/><category term='freedom'/><category term='C++'/><category term='relativity'/><category term='preconditioner'/><category term='Fortran'/><category term='pedagogy'/><category term='nerdy'/><category term='git'/><category term='hacks'/><category term='linear algebra'/><category term='history'/><category term='ScaLAPACK'/><category term='religion'/><category term='parallel'/><category term='performance'/><category term='scientific computing'/><category term='code'/><category term='MPI'/><category term='science'/><category term='typesetting'/><title type='text'>The nerdiest of the nerds</title><subtitle type='html'>The "nerdiest of the nerds" is of course the numerical analyst -- a superposition of mathematician and computer scientist, who partakes of the lowest-level attributes of each, was scorned as a slide-rule-toting curmudgeon for decades, and only now enters the light as savior of the computer industry ;-)  I stick to "nerdy" topics here:  science, math, programming, and the like.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>42</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6709686391434920788</id><published>2011-11-21T21:37:00.001-08:00</published><updated>2011-11-21T21:49:21.230-08:00</updated><title type='text'>Nobody has solved the fault resilience problem</title><content type='html'>Lessons from Supercomputing 2011:&lt;br /&gt;&lt;br /&gt;1. Nobody has solved the programming models program.&amp;nbsp; This has never stopped anyone from writing unmaintainable but functional scientific codes, however.&lt;br /&gt;&lt;br /&gt;2. Nobody has solved the fault resilience problem.&amp;nbsp; This actually terrifies everyone who has thought about it, because we have never had to deal with this problem before.&amp;nbsp; Furthermore, most of us are ignorant of the huge body of existing research and practice on fault-resilient computing in highly fault-prone environments with a huge penalty of failure (space-based computing, sometimes with human payloads).&amp;nbsp; Most of us balk at the price of fault resilience (triple modular redundancy) that this experience recommends.&amp;nbsp; Nevertheless, we're being asked by policymakers to do computations to help them make high-consequence decisions (asteroid-earth collisions? climate change? whether The Bomb might go off by mistake if unfortunate techs drop it on their big toe?).&amp;nbsp;&lt;br /&gt;&lt;br /&gt;The #2 situation is comparable to that of floating-point arithmetic in the awful, awful days before industry and academia converged on a standard.&amp;nbsp; The lack of a sensible standard required immense floating-point expertise in order to have some confidence in the result of a computation.&amp;nbsp; These days, the lack of reliability of data and computations might require the same level of expertise in order to prove correctness of existing algorithms.&amp;nbsp; Around six years ago, Prof. William Kahan (UC Berkeley) suggested that perhaps only one PhD expert in floating-point computation graduates per year.&amp;nbsp; While his standards are exceptionally high, I have no doubt that the lack of training in numerical analysis is the bottleneck for making use of computers whose data and computations are likely to experience undetected errors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6709686391434920788?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6709686391434920788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6709686391434920788' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6709686391434920788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6709686391434920788'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2011/11/nobody-has-solved-fault-resilience.html' title='Nobody has solved the fault resilience problem'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-7192009751699547418</id><published>2011-10-01T12:32:00.000-07:00</published><updated>2011-10-01T12:32:01.606-07:00</updated><title type='text'>Fastest Krylov method for symmetric indefinite problems?</title><content type='html'>One of my engineer correspondents asked what Krylov subspace method would be fastest for solving a symmetric indefinite problem: GMRES, a least-squares solver such as LSQR, or BiCG.&amp;nbsp; I've thought about this a lot for symmetric positive definite problems, but not so much for the indefinite case.&amp;nbsp; Here are my thoughts; i welcome correction or further suggestions.&lt;br /&gt;&lt;br /&gt;BiCG reduces to a redundantly computed CG if the matrix is symmetric positive definite.&amp;nbsp; Thus, CG should always be faster than BiCG in that case.&amp;nbsp; I haven't thought about this question before for the case of a symmetric indefinite problem, but my guess is that MINRES would be faster than BiCG (or, say, QMR) in most cases.&amp;nbsp; MINRES promises monotonic convergence of the residual norm; BiCG doesn't.&amp;nbsp; Their storage requirements are comparable.&lt;br /&gt;&lt;br /&gt;MINRES is a special case of GMRES for symmetric indefinite problems.&amp;nbsp; MINRES should be faster than GMRES, though MINRES is a bit less robust.&lt;br /&gt;&lt;br /&gt;LSQR works a little bit like BiCG (it builds up two Krylov bases).&amp;nbsp; My guess is that MINRES would be faster than LSQR.&lt;br /&gt;&lt;br /&gt;If the linear system is singular or very ill conditioned, one may have more success with LSQR than with GMRES, more success with GMRES than with MINRES, and more success with MINRES than SYMMLQ.&amp;nbsp; LSQR is able to solve rank-deficient problems in a least-squares sense.&amp;nbsp; GMRES and MINRES (depending on the implementation) may be able to solve rank-deficient problems in a least-squares sense.&lt;br /&gt;&lt;br /&gt;On the other hand, "LSQR with preconditioning" means something rather different than users might think.&lt;br /&gt;&lt;br /&gt;Alas, I want to write more, but time constrains me... &lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-7192009751699547418?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/7192009751699547418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=7192009751699547418' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7192009751699547418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7192009751699547418'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2011/10/fastest-krylov-method-for-symmetric.html' title='Fastest Krylov method for symmetric indefinite problems?'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-7817236412509792469</id><published>2011-09-13T06:32:00.000-07:00</published><updated>2011-09-13T06:32:22.633-07:00</updated><title type='text'>The ghosts of departed quantities</title><content type='html'>&lt;span class="commentBody" data-jsid="text"&gt;Yea, and Cauchy did gaze upon the paper, saying: "Behold, &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Limit_%28mathematics%29"&gt;I have reduced the infinite to the finite&lt;/a&gt;; the &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Ghosts_of_departed_quantities"&gt;ghosts of departed quantities&lt;/a&gt; I have constrained with sigils and brought again to life."&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="commentBody" data-jsid="text"&gt;My friend and old colleague John Cassel responds:&amp;nbsp; "&lt;/span&gt;&lt;span class="commentBody" data-jsid="text"&gt;Which is why we refer to Cauchy today as having invented 'the calculus of the undead.'"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="commentBody" data-jsid="text"&gt;Newton:&amp;nbsp;&lt;/span&gt;&lt;span class="commentBody" data-jsid="text"&gt; "BLAAAAARGHARARRR FLUUUUXIONSSSSSS..."&lt;/span&gt;&lt;span class="commentBody" data-jsid="text"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="commentBody" data-jsid="text"&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-7817236412509792469?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/7817236412509792469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=7817236412509792469' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7817236412509792469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7817236412509792469'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2011/09/ghosts-of-departed-quantities.html' title='The ghosts of departed quantities'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-760688739431497617</id><published>2011-09-09T22:17:00.000-07:00</published><updated>2011-09-09T22:17:14.674-07:00</updated><title type='text'>Commenting code may lead to category theory</title><content type='html'>&lt;p&gt;Not that category theory is hazardous to your health, of course!&lt;/p&gt;&lt;p&gt;This topic came up when i was thinking about how to justify the deficiencies of a particular C++ interface i had written.  These functions take the name of a file containing some kind of matrix data.  They then construct and return the corresponding matrix object, possibly distributing that object over some number of MPI processes.  The trick is that a file could contain all kinds of different data -- both different kinds of matrices (sparse or dense, symmetric or not) and different kinds of data (integer, real, or complex).  The file begins with a header containing all of this "metadata" that describes its contents, but you have to open up the file and start reading it in order to find that out.&lt;/p&gt;&lt;p&gt;The issue is that the natural interface calls for each of these functions to return an object of a particular type.  The type of object returned by the function is therefore fixed, even though the contents of the file are unknown.  There are two reasons for this.  First, different kinds of matrices (e.g., sparse or dense) require different kinds of metadata as input, before reading the file at all, in order to interpret the file's contents.  (For example, large square sparse matrices may be distributed in a 2-D fashion over processors, so they need both row and column distribution information.  Large, dense vectors may only be distributed by rows over processors, so they only need row distribution information.)  Second, even if we fix what kind of matrix we want in the interface, we still don't know the type of data &lt;i&gt;in&lt;/i&gt; the matrix.  It could contain integers, real numbers of various precisions, complex numbers, or even more complicated objects (quotients of polynomials with coefficients in a finite field, for example).&lt;p&gt;&lt;/p&gt;Concretely, in C++ terms, i might have a SparseMatrix or DenseMatrix type.  Each of these is "templated" (parametrized) on the type of entries in the matrix:  SparseMatrix&amp;lt;double&amp;gt; for a sparse matrix with double-precision real values, SparseMatrix&amp;lt;int&amp;gt; for a sparse matrix with 32-bit signed integer values, etc.  This lets us implement (for example) SparseMatrix once for entries of any type.  It also lets me write the file reader function once for any parametrization of SparseMatrix.  All these are good.  The problem is that the user has to pick a specific parametrization of SparseMatrix (e.g., SparseMatrix&amp;lt;double&amp;gt;) &lt;i&gt;before&lt;/i&gt; calling the function.&lt;/p&gt;&lt;p&gt;Now, there are ways of implementing types that could be any of various types.  Really, what you want are &lt;a href="http://en.wikipedia.org/wiki/Algebraic_data_type"&gt;"algebraic data types"&lt;/a&gt; that are a disjoint union of different types (for example, "int or double or complex&lt;double&gt;").  C++ doesn't have those, strictly speaking, although you can fake them with so-called "variant" types.  Other languages, like Haskell, do have algebraic data types.  &lt;/p&gt;&lt;p&gt;Suppose, though, that C++ did have algebraic data types.  I will represent the disjoint union here nonstandardly with a plus sign: e.g., "int + double + complex&amp;lt;double&amp;gt;."  What i might like the file reader to do is to return a sparse matrix, whose entries are int + double + complex&amp;lt;double&amp;gt;.  In our notation, this would be called a "SparseMatrix&amp;lt;int + double + complex&amp;lt;double&amp;gt; &amp;gt;."  However, C++ makes this undesirable.  This is because in C++, the bitwise arrangement of data in memory matters a lot.  Implementing a "variant" type in C++ that can hold any of int, double, or complex&amp;lt;double&amp;gt; would require at least as much space as that required by a complex&amp;lt;double&amp;gt; -- twice as much as a double, and twice as much again as an int.  Furthermore, i really don't want a sparse matrix for which different entries might have different types -- all of the entries should have the same type.  Yet, i'm forced to do this if the function returns a SparseMatrix&amp;lt;int + double + complex&amp;lt;double&amp;gt; &amp;gt;. &lt;/p&gt;&lt;p&gt;  The function might instead return a SparseMatrix&amp;lt;int&amp;gt; + SparseMatrix&amp;lt;double&amp;gt; + SparseMatrix&amp;lt;complex&amp;lt;double&amp;gt; &amp;gt;.  However, now when i get the return value, i have to guess what type the object is, trying each of the possibilities until i find out.  This is easy enough for the three possibilities here.  However, suppose that SparseMatrix takes several type parameters (in our case, it takes five).  Do i have to guess and check for all possible combinations?&lt;/p&gt;&lt;p&gt;It seems natural that SparseMatrix&amp;lt;int&amp;gt; + SparseMatrix&amp;lt;double&amp;gt; should be the same type as SparseMatrix&amp;lt;int + double&amp;gt;.  However, i was explaining this problem to a colleague tonight, and it occurred to me that templated classes in C++ are just functions between types.  For example, SparseMatrix&amp;lt;T&amp;gt; defines a function from the entry type T to the specific sparse matrix type SparseMatrix&amp;lt;T&amp;gt;.  If f is a function, why should i expect that f(x + y) = f(x) + f(y)?  That would make the function f only one step away from linear -- a very special function indeed!  Similarly, i should expect that a SparseMatrix&amp;lt;int&amp;gt; + SparseMatrix&amp;lt;double&amp;gt; would be a very different sort of object than a SparseMatrix&amp;lt;int + double&amp;gt;.  This makes sense if you think about possible representations of each of these objects in memory:  the former might be a pointer to some kind of matrix, whose arrays of matrix entries are either arrays of int or arrays of doubles, while the latter would be a matrix whose arrays of matrix entries are arrays of a variant type.&lt;/p&gt;&lt;p&gt;I'm not a category theorist, but thinking about this problem made me think about type systems and algebras over type systems.  So, i suppose it's a little bit of category theory, no?  All i have to do is not embarrass myself too much in front of a real category theorist!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-760688739431497617?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/760688739431497617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=760688739431497617' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/760688739431497617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/760688739431497617'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2011/09/commenting-code-may-lead-to-category.html' title='Commenting code may lead to category theory'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-1960708028715383969</id><published>2011-07-24T13:55:00.000-07:00</published><updated>2011-07-24T13:55:35.613-07:00</updated><title type='text'>Scientists "vs." programmers: It's all about reproducibility</title><content type='html'>I was troubled by Dr. John Cook's &lt;a href="http://www.johndcook.com/blog/2011/07/21/software-exoskeletons/"&gt;recent post&lt;/a&gt; on the different programming styles of scientists and programmers.&amp;nbsp; It felt like an exaggerated dichotomy to me: the "cowboy" scientist who tosses off throw-away scripts, and the cube-dwelling programmer implementing a serious product. Dr. Cook was trying hard to think of this dichotomy as a cultural distance, based on his experience working with and trying to understand the needs of both groups.&amp;nbsp; However, I think he missed the point that joins scientists and programmers:&amp;nbsp; &lt;i&gt;reproducibility&lt;/i&gt;.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;As the article mentions, programmers understand that their code needs to produce predictable results.&amp;nbsp; They set up tests to verify this.&amp;nbsp; Entire fields of computer science research revolve around proving code correctness.&amp;nbsp; As a developer of numerical algorithms, I think of my code as an experimental apparatus for testing my hypothesis about the superiority of one algorithm or optimization over another.&amp;nbsp; I can't say I've always done my best to &lt;i&gt;make&lt;/i&gt; my code work that way, but I'm working on it.&lt;br /&gt;&lt;br /&gt;I think many scientists haven't yet learned that code is really just experimental apparatus, and the reason for this in part may be that modern scientists are not rewarded for reproducing "known" results.&amp;nbsp; Programmers often are -- why else did Google+ follow Facebook, which in turn followed MySpace?&amp;nbsp; Now, a lot of code really is throw-away analysis scripts.&amp;nbsp; This is probably what Dr. Cook means by the following:&amp;nbsp; "Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision." However, simulations are &lt;i&gt;not&lt;/i&gt; like this.&amp;nbsp; A simulation code is an experimental apparatus.&amp;nbsp;  It need not be permanent, but it should be constructed well enough to make its results trustworthy.&amp;nbsp;  Just as importantly, it should be described and documented sufficiently well that other scientists can replicate its results.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The real problem is not a cultural difference between scientists and programmers: the real problem is reproducibility.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-1960708028715383969?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/1960708028715383969/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=1960708028715383969' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/1960708028715383969'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/1960708028715383969'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2011/07/scientists-vs-programmers-its-all-about.html' title='Scientists &quot;vs.&quot; programmers: It&apos;s all about reproducibility'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6197026726875147595</id><published>2011-07-21T06:59:00.000-07:00</published><updated>2011-07-21T06:59:57.636-07:00</updated><title type='text'>A mathematical chauvinist?</title><content type='html'>An afternoon discussion at work among some of our interns revolved around a question that one (A) posed to another (B): "How many molecules of Abraham Lincoln are in a glass of water?"&amp;nbsp; After sufficient simplifying assumptions, B (a mathematician) was able to solve the problem with a simple computation.&amp;nbsp; The process resulted in a debate between A and B over the virtues of being able to derive formulae and constants, rather than looking them up with a search engine.&amp;nbsp; I sided with B, of course! as having and exercising the ability to carry out that process is much more useful than demanding that a search engine do all the work.&amp;nbsp; Simple units and order-of-magnitude calculations are things my PhD advisor does very rapidly, and when I was his student, I always found myself lagging behind and feeling not so intelligent.&amp;nbsp; Exercising that skill at least saved me a little embarrassment!&lt;br /&gt;&lt;br /&gt;Later that evening, A was explaining to me how one writes applications for a cell phone.&amp;nbsp; I found myself feeling perfectly happy that I didn't have to do all that work.&amp;nbsp; Cell phones are things I want to use to make phone calls, and maybe look up directions in an emergency; I struggled to consider why I would want to write my own app.&amp;nbsp; However, then I realized I was doing the same thing that A had done above: not valuing the process over the answer.&amp;nbsp; Learning something about writing a cell phone application might serve me well at some future time, even though I can't imagine the value of this time investment now.&amp;nbsp; I suppose I was being a mathematical chauvinist!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6197026726875147595?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6197026726875147595/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6197026726875147595' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6197026726875147595'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6197026726875147595'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2011/07/mathematical-chauvinist.html' title='A mathematical chauvinist?'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-8585815634523100521</id><published>2010-10-11T22:37:00.000-07:00</published><updated>2010-10-11T22:37:46.942-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='git'/><title type='text'>Git: recovering from an incorrect e-mail address</title><content type='html'>Our big software project uses &lt;a href="http://git-scm.com/"&gt;Git&lt;/a&gt; (the version control system) in a CVS mode, with a single central repository to which we push commits (after they pass the check-in test suite).&amp;nbsp; This can be good, because it forces us to fix small but annoying things -- like setting the correct e-mail address!&amp;nbsp; I was testing on a different machine than my usual development workstation; I made some commits there, and then pushed to my workstation's local repository (Git shines for stuff like this).&amp;nbsp; Then, I activated the check-in tests, but the push failed: my e-mail address wasn't set!&amp;nbsp; (The tests check for that -- handy!)&lt;br /&gt;&lt;br /&gt;While I had set my e-mail address for Git correctly on my workstation, I hadn't set it on the other machine.&amp;nbsp; There were seven commits sitting on my workstation (where I run the check-in test suite) with the wrong e-mail address.&amp;nbsp; Here's how I started the fix.&amp;nbsp; First, I ran&lt;br /&gt;&lt;br /&gt;$ git rebase -i HEAD~7&lt;br /&gt;&lt;br /&gt;which let me fix up the last 7 local commits.&amp;nbsp; "-i" means "interactive," so Git fired up a text editor and let me decide which commits to change (and how).&amp;nbsp; I then repeated the following three tasks seven times:&lt;br /&gt;&lt;br /&gt;$ git commit --amend --reset-author&lt;br /&gt;(Edit the commit message -- no author e-mail address here, but it gets fixed due to "--reset-author")&lt;br /&gt;$ git rebase --continue&lt;br /&gt;&lt;br /&gt;That fixed it!&amp;nbsp; There's probably an easier way, but I was pleased.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-8585815634523100521?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/8585815634523100521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=8585815634523100521' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/8585815634523100521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/8585815634523100521'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/10/git-recovering-from-incorrect-e-mail.html' title='Git: recovering from an incorrect e-mail address'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6654730228216652311</id><published>2010-10-04T20:59:00.000-07:00</published><updated>2010-10-04T20:59:51.873-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>Larval sparse matrices?</title><content type='html'>John Rose's post on &lt;a href="http://blogs.sun.com/jrose/entry/larval_objects_in_the_vm"&gt;"larval objects in the JVM"&lt;/a&gt; suggests a type distinction between "larval" and "adult" objects.&amp;nbsp; Larval objects are mutable, suitable for in-place updates and for use as scratch computation, but are not safe for sharing by multiple threads in the same shared-memory space.&amp;nbsp; Adult objects are immutable and safe to share in parallel, but "changing" them requires copying, which can be expensive.&amp;nbsp; His examples mainly involve small objects, like complex numbers or dates.&amp;nbsp; However, he does suggest a "large-scale" example of a database, where the default state is immutable ("adult"), but a mutable API can be made available by request.&lt;br /&gt;&lt;br /&gt;This pattern of a "larval" initialization stage -- too complex and incremental to handle in a constructor -- followed by a "publish" command (we call it "&lt;a href="http://trilinos.sandia.gov/packages/docs/dev/packages/tpetra/doc/html/classTpetra_1_1CrsMatrix.html#a77f87853df66fa7afdf2a246d8689ebf"&gt;fillComplete&lt;/a&gt;") that transforms the data structure into its "adult" state -- is more or less how sparse matrices are used in practice.&amp;nbsp; The data for the sparse matrix is processed in "element" form (more or less equation by equation), and then "assembled" into the optimized data structure.&amp;nbsp; Changing the optimized final data structure can be expensive and may require re-optimization.&amp;nbsp; Thus, we like to think of it as immutable when optimized. &lt;br /&gt;&lt;br /&gt;The problem is, the sparse matrix API contains operations for the "larval" stage, which are either invalid or expensive when the matrix is in its "adult" stage.&amp;nbsp; Rather than discouraging users from doing the wrong thing, it would be nice for the API to forbid invalid operations syntactically (not by throwing an exception and frustrating new users with a mysterious error message), and also make clear which operations are allowed but undesirable due to their cost.&amp;nbsp; Making it syntactically harder to do suboptimal things is a good way to intimidate nonexpert programmers from doing them!&lt;br /&gt;&lt;br /&gt;John mentions the library solution to this problem: "builder" objects or functions, like Java's StringBuilder.&amp;nbsp; They take as input some number of updates, applied incrementally.&amp;nbsp; When the user signals that all updates are done, they produce a "final" immutable object.&amp;nbsp; In the case of sparse matrices, that would mean users can only call mutating methods on the "SparseMatrixBuilder" object.&amp;nbsp; To mutate a "finished" sparse matrix, they would have to request a "builder."&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The problem with this approach is the ability to request a builder, after the original completion of the immutable object.&amp;nbsp; It means that immutable objects can change state.&amp;nbsp; It also means that the immutable object (which is now only immutable in terms of its API) and the mutable "view" coinhabit the same scope.&amp;nbsp; Now the state of the "immutable" object depends on the semantics of view updates.&amp;nbsp; Do they happen as soon as the update methods are called?&amp;nbsp; (Simultaneity doesn't mean much in the context of multiple threads.)&amp;nbsp; Do they get accumulated and applied in batches?&amp;nbsp; Do they involve an asynchronous call to some remote compute device, like a GPU?&amp;nbsp; Programmers shouldn't be depending on these sorts of semantics.&amp;nbsp; It's better to avoid the mess and forbid changing immutable objects back into mutable ones, or return to the original solution and provide only a single object type with immutable and mutable states.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;What's missing here is a "rebinding syntax" that prevents programmers from accessing the supposedly immutable object while it's in a mutable state.&amp;nbsp; This is a good use case for the "WITH-" Lisp idiom, or for Python's context guards:&amp;nbsp; the mutable view created by the "with" isn't allowed to escape its scope.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6654730228216652311?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6654730228216652311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6654730228216652311' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6654730228216652311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6654730228216652311'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/10/larval-sparse-matrices.html' title='Larval sparse matrices?'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-5852114423905346610</id><published>2010-09-24T10:38:00.000-07:00</published><updated>2010-09-24T10:38:49.305-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ScaLAPACK'/><category scheme='http://www.blogger.com/atom/ns#' term='linear algebra'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel'/><title type='text'>Using ScaLAPACK: not so painful!</title><content type='html'>I finally got an excuse this week to try out &lt;a href="http://www.netlib.org/scalapack/"&gt;ScaLAPACK&lt;/a&gt;, a library for distributed-memory parallel dense linear algebra computations.&amp;nbsp; I've mainly been working on iterative methods for sparse problems, so it's not common for me to want to factor some giant dense matrix.&amp;nbsp; However, part of those iterative methods involves an orthogonalization method for small groups of dense vectors.&amp;nbsp; Comparing against ScaLAPACK's QR factorization &lt;a href="http://www.netlib.org/scalapack/double/pdgeqrf.f"&gt;PDGEQRF&lt;/a&gt; should make a good sanity check for accuracy and performance.&amp;nbsp; (PDGEQRF should be slower, since it uses an algorithm that requires more communication, and also since it likely doesn't use multicore parallelism as effectively for the panel factorization.)&lt;br /&gt;&lt;br /&gt;It took me two and a half days to get a prototype accuracy test and benchmark up and running, which is a lot less than I had thought.&amp;nbsp; The &lt;a href="http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/"&gt;Intel MKL Link Line Advisor&lt;/a&gt; was a big help in linking the right libraries.&amp;nbsp; I've tried building ScaLAPACK from source before -- while that was some four years ago, it was a real pain and I was happy not to have to try that.&amp;nbsp; The last few hours of those two-and-a-half days were spent trying to figure out why the matrix distribution wasn't working.&amp;nbsp; The routine in question, PDGEQRF, either refused to accept the input data (it helpfully identified the first argument of the routine which was off in some way, but didn't say why it was wrong), or only factored part of the matrix.&amp;nbsp; Staring at the &lt;a href="http://www.netlib.org/scalapack/slug/index.html"&gt;ScaLAPACK Users' Guide&lt;/a&gt; -- in particular, at the diagrams illustrating global matrix layout -- helped me decode the mysterious parameters describing how the matrix is distributed among processors.&lt;br /&gt;&lt;br /&gt;Using ScaLAPACK turned out not to be so difficult as I had thought.&amp;nbsp; The Users' Guide is well-written, with many helpful diagrams.&amp;nbsp; The routines have great documentation (and helpfully repeat documentation for certain library-standard interfaces, like the matrix descriptor array).&amp;nbsp; Having a pre-built library with the linker flags ready for copy-and-paste into my build configuration system was a huge help.&amp;nbsp; I imagine much of the pain people have reported came from trying to build everything themselves.&amp;nbsp; Note also that &lt;a href="http://www.mathworks.com/company/newsletters/digest/2010/sept/spmd-distributed-arrays.html?s_v1=15951092_1-D1DS22"&gt;Matlab now has support&lt;/a&gt; for distributed-memory parallel computation, and distributed-memory data layouts.&amp;nbsp; It's based on &lt;a href="http://www.mathworks.com/products/parallel-computing/demos.html?file=/products/demos/shipping/distcomp/paralleldemo_backslash_bench.html"&gt;ScaLAPACK&lt;/a&gt;, which means the MathWorks' developers have gone through a lot of this pain for you ;-)&lt;br /&gt;&lt;br /&gt;A final interesting point:&amp;nbsp; the ScaLAPACK Users' Guide was written when &lt;a href="http://www.netlib.org/hpf/"&gt;High Performance Fortran&lt;/a&gt; (HPF) was under serious consideration as &lt;i&gt;the&lt;/i&gt; way to write distributed-memory parallel code.&amp;nbsp; ScaLAPACK doesn't use HPF, but it does use HPF as an alternate notation to describe different ways to distribute matrices over processors.&amp;nbsp; HPF proved unpopular because it was too hard to build an efficient compiler for it, but parts of HPF made their way into modern versions of the Fortran standard.&amp;nbsp; &lt;br /&gt;&amp;nbsp;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-5852114423905346610?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/5852114423905346610/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=5852114423905346610' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/5852114423905346610'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/5852114423905346610'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/09/using-scalapack-not-so-painful.html' title='Using ScaLAPACK: not so painful!'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-7107255257590394941</id><published>2010-09-15T15:08:00.000-07:00</published><updated>2010-09-15T15:08:55.921-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='MPI'/><title type='text'>The user experience of lazy automatic tuning</title><content type='html'>I've been doing some performance experiments with an MPI-parallel linear algebra kernel.&amp;nbsp; I've implemented the kernel in two different ways:&amp;nbsp; a "reduce"&amp;nbsp; (not quite MPI_Reduce(), though it has a similar communication pattern) and "broadcast," and a "butterfly."&amp;nbsp; The latter is more general (it can be used to compute things other than what I'm benchmarking).&amp;nbsp; However, it might be slow in certain cases (e.g., if the network can't handle too many messages at once, or if there are many MPI ranks per node and the network card serializes on processing all those messages going in and out of the node).&amp;nbsp; Each run involves testing two different implementation alternatives, and four different scalar data types (float, double, complex float, and complex double).&lt;br /&gt;&lt;br /&gt;When I ran the benchmarks, I noticed that the very first run (butterfly implementation strategy, "float" data type) of each invocation of the benchmark is 10 - 100x slower than for other data types with the same implementation strategy. When I rebuild the executable using a different MPI library, this problem went away.&amp;nbsp; What was going on?&amp;nbsp; I noticed that for the "slow" MPI library, the minimum benchmark run time was reasonable, but the maximum run time was a lot more.&amp;nbsp; After reading through some messages in the e-mail archives of that MPI library, I found out that the library delays some setup costs until execution time.&amp;nbsp; That means the first few calls of MPI collectives might be slow, but overall they will be fast.&amp;nbsp; The other MPI library doesn't seem to be doing this.&lt;br /&gt;&lt;br /&gt;I was reminded of some troubles that the MathWorks had with integrating FFTW into Matlab, a few years ago.&amp;nbsp; FFTW does automatic performance tuning as a function of problem size.&amp;nbsp; The tuning phase takes time, but once it's finished, running any problem of that size will likely be much faster.&amp;nbsp; However, users didn't realize this.&amp;nbsp; They ran an FFT once, noticed that it was much slower than before (because FFTW was busy doing its thing), and then complained.&amp;nbsp; One can imagine all sorts of fixes for this, but the point is that something happened to performance which was not adequately communicated to users.&lt;br /&gt;&lt;br /&gt;Now, MPI users are likely more sophisticated programmers than most Matlab users.&amp;nbsp; They likely have heard of automatic run-time performance tuning, and maybe even use it themselves.&amp;nbsp; I'm fairly familiar with such things and have been programming in MPI for a while.&amp;nbsp; Yet, this phenomenon still puzzled me until I poked around through the e-mail list archives.&amp;nbsp; What I was missing was an obvious pointer to documentation describing performance phenomena such as this one.&amp;nbsp; &lt;i&gt;The performance of this MPI library is not transparent.&lt;/i&gt;&amp;nbsp; This is not a bad thing in this case, because the opaqueness is hiding a performance tuning process that will make my code faster overall.&amp;nbsp; However, &lt;i&gt;I want to know this right away&lt;/i&gt;, so I don't have to think about what I could be doing wrong.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;See, I'm not just calling MPI directly.&amp;nbsp; I've wrapped up some other wrapper over MPI.&amp;nbsp; I didn't know if my wrapper was broken, the wrapper underneath was broken, the MPI library was broken, the cluster job execution software was broken, or if nothing was wrong and this was just expected behavior.&amp;nbsp; That's why I want to know right away whether I should expect significantly nonuniform performance behavior from some library.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-7107255257590394941?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/7107255257590394941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=7107255257590394941' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7107255257590394941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7107255257590394941'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/09/user-experience-of-lazy-automatic.html' title='The user experience of lazy automatic tuning'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-2528051663604668534</id><published>2010-09-08T22:05:00.000-07:00</published><updated>2010-09-08T22:05:29.875-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='parallel'/><title type='text'>CPUs and GPUs, part 3: memory and programming models</title><content type='html'>My &lt;a href="http://hilbertastronaut.blogspot.com/2010/09/gpus-and-cpus-part-2-programming-models.html"&gt;last post&lt;/a&gt; got a good comment which pointed out that I hadn't been covering another important class of programming models: annotations.&amp;nbsp; C and C++ programmers would call these "pragmas," since their compilers understand lines starting with "#pragma" as directives to the compiler that affect one or more lines of code that follow.&amp;nbsp; They would likely be familiar with &lt;a href="http://www.openmp.org/"&gt;OpenMP&lt;/a&gt;, which uses annotations to tell the compiler to parallelize certain loops or fragments of code.&amp;nbsp; Other programming languages, like Common Lisp, use annotations as hints for optimization -- in particular, as type hints.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;The commenter pointed out that the Hybrid Multicore Parallel Programming (HMPP) system of annotations, developed by &lt;a href="http://www.caps-entreprise.com/"&gt;CAPS&lt;/a&gt; and supported by the &lt;a href="http://www.pathscale.com/"&gt;PathScale&lt;/a&gt; ENZO compiler suit, offers another programming model for heterogeneous node computing.&amp;nbsp; You write a kernel in a subset of C(++) or Fortran, annotated with operations that the compute device (e.g., GPU) understands (like when to transfer data to / from the host CPU, or when to synchronize device threads), and also label the kernel with an annotation to identify it for deployment to the device. &amp;nbsp; I was just introduced to HMPP last night, so please take a look at the comment to read the manual for yourself :-)&amp;nbsp; However, a brief read of the manual reminds me of &lt;a href="https://code.google.com/p/copperhead/"&gt;Copperhead&lt;/a&gt;, a Python annotation system (based on decorators) that accomplishes similar goals.&lt;br /&gt;&lt;br /&gt;Annotations require the programmer to (1) know the annotation language as well as the main programming language being used, (2) understand what subset of the programming language can be sent to the device, and (3) understand when and how the annotation changes program semantics.&amp;nbsp; Compilers &lt;i&gt;should&lt;/i&gt; help with #2, but may not; if given invalid code for the device, they may generate incorrect device code, which may crash or silently give the wrong answer.&amp;nbsp; This is true for other annotation systems as well.&amp;nbsp; OpenMP, for example, doesn't help you avoid race conditions if multiple threads modify shared state.&amp;nbsp; You have to identify and annotate critical sections and / or atomic updates yourself.&amp;nbsp; #3 can also be a challenge, especially if annotations are sprinkled throughout the code.&lt;br /&gt;&lt;br /&gt;Before talking about HMPP, I'll first use OpenMP as an example of the difficulties of popular annotation-based approaches.&amp;nbsp; Many OpenMP performance problems and code-writing challenges relate to the memory model.&amp;nbsp; You've taken a sequential code, where memory is laid out for sequential operation, and introduced parallelism to the computation but not the memory layout.&amp;nbsp; On nonuniform memory access (NUMA) hardware, performance tanks, because the data lives in the wrong place for most of the threads.&amp;nbsp; At least modern OpenMP implementations try not to let threads wander around to different NUMA regions from where they started, which means that the operating system could migrate pages closer to the cores that are working on them.&amp;nbsp; That suggests a "preconditioning" approach, where you first use OpenMP parallelism to read through the whole data set (thereby letting the operating system move data to where it needs to be), and then use OpenMP to do the computations you wanted to do in the first place.&amp;nbsp; Operating systems in common use don't do that yet!&amp;nbsp; Another approach is just syntactic sugar for explicitly threaded programming: each thread allocates its own data, and then you program in something like a distributed-memory model.&amp;nbsp; (It would be easier and less buggy just to run MPI with one process per core.)&lt;br /&gt;&lt;br /&gt;Both of these OpenMP approaches are awkward.&amp;nbsp; The problem is that OpenMP separates computation and memory layout.&amp;nbsp; The annotations only govern computation; they don't say where the data should live in memory.&amp;nbsp; That often means throwing away useful information, which many OpenMP programmers know, about which threads will operate on which data.&amp;nbsp; If you know how to allocate an array, you likely know how you would divide up simple parallel for loops or reductions on the array among the CPU cores.&amp;nbsp; If you know that, you almost certainly know how to lay out the array in the various NUMA regions.&lt;br /&gt;&amp;nbsp;Heterogeneous programming has its own memory layout problem: data either lives on the host CPU or the device.&amp;nbsp; Those are two separate memory spaces.&amp;nbsp; Even if they weren't separate, it still might be expensive to compute on remote data, or even to copy data from one space to another.&amp;nbsp; This is exactly the same problem as on multicore CPUs with NUMA.&amp;nbsp; When I look at HMPP, I see annotations for when to perform data transfers between the host and the device, and even ways to overlap data transfers with useful computation.&amp;nbsp; However, I don't see annotations for where to allocate the data.&amp;nbsp; The HMPP model assumes (at least from what I can tell from a brief look -- please correct me if I'm wrong) that data always lives on the host, is always transferred to the device for computations, and is always transferred back to the host afterwards.&amp;nbsp; This is like allocating all your data in one NUMA region and operating on it using CPU cores in all the NUMA regions.&lt;br /&gt;&lt;br /&gt;One could augment the annotation approach to include a memory layout model, but it seems like this is just making the language more baroque.&amp;nbsp; Plenty of programming languages reify data layout without the complication of additional annotations.&amp;nbsp; &lt;a href="http://www.co-array.org/"&gt;Co-Array Fortran&lt;/a&gt; is a good example.&amp;nbsp; This need not be done in the language; libraries can also hide the details of layout and allocation.&amp;nbsp; &lt;a href="http://trilinos.sandia.gov/packages/docs/dev/packages/kokkos/doc/html/group__kokkos__node__api.html"&gt;Kokkos&lt;/a&gt; in &lt;a href="http://trilinos.sandia.gov/"&gt;Trilinos&lt;/a&gt; does this for GPUs, for example.&amp;nbsp; CUDA and OpenCL are languages, but their special routines for allocating memory on the device are just plain C.&lt;br /&gt;&lt;br /&gt;I like the Array Building Blocks (and Kokkos) approach, because it also gives a good model for interactions between legacy code (with legacy memory layout) and new parallel code (that expects or should have data in a particular nonlegacy layout).&amp;nbsp; Kokkos does this with smart pointers and copy routines: you aren't allowed to call parallel kernels on legacy memory.&amp;nbsp; Array Building Blocks does this with "bind", which links up legacy memory and compute buffers.&amp;nbsp; The general idea is that "you don't get the pointer": memory on which heterogeneous or node-level parallel computation is done, is encapsulated and organized optimally by the language or library system.&amp;nbsp; This is also a great way to unify different kinds of nodes with different memory space(s), which is why Kokkos does it.&amp;nbsp; Pretty much any kind of node has parallel computing resources, associated to one or more memory spaces, for which data transfer costs are high and data layout matters for performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-2528051663604668534?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/2528051663604668534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=2528051663604668534' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2528051663604668534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2528051663604668534'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/09/cpus-and-gpus-part-3-memory-and.html' title='CPUs and GPUs, part 3: memory and programming models'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-9075428031100200376</id><published>2010-09-05T13:20:00.000-07:00</published><updated>2010-09-05T13:20:41.820-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><title type='text'>GPUs and CPUs, part 2: programming models</title><content type='html'>After finishing &lt;a href="http://hilbertastronaut.blogspot.com/2010/09/programming-gpus-makes-us-better-cpu.html"&gt;last night's post&lt;/a&gt;, I was thinking about CPU and GPU programming models -- in particular, about short vector instructions.&amp;nbsp; On CPUs these are called SIMD (single instruction, multiple data) instructions.&amp;nbsp; NVIDIA markets them instead as "SIMT" (single instruction, multiple thread), not because of differences in the hardware implementation, but because the programming model is different.&amp;nbsp; The "CUDA C Programming Guide" version 3.1 says:&amp;nbsp; "... SIMD vector organizations expose the SIMD width to the software, whereas SIMT instructions specify the execution and branching behavior of a single thread."&amp;nbsp; Here, "thread" corresponds to an element in an array, operated on by a vector instruction.&lt;br /&gt;&lt;br /&gt;CPU vendors have historically done a poor job presenting programmers with a usable interface to SIMD instructions.&amp;nbsp; Compilers are supposed to vectorize code automatically, so that you write ordinary scalar loops and vectorized code comes out.&amp;nbsp; However, my experience a few years ago writing SIMD code is that compilers have a hard time doing this for anything but the simplest code.&amp;nbsp; Often, they will use SIMD instructions, but only operate on one element in the short vector, just because the vector pipes are faster than the scalar ones (even if the extra element(s) in the vector are wasted).&amp;nbsp; The reason they struggle at vectorization -- more so than compilers for traditional vector machines like the Crays -- is because the SIMD instructions are more restrictive than traditional vector instructions.&amp;nbsp; They demand contiguous memory, aligned on wide boundaries (like 128 bits).&amp;nbsp; If the compiler can't prove that your memory accesses have this nice friendly pattern, it won't vectorize your code.&amp;nbsp; SIMD branching is expensive (it means you lose the vector parallelism), so compilers might refuse to vectorize branchy code.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Note that GPU hardware has the same properties!&amp;nbsp;&amp;nbsp; They also like contiguous memory accesses and nonbranchy code.&amp;nbsp; The difference is the programming model: "thread" means "array element" and consecutive threads are contiguous, whether you like it or not.&amp;nbsp; The model encourages coding in a SIMD-friendly way.&amp;nbsp; In contrast, CPU SIMD instructions didn't historically have much of a programming model at all, other than trusting in the compiler.&amp;nbsp; The fact that compilers make the instructions directly available via "intrinsics" (only one step removed from inline assembler code) already indicates that coders had no other satisfactory interface to these instructions, yet didn't trust the compiler to vectorize.&amp;nbsp; Experts like &lt;a href="https://hpcrd.lbl.gov/%7Eswwilliams/"&gt;Sam Williams&lt;/a&gt;, a master performance tuner of computational kernels like sparse matrix-vector multiply and stencils, used the intrinsics to vectorize.&amp;nbsp; This made their codes dependent on the particular instruction set, as well as on details of the instruction set implementation.&amp;nbsp; (For example, older AMD CPUs used a "half-piped" implementation of SIMD instructions on 64-bit floating-point words.&amp;nbsp; This meant that the implementation wasn't parallel, even though the instructions were.&amp;nbsp; Using the x87 scalar instructions instead offered comparable performance, was easier to program, and even offered better accuracy for certain computations, since temporaries are stored with extra precision.)&amp;nbsp; Using SIMD instructions complicated their code in other ways as well.&amp;nbsp; For example, allocated memory has to be aligned to a particular size, such as 128 bits. That bit value depends on the SIMD vector width, which again decreases portability.&amp;nbsp; SIMD instruction widths increase over time, and continue to do so.&amp;nbsp; Furthermore, these codes are brittle, because feeding nonaligned memory to SIMD instructions often results in errors that can crash one's program.&lt;br /&gt;&lt;br /&gt;CPU vendors are finally starting to think about programming models that make it easier to exploit SIMD instructions.&amp;nbsp; OpenCL, while as low-level a model as CUDA, also lets programmers reason about vector instructions in a less hardware-dependent way.&amp;nbsp; One of the most promising programming models is Intel's &lt;a href="http://www.drdobbs.com/high-performance-computing/227300084"&gt;Array Building Blocks&lt;/a&gt;, featured in a recent Dr. Dobb's Journal article.&amp;nbsp; I'm excited about Array Building Blocks for the following reasons:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;It includes a memory model with a segregated memory space.&amp;nbsp; This can cover all kinds of complicated hardware details (like alignment and NUMA affinity).&amp;nbsp; It's also validation for the work being done in libraries like &lt;a href="http://trilinos.sandia.gov/"&gt;Trilinos'&lt;/a&gt; Kokkos to hide array memory management from the programmer ("you don't get a pointer"), thus freeing the library to place and manage memory as it sees fit.&amp;nbsp; All of this will help future-proof code from the details of allocation, and also make code safer (you don't get the pointer, so you can't give nonaligned memory to instructions that want aligned memory, or give GPU device memory to a CPU host routine).&lt;/li&gt;&lt;li&gt;It's a programming language (embedded in C++, with a run time compilation model) -- which will expose mainstream programmers to the ideas of embedded special-purpose programming languages, and run time code generation and compilation.&amp;nbsp; Scientific computing folks tend to be conservative about generating and compiling code at run time, in part because the systems on which they have to run often don't support it or only support it weakly.&amp;nbsp; If Array Building Blocks gives the promised performance benefits and helps "future-proof" code, HPC system vendors will be driven to support these features.&lt;/li&gt;&lt;li&gt;Its parallel operators are deterministic; programmers won't have to reason about shared-memory nightmares like race conditions or memory models.&amp;nbsp; (Even writing a shared-memory barrier is a challenging task, and the best-performing barrier implementation depends on the application as well as the definition of "performance" (speed or energy?).)&lt;/li&gt;&lt;/ol&gt;I've been pleased with Intel's work on &lt;a href="http://www.threadingbuildingblocks.org/"&gt;Threading Building Blocks&lt;/a&gt;, and I'm happy to see them so committed to high-level, usable, forward-looking programming models.&amp;nbsp; NVIDIA's &lt;a href="https://code.google.com/p/thrust/"&gt;Thrust library&lt;/a&gt; is another great project along these lines.&amp;nbsp; It's also great to see both companies releasing their libraries as open source.&amp;nbsp; The "buy our hardware and we will make programming it easier" business model is great for us coders, great for doing science (open source means we know what the code is doing), and great for performance (we can improve the code if we want).&amp;nbsp; I'm hopeful that the competition between CPU and GPU vendors to woo coders will improve programming models for all platforms.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-9075428031100200376?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/9075428031100200376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=9075428031100200376' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/9075428031100200376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/9075428031100200376'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/09/gpus-and-cpus-part-2-programming-models.html' title='GPUs and CPUs, part 2: programming models'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-118774732785355342</id><published>2010-09-04T23:47:00.000-07:00</published><updated>2010-09-04T23:47:56.047-07:00</updated><title type='text'>Programming GPUs makes us better CPU programmers</title><content type='html'>High-performance computing blogs like "horse-race" journalism, especially when covering competing architectures like CPUs and GPUs.&amp;nbsp; One commonly hears of 20x or even larger speedups when porting a CPU code to a GPU.&amp;nbsp; Recently someone pointed out to me, though, that one rarely hears of the "reverse port": taking an optimized GPU code, and porting it back to the CPU.&amp;nbsp; Many GPU optimizations relate to memory access, and many scientific codes spend a lot of time reading and writing memory.&amp;nbsp; For example, coalescing loads and stores (for consecutive threads on a GPU warp) corresponds roughly to aligned, contiguous memory accesses on a CPU.&amp;nbsp; These are more friendly to cache lines, and are also amenable to optimizations like prefetching or using wide aligned load and store instructions.&amp;nbsp; CPUs are getting wider vector pipes too, which will increase the branch penalty.&amp;nbsp; From what I've heard, taking that GPU-optimized code and porting it back to the CPU might result in only a 2x slowdown over the GPU.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;I don't see this as bad, though.&amp;nbsp; First, GPUs are useful because the hardware forces programmers to think about performance.&amp;nbsp; After investing in learning a new programming language (such as CUDA or OpenCL) or at least a new library (such as Thrust), and after investing in new hardware and in supporting GPU runtimes in their applications, coders are obligated to get return on that investment.&amp;nbsp; Thus, they take the time to learn how to optimize their codes.&amp;nbsp; GPU vendors help a lot with this, and expose performance-related aspects of their architecture, helping programmers find exploitation points.&amp;nbsp; Second, optimizing for the GPU covers the future possibility that CPUs and GPUs will converge.&amp;nbsp; Finally, programmers' dissatisfaction with certain aspects of GPU hardware may cause divergence rather than convergence of GPUs and CPUs, or even the popularization of entirely new architectures.&amp;nbsp; (How about a Cray XMT on an accelerator board in your workstation, for highly irregular data processing?)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-118774732785355342?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/118774732785355342/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=118774732785355342' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/118774732785355342'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/118774732785355342'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/09/programming-gpus-makes-us-better-cpu.html' title='Programming GPUs makes us better CPU programmers'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-7873570722757137147</id><published>2010-08-25T12:52:00.000-07:00</published><updated>2010-08-25T12:52:08.208-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='templates'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='metaprogramming'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>"Archetype by prototype": a suggestion for making compile-time polymorphism more usable</title><content type='html'>A &lt;a href="http://parasail-programming-language.blogspot.com/2010/08/ad-hoc-interface-matching-in-parasail.html"&gt;recent post on interface matching&lt;/a&gt; in ParaSail (a new parallel programming language under development) inspired me to think about compile-time polymorphism in languages like C++.&amp;nbsp; The author of the article observed that "C++ templates define essentially no 'contract' so there is no easy way to find out what sort of type will be acceptable without trying the instantiation."&amp;nbsp;&lt;br /&gt;&lt;br /&gt;This stirred me to think about what C++ programmers can (and do) do to get around this.&amp;nbsp; A standard technique is to use "concept checks."&amp;nbsp; These test the required syntax by, well, trying the instantiation ;-) in a way that makes it easy to catch syntax errors.&amp;nbsp; They can be tedious, because doing them right requires two things: &lt;br /&gt;&lt;br /&gt;1. Concept checking code, that verifies the interface statically and as simply as possible (so that compiler error messages are easier to read)&lt;br /&gt;&lt;br /&gt;2. An "archetype class": like a Java interface, except with (skeletal) implementations of the required interface&lt;br /&gt;&lt;br /&gt;In the above, i'm using the terminology of the &lt;a href="http://www.boost.org/doc/libs/1_44_0/libs/concept_check/concept_check.htm"&gt;Boost Concept Check Library&lt;/a&gt;.&amp;nbsp; #1 alone adds to the code maintenance burden, since as the "contract" evolves, you have to synchronize the checks with how you are actually using the object.&amp;nbsp; #2 alone doesn't ensure synchronization of your idea of the contract, with the classes you think are implementing it.&amp;nbsp; Nevertheless, there are still four separate bodies of code to synchronize: the archetype class, the concept checking code, the application class(es), and the application code that invokes these class(es).&amp;nbsp; The most straightforward (but not easiest) way to avoid this burden would be to&lt;br /&gt;&lt;br /&gt;1. Write an archetype class, and make the compiler check the concept, using application code&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I call this approach "declared archetypes."&amp;nbsp; It calls for a syntax for declaring that a particular concrete data type implements an archetype's interface.&amp;nbsp; This approach has implications for avoiding error-prone tedium.&amp;nbsp; A common case for templates is generic code for different kinds of numbers.&amp;nbsp; I've written a lot of code templated on "Scalar", which could be anything representing a real or complex number, and Ordinal, which represents an array index type.&amp;nbsp; It would be tedious to declare an archetype supporting all the different kinds of things one might like to do with a Scalar, for example.&amp;nbsp; It would at least have to include all of arithmetic, and all kinds of transcendental functions (such as trigonometric functions and logarithms).&amp;nbsp; That would be an awful lot of effort, especially since most people are only going to instantiate Scalar with "float" or "double".&amp;nbsp; (A few unlucky folks will make Scalar a complex number and discover all the places where the supposedly generic code was using less-than on Scalar!)&lt;br /&gt;&lt;br /&gt;To avoid this tedium, the &lt;a href="http://www.generic-programming.org/languages/conceptcpp/specification/"&gt;"concepts" proposal considered for addition to the C++ standard&lt;/a&gt; (but rejected) offered a shorthand for "aggregate archetypes."&amp;nbsp; You could say that Scalar is "FloatingPointLike" or that Ordinal is "SignedIntegralLike", and that would hopefully bring in the syntax that you want.&amp;nbsp; You could also "inherit" from archetypes to create your own.&lt;br /&gt;&lt;br /&gt;The trouble with this approach is that somebody has to write a huge library of arbitrarily named predefined archetypes.&amp;nbsp; Huge, because pretty much anything you might want to do with a "plain old data" type has to have its own concept.&amp;nbsp; Arbitrarily named, because somebody has to decide what "ArithmeticLike" means.&amp;nbsp; (Does it mean integers or real numbers?)&amp;nbsp; This path seems to call for abstract algebra, where you have semigroups and rings and fields of a certain characteristic, and all kinds of things that nonmathematicians don't want to understand.&amp;nbsp; (I'm convinced the main reason why people don't use Haskell more is because &lt;a href="http://www.haskell.org/tutorial/monads.html"&gt;it's so hard to explain what a monad is&lt;/a&gt;, and why you want to know.)&amp;nbsp; &lt;br /&gt;&lt;br /&gt;This is overkill because in most cases, programmers can accomplish their work without so much formality.&amp;nbsp; The &lt;a href="http://parasail-programming-language.blogspot.com/2010/08/ad-hoc-interface-matching-in-parasail.html"&gt;ParaSail blog post&lt;/a&gt; alludes to the reason why:&amp;nbsp; "... when ad hoc matching is used, there is a presumption that the module formal (or target type) is very stable and new operations are not expected to be added to it."&amp;nbsp; The typical use case of C++ templates (for example) is for things that "look like double" or "look like int."&amp;nbsp; That suggests a second approach:&lt;br /&gt;&lt;br /&gt;2. "Archetype by prototype" or "looks like type T"&lt;br /&gt;&lt;br /&gt;If T is something simple like "double" or "int", then you save a lot of syntax and / or library writing.&amp;nbsp; If T is complicated, this forces you to write an archetype class, which is probably a good thing if T is complicated!&amp;nbsp; For the numerical linear algebra algorithms i write and maintain, this would help remind me whether an algorithm (that claims to be generic on a Scalar type) can handle complex numbers.&amp;nbsp; (Complex arithmetic changes linear algebra in subtle ways that don't only have to do with syntax.)&amp;nbsp; &lt;br /&gt;&lt;br /&gt;This approach does not require special syntax for defining archetypes.&amp;nbsp; However, it might still be nice to have such a syntax, and also to have a small library of archetype classes.&amp;nbsp; I could see this being useful for iterators, for example.&lt;br /&gt;&lt;br /&gt;"Archetype by prototype" would impose requirements "lazily," much like users of C++ template metaprogramming expect.&amp;nbsp; The "actual archetype" would consist only of those operations with the concrete datatype that are actually written in code.&amp;nbsp; For example, if you never ask for the cosine of a Scalar, you don't need to have an implementation of cosine that takes a Scalar argument.&amp;nbsp; It would be nice to have some development environment support for "extracting" the "actual archetype."&amp;nbsp; C++ compilers do some of this already in their error messages, if you have the patience to read them (after about a page of compiler backtrace, you get something like "no implementation of cos(Scalar) exists for Scalar = YourType").&lt;br /&gt;&lt;br /&gt;In summary, i propose "archetype by prototype" as an alternative to "concepts" for making compile-time polymorphism more usable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-7873570722757137147?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/7873570722757137147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=7873570722757137147' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7873570722757137147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7873570722757137147'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/08/archetype-by-prototype-suggestion-for.html' title='&quot;Archetype by prototype&quot;: a suggestion for making compile-time polymorphism more usable'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-2253069342634346139</id><published>2010-06-24T21:40:00.000-07:00</published><updated>2010-06-24T21:40:48.220-07:00</updated><title type='text'>"The limits of my language define the limits of my world"</title><content type='html'>&lt;a href="http://blogs.sun.com/jrose/"&gt;John Rose&lt;/a&gt;, speaking of programming languages, quotes the &lt;i&gt;Logisch-philosophische Abhandlung&lt;/i&gt; of&lt;br /&gt; &lt;a href="http://people.umass.edu/phil335-klement-2/tlp/bodygerman.html#p5.6"&gt;Ludwig Wittgenstein&lt;/a&gt;:&amp;nbsp; "The limits of my language define the limits of my world" (Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt).&amp;nbsp; John nicely links to the German version of Wittgenstein's work, which gives me an lets me practice my rusty German!&amp;nbsp;&lt;br /&gt;&lt;br /&gt;The phrase recalled a question a coworker asked me today:&amp;nbsp; "If I could choose the programming language I use for my work, what would it be?"&amp;nbsp; I answered that it depends on the work I want to do:&amp;nbsp; if I need to have a dialogue with the operating system and with low-level hardware details, C(++) is the tool of choice (mainly for ill); for numerical computations, some midway point between (modern) Fortran and Matlab would be nice.&amp;nbsp; C is more expressive than Fortran, I went on to say.&amp;nbsp; "Doesn't that mean C is better than Fortran?" he asked?&amp;nbsp; No, it doesn't mean that.&amp;nbsp; Fortran's restrictiveness (esp. its rules about pointer declarations and aliasing) make it easier for compilers to generate efficient code.&amp;nbsp; C's expressiveness makes it easier to control the computer at a low level.&lt;br /&gt;&lt;br /&gt;Wittgenstein's observation applies here:&amp;nbsp; in C, everything looks like a pointer.&amp;nbsp; In Fortran, everything looks like an array (C doesn't have real multidimensional arrays, remember!).&amp;nbsp; In Smalltalk, everything looks like an object.&amp;nbsp; Lisp programmers fear writing domain-specific languages much less than Java programmers.&amp;nbsp; SISAL programmers fear returning an (apparently) freshly created array much less than C programmers.&amp;nbsp; SQL programmers fear complicated searches over tables of data much less than programmers of other languages, etc.&amp;nbsp; By choosing the language, I choose the way in which I attack programming problems.&lt;br /&gt;&lt;br /&gt;If the language suggests a programming model (or a small set of them), then if a new programming model suggests itself, the rational response is to create a new language or modify an existing one in a way that naturally leads the programmer to work within the model.&amp;nbsp; Programming models ultimately reduce to our assumptions about computer hardware.&amp;nbsp; Where are these assumptions going?&amp;nbsp; &lt;a href="http://blog.ksplice.com/2010/06/attack-of-the-cosmic-rays/"&gt;It's not quite a Turing machine anymore.&lt;/a&gt;&amp;nbsp; We have to extend our hardware assumptions to include unreliability:&amp;nbsp; segfaults, kernel panics, nodes of a cluster going down and up, the occasional bad bit in memory or on disk.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;"Cloud" programming models ("everything is a reduction pass from disk to CPU and back to disk again") already handle nodes crashing to some extent, but it doesn't look like programming models for handling unexpected bit flips have gotten past the discussion phase.&amp;nbsp; In part this is because the hardware hasn't presented us with a programming model other than "compute and pray": even fixed addresses in read-only code pages aren't invulnerable, as the above link explains.&amp;nbsp; It's fair to assume, though, that the hardware will let us distinguish between "accurate" and "inaccurate" data or computations, just like it lets us distinguish between IEEE 754 float and IEEE 754 double.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;This distinction between accurate and possibly inaccurate computations and data will enter the programming model.&amp;nbsp; Languages have type systems for that sort of thing; it doesn't seem much different than a "const" annotation in C++, or a "synchronized" annotation in Java.&amp;nbsp; This attribute needs to be orthogonal to the actual datatype; the typical "oh well, floating-point numbers are inexact anyway" viewpoint will lead to disaster in an LAPACK LWORK query, for example.&amp;nbsp; The programming model will have to include promotions and demotions; these will have costs, of which the programmer must be made aware (just like with Java's "synchronized").&lt;br /&gt;&lt;br /&gt;What are the implications of such a model?&amp;nbsp; First, programmers will have yet another type annotation to learn.&amp;nbsp; Just like with C's "volatile," it will lead to endless confusion, especially as hardware evolves.&amp;nbsp; Compilers can help with type deduction, but programmers will be tempted to use the "inaccurate" annotation as a magic "go faster" switch.&amp;nbsp; If computer science students can't wrap their heads around tail recursion in Scheme, how can we expect them to understand yet another low-level annotation?&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Second, programmers' view of the hardware memory space will grow even more fragmented.&amp;nbsp; We already have "GPU memory" and "CPU memory" (for OpenCL and CUDA programmers), "local" and "remote" memory (for those working in a Partitioned Global Address Space (PGAS) model), caches and local stores and "shared memory" (which is really a cache) and vector registers and...&amp;nbsp; In the future, we may have this times two: for each, an "accurate but slow" and an "inaccurate but fast" version.&amp;nbsp; Going from one to the other will require copying -- which may be implicit (as in a PGAS-like model), but still costs something.&amp;nbsp; Relaxing bit accuracy is a hardware performance optimization and so if you don't want to declare everything "accurate," then you likely are worried about the costs of copying things around.&amp;nbsp; This means you will have to reason about all of those fragmented memory spaces.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Given this likely state of hardware, I can rephrase my colleague's question: "What programming model would I like to use if the hardware behaves in this way?"&amp;nbsp; First, the PGAS model seems most natural: it lets me reason about different memory spaces if I need to, or program as if there is only one memory space (at potentially higher cost) if it helps me get my work done.&amp;nbsp;&amp;nbsp;Second, the model should help programmers distinguish between "data" and "metadata."&amp;nbsp; Metadata (indices, permutations, pointers) should only be bit accurate. &amp;nbsp;(That doesn't mean integers always have to be! &amp;nbsp;Integer types get used a lot for signal processing and other bit-oriented domains.) &amp;nbsp;Third, programmers should have to handle metadata as little as possible. &amp;nbsp;The less metadata you handle explicitly, the less likely you are to "pollute" it with inexactness, and the more freedom the compiler has to perform optimizations on the metadata. &amp;nbsp;That means the language should support all kinds of collections. &amp;nbsp;Operations on collections should be &lt;i&gt;fast&lt;/i&gt;, otherwise programmers won't want to use them. &amp;nbsp;Fourth, by extension of the third point, the language should prefer sequence operations ("map f over the array v") that let programmers avoid naming variables that don't really matter,&amp;nbsp;such as index variables and intermediate quantities in loops. &amp;nbsp;(If you name it, the compiler has to keep it around unless it is clever enough to reason it away. &amp;nbsp;Storage necessitates annotation of its accuracy, which necessitates conservativism in optimizations, in the same way that C aliasing rules force the C compiler to be conservative.)&lt;br /&gt;&lt;br /&gt;This post rambles, but I'm thinking out loud, and I'm curious to hear what others have to say about programming models for this new world of unexpected flipping bits. &amp;nbsp;The one thing that encourages me is the intelligence of those thinking about the problem. &amp;nbsp;As Shakespeare has Miranda say in &lt;i&gt;The Tempest&lt;/i&gt;, "How beauteous mankind is! O brave new world, That has such people in't!"&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-2253069342634346139?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/2253069342634346139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=2253069342634346139' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2253069342634346139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2253069342634346139'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/06/limits-of-my-language-define-limits-of.html' title='&quot;The limits of my language define the limits of my world&quot;'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-2250232333044241141</id><published>2010-05-23T08:56:00.000-07:00</published><updated>2010-05-23T08:57:52.784-07:00</updated><title type='text'>Post to deter spammers</title><content type='html'>I need to post something here to deter spammers, since they seem to be targeting idle blogs.  I'll think of something Real Soon Now (tm) to post about generic APIs for parallelism, or something like that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-2250232333044241141?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/2250232333044241141/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=2250232333044241141' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2250232333044241141'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2250232333044241141'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2010/05/post-to-deter-spammers.html' title='Post to deter spammers'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-5460361187581942858</id><published>2009-11-28T10:26:00.001-08:00</published><updated>2009-11-28T10:26:30.686-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='numerical analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='history'/><category scheme='http://www.blogger.com/atom/ns#' term='math'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific computing'/><title type='text'>James Hardy Wilkinson, FRS</title><content type='html'>I love &lt;a href="http://www.maths.manchester.ac.uk/%7Ehigham/photos/wilkinson/wilkinson_021.htm"&gt;this photo&lt;/a&gt; of &lt;a href="http://www-history.mcs.st-andrews.ac.uk/Biographies/Wilkinson.html"&gt;James Hardy Wilkinson&lt;/a&gt;,the great numerical analyst and computer scientist.&amp;nbsp; Something not socommonly known is that Wilkinson's ability to get along with &lt;a href="http://www-history.mcs.st-andrews.ac.uk/Biographies/Turing.html"&gt;Alan Turing&lt;/a&gt; helped bring the Automatic Computing Engine, which was designed by Turing, to fruition.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-5460361187581942858?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/5460361187581942858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=5460361187581942858' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/5460361187581942858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/5460361187581942858'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/11/james-hardy-wilkinson-frs.html' title='James Hardy Wilkinson, FRS'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3137273703290135820</id><published>2009-07-21T22:43:00.001-07:00</published><updated>2009-07-21T22:55:46.793-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='code'/><title type='text'>C is too high level</title><content type='html'>I propose that C is too high level of a language for the purposes for which it's used.&amp;nbsp; While it purports to give programmers power over memory layout -- in particular, over heap (via malloc) and stack (via alloca) allocation -- it gives them no control, and does not allow them to describe, how function arguments or struct fields are laid out in memory.&amp;nbsp; Knowing how struct fields are laid out means you can use structs from languages other than C, in a portable way.&amp;nbsp; (Some foreign function interfaces, such as ANSI Common Lisp's CFFI, refuse to allow passing structs by value for this reason.)&amp;nbsp; You can know exactly how much memory to allocate, and bound more tightly from above how much stack space a particular C function call requires.&lt;br /&gt;&lt;br /&gt;Given the state of C as it is, what I would like is a domain-specific language for describing binary interfaces, such as struct layout or function call signatures.&amp;nbsp; I would like C compilers to support standard annotations that guarantee particular layouts.&amp;nbsp; Currently this is done in an ad hoc, compiler-specific way -- sometimes by command-line flags ("pack the structs") and sometimes by pragmas or annotations.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;The main reason I want standard compiler support for such a minilanguage is for interoperability between C and other languages.&amp;nbsp; I have the misfortune of needing to call into a lot of C libraries from my code, but I don't want to be stuck writing C or C++ (The Language Which Shall Not Be Parsed).&amp;nbsp; Nevertheless, I don't want to tie any other users of my code to a particular C compiler (if the code were just for me, it wouldn't matter so much).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3137273703290135820?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3137273703290135820/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3137273703290135820' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3137273703290135820'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3137273703290135820'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/07/blog-post.html' title='C is too high level'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-7529572119560091518</id><published>2009-07-01T10:50:00.000-07:00</published><updated>2009-07-01T10:50:10.898-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HPC'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>Python win: csv, sqlite3, subprocess, signal</title><content type='html'>I've been working these past few months (ugh, months...) on a benchmark-quality implementation of some new parallel shared-memory algorithms.&amp;nbsp; It's a messy, tempermental code that on occasion randomly hangs, but when it works, it often outperforms the competition.&amp;nbsp; Successful output consists of rows of space-delimited data in a plain text format, written to standard output.&lt;br /&gt;&lt;br /&gt;I spent a week or so on writing a script driver and output data processor for the benchmark.&amp;nbsp; The effort was both minimal, and paid off handsomely, thanks to some handy Python packages: csv, sqlite3, subprocess, and signal.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;The csv ("comma-separated values") package, despite its name, can handle all kinds of text data delimited by some kind of separating character; it reads in the benchmark output with no troubles.&amp;nbsp; sqlite3 is a Python binding to the &lt;a href="http://www.sqlite.org/"&gt;SQLite&lt;/a&gt; library, which is a lightweight database that supports a subset of SQL queries.&amp;nbsp; SQL's SELECT statement can replace pages and pages of potentially bug-ridden loops with a single line of code.&amp;nbsp; I use a cute trick:&amp;nbsp; I read in benchmark output using CSV, and create an SQLite database &lt;i&gt;in memory&lt;/i&gt; (so I don't have to worry about keeping files around).&amp;nbsp; Then, I can issue pretty much arbitrary SQL queries in my script.&amp;nbsp; Since I only use one script, which takes command-line arguments to decide whether to run benchmarks or process the results, I don't have to maintain two different scripts if I decide to change the benchmark output format.&lt;br /&gt;&lt;br /&gt;The subprocess and signal packages work together to help me deal with the benchmark's occasional flakiness.&amp;nbsp; The latter is a wrapper around the POSIX signalling facility, which lets me set a kind of alarm clock in the Python process.&amp;nbsp; If I don't stop the alarm clock early, it "goes off" by sending Python a signal, interrupting whatever it might be doing at the time.&amp;nbsp; The subprocess package lets me start an instance of the benchmark process and block until it returns.&amp;nbsp; "Block until it returns" means that Python doesn't steal my benchmark's cycles in a busy loop, and the alarm means I can time out the benchmark if it hangs (which it does sometimes).&amp;nbsp; This means I don't burn through valuable processing time if I'm benchmarking on a machine with a batch queue.&lt;br /&gt;&lt;br /&gt;I wish the benchmark itself were as easy to write as the driver script was!&amp;nbsp; I've certainly found it immensely productive to use a non-barbaric language, with all kinds of useful libraries, to implement benchmarking driver logic and data processing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-7529572119560091518?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/7529572119560091518/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=7529572119560091518' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7529572119560091518'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7529572119560091518'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/07/python-win-csv-sqlite3-subprocess.html' title='Python win: csv, sqlite3, subprocess, signal'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6387653264693230401</id><published>2009-06-22T12:58:00.000-07:00</published><updated>2009-06-22T12:58:27.657-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HPC'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><title type='text'>On metadata, shared memory, NUMA, and caches</title><content type='html'>Many modern shared-memory multiprocessors improve memory locality by providing each socket (group of processors) in the machine with its own chunk of memory.&amp;nbsp; Each socket can access all the chunks of memory, but accesses to its own chunk are faster (in terms of latency, at least).&amp;nbsp; This design decision is called "nonuniform memory access" (NUMA).&lt;br /&gt;&lt;br /&gt;Programming a NUMA machine for performance in scientific kernels looks a lot like programming a distributed-memory machine:&amp;nbsp; one must "distribute" the data among the chunks so as to improve locality.&amp;nbsp; There is a difference, however, in the way one deals with metadata -- the data describing the layout of the data.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;In the distributed-memory world, one generally tries to distribute the metadata, because fetching it from another node takes a long time.&amp;nbsp; This means that memory usage increases with the number of processors, regardless of the actual problem size.&amp;nbsp; It also means that programmers have to think harder and debug longer, as distributing metadata correctly (and avoiding bottlenecks) is hard.&amp;nbsp; (Imagine, for example, implementing a reduction tree of arbitrary topology on arbitrary subsets of processors, when you aren't allowed to store a list of all the processors on any one node.)&lt;br /&gt;&lt;br /&gt;In the NUMA world, one need not replicate the metadata.&amp;nbsp; Of course one _may_, but metadata access latency is small enough that it's practical to keep one copy of the metadata for all the processors.&amp;nbsp; This is where caches come in:&amp;nbsp; if the metadata is read-only, caches can replicate the relevant parts of the metadata for free (in terms of programming effort).&amp;nbsp; Of course, it would be nice also to have a non-coherent "local store," accessible via direct memory transfers, for the actual data.&amp;nbsp; Once you have the metadata, it's often easy to know what data you want, so it's easier to manage the explicit memory transfers to and from the local store.&amp;nbsp; However, if you &lt;i&gt;only&lt;/i&gt; have a local store, you have to store metadata there, and managing that reduces to the distributed-memory case (except more painful, since you have much less memory capacity).&lt;br /&gt;&lt;br /&gt;Memory capacity is really the most important difference between the shared-memory NUMA and distributed-memory worlds.&amp;nbsp; Our experience is that on a single shared-memory node, memory capacity cannot scale with the number of processors.&amp;nbsp; This is because DRAM uses too much power.&amp;nbsp; The same power concerns hold, of course, for the distributed-memory world.&amp;nbsp; However, an important use of distributed-memory machines is for their scale-out capacity to solve very large problems, so their architects budget for those power requirements.&amp;nbsp; Architects of single-node shared-memory systems don't have that freedom to expand memory capacity.&amp;nbsp; This, along with the lower latencies between sockets within a single node, makes sharing metadata in the shared-memory world attractive.&lt;br /&gt;&lt;br /&gt;It's often easy to delimit sections of a computation in which metadata is only read and never modified.&amp;nbsp; This suggests an architectural feature:&amp;nbsp; using cache coherence to fetch metadata once, but tagging the metadata in cache as read-only, or having a "read-only" section of cache.&amp;nbsp; I'm not a hardware person so I don't know if this is worthwhile, but it's a way to extract locality out of semantic information.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6387653264693230401?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6387653264693230401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6387653264693230401' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6387653264693230401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6387653264693230401'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/06/on-metadata-shared-memory-numa-and.html' title='On metadata, shared memory, NUMA, and caches'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-7559660877956497024</id><published>2009-06-13T18:57:00.000-07:00</published><updated>2009-06-13T18:57:24.242-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HPC'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific computing'/><title type='text'>Productivity language for performance-oriented programmers</title><content type='html'>Our computer science department last semester offered a graduate seminar course on parallel productivity languages -- that is, programming languages in which it's easy to write parallel code whose performance scales well, even if single-processor performance is not so good (e.g., because the language is interpreted or there is a lot of runtime error checking).&amp;nbsp; Implicit in the word "productivity" is that the target users need not be expert low-level programmers.&amp;nbsp;&amp;nbsp;&lt;br /&gt;&lt;br /&gt;According to a professor who helped lead the course, however, most of the class projects ended up being "productivity languages for efficiency-layer programmers."&amp;nbsp; That is, the target users were not non-computer-scientist domain experts, but experienced low-level programmers who want to boost their productivity when optimizing a particular computational kernel.&amp;nbsp; One example was a domain-specific language (DSL) for stencil operations, along with a compiler and autotuner.&amp;nbsp; Users write the stencil operation in a familiar notation, and the system transforms it at runtime (the source language is dynamic) into highly optimized code which performs just about as well as if it were optimized by hand.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;The professor's observation resonated, because I'm currently near the end of what turned out to be a long, painful implementation and debugging project.&amp;nbsp; It's a benchmark implementation of a new solver, which itself contains at least three different computational kernels.&amp;nbsp; The project has taught me hard lessons about what bugs consume the most time.&amp;nbsp; What I've found is that working in a language "close to the metal" is not the problem.&amp;nbsp; I'm not afraid to optimize for hardware or operating system features, to allocate and deallocate chunks of memory, or to impose matrix semantics on chunks of raw memory.&amp;nbsp; My frustrations usually come from silly human errors, that result in inscrutable errors which debuggers like gdb and memory analysis tools like Valgrind cannot diagnose.&amp;nbsp; Furthermore, sometimes bugs only manifest with very large problems, that make losing a factor of 10-100x performance to an analysis tool (like Valgrind, which is an amazing tool and perhaps one of the best optimized out there) impractical (or at least annoying).&lt;br /&gt;&lt;br /&gt;One source of human errors is conversion between units.&amp;nbsp; The classic example is the &lt;a href="http://www.cnn.com/TECH/space/9909/30/mars.metric/"&gt;loss of the Mars Climate Orbiter&lt;/a&gt; in 1999 due to a failure to convert between metric and English units.&amp;nbsp; However, units, or a related kind of semantic information, often come up in pure programming tasks that have little to do with physical measurements.&amp;nbsp; For example, my current project involves a lot of C code that iterates over one kind of data structure, and copies it into a new kind of data structure.&amp;nbsp; A lot of indices fly around, and I usually have to give them suggestive names like "index_old" or "index_new" to indicate whether they "belong" to the old or new data structure.&amp;nbsp; Naming conventions aren't enough to prevent a mismatch, though, and a mistake (like addressing a "new" array with an "old" index) may result in a segfault, smashing the stack, intermittent semantic errors that depend on runtime parameters, or no problems at all.&amp;nbsp; Here, "old" and "new" are a kind of "unit" or semantic restriction:&amp;nbsp; for example, "this index can only be used to address this array, not any other."&amp;nbsp; This kind of language feature seems like it should result in little or no runtime performance hit in many cases.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Another source of human errors is resource scope:&amp;nbsp; who initializes what, when, and who de-initializes it, when.&amp;nbsp; Many HPC codes require careful control of memory allocation and deallocation that often cannot be left to a garbage collector, either because the amounts of memory involved are huge, or because the allocation method is specialized (e.g., optimizations for nonuniform memory access (NUMA) machines).&amp;nbsp; Furthermore, garbage collection doesn't solve all resource management problems:&amp;nbsp; files, pipes, sockets, and the like are often shared by multiple threads.&amp;nbsp; Many languages, such as ANSI Common Lisp and the concurrent JVM-based language &lt;a href="http://clojure.org/"&gt;Clojure&lt;/a&gt;, have a special syntax for managing the scope of objects like files that need to be opened and closed, regardless of whether the operations done during the lifetime of the opened file succeed.&amp;nbsp; Common Lisp has a convention to name such scope management macros "WITH-*", where one replaces the * with the kind of object being managed.&amp;nbsp; Of course, a syntax for scope management assumes that one can statically determine the scope of the object, but many situations and kinds of object fit that pattern. &lt;br /&gt;&lt;br /&gt;My current project involves a shared-memory parallel solver written in an SPMD programming model.&amp;nbsp; There's a lot of shared state, and it gets initialized at different times:&amp;nbsp; either before the compute threads are created, during the scope of the threads, or after the threads complete.&amp;nbsp; Sometimes, threads cooperate to initialize an object, and sometimes they let just one compute thread do the work.&amp;nbsp; In general, it's hard for compilers to tell when different threads might be accessing a shared object.&amp;nbsp; However, my choice of the SPMD programming model restricts the implementationenough that the compiler can easily identify whether something happensbefore or after thread creation or completion. It would be nice to have a syntax or some kind of help from the language to manage the different kinds of shared object creation.&amp;nbsp; The runtime could also help out by forbidding multiple initializations, or by "tagging" a shared object as accessible only by a certain thread.&lt;br /&gt;&lt;br /&gt;A third source of errors has been imposing structure on raw pointers.&amp;nbsp; For example, a memory allocation returns a chunk of contiguously stored memory, but I have to divide up that chunk in place into multiple nonoverlapping matrices.&amp;nbsp; This is something that Fortran (especially Fortran 2003, if implemented correctly, *ahem*) does much better than C:&amp;nbsp; Fortran always imposes array structure on chunks of memory, and if you've done the mapping right, the runtime can identify indexing errors.&amp;nbsp; C only gives you raw pointers, onto which you have to impose structure.&amp;nbsp; If you mess up, only something like Valgrind may be able to help.&amp;nbsp; However, even Valgrind has no access to the semantic information that a particular chunk of memory is supposed to be an M by N matrix with leading dimension LDA.&amp;nbsp; Having an option to do bounds checking, which does not interfere with or affect parallel performance, but which can be turned off (at build time even) to improve performance on one thread, would be a big help.&lt;br /&gt;&lt;br /&gt;When I see new languages proposed for scientific computing, often they have lots of new features -- nested parallelism, data structures suited to scientific things like irregular discretizations, or a full library of parallel primitives. &lt;br /&gt; What I &lt;i&gt;don't&lt;/i&gt; see, however, is a new low-level language which can be used to &lt;i&gt;implement&lt;/i&gt; those new higher-level languages, or to implement highly optimized library routines.&amp;nbsp; A good low-level language is also important for bridging the gap between &lt;i&gt;declarative&lt;/i&gt; optimization approaches (i.e., identifying features of the data or algorithm that enable optimizations without requiring them), and &lt;i&gt;imperative&lt;/i&gt; approaches (mandating which optimization or feature to use, so that one can predict and measure performance).&amp;nbsp; Predictable performance makes optimization a science rather than a collection of incantations.&amp;nbsp; Furthermore, if a good low-level imperative language is used as a compilation target of a declarative language, then one can examine the generated code and figure out what the compiler is doing -- or even introspect and alter the implementation strategy at runtime.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-7559660877956497024?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/7559660877956497024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=7559660877956497024' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7559660877956497024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/7559660877956497024'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/06/productivity-language-for-performance.html' title='Productivity language for performance-oriented programmers'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6569375938315233447</id><published>2009-06-10T16:35:00.000-07:00</published><updated>2009-06-10T16:35:10.918-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>C++:  Worlds of Pain</title><content type='html'>&lt;br /&gt;&lt;a href="http://blogs.msdn.com/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx"&gt;MS's efforts&lt;/a&gt; to implement the new C++0x standard in their compiler remind me of the world of pain that is C++ syntax.&amp;nbsp; &lt;a href="http://james-iry.blogspot.com/2009/05/brief-incomplete-and-mostly-wrong.html"&gt;James Iry&lt;/a&gt; jokes that the "language is so complex that programs must be sent to thefuture to be compiled by the Skynet artificial intelligence," but of course programmers do a kind of compilation in their head too, in order to understand and work with the language.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;My experience with C++ was typified by looking at Boost's graph library and wondering what the developers were smoking and whether or not I wanted to try some.&amp;nbsp; I took C++ reasonably far in my professional work; I used the STL containers extensively and even dipped into some of the generic algorithms (even with Visual Studio 6's awful C++ templates support, some of them were still useful).&amp;nbsp; However, there were too many ad hoc - seeming rules and conventions and ways to shoot myself in the foot, and name mangling was always an annoyance for me.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6569375938315233447?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6569375938315233447/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6569375938315233447' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6569375938315233447'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6569375938315233447'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/06/c-worlds-of-pain.html' title='C++:  Worlds of Pain'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-4792725245534291162</id><published>2009-04-20T00:21:00.000-07:00</published><updated>2009-04-20T00:21:00.967-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='C'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='Fortran'/><title type='text'>C and Fortran: the right balance</title><content type='html'>I'm a fan of picking the right programming language for the job, even if it means mixing languages in a project.&amp;nbsp; Sometimes this can lead to pain, especially if I'm using relatively new language features that haven't had enough code coverage to be fully debugged.&amp;nbsp; However, getting the right balance between languages can make one's code concise and elegant.&lt;br /&gt;&lt;br /&gt;This weekend, I've been working on rewriting a parallel matrix factorization.&amp;nbsp; I had written most of it in (modern) Fortran, but that made it tricky to interface with C code written by collaborators and others in our research group.&amp;nbsp; However, C's lack of 2-D array notation makes it awkward to write matrix operations.&amp;nbsp; Thankfully the factorization algorithm splits naturally into a series of operations on submatrices.&amp;nbsp; I implemented each of those "local" factorization steps in Fortran, using the ISO_C_BINDING module so that I can call them from C without needing to guess the Fortran compiler's name mangling convention, or needing to make everything a pointer (by default, Fortran calls everything by reference).&amp;nbsp; C code sequences those calls to local factorization steps and does the necessary parallel synchronization.&amp;nbsp; The resulting C is significantly shorter than the Fortran version I had written before, but I still get the benefits of Fortran's concise notation for matrix operations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-4792725245534291162?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/4792725245534291162/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=4792725245534291162' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/4792725245534291162'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/4792725245534291162'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/04/c-and-fortran-right-balance.html' title='C and Fortran: the right balance'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3902511791414078471</id><published>2009-03-10T22:29:00.000-07:00</published><updated>2009-03-10T22:29:23.832-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='Fortran'/><title type='text'>Dark secrets of the art:  C pointers and Fortran pointers</title><content type='html'>&lt;br /&gt;Lately, I've been implementing some new numerical algorithms for benchmarking.&amp;nbsp; Normally I would write such things in C rather than Fortran, out of habit and experience.&amp;nbsp; However, since this code calls Fortran routines (specifically, in the BLAS and LAPACK) a lot, I found it less trouble to write it in Fortran, than to try to manage calling C from Fortran.&amp;nbsp; This is because C and Fortran have different linking conventions, and C has no standard interface for calling into Fortran.&amp;nbsp; Different compilers have different standards about translating Fortran subroutine and function names into linker symbols -- sometimes they add one or more underscores, sometimes not.&amp;nbsp; It's deceptively easy if you only use one compiler, such as gcc; developers that aim to build with multiple compilers often discover the differences by accident, and account for them with ad-hoc symbol translation macros (or paper them over by build systems such as GNU Autoconf, that hide the ad-hockery for you).&amp;nbsp; Fortran subroutines and functions also have no idea of "pass by value," so when calling them from C, you have to keep track of which arguments are inputs and which are "outputs," and make sure to copy the output arguments in and out.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;These factors have annoyed me enough in the past that for this project, I taught myself enough "modern" Fortran to get by.&amp;nbsp; (I define "modern" as Fortran 90 and newer, or generally, anything with free-form input and colon array slice notation.)&amp;nbsp; However, I have users who code exclusively in C.&amp;nbsp; Fortran &amp;gt;= 2003 has what C does not, though:&amp;nbsp; &lt;a href="http://www.intel.com/software/products/compilers/docs/flin/main_for/mergedProjects/bldaps_for/common/bldaps_interopc.htm"&gt;an ISO standard for interoperability with C.&lt;/a&gt;&amp;nbsp; You can give a Fortran routine a C binding, which solves the linker symbol name problem.&amp;nbsp; More importantly, you can get Fortran and C pointers to talk to each other.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Yes, Fortran &amp;gt;= 90 does actually have a pointer type!&amp;nbsp; (It has dynamic memory allocation too, unlike in previous Fortran standards!)&amp;nbsp; Fortran's pointers to arrays are "smarter" than C pointers in some sense:&amp;nbsp; they have a "rank" and a "shape," which means (for instance) that you can compute with a submatrix of a larger matrix, whose elements aren't necessarily contiguous, as if it were just a plain old matrix.&amp;nbsp; The corresponding C code would look much less like linear algebra and much more like the innards of some device driver.&amp;nbsp; The price of Fortran pointers' ease of use is that a Fortran pointer contains more information than just a memory address.&amp;nbsp; You can't just pass a Fortran pointer into C space and expect C to understand it, or vice versa.&amp;nbsp; Fortran 2003 has an ISO_C_BINDING module whose C_F_POINTER subroutine solves this particular problem:&amp;nbsp; it pulls in a C pointer from C space, and associates it to a Fortran pointer along with a rank and shape.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;I'm currently using the ISO_C_BINDING module to build a C interface to my Fortran library.&amp;nbsp; It's easy enough that it makes me wonder why many other languages don't come equipped with a language-level C binding!&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3902511791414078471?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3902511791414078471/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3902511791414078471' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3902511791414078471'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3902511791414078471'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/03/dark-secrets-of-art-c-pointers-and.html' title='Dark secrets of the art:  C pointers and Fortran pointers'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-5310089841027292201</id><published>2009-02-14T23:45:00.000-08:00</published><updated>2009-02-14T23:45:59.424-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nerdy'/><title type='text'>You might be an LAPACK geek if...</title><content type='html'>Here's a quiz modeled after the infamous "purity tests," except that it has nothing to do with goats and everything to do with &lt;a href="http://www.netlib.org/lapack/"&gt;LAPACK&lt;/a&gt;, the most awesome body of Fortran ever created.&amp;nbsp; Score +1 for a yes answer, 0 for a no answer, and add up the points.&lt;br /&gt;&lt;br /&gt;n00bs:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;1. Have you ever called an LAPACK routine from one of your codes?&lt;br /&gt;&lt;br /&gt;2. Do you know what the abbreviation LAPACK stands for?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;3. Do you know how to &lt;i&gt;pronounce&lt;/i&gt; LAPACK?&amp;nbsp; (No, it's not "la" like "lalala"...)&lt;br /&gt;&lt;br /&gt;4. Are you aware that using the reference BLAS implementation is generally a bad idea, and do you know how to seek out a better one?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://c2.com/cgi/wiki?SmugLispWeenie"&gt;"Smug LAPACK weenies"&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;5. Have you ever built the reference LAPACK library from the Fortran, not because you didn't know it was installed but because you (a) wanted the latest version, or (b) it wasn't available on the exotic prototype hardware which you have the misfortune of being asked to use?&lt;br /&gt;&lt;br /&gt;6. Have you ever found and reported an actual bug in some LAPACK implementation?&amp;nbsp; (One extra point if you went to the vendor's booth at a conference specifically to report the bug.)&lt;br /&gt;&lt;br /&gt;7. Did you learn Fortran specifically to call BLAS or LAPACK routines, because it "seems more natural" than using the CBLAS header files?&lt;br /&gt;&lt;br /&gt;8. Do you get the math joke on the cover of the &lt;a href="http://www.netlib.org/lapack/lug/"&gt;LAPACK Users' Guide&lt;/a&gt;?&lt;br /&gt;&lt;br /&gt;1337 |-|/\x0rz&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;9. Can you explain from memory what the first three letters of "DGEQRF" mean?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;10. Have you ever contributed to LAPACK code?&lt;br /&gt;&lt;br /&gt;11. Are you an author of an LAPACK Working Note?&lt;br /&gt;&lt;br /&gt;12. Do you belong to a UC Berkeley or UTK "lapackers" e-mail list?&lt;br /&gt;&lt;br /&gt;Beyond the Pale&lt;br /&gt;&lt;br /&gt;13. If somebody asks for a banded LU factorization with partial pivoting, can you tell them the LAPACK routine name without looking?&lt;br /&gt;&lt;br /&gt;14. Do you own and proudly wear a ScaLAPACK 1.0 T-shirt?&amp;nbsp; (One extra point if you get the math joke on the front.)&lt;br /&gt;&lt;br /&gt;15. Have you found a counterexample for convergence of an eigenvalue routine?&lt;br /&gt;&lt;br /&gt;Scoring:&lt;br /&gt;1-5:&amp;nbsp; Just starting&lt;br /&gt;6-10:&amp;nbsp; Escape while you can&lt;br /&gt;11-15:&amp;nbsp; Hopeless &lt;br /&gt;&amp;gt;15:&amp;nbsp; I know exactly who you are so you'd better not skew the curve!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-5310089841027292201?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/5310089841027292201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=5310089841027292201' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/5310089841027292201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/5310089841027292201'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/02/you-might-be-lapack-geek-if.html' title='You might be an LAPACK geek if...'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6118137369912513792</id><published>2009-01-21T12:24:00.000-08:00</published><updated>2009-01-21T13:41:46.451-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pedagogy'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><title type='text'>Modeling reality with code:  Nethack</title><content type='html'>One of my secret vices is Nethack, a classic dungeon-crawl game with an ASCII user interface, in the tradition of Rogue and Angband.&amp;nbsp; (The name "nethack" refers to the game's development as a hacking project by people distributed over the Internet, and has nothing to do with the game's content.)&amp;nbsp; Nethack is a great game, despite its ludicrously primitive graphics by modern standards, for the following reasons: &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Game play is rich and complex, but intuitive.&amp;nbsp; Rules make sense (especially if you catch the allusions to myth and modern storytelling) and actions have consequences.&amp;nbsp; Almost any object or creature can and will interact with anything else.&amp;nbsp; &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Intelligent humor makes the perverse and random ways to suffer and die (almost) bearable.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The game is so challenging that even if I skim the spoilers and play with infinite lives, I can get myself into apparently inextricable situations.&amp;nbsp; Yet, it's still enjoyable without cheats and spoilers.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The display is stripped-down but suggestive.&amp;nbsp; For example, line of sight and memory interact intuitively -- both are combined into one map, which lets you fit more of the map on one screen (unlike, say, a first-person shooter, which keeps the map separate from the view).&amp;nbsp; &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Nethack also makes a good motivating example for teaching new programmers how to model reality with code.&amp;nbsp; The game has some elements which are representative of real-life situations:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Agents have limited information, and may even forget things.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Things happen beyond agents' awareness.&lt;/li&gt;&lt;li&gt;Attempted actions may not succeed: luck (which may improve with skill) plays a role.&amp;nbsp; (Nethack, like most games, models luck probabilistically.)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Actions usually change the world state irreversibly.&lt;/li&gt;&lt;li&gt;Object interactions depend on the properties of &lt;i&gt;all&lt;/i&gt; the objects involved.&amp;nbsp; (In terms of object-oriented programming, your object system needs mixins.)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Some items and creatures are unique and irreplaceable.&lt;/li&gt;&lt;/ol&gt;&amp;nbsp;It also has properties which are not representative of reality:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Turn-based: there is no need to manage simultaneous attempts to change the world's state.&lt;/li&gt;&lt;li&gt;Rule-based: interactions are modeled simply, with little resort to a priori physical modeling (e.g., Newton's laws).&amp;nbsp; This means interactions are not usually computationally intensive.&lt;/li&gt;&lt;li&gt;Very simple graphics.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;These three properties make Nethack fit into a natural progression of increasingly complex situations to model.&amp;nbsp; The game belongs after explaining object-oriented programming (OOP), but before explaining concurrency.&amp;nbsp; The simple graphics let students concentrate on the algorithmic interactions (and save teaching assistants and graders a lot of trouble writing a GUI framework for them!).&amp;nbsp; These interactions are rich enough to exercise many OOP lessons, such as member and class methods, inheritance, generic (a.k.a. virtual) methods, and object factories (for uniqueness).&amp;nbsp; However, they are not generally computationally expensive, so complicated algorithms are not necessary.&lt;br /&gt;&lt;br /&gt;The chosen programming model affects the teaching approach.&amp;nbsp; For example, probabilistic success requires pseudorandom number generation, but if the programming model is functional, the novice's mental model ("rolling a die") differs from the usual stateful implementation.&amp;nbsp; Also, the lessons learned at this stage may require correction when the model is generalized to handle real-time (instead of turn-based) interactions, since multiple agents will need to draw random numbers simultaneously (parallel pseudorandom number generation). In addition, the programming language's version of OOP may make it more or less difficult to explain how to implement object interactions.&amp;nbsp; Some (such as Common Lisp) handle mixins trivially, whereas others (Java, C++) do not.&amp;nbsp; Reliance on "design patterns" to cover the programming language's deficiencies can help students deal with languages used in industry, but may also limit their flexibility with programming languages and models.&amp;nbsp; This doesn't mean a dogmatic insistence on "the one true programming language."&amp;nbsp; It's better to understand the deficiencies of one's chosen language, and learn how to work around them, than not to recognize deficiencies at all.&lt;br /&gt;&lt;br /&gt;Nethack offers another useful pedagogical feature:&amp;nbsp; it's well-crafted and balanced.&amp;nbsp; Playing it teaches students that making an application good comes not so much from learning new programming tricks, as from careful and laborious design and testing.&amp;nbsp;&amp;nbsp; "Testing" here means both correctness ("It didn't crash"), which students experience by necessity, as well as user experience testing ("It's fun to play"), which few students experience much in an undergraduate computer science curriculum.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6118137369912513792?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6118137369912513792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6118137369912513792' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6118137369912513792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6118137369912513792'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2009/01/modeling-reality-with-code-nethack.html' title='Modeling reality with code:  Nethack'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3927185440998835884</id><published>2008-12-06T13:24:00.000-08:00</published><updated>2008-12-06T17:54:48.296-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='typesetting'/><title type='text'>Making LaTeX builds faster</title><content type='html'>&lt;p&gt;I do most of my home computing on a first-generation Asus Eee -- with the seven-inch screen and all (though it's hooked up to a big monitor and ergonomic keyboard -- I'd go nuts otherwise!).  The little Eee is terrifically portable (only 2.2 lbs, and fits easily with fully open screen on an airplane tray) but is a bit underpowered -- mine has an Intel Celeron processor overclocked to 900 MHz, and presumably the memory bandwidth is much less than that of my  workstation at work.  I notice especially how underpowered the Eee is when I typeset documents with LaTeX.  Of course, making the most of underpowered hardware is a great nerd exercise ;-) so I'm posting some tips about how to do that here.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;The first trick I tried was to &lt;a href="http://magic.aladdin.cs.cmu.edu/2007/11/02/precompiled-preamble-for-latex/"&gt; precompile the document header. &lt;/a&gt;  This saved a little bit of time but not much.  Plus, the binary format of the precompiled preamble is not portable between different versions of the LaTeX toolchain, even when running under the same OS and brand of CPU.  I left it in just to save 0.4 or so seconds per document pass.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;Precompiling the preamble didn't save much time, so it seemed then that processing the (100-page) document was taking most of the time.  This made me think about the TeX build process.  Prof. Knuth considered a build an interactive process:  TeX runs, gives some warnings about overly long lines (with locations so you can correct them later), and on an error stops with a prompt for interactive editing.  This was probably because at the time Knuth developed TeX, building the document was a long computation.  One would want to interact with it in midstream to correct errors, rather than breaking off a build, editing, and starting the build over again.  This is partly why it's so much trouble to write a good Makefile for TeX documents -- Knuth set the tools to give you a lot of feedback by default, and you're supposed to read the feedback to tell whether you need to let a tool make another pass over the document.  On a fast computer, all the informative messages and warnings produced by the TeX toolchain zip by too fast to see.  On my Eee, however, they don't -- which made me wonder about the expense of terminal output itself.  TeX has an option ("-interaction batchmode") to turn off all the informative output.  Using this brought the time for a complete build (pdflatex once, bibtex once, then pdflatex twice) down from 30 seconds to 6 seconds!&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;Another optimization is &lt;a href="http://www.ics.mq.edu.au/~rdale/resources/writingnotes/latexstruct.html"&gt; taking advantage of the \include and \includeonly comamnds&lt;/a&gt; before the document is finished.  The \include command generates a .aux file for each included file.  If you add all the included files in the \includeonly list once and do a complete build, then you'll generate .aux files for all the included files.  This gives LaTeX references and page locations for all the files.  Then, you can remove files from the \includeonly list if you're not working on them, and only build the parts of the document in which you're interested.  This technique also preserves page numbers, equation numbers, and other reference-related things.  I was able to optimize a single pdflatex step down from 3.5 seconds to 2.5 seconds with this technique.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;The \include command has a side effect: it adds a page break after the included file.  (Presumably this lets LaTeX compute page numbers without needing to regenerate the .aux file.)  For shorter documents, I prefer to use \input to add files, because it doesn't create page breaks.  Since I'm writing a longer document with chapters and LaTeX inserts a page break at the end of each chapter anyway, I can use \include for "chapter files" and within each chapter file, use \input to include section files.  &lt;a href="http://www.ics.mq.edu.au/~rdale/resources/writingnotes/latexstruct.html"&gt;This website&lt;/a&gt; recommends solving the page break problem by using \include when editing, and \input when publishing, but for me, using \include for chapters reflects how I work on the document anyway.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;LaTeX is missing some features that could make a build go a lot faster.  First, building a document doesn't work like building a C library:  compilation of individual files isn't independent of the whole build.  I have a long document and I've taken the care to break it up into modular files; this helps with revision control, but LaTeX can't exploit this very effectively (other than with the \includeonly trick).  Second, LaTeX must pass several times over the same document in order to get references right; in Knuth's time, this probably was meant to save memory, but now it would be nice to use more memory in order to save some passes.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3927185440998835884?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3927185440998835884/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3927185440998835884' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3927185440998835884'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3927185440998835884'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/12/making-latex-builds-faster.html' title='Making LaTeX builds faster'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-8639973256275767775</id><published>2008-10-06T12:38:00.000-07:00</published><updated>2008-10-06T13:36:50.445-07:00</updated><title type='text'>Copying by hand</title><content type='html'>&lt;p&gt;A couple weeks ago, I noticed in a &lt;a href="http://www.boingboing.net/2008/09/26/play-cheat-program.html"&gt;BoingBoing post&lt;/a&gt; the following quote:&lt;br /&gt;&lt;blockquote&gt;Before literacy, we were mere listeners. We heard stories read to us as a group. After the printing press, we were elevating to individuals, each with our own, acknowledged perspective on what we read.&lt;br /&gt;&lt;/blockquote&gt;I made a comment (as "hilbertastronaut"), and I feel like expanding on&lt;br /&gt;it a bit more.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;The writer argues that the printing press elevated us (Westerners,&lt;br /&gt;perhaps?) from mere listeners to individuals with opinions.  If&lt;br /&gt;anything, the opposite might be true.  Back when the only way to&lt;br /&gt;propagate information was to copy it by hand, it was standard practice&lt;br /&gt;to add commentary to the margins of the document. This could include&lt;br /&gt;clarifications or even personal reflections. One sees this in Western&lt;br /&gt;monastic manuscripts as well as Chinese brush paintings (where&lt;br /&gt;standard practice was to indicate receipt of the painting by marking&lt;br /&gt;it with one's personal stamp, and then perhaps write one's thoughts on&lt;br /&gt;the piece in the margin).  In contrast, the printing press (and the&lt;br /&gt;Reformation in some sense) gave people the idea of a book as an&lt;br /&gt;official, received text. Written comments have a lesser status than&lt;br /&gt;the printed text. For example, if one buys a used textbook at a&lt;br /&gt;college bookstore, one generally hopes that it hasn't been written on&lt;br /&gt;much. Before the printing press, the commentary could often be equally&lt;br /&gt;or more valuable than the main text itself.  &lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;Copying other kinds of information by hand besides text also has&lt;br /&gt;the same effect.  For example, when J. S. Bach copied Italian&lt;br /&gt;concerti, he added inner voices and ornamentation, making the concerti&lt;br /&gt;into pieces universally recognized as his own compositions (and&lt;br /&gt;arguably more interesting than the originals!).  Bach treated copying&lt;br /&gt;both as an end (to get his hands on other people's music), and as a&lt;br /&gt;means (to improve his compositional skills).  Even today, mathematics&lt;br /&gt;teachers consider copying out and working through proofs an important&lt;br /&gt;part of learning mathematics.  This is because mathematics, like music&lt;br /&gt;composition, is not about memorization of facts but about developing&lt;br /&gt;one's creative problem-solving ability according to a combination of&lt;br /&gt;rules and aesthetic principles.  Copying a proof and understanding&lt;br /&gt;each step is a guided re-creation of the original author's work.&lt;br /&gt;Furthermore, creative mathematical thinkers in this process of&lt;br /&gt;re-creation usually have new things to say, like clarifications or&lt;br /&gt;generalizations, which they "write in the margins" and use in their&lt;br /&gt;research -- just like Bach wrote inner voices and ornamentations "in&lt;br /&gt;the margins" to produce new compositions.  &lt;/p&gt;&lt;br /&gt;&lt;p&gt;My sophomore high school English teacher handed out red pens on&lt;br /&gt;the first day of class, and exhorted us, even &lt;i&gt;required&lt;/i&gt; us to&lt;br /&gt;write in our books with them.  At the time, it seemed ridiculous to&lt;br /&gt;ruin books which could be reused for the next class.  Pencil&lt;br /&gt;annotations could at least be erased.  Now I understand that using a&lt;br /&gt;red pen elevates the status of the handwritten text, to be more&lt;br /&gt;visible and more permanent even than the printed text.  It makes the&lt;br /&gt;reader's commentary just as valuable, if not more so, than the&lt;br /&gt;original document.   Whether my commentary really was more valuable&lt;br /&gt;than what I read at the time is doubtful, but at least the long-term&lt;br /&gt;lesson of the value of "manual" commentary stuck in my head. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-8639973256275767775?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/8639973256275767775/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=8639973256275767775' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/8639973256275767775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/8639973256275767775'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/10/copying-by-hand.html' title='Copying by hand'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-8988473128545293338</id><published>2008-08-06T09:48:00.000-07:00</published><updated>2008-08-06T09:54:05.250-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hacks'/><title type='text'>ImageMagick crops your white space!</title><content type='html'>&lt;a href="http://www.imagemagick.org/script/index.php"&gt;ImageMagick&lt;/a&gt; is a fabulous command-line tool for image processing.  I use it a lot for converting between different image formats, which it does with no fuss whatsoever -- as the following PNG to PDF example illustrates:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;$ convert in.png out.pdf&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;ImageMagick can do much, much more than conversions.  Today I learned the following trick from an&lt;br /&gt;&lt;a href="http://studio.imagemagick.org/pipermail/magick-users/2002-May/002462.html"&gt;e-mail list discussion&lt;/a&gt;: &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;$ convert -trim img.pdf&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;which trims all the white space from around an image.  It's a great trick for inserting Matlab plots into papers!  I used to do this by hand with an image editing program like the GIMP; it's great to know that I don't have to fire up such a massive tool in order to accomplish this simple task.  (The GIMP is a wonderful program, incidentally, but starting it up just to crop some white space is a task beneath its mighty powers.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-8988473128545293338?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/8988473128545293338/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=8988473128545293338' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/8988473128545293338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/8988473128545293338'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/08/imagemagick-crops-your-white-space.html' title='ImageMagick crops your white space!'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3644046400383624629</id><published>2008-08-05T22:56:00.000-07:00</published><updated>2008-08-05T22:59:57.578-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Lisp'/><title type='text'>New version of ECL is out</title><content type='html'>A &lt;a href="http://sourceforge.net/forum/forum.php?forum_id=853748"&gt;new version&lt;/a&gt; of &lt;a href="http://sourceforge.net/projects/ecls/"&gt;Embeddable Common Lisp&lt;/a&gt; (ECL) is out!!!  I use ECL in my projects to support Lisp as an embedded scripting language in large C-based projects.  ECL's chief developer (Juan Jose Garcia Ripoll) is amazingly responsive, which makes the library a pleasure to use.  Yay ECL!!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3644046400383624629?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3644046400383624629/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3644046400383624629' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3644046400383624629'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3644046400383624629'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/08/new-version-of-ecl-is-out.html' title='New version of ECL is out'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6812677308923989647</id><published>2008-06-30T09:11:00.000-07:00</published><updated>2008-06-30T09:25:54.634-07:00</updated><title type='text'>(Pseudo)random numbers matter</title><content type='html'>&lt;p&gt;&lt;br /&gt;The newly released &lt;a href="http://www.netlib.org/lapack/lawnspdf/lawn206.pdf"&gt;LAPACK Working Note #206&lt;/a&gt; gives yet another reason why generating good pseudorandom numbers matters:  &lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;In May 2007, a large high performance computer manufacturer ran a twenty-hour long High Performance Linpack benchmark.  The run fails with the following output:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;|| A x - b ||_oo / ( eps * ||A||_1 * N ) = 9.22e+94 ...... FAILED&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;What happened was that the benchmark's matrix generator uses a lame linear congruential pseudorandom number generator, which causes generated matrices to have repeated columns for certain unfortunate choices of matrix dimension.  This of course makes one wonder why the generator doesn't just make a matrix which is known to be invertible, say, by generating a sufficiently nonzero diagonal matrix and hitting it on both sides with orthogonal transforms until the zeros are filled in.  Regardless, the bug meant 20 hours of very expensive, intensely power-consuming supercomputer time were wasted on computing the wrong answer to a problem which at such sizes very few people need to solve.  So, random numbers do matter ;-)&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6812677308923989647?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6812677308923989647/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6812677308923989647' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6812677308923989647'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6812677308923989647'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/06/pseudorandom-numbers-matter.html' title='(Pseudo)random numbers matter'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-2076237574714966178</id><published>2008-06-26T10:51:00.000-07:00</published><updated>2008-06-26T11:00:35.231-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nerdy'/><title type='text'>Contemplation leads to messes</title><content type='html'>&lt;p&gt;I like to make espresso using &lt;a href="http://www.aerobie.com/Products/aeropress_story.htm"&gt;one of these gadgets&lt;/a&gt;.  When you press out all the water, it leaves a "puck" of compressed grounds, which you can pop out straight into the garbage or compost just by pushing on the handle.  I was looking at the puck this morning and saw some lovely patterns made by different layers of grounds.  One thin layer in the puck appeared more compressed than the other layers, but this more compact layer wasn't a straightforward horizontal cross section -- it had a ridged topography like one sees in the sedimentary rock hills around here.  It made me wonder about the bulk properties of solid particles, about which I had seen an interesting presentation a few months before, on simulating such materials.  &lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;As I was holding the puck in my hand and examining this layer from all angles, the puck suddenly broke into its constituent coffee grounds and made a huge mess all over the floor.  I realized then that I had fallen into the nerd's usual trap of neglecting practical matters for the sake of contemplating lovely abstractions ;-)&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-2076237574714966178?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/2076237574714966178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=2076237574714966178' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2076237574714966178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/2076237574714966178'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/06/contemplation-leads-to-messes.html' title='Contemplation leads to messes'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-4956314315162931453</id><published>2008-06-17T15:27:00.000-07:00</published><updated>2008-06-17T15:28:44.346-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='horror'/><title type='text'>The horror</title><content type='html'>&lt;a href="http://www.wiley.com/legacy/compbooks/catalog/12974-7.htm"&gt;Add "MS Visual" in front to get the full effect.&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-4956314315162931453?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/4956314315162931453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=4956314315162931453' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/4956314315162931453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/4956314315162931453'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/06/horror.html' title='The horror'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-985205056723482688</id><published>2008-05-15T11:34:00.000-07:00</published><updated>2008-05-15T11:36:27.720-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='security'/><category scheme='http://www.blogger.com/atom/ns#' term='freedom'/><title type='text'>Securing your laptop from customs officials</title><content type='html'>Be sure to read Bruce Schneier's Guardian article on &lt;a href="http://www.guardian.co.uk/technology/2008/may/15/computing.security"&gt;how to secure your laptop from customs officials&lt;/a&gt;.  Why should you trust them not to steal your credit card numbers, just because they're wearing a badge?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-985205056723482688?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/985205056723482688/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=985205056723482688' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/985205056723482688'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/985205056723482688'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/05/be-sure-to-read-bruce-schneiers.html' title='Securing your laptop from customs officials'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3067479338058304440</id><published>2008-05-13T12:02:00.000-07:00</published><updated>2008-05-14T13:00:24.459-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='security'/><title type='text'>Verifying identity: Existence and uniqueness</title><content type='html'>Some friends of mine have been playing an online diplomacy and military simulation game, in which each player represents an independent country.  One of the biggest problems in the game is "multis" -- multiple accounts created by one person.  The one person uses the multiple nations to magnify his/her power in the game.  The rules prohibit this practice, but it can be hard to detect.  Restricting each player to use only one IP address is annoying (as it prevents multiple people from using the same computer), but a start.  The underlying problem is how to establish &lt;span style="font-style: italic;"&gt;uniqueness&lt;/span&gt; of identity:  Are you the only person who claims to be you?&lt;br /&gt;&lt;br /&gt;Another identity problem is that of &lt;span style="font-style: italic;"&gt;verification&lt;/span&gt;:  Are you the person whom you claim to be?  This was a problem faced by Daniel Plainview, the protagonist of the 2008 movie &lt;a href="http://www.imdb.com/title/tt0469494/"&gt;There Will Be Blood&lt;/a&gt;.  Was the man claiming to be his brother Henry actually that person?  Daniel is understandably mistrustful.  Ultimately the question is resolved accidentally, by the exchange of a bit of personal information -- an oblique reference to an inside joke.  When verifying identity, one has to take care not to give away information when asking for it:  for example, if you give a bank your social security number, the bank could then use it to masquerade as you.  Some recent security research has addressed the question of how to verify identity without giving away a secret.  Daniel Plainview's homegrown solution is just as effective:  inside jokes rely not only on objective information, but on a particular emotional response which would be very hard for either a machine or an unrelated human to mimic.  In that sense, they are even better than &lt;a href="http://en.wikipedia.org/wiki/Captcha"&gt;CAPTCHA&lt;/a&gt;s:  They give away very little information and are very difficult for unauthorized agents to solve.&lt;br /&gt;&lt;br /&gt;Uniqueness is an easy problem to solve in person, aside from comedy sketches involving twins.  This is because human "duplicates" (genetically identical multiple births) are difficult to "make."  In contrast, it's very hard to solve remotely and electronically.  If you can fake a verification test, then you can break a uniqueness test.  So verification and uniqueness go hand-in-hand.&lt;br /&gt;&lt;br /&gt;Oftentimes in math, there's a fruitful tension between existence and uniqueness.  When one wishes to prove that "there exists a unique object," one generally proves existence and uniqueness separately.  (In this context, "uniqueness" means that if such a thing does exist, it must be unique.  So uniqueness by itself doesn't necessarily imply existence, nor does existence by itself necessarily imply uniqueness.)  I'm curious whether this tension could be helpful in the field of verifying identity.  Answering the question "Am I Jane Doe?" is relatively easy, but the question "If I am really Jane Doe as I claim to be, then there is only one such person" doesn't even seem to be the right question to answer.  Furthermore, it shouldn't be necessary to reveal your true identity in order to play an online game.  Can uniqueness be solved without verification?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3067479338058304440?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3067479338058304440/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3067479338058304440' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3067479338058304440'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3067479338058304440'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/05/verifying-identity-existence-and.html' title='Verifying identity: Existence and uniqueness'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-1991198125767485421</id><published>2008-05-07T11:07:00.000-07:00</published><updated>2008-05-14T14:29:27.169-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='LAPACK'/><category scheme='http://www.blogger.com/atom/ns#' term='linear algebra'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel'/><title type='text'></title><content type='html'>Two new &lt;a href="http://www.netlib.org/lapack/lawns/downloads/"&gt;LAPACK Working Notes&lt;/a&gt; have been published!&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.netlib.org/lapack/lawns/downloads/"&gt;&lt;/a&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;"LAPACK Working Note 199: Regular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion."&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;by Fred G. Gustavson, Jerzy Wasniewski, and Jack J. Dongarra&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;UT-CS-08-614, April 28, 2008.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;"LAPACK Working Note 200: Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures." &lt;/div&gt;&lt;div&gt;by Marc Baboulin, Jack  Dongarra and Stanimire Tomov&lt;div&gt;UT-CS-08-615, May 6, 2008.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;You can download them at the above link.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-1991198125767485421?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/1991198125767485421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=1991198125767485421' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/1991198125767485421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/1991198125767485421'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/05/two-new-lapack-working-notes-have-been.html' title=''/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-1218039453507667013</id><published>2008-05-02T15:52:00.001-07:00</published><updated>2008-05-02T15:54:28.105-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='religion'/><title type='text'>The Other Blog</title><content type='html'>I post on religious, musical, and other matters at &lt;a href="http://hilberts-thoughts.blogspot.com/"&gt;my other blog&lt;/a&gt;, which you are welcome to visit :-)  I just put up a small sermon on the Ascension as the first post.  Eventually I'll clean up some older essays for reposting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-1218039453507667013?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/1218039453507667013/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=1218039453507667013' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/1218039453507667013'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/1218039453507667013'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/05/other-blog.html' title='The Other Blog'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-9103822999915945900</id><published>2008-04-28T21:53:00.000-07:00</published><updated>2008-04-30T08:55:50.490-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HPC'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><category scheme='http://www.blogger.com/atom/ns#' term='business'/><title type='text'>Microsoft comes over to play</title><content type='html'>Monday and Tuesday this week:  a new (three months old!) numerical libraries group from Microsoft came over to speak with us linear algebra hackers and parallel performance tuners.  Today we did most of the talking, but we learned something from them:  They aren't from MS Research, and they aim to do applied research (not "blue-sky research," as the group's manager put it) with a 2-5 year horizon, and then transition successful, desired prototypes out into a production group.  They've been incubating some scientific libraries for about two years, and they want to push it out into a core scientific library (sparse and dense linear algebra for now).  Target hardware is single-node multicore -- no network stuff -- and they are especially interested in higher-level interface design for languages like C++, C#, and IronPython, built on both native and managed code.  ("Managed code" means heavyweight runtimes like the JVM and .NET -- .NET is a big thing for them in improving programmer productivity, and these runtimes have a growing ecosystem of friendly higher-level languages.) Their group is pretty small now, but they are actively hiring (in case any of you are looking for a job in Redmond-world), and they have some bright folks on their team.&lt;br /&gt;&lt;br /&gt;One of our biggest questions was, and probably still is, "why?" -- why wouldn't third-party applications suffice?  Then again, one could ask the same of a company like Intel -- why do they need a large in-house group of HPC hackers?  However, there's a difference:  MS doesn't have the personpower or expertise to contribute as much to, say, ScaLAPACK development as Intel has, nor do they intend to grow their group that much.  This team seems to be mainly focused on interface design:  how to wrap efficient but hard-to-use scientific codes so that coders in the MS world can exploit them.  In that context, my advisor showed one of his favorite slides:  three different interfaces to a linear solver (solve Ax = b for the vector x, where b is a known vector and A is a matrix).  One is Matlab's:  "A \ b".  The other two are two possible invocations of ScaLAPACK's parallel linear solver.  The second of these has nearly twenty obscurely named arguments relating to the data distribution (it's a parallel distributed-memory routine) and to iterative refinement -- clearly not what you want to give to a 22-year-old n00b fresh out of an undergrad CS curriculum who knows barely enough math to balance a checkbook.  Ultimately, MS has to design programmer interfaces for these people, as well as for gurus -- which is something that the gurus often forget.&lt;br /&gt;&lt;br /&gt;Another reason perhaps for the "why" is that high-performance, mathematical calculations are a compelling reason to buy new computing hardware and software.   There are interesting performance-bound consumer applications being developed, most of which have some kind of math kernel(s) at their core.  MS presumably wants to get in on that action, especially as it is starting to lose out on the shrink-wrapped OS-and-Office software market as well as the Web 2.0 / search market. &lt;br /&gt;&lt;br /&gt;It's interesting to watch a large, bureaucratic company like Microsoft struggling to evolve.  IBM managed this sort of transition pretty well, from what I understand.  They still sell mainframes (and supercomputers!), but they also sell services, and maintain a huge and successful research effort on many fronts.  MS Research is also a powerhouse, but somehow we don't see the research transitioning into products, or even driving the brand, as it does in IBM's case (think about the  Kasparov-defeating chess computer &lt;a href="http://www.research.ibm.com/deepblue/"&gt;Deep Blue&lt;/a&gt;,&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt; for example).  Maybe it does drive the products, but somehow the marketing has failed to convey this.  I kind of feel for them, just like the "Mac guy" feels for the "PC guy" in Apple's ad series:  They have to struggle not only to command a new market, but also to reinvent their image and command a brand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-9103822999915945900?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/9103822999915945900/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=9103822999915945900' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/9103822999915945900'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/9103822999915945900'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/04/microsoft-comes-over-to-play.html' title='Microsoft comes over to play'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3962323436478410181</id><published>2008-04-17T17:05:00.000-07:00</published><updated>2008-04-17T17:06:55.298-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='relativity'/><title type='text'>Engineering applications of general relativity?</title><content type='html'>The first engineering application of general relativity: GPS. Apparently, global positioning requires accurate time computations, and once you can measure time to sufficient accuracy, you need to account for time dilation due to gravity.&lt;br /&gt;&lt;br /&gt;(Originally posted 16 April 2008.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3962323436478410181?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3962323436478410181/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3962323436478410181' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3962323436478410181'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3962323436478410181'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/04/engineering-applications-of-general.html' title='Engineering applications of general relativity?'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-3299730706756152381</id><published>2008-04-17T17:04:00.001-07:00</published><updated>2008-04-17T17:05:11.622-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='numerical analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='preconditioner'/><category scheme='http://www.blogger.com/atom/ns#' term='Krylov'/><title type='text'>Preconditioners as eigenvalue clusterers</title><content type='html'>At the Copper Mountain Iterative Methods conference this year, I heard a really good explanation of how preconditioners work, from a PhD student at the University of Manchester. (Thanks Richard!) He explained to me that one can think of a Krylov subspace method as polynomial interpolation of the spectrum of the matrix. Each time you do a matrix-vector multiply, you increase the polynomial's degree by one. (This gets more clear if you've seen the proof that conjugate gradients is optimal in the norm induced by the matrix.)&lt;br /&gt;&lt;br /&gt;If the matrix's eigenvalues are evenly distributed, it will get harder and harder to interpolate the spectrum, because of the Gibbs phenomenon. In contrast, if one eigenvalue is separated from the others, it's easy to interpolate that part of the spectrum accurately and "pick off" the separated eigenvalue. This is called "deflation." After this point, the Krylov method converges pretty much as if the separated eigenvalue wasn't there. This makes the goal of a good preconditioner clear: It should separate and cluster the eigenvalues (away from the origin), so that the Krylov method can pick off the clusters.&lt;br /&gt;&lt;br /&gt;If I may also engage in some speculation, this perhaps also explains one reason why block preconditioners are popular in multiphysics solvers: they group together eigenmodes that are on similar scales, and separate unrelated eigenmodes. Of course, block preconditioning is the mathematical equivalent of encapsulation in the software engineering world: it lets you solve one problem at a time and then stitch together a combined solution out of the components. Mushing together all the subproblems into a single big problem loses the structure, whereas sparse linear algebra is all about exploiting structure.&lt;br /&gt;&lt;br /&gt;(Originally posted 16 April 2008.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-3299730706756152381?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/3299730706756152381/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=3299730706756152381' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3299730706756152381'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/3299730706756152381'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/04/at-copper-mountain-iterative-methods.html' title='Preconditioners as eigenvalue clusterers'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-6143496819018316898</id><published>2008-04-17T16:56:00.000-07:00</published><updated>2008-04-17T17:03:42.744-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hermetic'/><category scheme='http://www.blogger.com/atom/ns#' term='science'/><category scheme='http://www.blogger.com/atom/ns#' term='alchemy'/><title type='text'>Alchemy: pride and wonder</title><content type='html'>The website of NOVA, a PBS science program, has &lt;a href="http://www.pbs.org/wgbh/nova/newton/alch-newman.html"&gt;a neat article&lt;/a&gt; which decodes a page from Isaac Newton's alchemy notebooks, and discusses Newton's interest in alchemy in its historical context. Alchemy is a fascinating subject because it shows the tight links between science, philosophy, and religion at the very beginning of what we consider "modern" science, and also because it illustrates the tension between rationalization (wanting to understand or control the universe by discovering or imposing universal laws that govern it) and wonder (non-rational awe of nature). Newton himself shows both tendencies. The article mentions how he secretly called himself "Jehovah the Holy One," and William Blake depicts both Newton and God bearing the compass to measure and bound Creation. Yet, Newton saw his discoveries as if he were just a child picking up a pretty shell on the beach and barely intuiting the existence of countless more in the ocean before him.&lt;br /&gt;&lt;br /&gt;Alchemy's all-encompassing vision arguably failed to persist into its successor sciences. By 1800 or so, Friedrich von Hardenberg (a.k.a. Novalis, a geologist and poet) was already condemning the "Scheidekunst" of the sciences: how they were dissected cadaver-like into subfields, without regard for the living interactions between fields. In some sense, modern physics with quantum mechanics inherited the idea of reaching for a universal theory: "grand unified theory" and "quantum gravity" reveal this desire to explain all interactions at both tiny and grand scales. They also attract their share of superstition and quackery (e.g., the use of "quantum tangling between doctor and patient" to explain how homeopathic medicine supposedly works), just as alchemy did in its time. Yet Novalis' and Newton's pursuit of universal natural philosophy as a spiritual activity shows the importance of non-rational awe, wonder, and curiosity in the sciences. Even such an avowed atheist as Carl Sagan could wax eloquent at the "billions and billions" of stars and speak tenderly of our planet as a "pale blue dot." (I've never heard an atheist write more reverently than he!)&lt;br /&gt;&lt;br /&gt;(Originally posted 31March 2008.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-6143496819018316898?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/6143496819018316898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=6143496819018316898' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6143496819018316898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/6143496819018316898'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/04/alchemy-pride-and-wonder.html' title='Alchemy: pride and wonder'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2572826743364240863.post-875814070926315501</id><published>2008-04-17T16:52:00.000-07:00</published><updated>2008-04-17T16:56:06.307-07:00</updated><title type='text'>(Re)new(ed) blog</title><content type='html'>Welcome to my (re)new(ed) blog!&lt;br /&gt;&lt;br /&gt;I deleted my old blog in order to start fresh.  This new version will only have "nerdy" posts -- i.e., about the sciences or related recreational interests.  I will spawn off different blog(s) for other topics, in particular, about music, religion, and politics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2572826743364240863-875814070926315501?l=hilbertastronaut.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hilbertastronaut.blogspot.com/feeds/875814070926315501/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2572826743364240863&amp;postID=875814070926315501' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/875814070926315501'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2572826743364240863/posts/default/875814070926315501'/><link rel='alternate' type='text/html' href='http://hilbertastronaut.blogspot.com/2008/04/renewed-blog.html' title='(Re)new(ed) blog'/><author><name>HilbertAstronaut</name><uri>http://www.blogger.com/profile/11443786031975040593</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/__2n7uXKh6gw/TFXiQ-VH_TI/AAAAAAAABkQ/ypK4h2QHS2A/S220/Franz_Gareis_-_Novalis.jpg'/></author><thr:total>2</thr:total></entry></feed>
