tag:blogger.com,1999:blog-2572826743364240863.post9075428031100200376..comments2020-05-04T08:05:03.791-07:00Comments on The nerdiest of the nerds: GPUs and CPUs, part 2: programming modelsHilbertAstronauthttp://www.blogger.com/profile/11443786031975040593noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-2572826743364240863.post-26516758384880780262010-09-08T09:45:54.932-07:002010-09-08T09:45:54.932-07:00Thanks for the link! I appreciate that you point ...Thanks for the link! I appreciate that you point out annotations as another approach to mixed GPU/CPU programming, since I'm not so familiar with it. I'm familiar with annotation-based programming from OpenMP and also from programming languages like ANSI Common Lisp, where type hints serve as optional annotations for improving performance.<br /><br />A big concern for mixed GPU/CPU programming, and indeed all heterogeneous node programming, is the memory model. It's reasonable to expect the following:<br /><br />1. Different devices will often have different memory spaces (and this may be good -- GPU memory sacrifices bandwidth for latency, which you may not want on a CPU)<br /><br />2. Even within a single shared-memory image, memory layout matters for performance (e.g., for NUMA, or if GPU and CPU memories get mapped into a single space but are physically different for performance reasons)<br /><br />3. Different devices have different requirements for memory alignment -- the way that you allocate arrays and lay out structures in memory may differ on different devices<br /><br />4. Copy overhead is prohibitive for many algorithms. The performance of sparse matrix-vector multiply is already typically memory bandwidth - bound, for example.<br /><br />All of these things mean that managing memory -- allocation, placement, alignment, and "conditioning" (convincing the OS to put pages where they should be for best performance) -- is perhaps the most important part of heterogeneous node computing. The most productive programming languages, libraries, or models for heterogeneous nodes will be those that help programmers manage those memory issues with minimal intervention. <br /><br />In Trilinos, we aim to do this with the Kokkos Node API:<br /><br />http://trilinos.sandia.gov/packages/docs/dev/packages/kokkos/doc/html/group__kokkos__node__api.html<br /><br />Intel's Array Building Blocks uses a similar model to Kokkos' "compute buffers," which convinces me that it's the way things will or should go.<br /><br />How do the HMPP directives help programmers manage memory issues with heterogeneous computing?HilbertAstronauthttps://www.blogger.com/profile/11443786031975040593noreply@blogger.comtag:blogger.com,1999:blog-2572826743364240863.post-3292288487586359062010-09-08T08:52:52.586-07:002010-09-08T08:52:52.586-07:00HMPP directives were made an open standard by Path...HMPP directives were made an open standard by PathScale and CAPS and could be an interesting addition for comparison against CUDA/OpenCL programming models. The directive/pragma based approach similar to OpenMP, but tailored to manycore/gpu offers a number of advantages..<br /><br />Here's our userguide which should give enough details to see what I mean..<br />http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdfUnknownhttps://www.blogger.com/profile/12010026956213906137noreply@blogger.comtag:blogger.com,1999:blog-2572826743364240863.post-27685288473801033962010-09-08T08:48:50.894-07:002010-09-08T08:48:50.894-07:00HMPP directives have been made an open standard an...HMPP directives have been made an open standard and the overall programming model may be an interesting 3rd part to your article. Please do cover the pragma/directive based approaches as well..<br /><br />http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdfUnknownhttps://www.blogger.com/profile/12010026956213906137noreply@blogger.com