﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-C++ Coder-随笔分类-OpenCL</title><link>http://www.cppblog.com/jackdongy/category/20074.html</link><description>HCP高性能计算架构，实现，编译器指令优化，算法优化，
  LLVM   CLANG   OpenCL   CUDA   OpenACC    C++AMP   OpenMP   MPI</description><language>zh-cn</language><lastBuildDate>Wed, 20 Feb 2013 07:10:01 GMT</lastBuildDate><pubDate>Wed, 20 Feb 2013 07:10:01 GMT</pubDate><ttl>60</ttl><item><title>浅谈多节点CPU+GPU协同计算负载均衡性设计</title><link>http://www.cppblog.com/jackdongy/archive/2013/02/17/197878.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Sun, 17 Feb 2013 05:27:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2013/02/17/197878.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/197878.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2013/02/17/197878.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/197878.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/197878.html</trackback:ping><description><![CDATA[<p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"><div>http://blog.csdn.net/zhang0311/article/details/8224093</div>近年来，基于CPU+GPU的混合异构计算系统开始逐渐成为国内外高性能计算领域的热点研究方向。在实际应用中，许多基于 CPU+GPU 的混合异构计算机系统表现出了良好的性能。但是，由于各种历史和现实原因的制约，异构计算仍然面临着诸多方面的问题，其中最突出的问题是程序开发困难，尤其是扩展到集群规模级别时这个问题更为突出。主要表现在扩展性、负载均衡、自适应性、通信、内存等方面。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">一、&nbsp;&nbsp;&nbsp; CPU+GPU协同计算模式</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">CPU+GPU异构协同计算集群如图1所示，CPU+GPU异构集群可以划分成三个并行层次：节点间并行、节点内CPU与GPU异构并行、设备（CPU或GPU）内并行。根据这三个层次我们可以得到CPU+GPU异构协同计算模式为：节点间分布式+节点内异构式+设备内共享式。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;节点间分布式</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">CPU+GPU异构协同计算集群中，各个节点之间的连接与传统CPU集群一样，采用网络连接，因此，节点间采用了分布式的计算方式，可以采用MPI消息通信的并行编程语言。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;节点内异构式</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">CPU+GPU异构协同计算集群中，每个节点上包含多核CPU和一块或多块GPU卡，节点内采用了异构的架构，采用主从式的编程模型，即每个GPU卡需要由CPU进程/线程调用。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">由于每个节点上，CPU核数也比较多，计算能力也很大，因此，在多数情况下，CPU也会参与部分并行计算，根据CPU是否参与并行计算，我们可以把CPU+GPU异构协同计算划分成两种计算模式：</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CPU/GPU协同计算：CPU只负责复杂逻辑和事务处理等串行计算，GPU 进行大规模并行计算；</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CPU+GPU共同计算：由一个CPU进程/线程负责复杂逻辑和事务处理等串行计算，其它CPU进程/线程负责小部分并行计算，GPU负责大部分并行计算。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">由于CPU/GPU协同计算模式比CPU+GPU共同计算模式简单，下面的介绍中，我们以CPU+GPU共同计算模式为例进行展开介绍各种编程模式。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">在CPU+GPU共同计算模式下，我们把所有的CPU统称为一个设备（device），如双路8核CPU共有16个核，我们把这16个核统称成一个设备；每个GPU卡成为一个设备。根据这种划分方式，我们可以采用MPI进程或OpenMP线程控制节点内的各设备之间的通信和数据划分。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;设备内共享式</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CPU设备：每个节点内的所有多核CPU采用了共享存储模型，因此，把节点内的所有多核CPU看作一个设备， 可以采用MPI进程或OpenMP线程、pThread线程控制这些CPU核的并行计算。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;GPU设备：GPU设备内有自己独立的DRAM存储，GPU设备也是共享存储模型，在GPU上采用CUDA或OpenCL编程控制GPU众核的并行计算。CUDA编程模式只在NVIDIA GPU上支持，OpenCL编程模式在NVIDIA GPU和AMD GPU都支持。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">根据前面对CPU+GPU异构协同计算模式的描述，我们可以得到CPU+GPU异构协同计算的编程模型（以MPI和OpenMP为例）如表1所示。</p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"><img src="http://img.my.csdn.net/uploads/201211/26/1353891940_7225.jpg" alt="" style="margin: 0px; padding: 0px; border: none;" /><br style="margin: 0px; padding: 0px;" /></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">图1 CPU+GPU异构协同计算架构</p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">表1 CPU+GPU异构协同计算编程模型</p><div align="center" style="margin: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"><table border="1" cellspacing="0" cellpadding="0" style="margin: 0px; padding: 0px;"><tbody style="margin: 0px; padding: 0px;"><tr style="margin: 0px; padding: 0px;"><td rowspan="2" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">&nbsp;</p></td><td rowspan="2" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">节点间分布式</p></td><td rowspan="2" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">节点内异构式</p></td><td colspan="2" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">设备内共享式</p></td></tr><tr style="margin: 0px; padding: 0px;"><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">CPU</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">GPU</p></td></tr><tr style="margin: 0px; padding: 0px;"><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">模式1</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">MPI</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">OpenMP</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">OpenMP</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">CUDA/OpenCL</p></td></tr><tr style="margin: 0px; padding: 0px;"><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">模式2</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">MPI</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">MPI</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">OpenMP</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">CUDA/OpenCL</p></td></tr><tr style="margin: 0px; padding: 0px;"><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">模式3</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">MPI</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">MPI</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">MPI</p></td><td valign="top" style="margin: 0px; padding: 0px;"><p align="center" style="margin: 0px; padding: 0px;">CUDA/OpenCL</p></td></tr></tbody></table></div><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">二、&nbsp;&nbsp;&nbsp; CPU+GPU协同计算负载均衡性设计</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">下面以模式2为例简单介绍多节点CPU+GPU协同计算任务划分和负载均衡，模式2的进程和线程与CPU核和GPU设备对应关系如图2所示。若采用主从式MPI通信机制，我们在节点0上多起一个进程（0号进程）作为主进程，控制其它所有进程。每个节点上启动3个计算进程，其中两个控制GPU设备，一个控制其余所有CPU核的并行，在GPU内采用CUDA/OpenCL并行，在CPU设备内采用OpenMP多线程并行。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">由于CPU+GPU协同计算模式分为3个层次，那么负载均衡性也需要在这3个层次上分别设计。在模式2的编程方式下，节点内和节点间均采用MPI进程，合二为一，设计负载均衡时，只需要做到进程间（设备之间）的负载均衡和CPU设备内OpenMP线程负载均衡、GPU设备内CUDA线程负载均衡即可。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">对于设备内，采用的是共享存储器模型，CPU设备上的OpenMP线程可以采用schedule(static/ dynamic/ guided )方式；GPU设备上只要保证同一warp内的线程负载均衡即可。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">对于CPU+GPU协同计算，由于CPU和GPU计算能力相差很大，因此，在对任务和数据划分时不能给CPU设备和GPU设备划分相同的任务/数据量，这就增加了CPU与GPU设备间负载均衡的难度。CPU与GPU之间的负载均衡最好的方式是采用动态负载均衡的方法，然而有些应用无法用动态划分而只能采用静态划分的方式。下面我们分别介绍动态划分和静态划分。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;动态划分：对于一些高性能计算应用程序，在CPU与GPU之间的负载均衡可以采用动态负载均衡的优化方法，例如有N个任务/数据，一个节点内有2个GPU卡，即三个设备（CPU和2个GPU），动态负载均衡的方法是每个设备先获取一个任务/数据进行计算，计算之后立即获取下一个任务，不需要等待其他设备，直到N个任务/数据计算完成。这种方式只需要在集群上设定一个主进程，负责给各个计算进程分配任务/数据。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;静态划分：在一些应用中，无法采用动态划分的方式，需要静态划分方法，然而静态划分方法使异构设备间的负载均衡变得困难，有时甚至无法实现。对于一些迭代应用程序，我们可以采用学习型的数据划分方法，如先让CPU和GPU分别做一次相同计算量的计算，然后通过各自的运行时间计算出CPU与GPU的计算能力比例，然后再对数据进行划分。</p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"><img src="http://img.my.csdn.net/uploads/201211/26/1353891946_9356.jpg" alt="" style="margin: 0px; padding: 0px; border: none;" /><br style="margin: 0px; padding: 0px;" /></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">图2 CPU+GPU协同计算示意图（以每个节点2个GPU为例）</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">三、&nbsp;&nbsp;&nbsp; CPU+GPU协同计算数据划分示例</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">假设某一应用的数据特点如图3所示，从输出看，结果中的每个值的计算需要所有输入数据的信息，所有输出值的计算之间没有任何数据依赖性，可以表示成out<sub style="margin: 0px; padding: 0px;">j</sub>=；从输入看，每个输入值对所有的输出值都产生影响，所有输入数据之间也没有任何数据依赖性。从数据特点可以看出，该应用既可以对输入进行并行数据划分也可以对输出进行数据划分。下面我们分析CPU+GPU协同计算时的数据划分方式。</p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"><img src="http://img.my.csdn.net/uploads/201211/26/1353891950_9638.jpg" alt="" style="margin: 0px; padding: 0px; border: none;" /><br style="margin: 0px; padding: 0px;" /></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">图3 并行数据示例</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;按输入数据划分</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">假设按输入数据划分，我们可以采用动态的方式给每个CPU或GPU设备分配数据，做到动态负载均衡，然而这种划分方式，使所有的线程向同一个输出位置保存结果，为了正确性，需要使所有的线程对每个结果进行原子操作，这样将会严重影响性能，极端情况下，所有线程还是按顺序执行的。因此，这种方式效果很差。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;按输出数据划分</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">按输出数据划分的话可以让每个线程做不同位置的结果计算，计算完全独立，没有依赖性。如果采用静态划分的方式，由于CPU和GPU计算能力不同，因此，很难做到负载均衡。采用动态的方式可以做到负载均衡，即把结果每次给CPU或GPU设备一块，当设备计算完本次之后，立即向主进程申请下一个分块，这样可以做到完全负载均衡。按输出数据划分，无论采用静态划分还是动态划分，都会带来另外一个问题，由于每个结果的计算都需要所有输入信息，那么所有进程（设备）都需要读取一遍所有输入数据，动态划分时还不只一次，尤其对于输入数据很大时，这将会对输入数据的IO产生很大的影响，很有可能使IO程序性能瓶颈。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;按输入和输出同时划分</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">由于按输入或按输出划分都存在不同的缺点，我们可以采用输入和输出同时划分的方式进行数据划分，如图4所示。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">从输出角度，让所有的计算进程（设备）都有一份计算结果，设备内的线程对结果进行并行计算，每个设备都有一份局部的计算结果，所有设备都计算完毕之后，利用MPI进程对所有设备的计算结果进行规约，规约最后的结果即是最终的结果。</p><p style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">从输入角度，按输入数据动态划分给不同的计算进程（设备），这样可以满足所有的计算进程负载均衡。</p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;"><img src="http://img.my.csdn.net/uploads/201211/26/1353891955_9226.jpg" alt="" style="margin: 0px; padding: 0px; border: none;" /><br style="margin: 0px; padding: 0px;" /></p><p align="center" style="margin: 0px; padding: 0px; color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff;">图4 CPU+GPU协同计算数据划分示例</p><img src ="http://www.cppblog.com/jackdongy/aggbug/197878.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2013-02-17 13:27 <a href="http://www.cppblog.com/jackdongy/archive/2013/02/17/197878.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>VLIW on Cypress and vector addition</title><link>http://www.cppblog.com/jackdongy/archive/2013/01/09/197154.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Wed, 09 Jan 2013 08:37:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2013/01/09/197154.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/197154.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2013/01/09/197154.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/197154.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/197154.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: http://devgurus.amd.com/thread/158866VLIW on Cypress and vector addition此问题被&nbsp;假设已回答。cadorino&nbsp;2012-7-2 上午10:31Hi to everybody.I'm thinking about VLIW utilization on a 5870 HD.Suppose you have ...&nbsp;&nbsp;<a href='http://www.cppblog.com/jackdongy/archive/2013/01/09/197154.html'>阅读全文</a><img src ="http://www.cppblog.com/jackdongy/aggbug/197154.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2013-01-09 16:37 <a href="http://www.cppblog.com/jackdongy/archive/2013/01/09/197154.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Low ALUBusy and low FetchUnitBusy</title><link>http://www.cppblog.com/jackdongy/archive/2013/01/09/197153.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Wed, 09 Jan 2013 08:26:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2013/01/09/197153.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/197153.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2013/01/09/197153.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/197153.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/197153.html</trackback:ping><description><![CDATA[<div>http://devgurus.amd.com/thread/158866<br /><div><div j-op="" j-rc4=""  "="" style="margin: 50px 0px 0px 54px; outline: 0px; font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; position: relative; -webkit-box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; background-image: url(http://devgurus.amd.com/5.0.1/images/steelhead/opstripe.png); background-color: #ffffff; line-height: 1.5; color: #575757; background-position: 100% 0%; background-repeat: no-repeat repeat;"><div j-rc4=""  "="" style="margin: -1px 4px -1px -1px; padding: 8px 16px 4px; border-width: 1px 0px 1px 1px; border-top-style: solid; border-bottom-style: solid; border-left-style: solid; border-top-color: #bebebe; border-bottom-color: #bebebe; border-left-color: #bebebe; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px;"><header style="margin-bottom: 18px; overflow-x: hidden;"><h1><a href="http://devgurus.amd.com/message/1279678#1279678" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">Low ALUBusy and low FetchUnitBusy</a></h1><p jive-answer-type-notanswered=""  font-color-meta"="" style="margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #8b8b8b;">此问题&nbsp;<strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #c20000;">未被回答 。</strong></p><div style="margin: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; position: absolute; top: -18px; left: -56px;"><a href="http://devgurus.amd.com/people/NURBS" data-externalid="" data-username="NURBS" data-avatarid="-1"  jivett-hover-user"="" data-userid="189544" data-presence="null" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;"><img src="http://devgurus.amd.com/people/NURBS/avatar/46.png?a=-1" border="0" height="46" data-height="46" width="46" alt="NURBS" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></a><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; display: block;"><img src="http://devgurus.amd.com/5.0.1/images/status/statusicon-47.gif" alt="Newbie" title="Newbie" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></span></div><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-size: 0.9em; font-family: inherit; vertical-align: baseline; position: absolute; top: -23px; left: 8px; height: 20px; display: block; overflow: hidden; white-space: nowrap; width: 670.7000122070313px;"><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/people/NURBS" data-externalid="" data-username="NURBS" data-avatarid="-1" id="jive-18954413315648021248072" data-userid="189544" data-presence="null"  jive-username-link"="" style="margin: 0px; padding: 0px 3px 0px 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-size: 1.1em; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">NURBS</a></strong>&nbsp;2012-3-19 下午1:35</span></header><section style="margin-bottom: 32px;"><div style="margin: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; overflow-x: auto; overflow-y: hidden;"><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">Hi,</p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; When my kernel performs badly, the APP profiler reports a very low ALUBusy and low FetchUniBusy, (Both less than 10%)</p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; What can be the bottleneck here? Could it be because of the high number of code paths?</p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 8pt; height: 8pt;">&nbsp;</p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 8pt; height: 8pt;">&nbsp;</p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">Thanks</p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">NURBS</p></div><div id="j-answer-rollup" style="margin: 35px 0px 12px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px; position: relative;"><div id="j-inline-helpful-answers" style="margin: 20px 0px 0px; padding-left: 28px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #8b8b8b; position: relative; min-height: 0px;"><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #c9891a;">有用答案</strong>&nbsp;作者&nbsp;<span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/thread/158866#1279893"  localscroll"="" title="查看此答案" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #8ca9cd; text-decoration: initial;">pesh</a>&nbsp;</span></div></div></section><footer style="font-size: 0.9em; margin-top: 24px; min-height: 0px;"><ul style="margin: 6px 0px 0px; padding: 6px 0px; border-width: 1px 0px 0px; border-top-style: solid; border-top-color: #ebebeb; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px; list-style: none;"><li font-color-meta"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; color: #8b8b8b; display: inline-block; zoom: 1; float: left;"><span style="margin: 0px 12px 0px 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1; line-height: 1.3em;">140&nbsp;浏览次数</span></li><li font-color-meta"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; color: #8b8b8b; display: inline-block; zoom: 1; float: left;"></li></ul></footer><div clearfix"="" style="margin: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px;"></div></div></div><ul jive-discussion-threaded=""  jive-discussion-indent-0"="" style="margin: 5px 0px; padding: 0px; border: 0px; outline: 0px; font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; vertical-align: baseline; list-style-type: none; position: relative; color: #575757; line-height: 16px; background-color: #ffffff;"><li id="discussion-1279893" style="margin: 33px 0px 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><div id="thread-message-1279893"  clearfix"="" style="margin: 0px; padding-top: 17px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px;"><div j-thread-post-wrapper="" j-rc4="" jive-content=""  j-helpful"="" style="margin: 0px 0px 0px 54px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; position: relative; -webkit-box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; background-image: url(http://devgurus.amd.com/5.0.1/images/steelhead/helpstripe.png); line-height: 1.5; background-position: 100% 0%; background-repeat: no-repeat repeat;"><div j-rc4"="" id="1279893" style="margin: -1px 4px -1px -1px; padding: 8px 16px 4px; border-width: 1px 0px 1px 1px; border-top-style: solid; border-bottom-style: solid; border-left-style: solid; border-top-color: #bebebe; border-bottom-color: #bebebe; border-left-color: #bebebe; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px;"><a name="1279893" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"></a><header style="margin-bottom: 18px; overflow-x: hidden;"><h6><span style="margin: 0px 8px 0px 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; display: block; float: left; position: relative;"><span font-color-helpful"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-size: 1em; font-family: inherit; vertical-align: baseline; color: #c9891a;">有用答案</span></span><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/message/1279893#1279893" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999; text-decoration: initial;">Re: Low ALUBusy and low FetchUnitBusy</a></strong></h6><div style="margin: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; position: absolute; top: -18px; left: -56px;"><a href="http://devgurus.amd.com/people/pesh" data-externalid="" data-username="pesh" data-avatarid="1125"  jivett-hover-user"="" data-userid="193239" data-presence="null" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;"><img src="http://devgurus.amd.com/people/pesh/avatar/46.png?a=1125" border="0" height="46" data-height="46" width="46" alt="pesh" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></a><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; display: block;"><img src="http://devgurus.amd.com/5.0.1/images/status/statusicon-47.gif" alt="Newbie" title="Newbie" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></span></div><span "="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-size: 0.9em; font-family: inherit; vertical-align: baseline; position: absolute; top: -23px; left: 8px; height: 20px; display: block; overflow: hidden; white-space: nowrap; width: 670.7000122070313px;"><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/people/pesh" data-externalid="" data-username="pesh" data-avatarid="1125" id="jive-19323913315648024540807" data-userid="193239" data-presence="null"  jive-username-link"="" style="margin: 0px; padding: 0px 3px 0px 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-size: 1.1em; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">pesh</a>&nbsp;</strong>2012-3-26 上午7:07&nbsp;<span j-thread-replyto"="" style="margin: 0px; padding: 0px 0px 0px 3px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999;">(<a href="http://devgurus.amd.com/thread/158866#1279678" title="转至消息"  localscroll"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999; text-decoration: initial;">回复 NURBS</a>)</span></span><a j-helpful-star="" j-ui-elem=""  popped"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; background-image: url(http://devgurus.amd.com/5.0.1/themes/amd-dev-global/images/j-ui-sprite.png); background-color: transparent; width: 39px; height: 38px; overflow: hidden; display: block; position: absolute; top: -14px; right: -18px; visibility: visible; z-index: 3; -webkit-animation: pop 0.2s; background-position: -55px -113px; background-repeat: no-repeat no-repeat;"></a></header><section style="margin-bottom: 32px;"><div style="margin: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; overflow-x: auto; overflow-y: hidden;"><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">Hi,&nbsp;<span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; vertical-align: baseline;">NURBS!</span></p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; vertical-align: baseline;">Can you provide information about your device? If it's an AMD APU then there were problems with performance counters in previous versions of APP Profiler.</span></p><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; vertical-align: baseline;">Also, check ALUPacking counter, if it has low value, then you code is VLIW limited and ALUBusy is poor, in this case try to reduce some data dependencies across sequential operations, it will allow compiler to better pack ALU instructions in VLIW, and utilize ALU resources. Try to reduce control flow statements, they affect counters to. In your situation, maybe you have if-statements, where in one branch you do fetch operation, and in another do some computations? That will cause some part of wavefront do fetch, and only after that remainder of wavefront will do ALU operations. So you will use only part of resources at time.</span><br /></span></p></div></section><footer style="font-size: 0.9em; margin-top: 24px; min-height: 0px;"><ul style="margin: 6px 0px 0px; padding: 6px 0px; border-width: 1px 0px 0px; border-top-style: solid; border-top-color: #ebebeb; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px; list-style: none;"><li style="margin: 0px 0px 0px 20px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1;"><a href="http://devgurus.amd.com/message-abuse!input.jspa?objectID=1279893&amp;objectType=2" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">举报滥用</a></li>&nbsp;<li style="margin: 0px 0px 0px 20px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1;"><span id="jive-acclaim-like-container-2-1279893-" data-uniquekey="2-1279893-" data-ratingtype="like" data-hasvoted="false" data-likes="0" acclaim-container="" acclaim-like-container=""  j-disabled"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">喜爱</span>&nbsp;(<a href="http://devgurus.amd.com/thread/158866#" id="jive-acclaim-likedlink-2-1279893-"  jive-acclaim-likedlink"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">0</a>)</span></li></ul></footer></div></div></div><ul jive-discussion-threaded=""  jive-discussion-indent-1"="" style="margin: 5px 0px; padding: 0px; border-width: 0px 0px 0px 1px; border-left-style: solid; border-left-color: #dcdcdc; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; list-style-type: none; position: relative;"><li id="discussion-1279896" style="margin: 33px 0px 0px; padding: 0px 0px 0px 24px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; background-image: url(http://devgurus.amd.com/5.0.1/images/steelhead/replyarrow.png); background-color: transparent; position: relative; background-position: 0% -36px; background-repeat: no-repeat no-repeat;"><div id="thread-message-1279896"  clearfix"="" style="margin: 0px; padding-top: 17px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px;"><div j-thread-post-wrapper="" j-rc4="" j-op=""  jive-content"="" style="margin: 0px 0px 0px 54px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; position: relative; -webkit-box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; background-image: url(http://devgurus.amd.com/5.0.1/images/steelhead/opstripe.png); line-height: 1.5; background-position: 100% 0%; background-repeat: no-repeat repeat;"><div j-rc4"="" id="1279896" style="margin: -1px 4px -1px -1px; padding: 8px 16px 4px; border-width: 1px 0px 1px 1px; border-top-style: solid; border-bottom-style: solid; border-left-style: solid; border-top-color: #bebebe; border-bottom-color: #bebebe; border-left-color: #bebebe; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px;"><a name="1279896" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"></a><header style="margin-bottom: 18px; overflow-x: hidden;"><h6><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/message/1279896#1279896" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999; text-decoration: initial;">Re: Low ALUBusy and low FetchUnitBusy</a></strong></h6><div style="margin: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; position: absolute; top: -18px; left: -56px;"><a href="http://devgurus.amd.com/people/NURBS" data-externalid="" data-username="NURBS" data-avatarid="-1"  jivett-hover-user"="" data-userid="189544" data-presence="null" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;"><img src="http://devgurus.amd.com/people/NURBS/avatar/46.png?a=-1" border="0" height="46" data-height="46" width="46" alt="NURBS" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></a><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; display: block;"><img src="http://devgurus.amd.com/5.0.1/images/status/statusicon-47.gif" alt="Newbie" title="Newbie" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></span></div><span "="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-size: 0.9em; font-family: inherit; vertical-align: baseline; position: absolute; top: -23px; left: 8px; height: 20px; display: block; overflow: hidden; white-space: nowrap; width: 646.9500122070313px;"><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/people/NURBS" data-externalid="" data-username="NURBS" data-avatarid="-1" id="jive-18954413315648028040496" data-userid="189544" data-presence="null"  jive-username-link"="" style="margin: 0px; padding: 0px 3px 0px 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-size: 1.1em; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">NURBS</a>&nbsp;</strong>2012-3-26 上午7:57&nbsp;<span j-thread-replyto"="" style="margin: 0px; padding: 0px 0px 0px 3px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999;">(<a href="http://devgurus.amd.com/thread/158866#1279893" title="转至消息"  localscroll"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999; text-decoration: initial;">回复 pesh</a>)</span></span></header><section style="margin-bottom: 32px;"><div style="margin: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; overflow-x: auto; overflow-y: hidden;"><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">I have dual Radeon 6950 with either 12.3 or the new beta driver. It seems control flow was the issue, things are much better now. Is there an equation&nbsp; I can use to sum up the numbers of counters to 100%, so that I can be more certain I am not getting bogus numbers?</p></div></section><footer style="font-size: 0.9em; margin-top: 24px; min-height: 0px;"><ul style="margin: 6px 0px 0px; padding: 6px 0px; border-width: 1px 0px 0px; border-top-style: solid; border-top-color: #ebebeb; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px; list-style: none;"><li style="margin: 0px 0px 0px 20px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1;"><a href="http://devgurus.amd.com/message-abuse!input.jspa?objectID=1279896&amp;objectType=2" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">举报滥用</a></li>&nbsp;<li style="margin: 0px 0px 0px 20px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1;"><span id="jive-acclaim-like-container-2-1279896-" data-uniquekey="2-1279896-" data-ratingtype="like" data-hasvoted="false" data-likes="0" acclaim-container="" acclaim-like-container=""  j-disabled"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">喜爱</span>&nbsp;(<a href="http://devgurus.amd.com/thread/158866#" id="jive-acclaim-likedlink-2-1279896-"  jive-acclaim-likedlink"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">0</a>)</span></li></ul></footer></div></div></div><ul jive-discussion-threaded=""  jive-discussion-indent-1"="" style="margin: 5px 0px; padding: 0px; border-width: 0px 0px 0px 1px; border-left-style: solid; border-left-color: #dcdcdc; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; list-style-type: none; position: relative;"><li id="discussion-1279898" style="margin: 33px 0px 0px; padding: 0px 0px 0px 24px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; background-image: url(http://devgurus.amd.com/5.0.1/images/steelhead/replyarrow.png); background-color: transparent; position: relative; background-position: 0% -36px; background-repeat: no-repeat no-repeat;"><div id="thread-message-1279898"  clearfix"="" style="margin: 0px; padding-top: 17px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px;"><div j-thread-post-wrapper="" j-rc4=""  jive-content"="" style="margin: 0px 0px 0px 54px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; position: relative; -webkit-box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; box-shadow: rgba(0, 0, 0, 0.117647) 0px 1px 1px; line-height: 1.5;"><div j-rc4"="" id="1279898" style="margin: -1px 4px -1px -1px; padding: 8px 16px 4px; border-width: 1px 0px 1px 1px; border-top-style: solid; border-bottom-style: solid; border-left-style: solid; border-top-color: #bebebe; border-bottom-color: #bebebe; border-left-color: #bebebe; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; -webkit-background-clip: padding-box; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px;"><a name="1279898" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"></a><header style="margin-bottom: 18px; overflow-x: hidden;"><h6><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/message/1279898#1279898" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999; text-decoration: initial;">Re: Low ALUBusy and low FetchUnitBusy</a></strong></h6><div style="margin: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; position: absolute; top: -18px; left: -56px;"><a href="http://devgurus.amd.com/people/pesh" data-externalid="" data-username="pesh" data-avatarid="1125"  jivett-hover-user"="" data-userid="193239" data-presence="null" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;"><img src="http://devgurus.amd.com/people/pesh/avatar/46.png?a=1125" border="0" height="46" data-height="46" width="46" alt="pesh" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></a><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; display: block;"><img src="http://devgurus.amd.com/5.0.1/images/status/statusicon-47.gif" alt="Newbie" title="Newbie" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;" /></span></div><span "="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-size: 0.9em; font-family: inherit; vertical-align: baseline; position: absolute; top: -23px; left: 8px; height: 20px; display: block; overflow: hidden; white-space: nowrap; width: 623.2000122070313px;"><strong style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><a href="http://devgurus.amd.com/people/pesh" data-externalid="" data-username="pesh" data-avatarid="1125" id="jive-19323913315648033687877" data-userid="193239" data-presence="null"  jive-username-link"="" style="margin: 0px; padding: 0px 3px 0px 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-size: 1.1em; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">pesh</a>&nbsp;</strong>2012-3-26 上午8:46&nbsp;<span j-thread-replyto"="" style="margin: 0px; padding: 0px 0px 0px 3px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999;">(<a href="http://devgurus.amd.com/thread/158866#1279896" title="转至消息"  localscroll"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #999999; text-decoration: initial;">回复 NURBS</a>)</span></span></header><section style="margin-bottom: 32px;"><div style="margin: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; overflow-x: auto; overflow-y: hidden;"><p style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">I guess no, there is no such&nbsp;<span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; vertical-align: baseline;">equation. First of all because when fetch instruction is applied by wavefront executing on compute unit,&nbsp;</span>this wavefront goes to fetch unit, where it sits until fetch is done. At this time other wavefronts are doing calculations, or wait unit fetch unit become free, to execute next fetch instructions. So when some wavefronts are doing memory read or write other can do computations, and in the best case both counters can have 100% value, and ALUFetchRatio counter will equal to 1. Another important counters is FetchUnitStalled and WriteUnitStalled, try to keep them about 0 value. If it's too big, then many of wavefront are waiting for fetch unit to do memory read/write. To improve performance first of all, try to use&nbsp;<span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; vertical-align: baseline;">sequential</span>&nbsp;memory access pattern, then try to use local memory, if your algorithm reuse data several timers within workgroup.</p></div></section><footer style="font-size: 0.9em; margin-top: 24px; min-height: 0px;"><ul style="margin: 6px 0px 0px; padding: 6px 0px; border-width: 1px 0px 0px; border-top-style: solid; border-top-color: #ebebeb; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; min-height: 0px; list-style: none;"><li style="margin: 0px 0px 0px 20px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1;"><a href="http://devgurus.amd.com/message-abuse!input.jspa?objectID=1279898&amp;objectType=2" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">举报滥用</a></li>&nbsp;<li style="margin: 0px 0px 0px 20px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; display: inline-block; zoom: 1;"><span id="jive-acclaim-like-container-2-1279898-" data-uniquekey="2-1279898-" data-ratingtype="like" data-hasvoted="false" data-likes="0" acclaim-container="" acclaim-like-container=""  j-disabled"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline;">喜爱</span>&nbsp;(<a href="http://devgurus.amd.com/thread/158866#" id="jive-acclaim-likedlink-2-1279898-"  jive-acclaim-likedlink"="" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: baseline; color: #009966; text-decoration: initial;">0</a>)</span></li></ul></footer></div></div></div></li></ul></li></ul></li></ul><div id="jive-thread-reply-footer" style="margin: 30px 0px 0px; padding-right: 5px; padding-left: 5px; outline: 0px; font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; vertical-align: baseline; min-height: 0px; clear: both; position: relative; text-align: right; color: #575757; line-height: 16px; background-color: #ffffff;"><a href="http://devgurus.amd.com/thread/158866#158866" style="margin: 0px 0px 0px 1em; padding: 0px; border: 0px; outline: 0px; font-style: inherit; font-family: inherit; vertical-align: bottom; color: #009966; text-decoration: initial; display: inline-block; zoom: 1;">转至原文</a></div></div></div><img src ="http://www.cppblog.com/jackdongy/aggbug/197153.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2013-01-09 16:26 <a href="http://www.cppblog.com/jackdongy/archive/2013/01/09/197153.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Understanding performance counters</title><link>http://www.cppblog.com/jackdongy/archive/2013/01/09/197150.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Wed, 09 Jan 2013 05:36:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2013/01/09/197150.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/197150.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2013/01/09/197150.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/197150.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/197150.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: http://devgurus.amd.com/thread/159558Understanding performance counters此问题被&nbsp;假设已回答。chersanya&nbsp;2012-8-5 下午12:03I have a kernel, and each workitem processes tens of elements (firstly perform som...&nbsp;&nbsp;<a href='http://www.cppblog.com/jackdongy/archive/2013/01/09/197150.html'>阅读全文</a><img src ="http://www.cppblog.com/jackdongy/aggbug/197150.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2013-01-09 13:36 <a href="http://www.cppblog.com/jackdongy/archive/2013/01/09/197150.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>ALUBusy question</title><link>http://www.cppblog.com/jackdongy/archive/2013/01/09/197140.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Wed, 09 Jan 2013 02:35:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2013/01/09/197140.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/197140.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2013/01/09/197140.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/197140.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/197140.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: http://devgurus.amd.com/thread/158655ALUBusy question此问题&nbsp;已被回答。viscocoa&nbsp;2012-2-20 下午2:39What does ALUBusy in APP profiler really mean?&nbsp;If there is branching in a kernel, the SIMD unit wi...&nbsp;&nbsp;<a href='http://www.cppblog.com/jackdongy/archive/2013/01/09/197140.html'>阅读全文</a><img src ="http://www.cppblog.com/jackdongy/aggbug/197140.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2013-01-09 10:35 <a href="http://www.cppblog.com/jackdongy/archive/2013/01/09/197140.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>适用于ATI卡的GPU计算MD5的小程序源码，基于AMD APP SDK开发</title><link>http://www.cppblog.com/jackdongy/archive/2012/12/27/196701.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Thu, 27 Dec 2012 02:46:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2012/12/27/196701.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/196701.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2012/12/27/196701.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/196701.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/196701.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 以下代码在win7 home basic , ati hd 5450平台测试通过，处理速度为每秒100万次。&nbsp;程序很简单，只有一个main.cpp程序。Device端只有一个md5.cl文件。下面我把代码贴出来，因为不能上传附件，我把完整工程包放到了242337476的群共享里面。。。。&nbsp;main.cpp#include "CL\cl.h"#...&nbsp;&nbsp;<a href='http://www.cppblog.com/jackdongy/archive/2012/12/27/196701.html'>阅读全文</a><img src ="http://www.cppblog.com/jackdongy/aggbug/196701.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2012-12-27 10:46 <a href="http://www.cppblog.com/jackdongy/archive/2012/12/27/196701.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Test latency for clEnqueueNDRangeKernel</title><link>http://www.cppblog.com/jackdongy/archive/2012/12/03/195936.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Mon, 03 Dec 2012 13:32:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2012/12/03/195936.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/195936.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2012/12/03/195936.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/195936.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/195936.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: http://pastebin.com/fije3CKf#include &lt;stdlib.h&gt;#include &lt;stdio.h&gt;#include &lt;string.h&gt;#include &lt;CL/opencl.h&gt;&nbsp;cl_int cl_error; // OpenCL error codecl_device_i...&nbsp;&nbsp;<a href='http://www.cppblog.com/jackdongy/archive/2012/12/03/195936.html'>阅读全文</a><img src ="http://www.cppblog.com/jackdongy/aggbug/195936.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2012-12-03 21:32 <a href="http://www.cppblog.com/jackdongy/archive/2012/12/03/195936.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>采用OpenCL标准实现FPGA设计</title><link>http://www.cppblog.com/jackdongy/archive/2012/11/22/195577.html</link><dc:creator>jackdong</dc:creator><author>jackdong</author><pubDate>Thu, 22 Nov 2012 14:00:00 GMT</pubDate><guid>http://www.cppblog.com/jackdongy/archive/2012/11/22/195577.html</guid><wfw:comment>http://www.cppblog.com/jackdongy/comments/195577.html</wfw:comment><comments>http://www.cppblog.com/jackdongy/archive/2012/11/22/195577.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/jackdongy/comments/commentRss/195577.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/jackdongy/services/trackbacks/195577.html</trackback:ping><description><![CDATA[<p><strong><span style="font-size: 12px">OpenCL标准简介</span></strong></p>
<p>OpenCL应用程序含有两部分。OpenCL主程序是纯软件例程，以标准C/C++编写，可以运行在任何类型的微处理器上。例如，这类处理器可以是FPGA中的嵌入式软核处理器、硬核ARM处理器或者外置x86处理器，如图4所示。</p>
<p><span style="font-size: 12px"><img style="height: 271px; width: 590px" alt="" src="http://articles.csdn.net/uploads/allimg/120906/1454102P1-0.jpg" width="647" height="311" /></span></p>
<p><span style="font-size: 12px">&nbsp;&nbsp;&nbsp;在这一主软件例程执行期间的某一点，某一功能有可能需要很大的计算量，这就可以受益于并行器件的高度并行加速功能，例如CPU、GPU、FPGA等器件。要加速的功能被称为OpenCL内核。采用标准C编写这些内核；但是，采用结构对其进行注释，以设定并行处理操作和存储器等级。图5中的例子对两个数组a和b进行矢量加法，将结果写回输出数组应答中。矢量的每一元素都采用了并行线程，当采用像FPGA这类具有大量精细粒度并行单元的器件进行加速时，能够很快的计算出结果。主程序使用标准OpenCL应用程序接口(API)，支持将数据传送至FPGA，调用FPGA内核，传回得到的数据。</span></p>
<p><span style="font-size: 12px"><img alt="" src="http://articles.csdn.net/uploads/allimg/120906/1454105644-1.jpg" width="619" height="356" /></span></p>
<p><span style="font-size: 12px">1Khronos集团网站对OpenCL标准进行了详细的介绍。</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp;与CPU和GPU不同，其并行线程可以在不同的内核中执行，而FPGA能够提供不同的策略。可以把内核功能传送到专用深度流水线硬件电路中，它使用了流水线并行处理概念，在本质上就是多线程的。这些流水线的每一条都可以复制多次，与一条流水线相比，提供更强的并行处理功能。如图5所示，可以通过级联功能单元实现矢量加法内核，在OpenCL描述中实现每一操作，进行复制以满足实际应用的吞吐量和延时要求。</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp;虽然所显示的只是一个简单表征，但每个功能单元都可以是深度流水线，以保证最终电路的工作频率足够高。此外，编译器可以建立电路来管理与外部系统的通信。在这个例子中，DDRx控制器和PHY连接至内核，使其能够高效访问片外阵列。类似的，PCI Express?(PCIe?)IP自动例化，连接至内核，这样，x86主机能够通过OpenCLAPI与FPGA加速器进行通信。</p>
<p>在FPGA上实现OpenCL标准的优势</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp;使用OpenCL描述来开发FPGA设计，与基于HDL设计的传统方法相比，具有很多优势。最显著的优势如图6所示。开发软件可编程器件的流程一般包括进行构思、在C等高级语言中对算法编程，然后使用自动编译器来建立指令流。</p>
<p><span style="font-size: 12px"><img style="height: 137px; width: 533px" alt="" src="http://articles.csdn.net/uploads/allimg/120906/1454105594-2.jpg" width="575" height="121" /></span></p>
<p><span style="font-size: 12px">&nbsp; &nbsp; &nbsp; &nbsp;这一方法可以与传统基于FPGA的设计方法相比。这里，设计人员的主要工作是对硬件按照每个周期进行描述，用于实现其算法。传统流程涉及到建立数据通路，如图7所示，通过状态机来控制这些数据通路，使用系统级工具(例如，SOPCBuilder、PlatformStudio)连接至底层IP内核，由于必须要满足外部接口带来的约束，因此，需要处理时序收敛问题。OpenCL编译器的目的是帮助设计人员自动完成所有这些步骤，使他们能够集中精力定义算法，而不是重点关注乏味的硬件设计。以这种方式进行设计，设计人员很容易移植到新FPGA，性能更好，功能更强，这是因为OpenCL编译器将相同的高级描述转换为流水线，从而发挥了FPGA新器件的优势。</span></p>
<p><span style="font-size: 12px"><img style="height: 282px; width: 556px" alt="" src="http://articles.csdn.net/uploads/allimg/120906/1454104A3-3.jpg" width="828" height="450" /></span></p>
<p><span style="font-size: 12px">案例：MonteCarloBlack-Scholes方法</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp;在金融市场上最重要的一个基准测试方法是通过Monte Carlo Black-Scholes方法计算期权价格。该方法基于对底层股票价格的随机仿真，以及数百万不同路径上的平均预期收益。图8以图形化的方式显示了这类仿真的一个例子。</p>
<p><span style="font-size: 12px"><img style="height: 81px; width: 522px" alt="" src="http://articles.csdn.net/uploads/allimg/120906/1454106146-4.jpg" width="817" height="43" /></span></p>
<p><span style="font-size: 12px">&nbsp; &nbsp; &nbsp; &nbsp;图9显示了进行这一计算的高级算法结构。首先采用Mersenne旋转随机数发生器来创建均匀分布的数值。将随机数序列送入逆正态累积密度函数，以产生正态分布序列。然后，使用几何布朗运动，这些随机数用于仿真股票价格的变化。在每一仿真通路的最后，记录看涨期权的收益，进行平均来产生收益预期值。整个算法通过大约300行的OpenCL代码来实现，可以从FPGA移植到CPU、GPU。</span></p>
<p><span style="font-size: 12px"><img style="height: 332px; width: 551px" alt="" src="http://articles.csdn.net/uploads/allimg/120906/145410J04-5.jpg" width="668" height="423" /></span></p>
<p style="margin-bottom: 15px; font-family: Arial,Helvetica,sans-serif,宋体; margin-top: 3px; line-height: 22px">&nbsp;</p>
<p style="margin-bottom: 15px; font-family: Arial,Helvetica,sans-serif,宋体; margin-top: 3px; line-height: 22px"><span style="font-size: 12px">&nbsp; &nbsp; &nbsp; &nbsp;利用针对Altera FPGA开发的OpenCL工作台，可以产生很好的基准测试结果，如表1所示。与相应的GPU相比，面向Stratix? IV FPGA EP4SGX530的OpenCL工作台在吞吐量上超过了CPU和GPU。与相应的GPU相比，在执行相同的代码时，FPGA解决方案</span><span style="font-size: 12px">不但提高了吞吐量，保守估计，功耗也只有其五分之一。速率和高功效相结合，降低了大计算量应用的功耗需求。</span></p>
<p style="margin-bottom: 15px; font-family: Arial,Helvetica,sans-serif,宋体; margin-top: 3px; line-height: 22px">&nbsp;</p>
<p><span style="font-size: 12px"><img style="height: 89px; width: 536px" alt="" src="http://articles.csdn.net/uploads/allimg/120906/1454103G8-6.jpg" width="608" height="59" /></span></p>
<p style="margin-bottom: 15px; font-family: Arial,Helvetica,sans-serif,宋体; margin-top: 3px; line-height: 22px"><span style="font-size: 12px">结论</span></p>
<p style="margin-bottom: 15px; font-family: Arial,Helvetica,sans-serif,宋体; margin-top: 3px; line-height: 22px"><span style="font-size: 12px">&nbsp; &nbsp; &nbsp; &nbsp;利用FPGA上的OpenCL标准，与目前的硬件体系结构(CPU、GPU，等)相比，能够大幅度提高性能，同时降低了功耗。此外，与使用Verilog或者VHDL等底层硬件描述语言(HDL)的传统FPGA开发方法相比，使用OpenCL标准、基于FPGA的混合系统(CPU+</span><span style="font-size: 12px">FPGA)具有明显的产品及时面市优势。Altera于2010年加入Khronos集团，为标准建设做出了积极贡献。</span></p>
<p style="margin-bottom: 15px; font-family: Arial,Helvetica,sans-serif,宋体; margin-top: 3px; line-height: 22px"><span style="font-size: 12px">原文转自：<a href="http://www.ednchina.com/ART_8800501745_19_35499_AN_a996b8f4.HTM">http://www.ednchina.com/ART_8800501745_19_35499_AN_a996b8f4.HTM</a></span></p><img src ="http://www.cppblog.com/jackdongy/aggbug/195577.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/jackdongy/" target="_blank">jackdong</a> 2012-11-22 22:00 <a href="http://www.cppblog.com/jackdongy/archive/2012/11/22/195577.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>