C++ Coder

HCP高性能计算架构,实现,编译器指令优化,算法优化, LLVM CLANG OpenCL CUDA OpenACC C++AMP OpenMP MPI

C++博客 首页 新随笔 联系 聚合 管理
  98 Posts :: 0 Stories :: 0 Comments :: 0 Trackbacks
http://devgurus.amd.com/thread/158655

ALUBusy question

此问题 已被回答。

viscocoaNewbie
viscocoa 2012-2-20 下午2:39

What does ALUBusy in APP profiler really mean?

 

If there is branching in a kernel, the SIMD unit will run multiple passes to cover all cases. The ALUs are actually used less efficiently. Does ALUBusy reflect this?

 

__kernel void MyKernel()

{

    float x = 0, y = 0;

    if (get_global_id(0) % 2 = 0)

       x ++;

    else

       y ++;

}

 

Should APP Profiler show ALUBusy is 100% or 50%?

 

Thank you!

Lihan Bin
正确答案 作者 Lihan Bin  打开 Mar 20, 2012 1:42 PM

ALUBusy measures the percentage of GPU time ALU instructions are processed. There are many reasons for a low ALUBusy number, for example, not enough active wavefront to hide instruction latency or heavy memory access.

Code divergence can be measured with VALUUtilization counter if you have SI hardware.

  • Re: ALUBusy question
    viscocoaNewbie
    viscocoa 2012-2-22 下午12:12 (回复 viscocoa)

    Hi,

     

    ALUBusy is a very important factor for program optimization.

     

    In cases of branching, some ALUs are "busy" with useless errand. I wonder if the useless job is counted as ALUBusy.

     

    Some branchings are inevitable. If a low ALUBusy is caused by these branchings, I will not try to optimize the program more.

     

    Any suggestions will be appreciated.

     

    Vis Cocoa

  • 正确答案Re: ALUBusy question
    Lihan BinModerator
    Lihan Bin 2012-3-20 下午1:42 (回复 viscocoa)

    ALUBusy measures the percentage of GPU time ALU instructions are processed. There are many reasons for a low ALUBusy number, for example, not enough active wavefront to hide instruction latency or heavy memory access.

    Code divergence can be measured with VALUUtilization counter if you have SI hardware.

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
    • Re: ALUBusy question
      viscocoaNewbie
      viscocoa 2012-3-23 下午3:26 (回复 Lihan Bin)

      Hi Lihan Bin,

       

      Thank you very much for you answer.

       

      I need a clear definition of ALUBusy. According to your comment, ineffeciency caused by branching is not counted. Even if only one thread in a group is doing useful work, the ALU is 100% busy (at the moment).

       

      I think divergence is a very important factor for kernel tune-up.

       

      I did not find VALUUtilization in App Profiler. What is SI hardware?

       

      Thank you again and have a great weekend!

       

      Vis Cocoa

      • Re: ALUBusy question
        SkysnakeNewbie
        Skysnake 2012-4-1 上午1:50 (回复 viscocoa)

        SI is the short form for Southern Islands, the newest GPU architectur of AMD. SI is also known as the HD79x0, HD78x0 and HD77x0 series for the desktop. For the mobile chips you have to look.

posted on 2013-01-09 10:35 jackdong 阅读(327) 评论(0)  编辑 收藏 引用 所属分类: OpenCL

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理