并行化你的程序--Array of structure 与 structure of Array

Posted on 2012-06-05 10:11 djx_zh 阅读(3719) 评论(2) 编辑收藏引用

通常我们大量使用array of structure来开发程序，因为array of structure 具有面向对象的特征，易于描述客观世界，代码也容易理解。但是 array of structure 却常常会阻碍程序的并行化。
structure of array 与之相反，它易于并行化，但拙于描述客观世界，代码也变得难以理解。
要想代码性能好就要使用structure of array ，要想开发效率高就要使用array of structure, 设计之初就要做出选择，开发后期如果想转换到另一种方案将会大费周章。
Intel 的 Array building block 提供了一套编程接口让我们可以从array of structure 的视角编写基于 structure of array的程序。这话说起来有点绕，可以这样理解，在逻辑层是array of structure , 在物理层是structrue of　array.
在C++中我们如何实现这种逻辑层(array of structure )/物理层(structrue of　array )的分离与映射呢？
这是我们基于array of structure 的程序

struct RGB
{
        int r;
        int g;
        int b;
};
template<class T>
void test(T& rgb, size_t n)
{
    int i =0;
    for(i=0;i<SIZE;i++){
        rgb[i].r = 3*i;
        rgb[i].g = 3*i + 1;
        rgb[i].b = 3*i + 2;
    }
    for(i=0;i<SIZE;i++){
        rgb[i].b=rgb[i].r + rgb[i].g;
    }
}
#define SIZE 65536
int main()
{
  RGB* rgb = new RGB[SIZE];
test(rgb, SIZE);
}

要将上面的程序转换为SOA，我们首先为RGB定义一个影子

struct RGBshadow
{
        RGBshadow(int& r, int& g, int& b):r(r),g(g),b(b){}
        int& r;
        int& g;
        int& b;
};

然后我们有一个模板类用于定义SOA类，此类为通用类

template<class Shadow, typename T1, typename T2, typename T3>
class SOA
{
    public:
        typedef T1 aligned_t1 __attribute__((aligned(16)));
        typedef T2 aligned_t2 __attribute__((aligned(16)));
        typedef T3 aligned_t3 __attribute__((aligned(16)));
    public:
        SOA(int n){
            r = (aligned_t1*)_mm_malloc(n*sizeof(T1), 64);
            g = (aligned_t2*)_mm_malloc(n*sizeof(T2), 64);
            b = (aligned_t3*)_mm_malloc(n*sizeof(T3), 64);
        }
        ~SOA(){
            if(r) _mm_free(r);
            if(g) _mm_free(g);
            if(b) _mm_free(b);
        }
        Shadow operator [] ( size_t i){
            return Shadow(r[i],g[i],b[i]);
        }
    private:
        aligned_t1* r ;
        aligned_t2* g ;
        aligned_t3* b ;
};

#define SIZE 65536
int main()
{
  RGB* rgb = new RGB[SIZE];
  test(rgb, SIZE);
  SOA<RGBshadow, int, int,int> soa(SIZE);
  test(soa, SIZE);
}

编译器会自动向量化test(soa,SIZE);

test(rgb, SIZE);中的第二个for循环生成的代码如下：

.L14:
    movl    (%rbx,%rax), %edx
    addl    4(%rbx,%rax), %edx
    movl    %edx, 8(%rbx,%rax)
    addq    $12, %rax
    cmpq    $786432, %rax
    jne .L14

test(soa, SIZE);中的第二个for循环生成的代码如下：

.L16:
    movdqa  (%rsi,%rax), %xmm0
    paddd   (%rcx,%rax), %xmm0
    movdqa  %xmm0, (%rdx,%rax)
    addq    $16, %rax
    cmpq    $262144, %rax
    jne .L16

要将AOS转换为SOA，分如下三步
1。定义一个影子结构
2。利用SOA<shadow,...>模板定义相应的SOA结构
3。修改业务代码，SOA<shadow,...> 与AOS有相同的操作方式，因而可以尽量少的修改代码。

Feedback

# re: 并行化你的程序--Array of structure 与 structure of Array 回复 更多评论

2012-08-06 14:14 by ningle

看你的博客，uefi的application基本都是用C++写成，难道这是现在uefi application开发的主流嘛？

# re: 并行化你的程序--Array of structure 与 structure of Array 回复 更多评论

2012-08-06 22:16 by djx_zh

@ningle
开发UEFI application的主流还是C。如果application规模十分庞大，用C++开发效率会高些。

刷新评论列表

只有注册用户登录后才能发表评论。
【推荐】100%开源！大型工业跨平台软件C++源码提供，建模，组态！



网站导航: 博客园 IT新闻 BlogJava 博问 Chat2DB 管理

string