C++博客-Der See der Vergessens-随笔分类-Cg艺术

Pure GPU Computing Platform : NVIDIA CUDA Tutorial

周波 — Sat, 24 Feb 2007 06:42:00 GMT

摘要: 近距离接触NVIDIA新发布的GPU计算平台CUDA 阅读全文

周波 2007-02-24 14:42 发表评论

利用SAH实现kD树快速分割模型实践

周波 — Thu, 15 Feb 2007 14:31:00 GMT

摘要: kD 树是二叉树结构的一个变种，当前主要用于加速光纤跟踪的遍历过程。最简单的排序二叉树以各个元素的大小关系作为分割点，而 kD 树简而言之就是从数据中选择一个“维度”构造一个超平面对数据集进行分割。比如要对学生数据进行分割，找出哪些学生的生日小于 2 月 18 日，那么就只要遍历整个集合，把所有的数据分成。如果又要在符合第一次条... 阅读全文

周波 2007-02-15 22:31 发表评论

Progressive Mesh

周波 — Tue, 19 Dec 2006 05:57:00 GMT

　　快要考试了，没有多少时间去玩了。又借了下数据结构，复习一下，同时开始看一些资料，为寒假做准备吧。

　　Progressive Mesh已经不是非常新的技术，在HF2 SOURCE引擎中得到了应用。ATi的那个DEMO，帕提农神庙的实时渲染程序，一次跑到同学的老ATi X1600XT上测试了一把。倒不是震撼于使用的技术，而是赞叹于美工制作这个建筑模型的精细度……

　　什么叫做Teamwork，我是完完全全的体会到了。空有Programmer的想法，没有来自美工精心雕琢的实验模型是绝对不行的。

　　OoCS(x)的思想是，预先处理模型数据，使用Octree树分割，生成LOD数据，写到磁盘上。运行时载入，检测当前View Frustrum的位置，应用LOD绘制场景。对我来说相当的复杂，Octree还没有接触过，下面要踏踏实实的复习数据结构了。

　　特别一提的是这个人的个人网站，微软研究院的Hugues Hoppe，没什么词汇形容，强字完毕。估计Direct3D中优化网格函数的算法估计就是他的研究成果，看他的PDF可以很快的了解原理。

　　多谢LOGOS兄的反问，原文忘记了写一些胶水文字。ATi的那个DEMO使用了OOCS的算法，而Progressive Mesh和OoCSx中的一些思想类似，比如都使用了Quadric Matrix判断顶点。

周波 2006-12-19 13:57 发表评论

矩阵圣经 FOR 3D Computer Graphic Final

周波 — Sun, 10 Dec 2006 05:46:00 GMT

摘要: 想来编程也有一段时间，什么都很明白就是对于坐标变换不是很理解，总是在关键的时候迷乱不已，胡乱的写一些变换代码，得到的结果当然让自己云里雾里。仔细的看了一下好几本书关于3D变换的篇章，总结了一下，希望对大家有帮助。末了声明以下，可能我说得也有错误的地方，敬请局内人明鉴指正，我只是一个在校学生没有实际的工作经验。恳请大家提出宝贵的意见，打造一个Matrix Bible，让更多的初学者不要走弯路。谢谢大... 阅读全文

周波 2006-12-10 13:46 发表评论

BattleField 2142引擎图形程序员小访谈

周波 — Fri, 10 Nov 2006 03:44:00 GMT

　　最近在vLan上面鏖战BF2142，着实被这个游戏深深地吸引住，所以就开始关注起BF系列的引擎起来，只知道Script部分是Python完成的。在国外的一个站点上发现了这个小小的访谈，翻译给大家仅供了解。

Continuing our series of occasional interviews with game developers about current and upcoming hardware and game graphics engines, we chat with Marko Kylmamaa, senior graphics programmer for Digital Illusion' Canadian studio.

　　本期的采访对象是来自DICE的高级图像程序员Marko Kylmamaa先生。

FiringSquad: First, Intel and AMD are pushing dual core processors and within the next year four core processors are due to be released. How will DICE support this kind of tech in the Battlefield 2/2142 engine and will there be any need for special programming to fully support multi core CPUs in PCs?

　　提问：目前Intel与AMD力推双核CPU，目前明年都准备推出４核心的CPU。DICE准备如何在BF2引擎中加入对这种技术的支持，如果这样做需要什么特殊的编程技术么？

Marko Kylmamaa: While a program geared towards a single-core machine may run fine, with some exceptions, and perhaps even somewhat faster on a multi-core machine, in order to realize the real performance benefits a careful attention has to be paid into structuring the code for the correct granularity in mind, to make it suitable for multi-core execution. With the introduction of the next generation consoles and the PC hardware, the whole industry is in a learning phase for understanding the differences between the traditional multi-threading approaches, and multi-threading for multiple cores. DICE is working closely with hardware vendors in making sure that all of the future titles make the maximum use of the available multi-core architecture.

　　回答：本来单核心的机器就可以运行得很好，有些时候甚至要快于多核机器。其实问题主要是在多核心的处理比单核心复杂（类似于痛苦的多线程），需要正确的处理代码的结构与处理同步。随着下一代硬件的普及，整个领域开始学习多线程编程技术。DICE也在不断和硬件厂商深入合作发挥多核架构的性能。

FiringSquad: The 64-bit CPU has taken longer to really appear in mainstream PCs than some people expected. Do you think 64-bit CPUs will become more popular and how does DICE support it in their Battlefield 2/2142 engine ?

　　提问：64位CPU的普及速度超过人们的预计到来得如此之快，您认为６４位cpu会流行起来么？DICE在BF2引擎中如何支持它呢？

Marko Kylmamaa: One of the problems with harnessing the full power of 64-bit CPU抯 is the lack of adoption of 64-bit operating systems. Due to this it抯 difficult for the game developers to make full use of the 64-bit execution potential without providing a separate set of executables compiled for the different operating systems. The current Battlefield 2 technology has been thoroughly tested on the 64-bit architecture for guaranteeing a solid performance, and optimizations have been made where possible with such architectures in mind.

　　回答：由于现在64bit操作系统对64位ＣＰＵ的支持不是非常好，所以还无法完全发挥６４位ＣＰＵ的性能。如果不分别的为不同平台编写程序就无法发挥６４位的性能，这是个难点。BF2已经在６４位平台上经过测试与优化过。

FiringSquad: Game physics are getting more and more attention as well with more attention being put into destructible objects and better collisions. Where does DICE stand on this kind of support for its engine and what solution is best; having a dedicated card (AGEIA) using a graphics card (ATI/Havok) or using a CPU to handle it?

　　提问：游戏的物理特性越来越受到重视。DICE如何看待它？您认为哪种方案最好呢？是独立的AGEIA物理卡，还是NV/Havok的图形卡，还是用CPU处理？

Marko Kylmamaa: Especially with multiplayer games in mind, it is difficult to make use of scaleable physics, since especially from the gameplay perspective all of the players must experience the same end result in simulation regardless of their hardware. This leads to a lot of the scalability of the physics being used for visual effects such as richer particle effects or fluid simulation. The GPU can of course be used for offloading the physics simulation from the CPU, but this will compete with the remaining processing time for graphics. Therefore in most cases it is necessary to strike the right balance between the CPU and GPU usage with the needs of the particular game in mind. The next generation technology at DICE is being built on the bleeding edge and will make use of very comprehensive physical modeling.

　　回答：在多人游戏中使用物理特性是相当难做的，从玩家的视角来说，所有的交互角色必须体验到相同的物理特性而不关系他们说使用的是何种硬件。已经使用的物理特性有比如流体模拟粒子系统等等。ＧＰＵ可以分担一些ＣＰＵ的物理模拟计算工作，但是这样就和图形计算争抢了宝贵的资源。虽然如此，我们依旧需要平衡ＣＰＵ和ＧＰＵ之间的负载。DICE将会充分的利用下一代技术为玩家构建最优秀的物理体验。

FiringSquad: HDR lighting is also getting a lot of attention in more PC games. How does the Battlefield 2/2142 engine support those features and how will that help the graphics in games that use it?

　　提问：HDR光照效果也被越来越多的提及。BF2/2142引擎是如何支持这种特效，而且它将如何提升游戏画面呢？

Marko Kylmamaa: HDR lighting can add significantly to the perceived realism in the modern graphics engines. It is becoming an increasingly common feature as the new hardware supports full floating point surfaces and has the required processing power for supporting a multitude of such high end features.
Some aspects of the HDR lighting were simulated especially in the Battlefield 2 Expansion Pack: Special Forces, for adding a degree of realism to the night-time look. The effect is fairly settle and was used mainly for fine tuning the overall look. Battlefield 2142 does not have night-time levels, so the same technology was not applicable to it, however there are a great number of special lighting effects for enhancing the desired futuristic look of the game.

　　回答：HDR光照可以作为现代图形引擎的一个特性。在新硬件完全支持浮点计算的方式下，它可以提高画面质量让它看起来更真实，同时也需要相当的计算量。ｈｄｒ在ｂｆ２特别武力　中被使用，用于夜视效果。BF2142没有夜市场景，所以也就没有使用这种技术（应该是HDR），不过我们使用其他的光照效果提高画面的真实感。

FiringSquad: More and more games are using extensive pixel and vertex shading for visual and art effects. How does the Battlefield 2/2142 engine support these features currently and how will pixel and vertex shaders be used in the future, particularly with Windows Vista and DirectX10 support?

　　提问：越来越过的游戏广泛使用PS及VS技术提高画面质量。BF2/2142的引擎如何支持这些特色，未来PS VS将被如何使用，特别是VISTA和DX10的来临？

Marko Kylmamaa: The Battlefield 2 engine has been built on the DirectX9 architecture and is a fully shader based model. This allowed for a great flexibility during the development, and not supporting the older fixed function pipeline model allowed us to concentrate solely on the high end features. Battlefield 2142 is based on the improved Battlefield 2 technology and will be released later this year, so considering that the DirectX10 hardware won抰 be widely available just yet, it hasn抰 been beneficial to re-architect the engine into a DirectX10 based model for this release. This allowed the available time to be used for adding a number of new special effects and polishing the overall look of the existing engine.

　　回答：目前BF2引擎完全构建于DX9架构，这是个完全基于Shader的模型。这提高了开发的可伸缩性，摆脱了FF管线模型让我们得以实现最高级的特效。BF2142基于改进的BF2引擎技术，不久将发布于世，所以考虑到DX10硬件不会那么快的普及，我们将引擎重新构建以适应DX10的模型。这样我们就有时间在以后的日子里继续加入新的效果，拓展现有的引擎。

FiringSquad: What other advanced hardware and graphical features do you think will be supported in upcoming Battlefield 2/2142 engine games and in future graphics engine?

　　提问：您认为BF2/2142引擎将会支持哪些高级的硬件及其图形技术，未来的引擎呢？

Marko Kylmamaa: Battlefield 2142 will support a large range of high end special effects geared towards creating the desired futuristic look. These involve for example new atmospheric effects for creating a unique look that is quite different from Battlefield 2.

　　回答：BF2142支持许多特效用来构建绚丽真实的图像。比如，球体光照技术（Atomospheric Effect）技术就和BF2中的不同。

FiringSquad: Finally, Mark Rein from Epic has said that Intel is hurting the PC gaming industry through its use of intergrated graphics in PCs. Is this a real threat and if so what can be done about this from the game developer's side?

　　提问：最后，Epic（不要告诉我不知道，即将发布的UT2007）的Mark Rein说，Intel正在通过集成图形硬件损害PC游戏工业。从游戏开发者的角度来说您如何看待这个问题？

Marko Kylmamaa: Intel produces what you could call the ultra-low end graphics cards for a market segment that typically doesn抰 wish to invest the money into a higher end, gaming geared hardware. Clearly there is a demand for this type of hardware as Intel抯 graphics cards boast a large user base. However, this does impose challenges for the games industry in our attempts at reaching especially for the casual gamer market. Hardware requirements for the next generation games keep growing faster than what is needed for running general applications, which increases the rift between the casual and hardcore hardware markets. I believe that we as an industry will also have to recognize the different requirements these markets impose.
From the perspective of a developer, it can be difficult or in some cases practically impossible to make the high-end game run on the ultra-low end hardware. Supporting such scalability range in performance could be prohibitive with the required development time and cost in mind. It is ultimately up to each developer to find the correct range of hardware which allows for the desired market penetration.

　　回答：买Intel的显卡的人，就是那些你称之为买低端货的那些人，他们其实都不会花钱构建一个游戏平台。虽然事实如此，由于这个原因的影响，我们还是不太容易开拓这样的一个市场。游戏对硬件的需求总是要远高于商用软件，其实这也扩大了硬件市场的层次差距。我相信整个工业会对看清楚这个问题。从一个游戏开发者的角度来说，让高端游戏运行在低端平台上着实困难。因为要支持这些性能不一的硬件需要提高开发的时间和花费。更本上还是要开发者根据他们所要开发的市场这一角度进行硬件的平台的选择。

周波 2006-11-10 11:44 发表评论

GPU还可以做什么 —— Brook for GPUs,Stream Computing On GPUs

周波 — Sat, 14 Oct 2006 14:21:00 GMT

研究GPGPU也有一段时间了，去年这个时候正在学习GLSL。一段时间前在opengl.org上面发了一个Suggestion，建议GLSL向Cg以及CgFX学学架构，不要这样成对成对的零散使用，虽然说自己可以写class进行封装，可是如果Shader一多管理起来是相当的头疼，应该学学HLSL Cg那样的方式，通过technique与pass的选择进行渲染，在概念上也符合multi-pass。

GPU的SIMD性能超强，比CPU强得太多太多，由此带来异常强悍的浮点运算性能，请看下图。

    画外音：不知道我的6200A排在什么地方哈哈。

    其实上图有偏颇，这张图节选自Siggraph2004，而现在ATi 1800XT的SIMD性能已经超过了6800好多，可不是游戏性能。不过可以看出，比CPU的浮点运算性能高好几倍是不真的事实，可是如何利用呢？

    可编程硬件的到来为我们开了一个好头，也许未来计算机硬件的发展趋势就是，通用计算Generic Computing（GC，自造词汇，可不是垃圾收集）。显卡一直以来都是和Pixel打交道，读取Texel，处理Primitive，写入FrameBuffer，为SIMD的应用打下了坚实的基础。显卡芯片从开始就是并行设计的，这样从纹理单元读取Texel时才能发挥效力，当年大名鼎鼎的Riva TNT2的意思其实是TwiNs Textures双纹理，而不是黄色炸药。Geforce3依靠添加的几个昂贵的register实现了Vertex Programming。NV收购3dfx，推出NV30系列芯片，伴随着DX8为PC机引入Shader，开创PC机图像画质飞跃的先河，如今热门游戏大多数已经使用可编程着色技术用来实现以往在工作站上才能实现的效果，这就是为什么如今看游戏实时演算的画面都比当年Square动用sgi工作站集群渲染出来的FF8动画效果好的原因。其实高级CG图形理论在80年代就已经相当成熟，比如78年的Shadow mapping，White的Ray-tracing等等。那些技术以后我会慢慢给大家介绍，大家不妨去NVIDIA下载一个SDK研究一下，还有MS DX SDK也是必需的。

    先说目前可编程硬件用作通用计算的局限，而且在我看来，这个局限在Vista与DX10流行后可能依旧得不到解决，那就是API的问题。显卡厂商提供的驱动，无一例外的都是彻底为显示服务的，而不是用来标榜自己是GPGPU的。虽然说都有了自己的本地编译器（主要是用于编译GLSL string codes，HLSL可以预先编译好，然后再由驱动载入执行），可是依旧不是为了计算非图形数据服务。于是找到了Sh。Sh是一个很有趣的东西，使用了metaprogramming技术，模拟图形语言的算法，编译的时候转化为对应的低等级ASM语句，很多Graphic Slide里面进行核心算法展示的时候都用的Sh。有兴趣地可以到这里看一下。强烈建议显卡厂商推出可以直接进行计算的驱动，不要和FrameBuffer牵涉，可以直接通过Bus写入内存，技术上并不难，也许是个商业问题。关键时刻永远是商业左右技术的发展，而不是技术人员的一厢情愿就可以左右世界发展，如今已经不是工业革命时代了。

    给大家介绍来自Starford University的Brook（听起来好像广告，不过在Shading Language界可是有Starford Shading Language得一席之地的）。Brook可以理解为是一个C编译器，只不过它编译的不是Bin，而是C++ string codes，而且是着色计算语句数组。比如有这样一段Brook代码，简单的Alpha混合，不对，不像，反正就是它了：

kernel void saxpy(float alpha, float4 x<>, float4 y<>,
out float4 result<>) {
result = (alpha * x) + y;
}

编译成最终的C++代码变成，

static const char* __saxpy_fp30[] = {
"!!FP1.0\n"
"DECLARE alpha;\n"
"TEX R0, f[TEX0].xyxx, TEX0, RECT;\n"
"TEX R1, f[TEX1].xyxx, TEX1, RECT;\n"
"MADR o[COLR], alpha.x, R0, R1;\n"
"END \n"
"##!!BRCC\n"
"##narg:4\n"
"##c:1:alpha\n"
"##s:4:x\n"
"##s:4:y\n"
"##o:4:result\n"
"##workspace:1024\n"
"##!!multipleOutputInfo:0:1:\n"
"",NULL};
void saxpy (const float alpha,const ::brook::stream& x,const ::brook::stream& y,
::brook::stream& result) {
    static const void *__saxpy_fp[] = {"fp30", __saxpy_fp30, "ps20", __saxpy_ps20,
                    "cpu", (void *) __saxpy_cpu, NULL, NULL };
    static __BRTKernel k(__saxpy_fp);
    k->PushConstant(alpha);
    k->PushStream(x);
    k->PushStream(y);
    k->PushOutput(result);
    k->Map();
}

    这不就是纯粹的Shading Language么。不过值得注意的是，Brook通过运行库进行封装，把GPU当作Streaming Processor，由CPU进行控制，计算数据并输出。目前似乎只能进行图形的计算，比如FFT，Ray-Tracing等演示，还没有到达能够计算pi的程度。

    思考了一下。精度问题需要解决，FP16刚刚开始广泛使用，FP32还不能够支持硬件过滤。FP32仅仅只是IEEE754 float的精度而已，更本谈不上double的精度，用在需要精度较高的地方可能还不是很适合。如我设想那样，进行pi的几百万位的计算，目前来说不太可能，首先，Shading Language从来就没有提供地址的操作，也就是无法选泽Pixel的位置，也就是无法对FrameBuffer进行准确定位。如果可以解决这个问题，那么就可以进行真正意义上的通用计算，那个时候FrameBuffer只是一个暂时的缓冲容器而已。

    SIMD的物理计算可以相当的强悍。物理特性计算都是强调同时性的，而GPU可以同时并行计算，充分发挥了自己的优势，难怪NVIDIA要和Havok进行合作。记得以前看过博客园中一位先生写的物理引擎，着实震惊，我建议他不妨研究研究这一块。Stream的概念将在DX10上得到彻底的诠释，不妨看看我以前翻译的DX10文章，其中Geometry Shader很有意思。

    我期待下一代API出现，一个崭新的软硬件组合方案，这样就可能为Display Adapter这个古老的东西带来真正的革命。值得注意的是，AMD已经收购了ATi，而Intel还在为100亿美元收购NV的价格评估的时候，也许下一代变革已经开始了，让我们拭目以待。

    提到的东西可以在这里找到
    Brook http://sourceforge.net/projects/brook
    libSh http://sourceforge.net/projects/libsh

周波 2006-10-14 22:21 发表评论