Designing  a  Data-Driven Renderer

Donald Revie

 

 

4.1         Introduction

Since the advent of hardware acceleration, rendering 3D graphics has been almost entirely achieved through the use of a handful of APIs.  It is thus accepted   that almost all engines that feature real-time 3D visuals will make use of one or more of these. In fact, as computing complexity and power increase, it becomes inconceivable that a single development studio could create proprietary code to access the abilities of hardware or implement all the newest techniques in fields such as physics and AI. Thus, engine development focuses more and more on integrating functionality exposed by APIs, be that hardware, OS features, or middleware solutions [Bell 10].

Each API is built using its own model and paradigms best suited to a specific problem domain, expressing in higher-level terms the structures and concepts of that domain. Graphics APIs are no different in this respect, typically expos-  ing the hardware as a virtual device that must send relatively large amounts of information to the hardware when rendering each frame.

The engine itself can also be thought of as an interface, exposing a superset of the functionality found in all its components in a consistent manner. This inter- face forms the environment within which the core logic of the game or simulation is implemented. Thus, the engine must adapt the concepts and functionality of its constituent APIs to its own internal model, making them available to the game logic.

This chapter explores designing a renderer to bridge the gap between the logical simulation at the core of most game engines and the strictly ordered stream of commands required to render a frame through a graphics API. While this is a problem solved by any program used to visualize a simulated scene, the solution presented here focuses on providing a flexible data-driven foundation on which to build a rendering pipeline, making minimal assumptions about the exact rendering style used.

 

291


The aim is to provide a solution that will decouple the design of the renderer from the engine architecture. Such a solution will allow the rendering technology on individual projects to evolve over their lifetime and make it possible to evaluate or develop new techniques with minimal impact on the code base or asset pipeline.

 

4.2         Problem Analysis

So far we have defined our goal as exposing the functionality of a graphics API in a way consistent with our engine, placing emphasis on flexibility of rendering style. As a top-level objective this is sufficient but provides very little information from which to derive a solution. As already stated, every API is designed to its own model requiring its own patterns of use coinciding to a greater or lesser extent with that of the engine. To determine this extent and thus the task of any renderer module, both the API and the intended engine interface must be explored in detail. Discussion of these matters will be kept at a design or conceptual level, focusing on the broader patterns being observed rather than on the specifics of implementation in code. In this way the problem domain can be described in a concise manner and the solution can be applicable to the widest possible range of graphics API and engine, regardless of language or other implementation details.

 

4.2.1       Graphics API Model

Most graphics APIs belong to one of two groups, those based on the OpenGL standards [Shreiner et al. 06] and those belonging to Microsoft’s Direct3D family of libraries [Microsoft 09]. At an implementation level, these two groups are structured very differently: OpenGL is built on the procedural model informed by its roots in the C language, and Direct3D has an object-oriented structure using the COM (Component Object Model) programming model. Despite this, the underlying structure and patterns of use are common to both groups [Sterna 10] (and most likely all other APIs as well). Above this implementation level we can work with the structures and patterns common to all graphics APIs and define a single model applicable to all.

No doubt many of these concepts will be very familiar to most graphics pro- grammers, however in programming how a thing is thought about is often more important than what that thing is. Therefore, this section is intended to achieve a consensus of view that will form the foundation for the rest of the chapter. For instance, this section is not concerned with what the graphics API does, only the patterns of interaction between it and the application.

Graphics device. The graphics API is designed to wrap the functionality of hard- ware that is distant from the main system architecture, be it an expansion card with self-contained processor and memory or a completely separate computer ac- cessed via a network connection; communication between the main system and


the graphics hardware has historically been a bottleneck. This distance is ad- dressed by the concept of a virtual device; the exact term varies between APIs. Differing hardware devices, and even software in some cases, are exposed as a uniform set of functionality that can be roughly divided into three groups:

1.   Resource management. This controls the allocation and deallocation of device-owned memory, typically memory local to the hardware device, and controls direct access to this memory from the main system to avoid doing so concurrently from both the main system and the graphics hardware.

2.   State management. Operation of the graphics hardware is too complicated to pass all pertinent information as parameters to the draw function. In- stead, execution of the draw command is separated from the configuration of the device. Calling such functions does not modify the device resources in any way and their effects can be completely reversed.

3.   Execution. This executes a draw or other command (such as clearing a framebuffer) on the current device configuration. The results may perma- nently modify resources owned by the device.

 

Graphics pipeline. The main function of the graphics device is to execute draw commands, thus much of the device is dedicated to this task. The core of the device is structured as a series of stream processing units (Figure 4.1), each with

 


 

Figure 4.1. Pipeline (DX9).


 

 

Figure 4.2. Shader unit.

 

its own specific function in transforming streams of vertex information into pixels in the framebuffer. For the purposes of this chapter two distinct types of unit comprise the stages of the pipeline.

Fixed function units such as the raster, merge, and triangle assembler stages are characterized by having a set of specific state values used to configure oper- ations during that stage of the pipeline. For instance, the raster stage supports various culling conditions that may determine the face winding during rasteriza- tion or may mask off a subsection of the framebuffer for rendering. The various different states and their descriptions fall outside the scope of this chapter—it is enough to simply define them as a group of hard-coded device states.

In contrast, the programmable shader units (Figure 4.2) have various generic resources associated with them. The nature of each unit remains fixed by its place in the pipeline, but the exact function is defined by the shader program loaded into the unit at the time a draw call is made. To accommodate this flexibility, the resources associated with shader units are not fixed to a specific meaning like the state values of other stages; instead, they are general-purpose resources that are made visible to both the shader program and application. These resources include the following:

 

Input streams. Multiple streams of equal length may be attached to the shader, and the program is executed once for each element. These are the only values that can change across all iterations of the shader in a single draw call.

 

Output streams. Each execution of the shader program will output a value into one or more output streams.


 

 

Figure 4.3. Parameter naming.

 

 

 

Constant registers. Each shader unit has a number of constant registers  that may be set before a draw call is made; the values stored will remain constant for all iterations of the shader.

 

 

Texture samplers. Textures allow the shader access to multidimensional arrays of data. This data can be randomly indexed into, using various filtering options.

 

While increasing flexibility, these generic resources pose a problem. Without imposing a standardized mapping of resources between the shader and the engine, there is little indication telling the engine which values to associate with a given register, sampler, or stream index. Enforcing such a standard would conflict with the stated goal of maximizing flexibility and thus is not acceptable.

Fortunately high-level shading languages such as HLSL, Cg, and GLSL pro- vide a simple and elegant solution (Figure 4.3). They expose these resources as named parameters that can be defined within the shader program, either on a global scope or as arguments. Each parameter must be identified with a type and name and, in some cases, can be given an optional semantic, too. This infor- mation is also exposed to the engine, providing the same amount of detail as the fixed function state with the flexibility of being defined in data (shader code). Thus, emphasis moves from interfacing with a finite set of standardized state values to an unlimited set of values defined by a combination of type and name and/or semantic, creating a very different challenge for engine design.

It should also be noted that the structure of the pipeline is not fixed across all APIs. With each generation of hardware, new features are exposed, extending and altering the layout of the pipeline (Figure 4.4). Over the last four generations of the Direct3D API, there has been a substantial movement from fixed function to programmable stages. It can be assumed that this trend will continue in the future, and any solution should acknowledge this.


 

 

Figure 4.4. Pipeline expansions.

 

Command buffer. As the device is merely sending commands to the graphics hard- ware, these are not carried out immediately but are instead queued in a command buffer to be sent to the graphics hardware for execution. By examining the com- mand buffer, we can clearly discern a pattern of use across the whole frame (Fig- ure 4.5). Several state-setting commands are followed by the execution of a draw (or clear) command. Each recurrence of this pattern involves the configuration of the pipeline for a single draw call followed by its execution.

For the purposes of this chapter, this recurring pattern is defined as a batch. A batch describes a single unit of execution on the device, that contains all the

state setting commands required to configure the pipeline and the draw command itself. Therefore, the rendering of a frame can be described as a series of batches being executed in order (Figure 4.6).

Summary. In summary, it can be said that the modern graphics API does not recognize higher-level concepts that might be associated with rendering such as

 


 

Figure 4.5. Command buffer.


 

 

Figure 4.6. Batches.

 

lights, characters, sprites, or postprocesses. Instead, it focusses on providing a homogenizing interface exposing a uniform set of features on a range of hardware.

A typical pattern of use involves

 

over the course of a session, managing resources in device memory to ensure they are available during rendering;

 

over the course of a frame, constructing batches by assigning appropriate shaders to the programmable units and then gathering both fixed func- tion state information and shader parameters from the application. These batches must then be executed in the correct order to generate a command buffer for the frame.

4.2.2       Engine Model: Intended Pattern of Use

Every engine follows a design model dictated by the personal requirements and tastes of its authors;  thus,  in contrast to the graphics API, it is very difficult   to define a general model that will fit all engines. It is, however, possible to select common examples of rendering during a typical game or simulation and from these derive patterns that could be expected to exist in most engines. By combining and refining these patterns, a general model for all rendering can be derived. These examples of rendering may often be considered the domain of separate modules within the engine, each using a model of rendering that best suits that domain. They may even be the responsibility of different authors, each with their own style of system design. By providing a single interface for rendering at a level above that of the API, it is much simpler to create rendering effects that cross thenew systems entirely. boundaries between these systems or to build

3D scene rendering. Rendering 2D images of increasingly complex 3D scenes has been the driving force behind graphics development for many years now. Graphics


 

 

Figure 4.7. 3D scene creation.

 

APIs like OpenGL and Direct3D were originally designed with 3D rendering in mind. The older versions of the pipeline were entirely fixed function, consisting of stages like the transformation and lighting of vertices, focusing on processing 3D geometric data. This rigidity has been largely superseded by the flexibility of the programmable pipeline, but the focus on processing 3D geometry is still prevalent. Typically in real-time 3D applications, a complete visual representation of the scene never actually exists. Instead, the simulation at the core of the engine approximates the scene as a collection of objects with a visual representation being composited only upon rendering.

The visual representations of objects within the scene are usually constructed in isolation as part of the asset-creation process and imported into the simulation as data (Figure 4.7). Many simulated objects—entities—of a single type may ref- erence the same visual representation but apply their own individual attributes, such as position and orientation, when rendering. The visual scene itself is con- structed around the concept of nested local spaces defined relative to one another. Meshes are described as vertex positions relative to the local space of the whole mesh, visible entities are described as a group of meshes relative to the entity’s local origin, and that entity may be described relative to a containing space, all such spaces ultimately being relative to a single root origin. This system has the great advantage of allowing many entities to share the same mesh resources at different locations within the scene. By collapsing the intervening spaces, the vertex positions can then be brought into the correct world-space positions.

Rendering involves the further step of bringing these world-space vertices into the 2D image space of the framebuffer (Figure 4.8). To do this we need to define


Figure 4.8. 3D scene rendering.


 

 

Figure 4.9. Order-dependent rendering.

 

a point of view within the scene and a space representing the visible volume of the scene to be projected. These additional transforms are encapsulated within the camera or view frustum entity. Similarly, output information needs to be specified regarding to which framebuffer and, if necessary, to which subsection of that framebuffer the scene is to be rendered. This information could be added to the camera object or embodied in further objects. Most importantly, it illustrates a key disjoint between the representational information stored within or referenced by the individual entities and the output information responsible for compositing the scene as an image.

In some cases, such as the rendering of transparent geometry within the scene, there are further constraints placed on the order in which entities may be rendered (Figure 4.9). This is due to the use of linear interpolation when compositing such entities into the existing scene. To achieve correct composition of the final image, two additional criteria must be met. Transparent entities must be rendered after all opaque parts of the scene, and they must be ordered such that the transparent entities furthest from the camera render first and those closest render last.

There are occasions, such as rendering real-time reflections, where rendering the scene from the main viewpoint will be reliant on output from a previous rendering of the scene, typically from another viewpoint. This creates a scenario where the entire pipeline for rendering is instantiated multiple times in a chain. This could even occur dynamically based on the contents of the scene itself, with later instances of the pipeline being dependent on the results of those prior (Figure 4.10).

 


 

Figure 4.10. Result-dependent rendering.


 

 

Figure 4.11. Differing representations.

 

In some cases it is possible that the required output will not be identical to that of the main viewpoint. For instance, rendering a shadow map from the perspective of a light should result in only depth information. To use the same resources, such as shaders and textures, needed to produce a fully colored and lit image would be highly inefficient. Therefore, multiple potential representations need to be referenced by any given entity and selected between based on the required output (Figure 4.11).

Postprocessing. The previous section focused on projecting geometric shapes from 3D space into a 2D framebuffer. The postprocessing stage is instead mostly concerned with operations in image space. Modifying the contents of previous render targets, postprocessing systems often reference the high-level concept of image filtering [Shodhan and Willmott 10].

However, the graphics hardware is unable to both read from and write to a single framebuffer, with the exception of specific blending operations. This neces- sitates a pattern of use whereby the postprocessing system accesses the results of previous framebuffers as textures and writes the modified information into a new framebuffer (Figure 4.12). Each stage within postprocessing typically requires rendering as little as a single batch to the framebuffer. This batch will likely not represent an object within the simulation; it will merely consist of a screen-aligned quad that will ensure that every pixel of the source image is processed and output to the framebuffer. More complex postprocesses may require a number of stages to create the desired effect. This will result in a chain of dependent batches each using the output of one or more previous framebuffers. Upon examination it can


Figure 4.12. Postprocessing.


then be said that the required functionality for postprocessing is not dissimilar to that required for reflection or shadow map creation as described in the 3D scene rendering section. In fact, with regard to effects such as heat haze, there is some overlap between the two areas of rendering.

GUI rendering. In contrast to 3D scene rendering, the GUI system typically is not interested in projecting 3D entities but operates in a much more confined volume of space encapsulating the screen. This space is populated with various objects each of which might represent a single element of the interface. There may be a wide range of elements, from active elements such as text or status bars to interactive elements such as buttons, all of which represent elements of the logical if not physical simulation, or static decorative elements.

While these elements and the space within which they exist are often as- sumed to be inherently 2D in nature, in composing a working interface it is often necessary to layer multiple elements one atop the other in a single area of the screen. For instance, a text element may appear in front of a decorative sprite, thus adding a strict order of rendering that must be observed. This ordering can be represented by adding depth to the space and the position of elements (Figure 4.13).  In practice it might be more effective to construct the GUI as   any other 3D scene containing transparencies, treating GUI elements as normal entities, thus making effective use of the structures already implemented for such scenes. This approach has the added benefit of addressing the difficulties of con- structing interfaces that work well when rendering in stereographic 3D.


Figure 4.13. GUI rendering.

 

Summary. In summary, it can be said that by broadening the definitions of entity, scene, space, and camera, the same structures required for rendering 3D scenes can be extended for use in postprocessing and GUI rendering. Any object that needs to be rendered should be defined as an entity within a scene regardless of whether it is a character, terrain section, postprocessing stage, or GUI element. Similarly, all scenes must be rendered through the use of a pipeline of objects that provide

   contextual information for shaders (such as view and projection matrices),

   culling information about the volume being rendered,


   a correct rendering order of entities,

   output information on the render area and framebuffer to be used.

This investigation of potential rendering patterns is by no means exhaustive. It is therefore important that any solution be extensible in nature, allowing for additional patterns to be readily integrated.

 

4.2.3       Renderer Model

The graphics API and a potential engine-level interface for rendering have been examined, and simplified models have been constructed to describe typical pat- terns of activity in both. By combining the two models it is possible to accurately describe the desired behavior of the renderer. At various points in the execution of the application, the renderer will be required to perform certain functions.

Session. All visible elements of the game should be represented by entities that provide a single interface for rendering regardless of whether they are 3D, 2D, or a postprocessing stage. Each visible entity will need to reference resources in the form of shaders, meshes, textures, and any other information that describe its visual representation in terms of batch state (Figure 4.14).


Figure 4.14. Session setup.

 

Frame. Over the course of each frame, various targets must be rendered to in a specific order creating multiple stages of rendering. Each stage requires rendering for the previous stage to be complete, perhaps to allow for a change to a new framebuffer or viewport or for constrictions on the composition of the current one such as transitioning from opaque to transparent entities.

Stage. Each stage forms a pipeline consisting initially of a camera object that filters entities from the scene it is observing, culling based on its own criteria. The resulting group of entities can then be sorted using the operation specified in this particular stage. Once correctly ordered, the entities are queried for any representative data relevant to this stage (Figure 4.15).


 

 

Figure 4.15. Stage rendering.

 

Batch. Representative data from an entity is used to form the basis for a single batch. The other elements of the rendering stage then provide the remainder of the batch’s state, such as the correct viewport dimensions and framebuffer. This batch can now be executed before moving onto the next (Figure 4.16).


Figure 4.16.  Batch rendering.

 

4.2.4       Further Considerations

Having defined the basic operations of the renderer, there are a number of addi- tional requirements to consider during its development.

Exposing diverse API features. Though APIs employ the same general model, some useful features will be unique to certain APIs and new features will become available as hardware advances. It is important to allow the renderer to be quickly extended to expose these.

Supporting multiple rendering techniques. An entity with specific visual properties may exist in multiple projects with different rendering styles (e.g., forward/de- ferred shading) or in the same project across multiple platforms with differing hardware features. It is important to be able to choose between different imple- mentations of a single visual effect, preferably from a single data set.

Extensible architecture. New graphics techniques are constantly being developed, and it is important that any new patterns of rendering be easily integrated into the renderer with minimal impact to code or data formats.

Resource management. Device resources are finite and any solution should at- tempt to minimize duplication of resources wherever possible. A comprehensive resource management scheme, however, is beyond the scope of this chapter.

 

4.3         Solution Development

Before outlining an exact design for the renderer, it is a good idea to determine a general approach to the implementation, which can further inform decisions.

 

4.3.1       Object-Based Pipeline

The design of the renderer and additional constraints lend themselves well to a highly modular implementation. By examining the model derived in Section 4.2.3, we can see a pattern emerging where the functions of the renderer are independent operations performed in series to construct a single batch. This design pattern is often defined as a pipeline or pipes and filters pattern [Buschmann et al. 96]. Rather than encapsulating functionality in a monolithic object or function, a system is constructed as a series of autonomous components sharing a single interface where the output of one object becomes the input for the next. The same pattern can also be observed in many asset pipelines used by developers to process assets before loading them into an engine and in the graphics pipeline described in the API section. In some ways the renderer pipeline can be thought of as a continuation of these two pipelines, bridging the gap between the assets and the device state. Such fine-grained modularity allows for the decoupling of the composition of the pipeline as data and the function of its components as code with numerous benefits.

Accessibility. With a little explanation, artists and other asset creators should be able to understand and modify the data set used to configure the renderer without programmer intervention. This process can be further improved by providing tools that will allow users to author a renderer configuration via a visual interface.

 

Flexibility. It should be possible to change the structure of the renderer quickly by modifying the data set from which it is created, even while the engine is running. This allows for quick testing of various configurations without affect- ing the project’s work flow and also for opportunities to optimize rendering as requirements may vary over a session.

 

Extensibility. Objects allow the architecture to be extended by adding new object types that match the same interface but bring new behaviors, thus providing a degree of future proofing with regards to new rendering patterns or to exposing new API features.


4.3.2       Device Object

It has already been posited that all graphics APIs have a central device object, whether explicitly, as in the case of the Direct3D libraries, or implicitly. To im- prove the portability of the renderer code, it makes sense to write its components to use a single interface regardless of the underlying API. Equally, it makes sense for this interface to formalize the concept of the device object, even if such a concept already exists in the API.

To properly abstract all possible APIs, the device object must expose a su- perset of all the functionality provided. If a feature is not supported on a given API, then it would need to be emulated or, because the renderer is configured through data, it could warn that the current data set is not fully supported. As new features become available, the device interface would need to be extended before additional renderer components could be written to expose them.

4.3.3       Deriving State from Objects

As described previously, when rendering each batch, the renderer will iterate over the list of objects forming the entity and the pipeline, deriving from each a portion of the state required to render that batch. It is important to differentiate between the various types of states found in the graphics pipeline (Section 4.2.1) and also between the objects representing entities within the simulation and those representing the pipeline (Section 4.2.3). Iteration will ensure that objects get access to the device for state setting.

Fixed function state. Objects that contain fixed function state are simple to man- age. When iteration reaches such a node in the pipeline or entity, the node will be able to access the device and make the correct calls.

Shader parameters. Correctly setting the values of shader parameters is a con- siderably more difficult challenge. Each parameter is identified by a type and a name and an optional semantic (where semantics are supported by the shading language); there is a certain amount of redundancy between the name and se- mantic values. While the semantic quite literally represents the meaning of the parameter, a good name should be equally expressive; in practice, the name is quite often a shorthand version of the semantic (for instance, a world-view matrix might have the semantic WORLDVIEW but have the abbreviated name WVm).

It is usually enough to match either the name or semantic rather than both; each has its own benefits and drawbacks. Names are a requirement of any param- eter and thus are guaranteed to be available in all shader code; however, naming conventions cannot be guaranteed and would have to be enforced across a whole project. There exists a Standard Annotations and Semantics (SAS) initiative that looks to solve this problem by standardizing the use of semantics between applications.


 

 

Figure 4.17. Parameter setting.

 

To correctly set the parameter, an equivalent variable must be found within the logic of the program. Unfortunately, compilation usually strips the context from variables, preventing us from simply performing a search of each object’s member variables by type and name (Figure 4.17). Two possible solutions might be considered:

 

Build a search function into any object that provides such variables; it will return a pointer to the member variable that best matches the parameter (or a NULL pointer if none can be found) [Cafrelli 01]. This approach may be best suited to existing engines that use an inheritance hierarchy of en- tity types. It has the drawback of enforcing a fixed naming or semantic vocabulary in code.

 

Define entity types using data aggregation instead of classes; each object can store its variables as a structure of nodes that can actually be searched based on type and name or semantic. This may not be realistic in many engine architectures, but it has the added benefit of flexibility, allowing new


data to be inserted into an entity and to be automatically picked up by the assigned shaders.

Due to the cost of searching for data, these links between shader parameters and variables should be formed once per session when initializing the entity and its shaders rather than per frame. These links can be divided into two groups, those involving data that represent the entity and those involving the current context or configuration of the pipeline at the time of rendering.

Representational objects. Every visible object is represented by a single unique entity within the scene; however, that entity can reference resource objects that may be shared by any number of other entities. Collectively, these objects provide information required to represent the entity as one or more batches under various different conditions. This information is limited to values that remain static across the period of a frame (such as the entities’ absolute positions in the world space). It is taken in isolation and further information is required to dictate how the entity will appear at the exact moment of rendering (such as the position of the particular camera in use). For each of its potential batches, the entity stores links between all the shader parameters and variables used to set them. Where the relevant data is available within the entity or one of its referenced resources, this is used.

Contextual/pipeline objects. Where data is not available directly to the entity, it can be referenced from a global context entity. This entity is unique and contains all the variables needed to describe the current state of the pipeline from within a shader. As the pipeline changes over the course of a frame, its components modify the variables stored in the context entity, thus ensuring any batches rendered will gain the correct inputs.

4.4         Representational Objects

As stated, the visual representation of a single entity is built up from multiple objects. Each object references a more general resource and adds an extra layer of specificity, progressing from generic shared resources to a unique instance of an entity. In effect, this creates a hierarchy with resources becoming more specific as they near the root (Figure 4.18).

The exact nature of these resources is somewhat arbitrary, and the structures used could easily vary between engines. The criteria for their selection is to group states roughly by the frequency with which they change between batches; a shader might be used multiple times with different parameters, thus it will change with a lower frequency. As the specificity of an object increases so does the frequency with which it will change.

As each successive layer of the hierarchy is more specific, any values provided by it will take precedence over those from more general resources.


 

 


 

4.4.1       Effect


Figure 4.18. Hierarchy of resources.


The concept of the effect makes aspects of the programmable graphics pipeline accessible in a way that does not require deep knowledge of graphics or API programming. It does this by encapsulating these elements in a structure based not on their purpose within the pipeline but on the end result, a unique visual quality (Figure 4.19).

While the structure of the effect is fairly well standardized, interpretation  of the components involved is quite open. In this chapter, the interpretation is informed by the meaning of component names, documentation provided by APIs in which effects are a feature, and existing sources on the subject [St-Laurent 05].

 


Figure 4.19. Effect structure.


Techniques. The visual quality embodied by any particular effect can often be achieved in a number of ways, each with various trade-offs and requirements. Each technique within the effect is a different method for achieving comparable results. By applying annotations to the individual techniques, it is possible to group them by various criteria, such as features required or relative cost, allowing the renderer to algorithmically select the appropriate technique depending on the circumstances.

Passes. It may not always be possible to achieve an effect with a single batch, instead requiring multiple successive batches that may render at various points throughout the frame. Each technique is constructed from one or more passes; these passes can be annotated to direct the renderer to execute them at the correct point within the frame.

Passes contain states with a similar frequency of change, such as shaders or state assignments, which are rarely useful taken individually due to their inter- dependence on one another to create an overall pipeline configuration. There- fore, each pass within the technique combines shaders and state assignments but omits many of the shader parameter values, effectively creating an interface to the graphics pipeline allowing the effect as a whole to be parameterized.

Default values. Having the effect provide default values for all its parameters reduces the time required to implement new techniques. It minimizes the number of values that need to be overridden by more specific resources to just those that are strictly necessary, many of which may be automatically provided.

4.4.2       Assets

Assets further specialize a particular effect by providing inputs to some of the parameters. They are typically authored, at least in part, using tools external to the engine. The end result is a visual representation of an object in isolation.

Although the terms material and mesh are typically used in 3D asset creation, these are equally applicable to GUI or postprocess rendering.

Material. Typically, the material consists of constant and texture information that specializes the effect to emulate the qualities of a certain substance. More generally, to extend the concept to postprocessing, it can be considered an au- thored configuration or tuning of the effect with static values.

Mesh. A mesh is the set of all vertex data streams and indexes and thus encap- sulates all the varying input for an effect. There are many cases in which this information is not used directly as provided by asset creation tools. Instead, sys- tems within the engine will generate or modify vertex streams before transferring them to the graphics device—this could include animation of meshes, particle effects, or generating meshes for GUI text.


A more complete resource management system would likely support instancing of these resources to improve efficiency. However, such details are beyond the scope of this chapter.

 

4.4.3       Simulation

Where resources in the assets section were largely concerned with the visual at- tributes of batches, this section is concerned with objects representing the logical attributes.

 

Entity class. As part of the game design process, archetypal entities will be de- scribed in terms of visual and logical attributes shared by many individual en- tities. These attributes do not vary between individual entities and do not vary over the course of a game session, so they can be safely shared.

 

Entity instance. Individual entities are instances of a specific class; they represent the actual simulated object. As such they contain all the values that make each instance unique, those which will vary over the course of a game session and between individual instances.

 

4.5         Pipeline Objects

The pipeline is made up of a series of interchangeable objects, each of which observes the same interface. This pipeline controls the rendering of a group of entities during one stage of the frame (Figure 4.20).

 

4.5.1       Contextual Objects

Contextual objects provide values to the effect in a similar way to representative ones. However, there is a level of indirection involved; the effect parameters  are linked to values defined within a global context object, and these values are manipulated over the course of the frame to reflect the current context in which batches are being rendered.

The values are specific to each object only; they cannot provide concatenations of these with any belonging to the entity currently being rendered, although this functionality could be added as an additional stage in rendering.

 

Camera. The camera object will also likely be an entity within the scene, though perhaps not a visible one. It is responsible for the transforms used to project entities from their current space into that of the framebuffer. The camera may also provide other information for use in effects, such as its position within world space. The exact details are at the discretion of those implementing the system.


 

Figure 4.20. Pipeline objects.

 

Viewport. The viewport defines a rectangular area of the framebuffer in which to render, the output image being scaled rather than cropped to fit. The viewport could be extended to work using relative values as well as absolute ones, decou- pling the need to know the resolution of any particular target when authoring the pipeline. This could be further extended to permit nesting of viewports, with each child deriving its dimensions relative to those of its immediate parent.

Target. The render target defines the contents of the framebuffer to be used, referencing either a client area provided by the platform or textures that may be used in further stages of rendering.

 

4.5.2       Control Objects

In contrast, control objects do not set the states of effect parameters. Instead, they control the order and selection of the batches being rendered.

Camera. In addition to providing contextual data during rendering, the camera object may also perform culling of entities before they are queried for represen- tational data.


 

 

Figure 4.21. Technique filters.

 

Layer. Layers can be used for course-grained control of rendering across a single render target. This forces one set of batches to be rendered only after the previous set has completed. As the name suggests, they are analogous with layers in art packages like Photoshop or GIMP.

Sorting algorithm. Defining a sorting algorithm for the pipeline will force entities to be sorted by  arbitrary criteria before any rendering is performed.   This is    a requirement for compositing semitransparent batches onto a single layer of rendering, but it can also be used to gain additional performance by sorting batches based on material or approximate screen coverage.

Technique filter. Technique filters can be applied to provide additional informa- tion for choosing the correct technique from an effect. Each filter can provide values in one or more domains, such as platform, required features, or render- ing style. Each domain can only have a single value at any point in the frame, and these are matched to the domain values specified by the effect techniques available to select the best suited (Figure 4.21).

The domains and values are not defined anywhere in the code and only appear in the pipeline configuration and effect files, allowing each project to define their selection. Technique filters are largely optional, being most useful for larger projects and those using extensive effect libraries.

Pass filter. Pass filters are a requirement for the pipeline to operate correctly. These work differently from the technique filters in that each filter just has an identifying value and any number of pass filters can be active at a single point in the frame. When rendering an entity, all passes within the current technique that match an active filter will be rendered. The order in which they are pre- sented in the technique will be observed regardless of any sorting algorithm in use (Figure 4.22). If no passes match the active filters, then nothing will be rendered.


 

 


 

4.6         Frame Graph


Figure 4.22. Pass filters.


The configuration of the pipeline varies from stage to stage over the course of each frame. As with the representative resources, the components of the pipeline vary with different frequencies. It is therefore possible to describe the pipeline over the course of a whole frame as a graph of the various components (Figure 4.23).

 


Figure 4.23. Frame graph iteration and derived pipeline.


This graph effectively partitions the frame based on the current stage of rendering, with the nodes being active pipeline components. The components touched by iterating through the graph from root to leaf form the pipeline at that stage of the frame. Each leaf node represents a complete pipeline configu- ration for which rendering must occur. By traversing the graph depth first, it is possible to iterate through each of the pipeline configurations in sequence, thus rendering the entire frame.

As each node in the graph is iterated over, it is activated and modifies the con- text object or the control state of the pipeline. Once the leaf node is reached, the camera object begins the rendering of that stage, iterating over the scene graph and processing the visible entities based on the pipeline state. Upon completion of the stage, iteration returns back through the graph undoing the changes made by the various nodes in the graph.

While the concept of the pipeline remains valid, this behavior of modify- ing global states makes the implementation of the system more akin to a stack. Changes are pushed onto the stack as each node is visited and then popped from the stack upon returning. This reduces the frequency of certain state changes immensely, making it considerably more efficient than having each batch iterate along the full pipeline as it is being rendered.

 

4.7         Case Study: Praetorian Tech

Praetorian is Cohort Studio’s proprietary engine technology. It was developed to provide the studio with a solid platform on which to build full titles and prototypes across multiple platforms (Figure 4.24).

 

 


 

Figure 4.24. Prototypes developed by Cohort Studios: Wildlife (left) and Pioneer

(right).


 

 

Figure 4.25. Various rendering styles tested on the Me Monstar motion prototype: custom toon shading defined by Andreas Firnigl (left), soft ambient lighting (center), and deferred shaded fur (right).

 

4.7.1       Motivation

Praetorian was developed with the intention of maximizing code reuse across multiple simultaneous projects. With this in mind, it made sense to define those features unique to each project through data and scripts instead of engine code, allowing artists, designers, and gameplay programmers to control the look and feel of their project in great detail (Figure 4.25). As described earlier, effect files were a natural solution to exposing programmable graphics via data. Being compatible with a large range of art packages meant that many artists were already familiar with their use and in some cases with authoring them. However, it was soon discovered that more advanced visual effects required multiple passes throughout the course of a frame, something which effect files alone could not define. A data structure was needed that could map out the frame so as to give points of reference and control over the order of rendering at various levels of granularity.

 

4.7.2       Implementation Details

Praetorian’s design was based on the concepts described in the introduction to this chapter. At its core lies a highly flexible architecture that split the function- ality of objects, as provided by the engine’s subsystems, from the raw data stored in the central scene graph. As such all entities within the scene were represented in code by a single class that stored a list of data objects, with distinctions. An entity containing references to a mesh and material could be drawn; one with a physics object would be updated by the physics simulation; and so on.  Thus,  a single conceptual entity such as a character or light could be represented as a single entity in the scene or by a subgraph of entities grouped under a single root node. Such details were largely informed by the assets produced by artists and designers rather than led by code.


This approach made it simple to add subgraphs to the scene that could em- body many of the renderer concepts discussed in this chapter. The design de- scribed throughout this chapter is a refinement of the resulting architecture.

 

4.7.3       Analysis

The purpose of the renderer design was to separate graphics development from that of the engine, exposing through data the functionality of the graphics API beyond that already provided by effect files. In this it proved highly successful, initially in moving from forward to deferred shading on our main project and then in experimenting with a number of styles and variations including deferred and inferred lighting, various forms of cell shading, and other nonphotorealistic rendering methods. All this could occur in parallel to the main development of the projects with the roll-out of new styles being performed gradually to ensure that builds were unharmed by the introduction of potential bugs.

Another area where the design proved successful was in the definition of post- processing techniques; these benefited from the same development environment as other effects but also from being fully configurable across multiple levels within the same project.

As with any design decisions, there were trade-offs made when developing the renderer, some of which have been addressed to various degrees in the revised design. To a certain extent, the data-driven nature of the renderer became a limitation in its initial and continuing development. In the case of some simple tasks, it took considerably longer to design and implement an elegant way to expose functionality through data than it would to do so through code. Once such structures are in place, however, the savings in time can quickly make up for the initial investment. Praetorian’s initial structure was significantly more rigid than that described in this chapter—this made adding special-case behaviors, such as shadow frustum calculations or stereographic cameras, more difficult to implement.

As the renderer configurations became more complex so too did the XML files used to describe them. This had the undesired effect of reducing the system’s accessibility and increasing the likelihood of errors occurring. One solution would have been to create a visual editor to interpret the files, something that is highly recommended to anyone implementing a similar renderer design.

The generalized nature of the architecture also had the effect of making op- timization more difficult. By making minimal assumptions about the batches being rendered, it can be difficult to maximize efficiency. The greatest gains were made in performing as much processing as possible during loading or as part of the asset pipeline, moving the workload away from the time of rendering. This had an impact on the runtime flexibility of the system, forcing entities to be re- initialized if data was added or removed, but overall it was necessary to maintain realistic frame rates.


 


4.8      Further Work and Considerations

4.8.1       Optimization: Multithreaded Rendering Using Thread-Local Devices

Some APIs support the creation of multiple command buffers on separate threads. Where this is available, it would be possible to have multiple threads process the frame graph simultaneously. Each thread would iterate until it reached a leaf node. It would then lock the node before processing the resultant pipeline as normal. Should a thread reach an already locked node, it would simply skip that node and continue iteration until it discovered an unlocked leaf node or the end of the frame graph.

To make this approach safe from concurrency errors, each thread would have a local device with a subset of the functionality of the main device. This would also require thread-local context entities to store the state of the current pipeline; as no other objects are modified during rendering, the scene and pipeline objects can be accessed concurrently.

 

4.8.2       Extension: Complex Contextual Data

Some shaders will require data that does not exist within a single entity, such as a concatenated world-view-projection matrix. This data can be created in the shader from its constituent parts but at a considerably higher performance cost than generating it once per batch. Thus, a system could be added to perform op- erations on values in the context entity and the entity being rendered to combine them prior to rendering. In its most advanced state, this could take the form of executable scripts embedded in an extended effect file, a kind of batch shader.

 

4.8.3       Debugging: Real-Time Toggling of Elements in Pipeline

Shaders for displaying debug information can be added to entities by inserting additional techniques into their effects. These techniques can then be activated by adding the correct filters to the frame graph, and these sections can then be toggled to render various entities under different conditions, showing informa- tion such as normals, overdraw, or lighting. As the renderer automatically binds shader parameters to entity data it is possible to visualize a wide range of infor- mation by modifying only the shader code. This debugging information is strictly controlled by data, and as such it is simple to introduce and remove, allowing individual developers to tailor the output to their specific needs and then remove all references before shipping.


 

4.9         Conclusion

The renderer described in this chapter successfully decouples the details of graph- ics rendering from engine architecture. In doing so it tries to provide an interface that better fits the needs of graphics programmers and asset creators than those currently available—one that takes its lead from the incredibly useful effect struc- ture, attempting to expand the same data-driven approach to the entire rendering pipeline.

The principles discussed have been used in several commercial game titles and various prototypes, being instrumental in the rapid exploration of various graphical styles and techniques that could be freely shared across projects. In the future it might even be possible to expand on these concepts to create a truly standardized notation for describing graphics techniques in their entirety, regardless of the application used to display them.

 

4.10         Acknowledgments

Thanks to everyone who worked at Cohort Studios over the years for making my first real job such a great experience. I learned a lot in that time. All the best wherever the future finds you. Special thanks go to Andrew Collinson who also worked on Praetorian from the very beginning, to Bruce McNeish for showing a lot of faith in letting two graduate programmers design and build the beginnings of an engine, and to Alex Perkins for demanding that artists should be able to define render targets without programmer assistance, even if they never did.

 

Bibliography

[Bell 10] G. Bell. “How to Build Your Own Engine and Why You Should.” Develop 107 (July 2010), 54–55.

[Buschmann et al. 96] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerland, and M. Stal. Pattern-Oriented Software Architecture. Chichester, West Sussex, UK: John Wiley & Sons, 1996.

[Cafrelli 01] C. Cafrelli. “A Property Class for Generic C++ Member Access.” In Game Programming Gems 2, edited by Mark DeLoura, pp. 46–50. Hingham, MA: Charles River Media, 2001.

[Microsoft 09] Microsoft Corporation. “DirectX SDK.” Available at http://msdn. microsoft.com/en-us/directx/default.aspx, 2009.

[Shodhan and Willmott 10] S. Shodhan and A. Willmott. “Stylized Rendering in Spore.” In GPU Pro, edited by Wolfgang Engel, pp. 549–560. Natick, MA: A K Peters, 2010.

[Shreiner et al. 06] D. Shreiner, M. Woo, J. Neider, and T. Davis. OpenGL Program- ming Guide, Fifth edition. Upper Saddle River, NJ: Addison Wesley, 2006.


 


[St-Laurent 05] S. St-Laurent. The COMPLETE Effect and HLSL Guide. Redmond, WA: Paradoxal Press, 2005.

[Sterna 10]  W. Sterna. “Porting Code between Direct3D9 and OpenGL 2.0.”  In GPU  Pro, edited by Wolfgang Engel, pp. 529–540. Natick, MA: A K Peters, 2010.