Over the past couple of months I have been reading a lot about writing games and engines for multiprocessor machines and it is definately a non-trivial problem! It is complicated even more if you want to make full use of the processors on most home computers (currently single-core), the Xbox360 (Triple-Core with two hardware threads per core
) and the PS3 (Cell-Processor
)! From what I have been reading there are many ways to get use of multiple processors. The simplest, if you have an already existing codebase is called Synchronous Functional Parallel and looks like this:
The idea is to take parts that are easily independant and break them off into their own thread. The hard part is to pull the bits out of an existing system and make then run in parallel. It is also unlikely that you are going to find that many things to run at the same time so you would make use of only 1 or 2 of the processors and even then the time for the cycle would be dictated by the longest running process since each process needs to resync each frame so you could, in theory, spend quite a bit of time waiting. The next possible configuration is called Asynchronous Funtional Parallel.
The down side of this method is that it would require almost a complete rewite if you had an existing engine. We don't so this is not a problem for us! The benefits are that you don't need to sync the threads each frame so each one can run independantly and any time one thread needs to interact with another thread it would just grab the latest state of that thread. This may cause threads to use old information from other threads, but you would only be behind 1 or 2 frames and that shouldn't be noticable. A downside that we would need to be concerned about is the fact that there are a limited number of tasks that you can break your engine into. For example if you look at the picture, there are three. If we had a quad-core processor the fourth core would be completely unutilized! Another way to do threading is to break things up according to the data being processed instead of the tasks that need to be done. This is called Data Parallel.
As you can see this is a somewhat synchronous approach in that each thread must finish before it enters the rendering stage. That means that again the time that section takes is dictated by the thread that takes the longest. The biggest benefit here is that you can break this up over an arbitrarily large number of processors and have them all fully utilized, a down side is that things can get pretty complex if the objects in your game need to interact. It's not unmanagable, but it's something that will require quite a bit of thought to get right!
In reading an article about the rewrite of Valve's Source Engine they say that they have decided to use a Hybrid solution but they don't go into the details as to what it looks like. I think that the best hybrid would be to have something that looked like the Asynchronous Functional Parallel and then break each function down into data objects.
This would allow the function to be sort of a Data Object Manager and it could break them up as it saw fit. Or example in the Physics function it may choose to break the objects into chunks by area so that the objects that are most likely to interact are in the same thread. This would have the synchronous issue where the time to execute would be dictated by the longest execution time, but it would only be in that function and not compuded by each function and so should be less noticable. It also has the benefit that it can be scaled across an arbitarily large number of cores.
I would really like to get peoples opinion on this last setup as it's the one that I am leaning towards using.