I’ve had a couple of queries about my last post - mostly of the ‘what the hell are you talking about?’ variety. I figure now is a good time to explain as any. But first, a new screenshot:
What exactly is ClusterGL?
It’s a system for taking an standard application that uses OpenGL for 3D graphics and transparently streaming the rendering commands to multiple separate machines over a network.
And why is this useful?
It’s the technique used to make ‘display walls’, like this:

How does it work?
The basic idea is that you have a lot of monitors tiled in a wall, each connected to a different computer. Then you start a copy of ClusterGL on each of them, and run your OpenGL application as normal. ClusterGL intercepts the OpenGL commands and sends them over the network to the CGL renderer application running on each of the display machines. The renderer application is smart enough to modify its internal camera position based on where it’s positioned in the wall, meaning that the application is rendered *once*, over a very big composite display. From the applications point of view, it doesn’t have to do anything special - as far as it’s concerned, it’s just running at a very high screen resolution.
More scary technical details:
LD_PRELOAD is used to force my client library to be loaded first so I can capture the GL commands called by the application. Every time the application calls GL_SWAP_BUFFERS (indicating that it has finished the current frame), my program sends the GL data to the remote renderer applications.
At present, the original window is still created (the blank window in the top-left of the screenshot). It provides somewhere to send X events (like keystrokes). This will probably have to change.
To decrease bandwidth usage, instead of sending the entire series of commands every frame, I only send the difference in the stream since the last frame. Typical GL programs consist of a lot of ‘begin frame, move to x/y/z, draw these vertexes, rotate n degrees on an axis, draw another lot of vertexes…’ and so on. The key point here is that most of the calls are the same each frame - in most cases the vertexes drawn are going to be identical, the only changes are the arguments given to rotation and translation calls. So, we can cache the commands on the renderer side, and simply transmit a command that says ‘and the next N instructions are the same as last time‘ instead. If something has changed, we send a ‘and this is the next thing to do‘ command instead, obviously.
The effect of this can be seen in the screenshot at the top of this post - each renderer window is using around 2k/s of data, at a fairly high framerate. This is because the only changes between frames is one argument to two calls to GL_ROTATE, so that’s all that is transmitted.
There’s some other nice tricks that I intend to play to make the system more efficient as well:
- Renderer-side interpolation of frames. If the application is only sending the renderer updates ever 20fps, we can still ensure a smooth framerate on the display nodes by interpolating between the previous two frames.
- Only sending renderer nodes the data they’re actually going to use. Because I have access to the raw vertex data on the application side, and I know the camera position of the other nodes, it should be fairly simple to compute bounding boxes to only send when necessary.
- Caching of texture/vertex buffer/shader data on the renderer nodes.


0 Responses to “ClusterGL #2”
Leave a Reply