I’ve almost completely forgotten about posting, fortunately I have something to show. I’ve implemented tiled deferred shading using compute shaders. I decided to use a single compute shader since they seemed far better for performance than constructing the light grid on the CPU. I’ve implemented as suggested by http://dice.se/wp-content/uploads/GDC11_DX11inBF3_Public.pdf .
First I calculate the min/max z value for each tile illustrated below, the differences in the values are mostly visible around the edges.
Then a view frustum is calculated for each tile and then the light is checked against the frustum as normal (the details are probably best explained in the aforementioned paper). After that each work group switches to process lights, meaning that a 16×16 work group can process 256 lights in parallel, which enables some quite fast computations. My visualisation of the tiles affected by a lights radius below is probably an eyesore but I didn’t care much for how it looked, but the yellow tiles show which tiles are affected by a light after the frustum culling
And here’s the final result showing the grids and running 1024 lights:
The only problem I’ve encountered aside from learning compute shaders and setting everything up correctly was that the radius isn’t calculated very well using my current setup. One of the key things you can do with tiled shading is to set a maximum number of light that you can compute per tile, I tried setting a limit of 40 which enhances performance by a ton with lots of lights in a small area, only problem is that the radius of the light is so big that the max lights per tile calculation stops working properly, illustrated below:
If I set the filter to 40 lights the above happens and if I remove it everything runs fine, I will probably look into a better calculation for the radius as I’d like to have a lot of lights in a small area just for testing at least even though you might not usually cram more than 40 lights into a tiny area.