Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.

[Final] Now it’s finished!

It’s been a good few weeks, and time has passed with a flash. The project has had a lot of ups and downs, but it has finally arrived at a conclusion, to my delight. It was a fun process and I believe I’ve learned a lot, and have got good grounds to learn even more. I don’t intend on stopping my work on this project as it is, not now. There are just a lot of more things I can do to it.. but anyway, here’s my wrap-up. My report. In short.


The positives:

  • A lot of tasks were simpler than I had estimated, so I ended up winning a lot of time.
  • OpenCL was easier to get into than imagined.
  • I was more capable than I first imagined, too. It helps to get confident. =D
  • Managed work so quickly, even with timelosses..

The negatives:

  • Timeloss on sickness: 2 weeks approx.
  • Timeloss on driver issues: 1 week approx.
  • Could have had better initial planning!
  • There’s a lot of place for more sophisticated solutions, like maybe using a heightmap for particle collision. Or something else.

Overall I’m happy with my results, and I feel there is a lot of room for improvement. Improvements that I am going to try out from home and at my own leisure.


The final delivery ended up being not quite a .lib. I wasn’t happy enough with the result, so instead I send out my entire source, which is full of test-code and unoptimized parts; but it’s something for anyone to improve upon.

This is Klas, over and out.

[Week 7] Ray-Triangle intersection, friction and I’m starting to get done.

This seems to be it. Even if most of the mechanics are currently locked down in a conveiently named test kernel, they are there and they seem to work. Now I can start working with making the code more readable, isolate parts into their own kernels, and of course optimize the code after doing this. Then I can set up example scenes for the upcoming milestone. For demonstrating them at the gradshow, I may or may not add functionality to alter the system on the fly, adding several force fields and several particle systems at a time; all with activated and/or deactivated kernels.

So, today  I wish to go through the particle-triangle collisions. For the collision detection, I’m using Möller & Trumbore’s fast triangle intersection test, modified as to return ray values of (float)INFINITY if there are no intersecting rays. (or if the ray is behind a triangle). For detecting the actual collision, I make a ray with an origin on the particle, with it’s velocity as its reach, and if I get a proper collision, the distance to the collision point is returned. When a collision happens, I split the current velocity vector in two components; one component that is aligned with the surface normal, and another that is aligned with the surface tangent. This is so that I can apply particle restitution and plane friction using:

V=(1-F )*vT – R*vN.
Where V = velocity,
F = friction,
vT = Tangent Force,
R = restitution,
vN = Normal Force

[Week 6] Just a few example videos

Now, I know the system behaves mostly the same. But the purpose of these tests were just to try out functionality.

[Week 6]Simple local forces somewhat finished, trying plane/triangle collision

I didn’t think trying local forces would go as fast as it did, after settling the sorting and putting extra time in some overdue courses I did it in less than a day. The local forces are not complex by any means.. I had hoped to put in some sort of field texture, but with my time I think I’d rather focus on a simple spherical ’emitter’. That can either drag in or push away particles. I may, when I create the kernel for this functionality also add functionality for bounding boxes with a “wind” vector..

I am currently looking into plane/triangle collision with my particles.. if I could get it done swiftly, I could get to cleaning up my code and actually make a demo of sorts. I’ve already planned what the demo should contain by itself. I may also, since it’s an extremely logical thing to do, add support for several particle systems in one particle manager. However, focus lies on making the particles behave pleasantly first! It doesn’t matter if I can demonstrate several systems at once if they don’t behave as I want. I’d rather have a demo with a single particle emitter, than several that don’t work.

[Week 6] Well… that was embarassing

So.. I spent most of the week trying to implement a sorting algorithm. I never did that well to begin with, nothing seemed to work quite as I imagined, and I trawled through several papers trying to write my own sorting algorithms.

Nothing of this had any success, so before the weekend, I decided to use some existing projects for sorting. None of which seemed to work properly.


In short. NOTHING.  WHATSOEVER. Worked. Apart from my particle kernel. But that one was really simple, nothing much to screw up there. But.. in my desperation, I, today decided to install new drivers. And… it suddently works. Thanks a bunch Nvidia. I had broken drivers all along.


So now I have working depth sorting. I’ll post a simple run-through of the “algorithm” later.  And a video. Even if the particle kernel and test emitter still looks the same, this means I can start doing other things. For this evening, I will be clearing up my code however. A video will come at a later time.

[Week 5] Sitrep! More sickness and overall confusion!

So, turns out my body decided it was a good idea to ground me for a few more days. Losing a few days off of the end of last week. I’ve been trying to read into and make my own parallel sorting algorithm implementations, but all my efforts so far have been rather.. pointless. The resulting sorting kernels have had underwhelming performance. Sorting on my schedule was supposed to be done with this week, but I can’t get my own things to work at an acceptable rate. I will likely just resort to using existing code (like, for example, NVIDIAs SDK) as it would seem that understanding and implementing a fully proper parallel sorting algorithm is currently beyond me.

I hope to, by the end of the week, be able to sort particles by depth. I will not reorder any particles, as this would chug up considerable amounts of time on reordering ALL related blocks, such as color, movement vector, etc. Instead I will have a VBO index buffer to dictate the draw order of my particles on screen. This means I can reduce the sorting calls to whenever I want them. Sorting after each iteration will, as it looks now, just chug up valuable compute time.

[Week 4] A note on not-progress, and some handy links!

So, this week I haven’t got a lot done past reworking my schedule. mostly because my time has been spent reading papers and pre-existing code in SDKs. I may end up using pre-existing code for sorting, with slight modifications. However, if I do, I’ll make sure it’s within my right, leave a nudge towards the original authors/owners.

As said, I’ve had no VISUAL, nor any CODE advancements this week apart from the particle rendering. Instead I’ve read up on sorting on the GPU. I want to sort on the GPU because transferring all of the data back and forth from the GPU-CPU would be out of the question. It would be a bottleneck. However, I thought I’d make a few shout outs. NVIDIA and AMD both have nice SDK examples (however I haven’t found the sorting samples of NVIDIA). And Intel has got great OpenCL samples too, one of which are a bitonic sort, which is what I’m looking for!


The idea for the particles are that I’m gonna sort them. But I will not sort the particles once every frame. I’ll look into depth-sorting the particles maybe every fifth frame, or tenth. This is an arbitrary number and shouldn’t be tough to fiddle with.

[Week 4] Colored Billboards done

So, after finishing the billboards I figured I wanted transparency, and thus I read through that a little bit. What was seemingly simple bounded into me realizing just how much I have to learn. Even if I did bind VBOs, I rewrote the system a little bit so that I could send the buffers to my shaders, something I wasn’t quite sure about, but it worked well. However, I had many issues compiling a geometry shader, and first after fixing a bug with my debug logging could I see that it was because Geometry Shaders don’t take parametres that aren’t arrays. A facepalm later, I fixed it in a swiffy!

Also: I got some help with blending from someone.. and I feel I don’t quite grip it quite yet; I’ll have to take a more proper look later. As I will have to use different blending schemes later. I will also have to look into rendering particles to the depth buffer, but not against themselves.

This little adventure in itself worked fairly nice, I did learn a few things. Now I feel I can start with my research on particle sorting! This is gonna be interesting! The coming week I may also work on cleaning up my code a little bit and having more examples than just an all-emitting, fill-rate consuming ball of chaos. Even if it looks kind of neat!

[Week 4] Here, have a video!

[Week 4] Behind on schedule, but puffing on.

Since I spent most of the previous week bedridden, I have fallen behind a week and will have to rethink my planning. Yesterday I did pretty decently however, and now my billboards are in place. Cleaning code has gotten pushed back, and now focus will instead be on functionality. I’ll make it less messy when I feel I have the time.

This is the shader program for my first billboarded particles:

  1. [VS] Pass unprojected points directly down to the Geometry Shader
  2. [GS] GL_POINTS as input, output GL_TRIANGLE_STRIP
  3. [GS] Get the non-projected point. (modelView, no projection)
  4. [GS] Add/Subtract extents from the points for each corner of the billboard, then multiply with the projection matrix.
    projection * vec4(pos.x – extent.x, pos.y – extent.y,  pos.z, pos.w), //Lower Left
    projection * vec4(pos.x+extent.x, pos.y – extent.y,  pos.z, pos.w), //Lower Right
  5. [GS] Emit vertices accompanied by an outparameter for UV coords,
  6. [FS] Texture the billboard from a sampled texture!

This is mostly done, I just need to put up a texture on the billboard now. One interesting detail to note for someone new to geometry shaders (like me!), remember to generate points following a point only altered by the modelview matrix, otherwise the billboard will be skewed depending on the screen width and height!

Hopefully things will work pretty alright and I won’t have to overthink. I am slightly curious about the difference in consumption between these Geometry Shader billboards and normal billboarded particles. Maybe I’ll have to measure when I have time.

[Week 3] Billboards? Point Clouds?

Now that I am finally not sick, it’s time to try and get things done! The first thing I’m getting to this week is some simple representation of the particles. I want to draw scaled textures from the points. I have two choices for this. I could either make a shader for binding and drawing simple point sprites, these would be very cheap to do, but as far as I understand, they’ve got two issues.

  1. As a GL_POINT is culled out of the screen, the texture will never be drawn (even if it may cover more screen space). This could lead to some strange artifacts. I have yet to test the severity of this, so I can’t speak for how it would look. Chances are it could look allright, but I can find no direct example of it online. I’ll just trust what I read on this.
  2. GL_POINTS are not rendered with depth, so if I’ve understood this right, these point sprites will all be of a certain unit size, regardless of the depth value of the point in the pointcloud. Again, this is something I haven’t tried, but if it is like this, it would be an effect I wouldn’t desire. I could make my own distance attentuation in the shader for this, but there might be a simpler solution altogether.

I first thought that point sprites would be a nice little solution, and true, it may be one. But the points presented above made me consider just doing a simple fragment shader (at first) which would take a point and make a quad facing the camera out of it. This would mean I wouldn’t have to worry about neither culling, nor distance attentuation, as this would be handled by OpenGL itself. This shader could later be changed to create actual, rotated particle geometry, something that might be interesting to peek into later. However nothing that I will focus on during the run of this project.

[Week 3] Unforeseen consequences

There are always some things you can’t account for in an initial planning, such as complications, little things you hadn’t thought about that you needed for the larger details, some minor things. This blog post will be short, but it will provide a good explanation on this week, and why I might have to revise my planning again.

I’ve been sick. From the very day I posted the last optimistic update, the only real thing I’ve managed to get done is the milestone presentation, then my situation has been very volatile. Sometimes I’ve been lying all day, sometimes I’ve been trying all day. But no amount of study or work has been profitable, because I’ve shortly thereafter had to go for more medicine, a cup of tea and bed. It’s been absolutely terrible. The extra time I had up is long time wasted, so I wouldn’t say I am behind on schedule. I can just pray tomorrow will be good enough to finally head out of my home and just get my work done.

[Milestone 2] Where I am now

I’ve got an example in the earlier blog post, and a less compressed version available in my MS2 folder. I’ll be brief on the rest.

I have a very simple particle application capable of updating the VBO directly on the graphics card. This is along (even ahead of) my initial planning, it turned out relatively simple. The only thing I think I’ve got left to actually do is to give the particles the ability to “spawn” after a delay.


No time will be wasted on complex if-cases, and for every sort of particle system I intend to do, I’ll make separate sub-classes for each that load the related programs and openCL kernels. As some openCL kernels can have specific behaviour that I might want for a certain particle system. This will also allow me to draw different sorts of particles in different drawing functions, so that systems that are meant to be drawn additive can be drawn additive, while particles that are not, will be drawn their own way, etcetera. This is not implemented, and not thought of, and it is something that will be added in next week’s planning! I will also add cleaning up my code to the planning, as it could easily get out of hand at this rate!

Otherwise everything will be going as planned, I’ll be looking into replacing the points in the VBO point cloud with textures or meshes through shaders, and/or start researching sorting on the GPU. I found a neat example where this “bitonic mergesort” was used, and I’ll see if  I cannot get any inspiration, or a clue of what I’m doing from there. It is also mentioned in GPU Gems 3 and in a few papers and assorted websites, I’ll see to it.

I will have to consider doing a n-body simulation of sorts, but I would rather not get into it at this time. I will have to discuss the merits of this with my teacher as I am still not sure how much time my sorting and rendering will take.

[Week 2] Slightly ahead, slightly not! Preparing milestone!

Here’s a video of my work so far. I’ve done a very simple particle engine, capable of taking a “gravity” force, an initial impulse of and position of a particle in time. Everything so far is calculated only on the GPU, apart from the original values. I intend to add functionality to it under the upcoming weeks, as well as start working on some more proper visualisations. There is only one thing I’m behind on, and that is spawning the particles on different times! My solution to this will be to spawn the unused particles under the world until they have passed their first ‘particle death’, where they’ll be moved back in the particle simulation. This is everything that is stopping me from having a particle visualisation that isn’t.. very.. pattern-like?

It’s running 11.2 million particles on the GPU at once in real time. Initial transfer requires three float4 arrays to the GPU (as VBOs). How much is that to transfer? 16 * 11200000 *3  / (1024^2)  = (roughly) 512 MB of data. I think? At around 170 MB of data, only the position information by itself would be nasty to send back and forth between the CPU and the GPU all of the time. Especially considering we want to update the particle field in real time. Thirty or more times per second is desireable, and at this point in time, sending data back and forth from the CPU to the GPU would just be a major bottleneck. Which is why altering the VBO directly on the GPU is so handy.


I never thought I would say it; but it would seem I’m one step ahead of my planning. I was very lenient with my time scheduling, and I had been rather paranoid about the troubles of drawing on the GPU. As drawing on the GPU turned out surprisingly simple. I have wasted a lot of time being sick and bothered, so I got less done than I had imagined, but actually sending a vbo to openCL was really easy, especially with the khronos c++ opencl bindings. That was work meant for next week.

[Week 2] (slow) progress

I can draw particles! Yay! No pictures yet, as the particles emitted are far from.. interesting. I will try to set up some simple patterns that would be more interesting to show off.

Particles are currently drawn as a point cloud, where one point is one pixel. I later plan on implementing a shader to replace this with either textures or models.


Also I’ve got some strange light behaviour in my openGL context. The particles look odd when drawn, but this is currently treated as a minor issue. The actual positioning and behaviour of the particles is right.

So for week 2 I’ve had a little success experimenting with examples and even implementing a simple particle emitter. I can now easily render and calculate five million points from home (where I have got a GTX 560 card). The particle emissions are far from.. interesting at this point in time, but it turned out that calculating and drawing them directly from the GPU was relatively simple. Issues have been few apart from depression.

I ultimately gave up on tending to the OclManager class as I came to my senses. It would be an ultimate timewaste unless I want to tidy my code, and tidying in a sense is similar to optimization. We all know the saying “premature optimization is the root of all evil”. I may get back to it later however, it isn’t a key detail. It did however serve a purpose though, as it gave me room to experiment with setting up an openCL context.

[Week 1] Getting started with OpenCL

The week has been a slow start, but it’s mostly been because I’ve been distracted and/or busy for the most part. These last two days I’ve been doing progress however, I’ve been able to create and compile a simple openCL program of my own after trying a few applications. The biggest ‘issue’ I’ve ran into yet was a problem with linking, that was all my doing. After getting started though, I came to a surprising discovery:

OpenCL is surprisingly simple to set up. I hope to get a head start on next week by implementing a simple particle system ahead of time. Of course, this is just me being positive.

After playing a little with OpenCL, I felt like it was in my place to create a little OpenCL handler of my own, just to ensure code stays tidy and separated. It will not be anything fancy, just a few functions that return integer handles to std::vectors kept inside the handler. I’m considering making it a singleton, but it won’t really need such a treatment.

[MS1 Concluded] Project Goals

But enough about private issues.

GPU Accelerated Particle System (from here on referenced to as GAPS)  is a 2013 specialization project in the Luleå University of Technology carried out by me, Klas Linde. My goal with GAPS is mainly to learn GPGPU programming practices, to get used to a largely parallel environment. In this very case OpenCL, maybe Cuda.

Particle systems are ideal for such optimizations, and have seen massive performance increases by parallelization in similar projects; and as such I will be making a parallel particle system. My goal is to offload all particle logic, including depth sorting and drawing to the Graphics Processing Unit, which should with some work allow me to integrate an intensive amount of particles at once. If time allows, I will be making the particle system state-preserving, which will allow me to apply forces even after system initiation. This will provide me with plenty of challenges as it is.

I expect to run into at least a few issues regarding parallel computing along the way, as parallel programming is vaguely dissimilar to regular programming towards the CPU. Work will have to be divided into small chunks, with as little workload as possible. As GPUs are well suited for many small workloads rather than one large problem.

[MS1 Concluded] Crisis Averted

After about a week of insecurities and having no idea of what to do with my particle system, I managed to have a miracle presentation using incorrect slides and an unfinished research document. I’ve been trying hard, but been thinking too much.

The first week has, as this implies, been tough on my nerves. However, it is back on track.


Further note for anyone making later projects: The MS1 presentation is about your research, it can be ground down into:

  • What you have decided to do and why.
  • How you are going to do it (research should have got some information on this)
  • A risk assessment (research should also have got information on this)

I had trouble understanding what to present, but this is something I feel properly describes the process.