Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.
X

Posts tagged ‘GPU’

[Week 7] Ray-Triangle intersection, friction and I’m starting to get done.

This seems to be it. Even if most of the mechanics are currently locked down in a conveiently named test kernel, they are there and they seem to work. Now I can start working with making the code more readable, isolate parts into their own kernels, and of course optimize the code after doing this. Then I can set up example scenes for the upcoming milestone. For demonstrating them at the gradshow, I may or may not add functionality to alter the system on the fly, adding several force fields and several particle systems at a time; all with activated and/or deactivated kernels.

So, today  I wish to go through the particle-triangle collisions. For the collision detection, I’m using Möller & Trumbore’s fast triangle intersection test, modified as to return ray values of (float)INFINITY if there are no intersecting rays. (or if the ray is behind a triangle). For detecting the actual collision, I make a ray with an origin on the particle, with it’s velocity as its reach, and if I get a proper collision, the distance to the collision point is returned. When a collision happens, I split the current velocity vector in two components; one component that is aligned with the surface normal, and another that is aligned with the surface tangent. This is so that I can apply particle restitution and plane friction using:

V=(1-F )*vT – R*vN.
Where V = velocity,
F = friction,
vT = Tangent Force,
R = restitution,
vN = Normal Force

[Week 5] Sitrep! More sickness and overall confusion!

So, turns out my body decided it was a good idea to ground me for a few more days. Losing a few days off of the end of last week. I’ve been trying to read into and make my own parallel sorting algorithm implementations, but all my efforts so far have been rather.. pointless. The resulting sorting kernels have had underwhelming performance. Sorting on my schedule was supposed to be done with this week, but I can’t get my own things to work at an acceptable rate. I will likely just resort to using existing code (like, for example, NVIDIAs SDK) as it would seem that understanding and implementing a fully proper parallel sorting algorithm is currently beyond me.

I hope to, by the end of the week, be able to sort particles by depth. I will not reorder any particles, as this would chug up considerable amounts of time on reordering ALL related blocks, such as color, movement vector, etc. Instead I will have a VBO index buffer to dictate the draw order of my particles on screen. This means I can reduce the sorting calls to whenever I want them. Sorting after each iteration will, as it looks now, just chug up valuable compute time.

[Week 2] Slightly ahead, slightly not! Preparing milestone!

http://youtu.be/QONNn1f8TxU

Here’s a video of my work so far. I’ve done a very simple particle engine, capable of taking a “gravity” force, an initial impulse of and position of a particle in time. Everything so far is calculated only on the GPU, apart from the original values. I intend to add functionality to it under the upcoming weeks, as well as start working on some more proper visualisations. There is only one thing I’m behind on, and that is spawning the particles on different times! My solution to this will be to spawn the unused particles under the world until they have passed their first ‘particle death’, where they’ll be moved back in the particle simulation. This is everything that is stopping me from having a particle visualisation that isn’t.. very.. pattern-like?

It’s running 11.2 million particles on the GPU at once in real time. Initial transfer requires three float4 arrays to the GPU (as VBOs). How much is that to transfer? 16 * 11200000 *3  / (1024^2)  = (roughly) 512 MB of data. I think? At around 170 MB of data, only the position information by itself would be nasty to send back and forth between the CPU and the GPU all of the time. Especially considering we want to update the particle field in real time. Thirty or more times per second is desireable, and at this point in time, sending data back and forth from the CPU to the GPU would just be a major bottleneck. Which is why altering the VBO directly on the GPU is so handy.

 

I never thought I would say it; but it would seem I’m one step ahead of my planning. I was very lenient with my time scheduling, and I had been rather paranoid about the troubles of drawing on the GPU. As drawing on the GPU turned out surprisingly simple. I have wasted a lot of time being sick and bothered, so I got less done than I had imagined, but actually sending a vbo to openCL was really easy, especially with the khronos c++ opencl bindings. That was work meant for next week.

[MS1 Concluded] Project Goals

But enough about private issues.

GPU Accelerated Particle System (from here on referenced to as GAPS)  is a 2013 specialization project in the Luleå University of Technology carried out by me, Klas Linde. My goal with GAPS is mainly to learn GPGPU programming practices, to get used to a largely parallel environment. In this very case OpenCL, maybe Cuda.

Particle systems are ideal for such optimizations, and have seen massive performance increases by parallelization in similar projects; and as such I will be making a parallel particle system. My goal is to offload all particle logic, including depth sorting and drawing to the Graphics Processing Unit, which should with some work allow me to integrate an intensive amount of particles at once. If time allows, I will be making the particle system state-preserving, which will allow me to apply forces even after system initiation. This will provide me with plenty of challenges as it is.

I expect to run into at least a few issues regarding parallel computing along the way, as parallel programming is vaguely dissimilar to regular programming towards the CPU. Work will have to be divided into small chunks, with as little workload as possible. As GPUs are well suited for many small workloads rather than one large problem.