Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.

Animations wrap-up

This week has been stressful. Not only did I realize I had made some mistakes in the animation system that would only allow you to use one skin, but also that you have to queue the animations per agent, not per chunk, and that all animations are looping. What I’ve fixed is the ability to enqueue a chunk of agents to use an animation, and also to use multiple skins. This allows us to have both the new and pretty costume man, as well as the penguin. The way I solved this was by using a texture containing vertex positions for each skin, so that I didn’t have to update all previous strides when queuing an animation using another skin. The result would be that they were mixed together in the vertex positions texture, so that the first could be the man, and the second could be the penguin, which made everything unnecessarily complex (This solution should be credited to Viktor in the second year). I will have to test the looping part when we get the animations that shouldn’t be looped, which should be in the weekend.

Also, after some research it turns out that the Nvidia development drivers for the GTX 570, as well as the newest commercial release of the GTX 570 driver has a major performance issue when it comes to aquiring and releasing OpenGL objects using OpenCL. We saw pretty early in development that the GTX 260 had better performance with the application than the GTX 570, even though the 570 has the edge when it comes to raw power. We can’t definitely prove it, but we hypothesize that the OpenCL drivers for the new high-end graphics cards might not be as optimized as those for the older ones. This is unfortunate as we want to max out on everything when we demonstrate what we can do.

Animation – OpenCL and LOD

This week I’ve been focusing on porting the skinning to OpenCL to avoid skinning in a shader. This has worked out pretty well, as we now can animate and skin the characters without any loss of performance. However, this performance increase raises a problem…

To get the skinned vertices to the graphics card, I have to save them in a texture. And textures have limited sizes. And I need a set of skinned vertices per skin and per animation. So one might be concerned with running out of memory, for obvious reasons. The simple solution is just to make the texture bigger, and that I can, but one might not want to constantly write to an image with the dimensions 4096 x 4096 pixels every frame, it’s costly. I’m going to have to investigate how many animations I can queue before I lose the performance I gained by using OpenCL.

Also, together with Simon I’ve now implemented skinned characters with LOD. This was fairly simple, as I only have to treat all levels of LOD as a single skin, and just simply pick what level of LOD I want to use, in the GLSL shader.

//Gustav Sterbrant

Animation – progress update

So I’ve been porting the skinning algorithm to OpenCL to avoid reskinning each and every character each frame. So instead I’m going to make it so that a single animated character is only skinned once, and then just moved for each instance. I’m going to use CL to GL interop in order to write the skinned characters calculated in CL to a texture in GL, and then per vertex just set each position. If this works out, I suggest the skinning won’t have a measure impact on the performance of the program.

I’ve had to make some basic mathematical functions for matrices and matrix-vector operations. Also, even though floatnxm is listed as a reserved OpenCL keyword, the compiler complains if one tries to use a float4x4. So I made my own matrix44 struct that contains an array of four float4 vectors. What I’m afraid of however, is that my code is as far from optimized as humanly possible. I’ve read some CL math implementations from Nvidia and AMD, and even downloaded the AMD APPML. To my disappointment, there are were no .cl files in the project, instead, there was just a .lib and a header file containing function calls to CL kernels. What I wanted is the math itself, so I’m going to stick to my own and get it working first, then think about optimizing it, if needed.

//Gustav Sterbrant

Animation – optimization

After a weekend of relaxation I started working on some optimizations for the animation system. First of all, I made an animated version of tetrahedron man that we now use in the game. We can render 30 000 agents, with animations, rotations and positions, and AI at around 12 fps. The low FPS is mainly because of the skinning shader, which currently takes a lot of juice to render, so there are some optimizations to be done there. I managed to cut down the amount of memory fetching by changing from matrices to quaternions when passing the data to the GPU. This didn’t give a measurable performance increase however, but it does look neater with less pixel fetches in the shader. Also, I found that by checking whether a weight is zero or not could eliminate both unnecessary texture fetches, and matrix multiplications, increasing the FPS by quite a bit. I’ve also come up with another optimization technique I could use to speed up the skinning. One of the major problems with the shader is that it does so much per each vertex, but that isn’t necessarily needed. Seeing as the model actually renders the same vertex several times (one vertex can have more than one texture coordinate and normal), the exact same operations will be done on the very same vertex several times. So to fix this, I though of rendering the vertices separately, save their positions to a texture, and then when the actual rendering is active, just fetch each vertex positions from this texture.

I’ve also worked on the distribution system for the animations. One can now queue either one animation, or a combination of several and bind them to an AI, and that particular AI will move using that animation. I will also add a way to force animations to play using a new skeleton, so that one can choose if two persons should move with the exact same time step or not.

//Gustav Sterbrant

Animation – skinning

I’m very happy to say, that I’ve finally got the skinning working! This results in an animated Seymour (as previously used), where the entire skinning process takes place on the GPU. Despite it working though, there seems to be a problem with performance. Since I wanted to get it working, I haven’t really focused on getting it fast, so there’s a lot of work to be done there. An example would be that the shader now makes 37 texture fetches per VERTEX, which is a lot! Loads of these values are constants, so I might as well send them as attributes instead. Also, the matrices, they are stored as pixels, where each matrix is stored as 3 pixels. This results in 3 * 8 (maximum of 4 influences per vertex) texture fetches for each vertex. I got a tip that one could use quaternions instead, resulting in less texture fetches, which is awesome! I also have to work on a distribution system, so I don’t need a separately animated skeleton per agent. I bet you want some kind of proof of what I’ve accomplished, well, you’re in luck! There’s a video!


Animations – stress test and skinning

So I’ve added support for having more than one skeleton animated at the same time, and then stress tested it. Right now, I can have 500 individually animated skeletons without a major performance drop, on the CPU. I’ll have to investigate if I should move the calculations to the GPU using CL, or if I am going to stick with having the work done on the CPU. The pro is that I might be able to simulate way more skeletons, giving us much more variation. The con is that I might hog the performance needed for the AI. If I’m going to stick with animating on the CPU, I will need to construct some sort of distribution system for the skeletons. By this I mean that if we have 50 000 agents, but only 500 skeletons, I’ll have to make it so that several agents use the same skeleton. This introduces a new problem, what if one agent changes animation? This is something I have to figure out when the skinning is done.

However, I’ve started on the skinning, which means I’m a little behind on my time plan. I’ve started out with sending the skin data to the GPU using a texture. I will also need to send the updated joints to the GPU as well, but I intend to do that later.  Anyhow, after fiddling around a bit, I managed to create and render the skeleton joint rotations and positions, all the skin weights, all the joint influences, and all the vertex influence counts (how many joints each vertex is affected by) using a GL texture. Turns out using texture coordinates and texture filters help 😀

If you’ve ever wondered what matrices looks like in color, this image will show you how much more fun they are in color than as, well, matrices. Each value is stored in RGBA format, so that means a single pixel contains either an entire matrix row plus one position parameter, four skin joint indices, four skin weights, or four joint influences. I’m going to send the matrices as 3×4 matrices, assuming there is no skewing going on. Also, the image is filtered using GL_NEAREST, we wouldn’t want GL_LINEAR ruining our data.

//Gustav Sterbrant

Animation progress

So I’ve finally managed to get the animations working! The result of this is an animated skeleton of Seymour, which is playing all of its 13 saved animation clips. This might seem trivial, and it should, but using COLLADA isn’t really that simple. When starting, I found a thread online describing that all the rotation tags in the COLLADA file represents a rotation that should be applied for each joint. I performed this, and constructed the skeleton using these rotations, and it worked fine! So I thought I was doing it right all along, then when I tried to animate it, nothing worked like it should. This is because Maya saves the joints in two sets of data, rotation and jointOrientation. This confused me, because I thought the COLLADA file described a <rotation> tag to be a description of what operation should be applied to the joint in question. However, one needs to save away the jointOrientations separately. This is because the rotation axis around which the animation should be performed, is not around the complete rotation of the joint specified in the file, but using the jointOrientation matrix. So, I had to save that matrix away, and then multiply that matrix with the rotation axis to get the correct rotation axis. Phew, took some analysis to draw that conclusion. The result is the following:

Using our system:Seymour in Chaos

From Maya: Seymour in Maya

Also I’ve managed to store the data in a fairly clever way. Instead of having an animation clip, and having each clip contain a frame, I sort them by time instead. So for every AI I’ve queued a couple of animations, and they are in turn sorted by what time step you are in. So I use a two-dimensional array, where the first dimension corresponds to the time intervals, that is for bezier curves an increment of one between each frame. So the interval on zero will contain all the animation frames of all the animations in their first segment. This will continue until there are no more animation frames to be played, and then it repeats. I have yet to add a dequeue so one can disable certain animations, but that will be added! Also, I need to get it skinned, which shouldn’t be to far away. And oh, right, I also need to put everything I can on the GPU if I want thousands of these guys animated in real time.

//Gustav Sterbrant

Skeleton, animations and instancing

Progress report for the skeleton is that I’ve managed to load everything I need from COLLADA (finally!)  and also save it just the way I want it. That is for the exception of animations, but I’ll come to that later. What I’ve managed to do though, is to load a skeleton, and put it into bind pose, or whatever pose it was left in when saved. This means not only that the linear hierarchy structure I talked about later actually worked, but also that I’m well on my way to start animating them.

The picture shows the carefully Seymour_anim2_triangulate.dae example file from the model bank. You can see the nodes, with the red node being the root node, the yellow being one foot, the purple being the other, and the cyan being his head. What you are seeing is the skeleton in bind pose, and each joint’s corresponding local coordinate system. I’ve got the math down for the animations as well, but here’s where the tricky part comes. To preserve memory space I’ve saved the skeleton and animation clips, and each AI will have their own set of joint matrices. If I want every single AI to be able to be in their own time frame with their own sets of animations, this is how I have to do it. Each AI will use their associated skeleton’s bind pose, animate it, and save the matrices received as their own. If I batch this it should be fast, maybe not 50000 AI’s fast but fairly fast.

The only problem is that if each AI can have several animations playing, that means each joint can be in several frames at the same time. So I have several AI’s that should be animated, and each of them have a set of joints (19 for this model) that all can be in several time steps, but I do not want to process them one at a time. I want to apply every interpolation of every joint of all characters in one function call. But I’ve saved each animation frame in an array in each animation clip, so I can’t really send all the animation frames I want to a function without copying them to a new array, thus allocating memory each animation step, and I do not want that! I’ll have  to rethink. Also, you might wonder why i’m doing this all on the CPU when I’ve made a rough design of how animation data will be stored on the GPU. The answer is that I want to get all the math down, and by all I mean everything from getting a skeleton correctly structured and rendered to animate it and skin it. When I’ve done that, I can start working on how to sort it on the GPU.

I’ve done some work with the rendering engine as well, even though it’s not my territory. I did that because I have an ATI graphics card at home, and it seems that the Nvidia cards we have at the university are a bit more forgiving when it comes to messing up your VBO’s. So I took the opportunity to fix it so that it works on ATI cards as well. We figured as soon as yesterday that we made a much to big VBO than we needed to, and also drew a lot of triangles that simply wasn’t there, giving us a poor frame rate. The important part is that it seems to work now.

//Gustav Sterbrant

Animation – linear hierarchies and GRAM storage

So i’m temporarily done with the COLLADA file parsing. By that I mean that I can import the data I want to right now. That includes vertices, normals, texture coordinates, joint bind matrices, joint weights, vertex influences. Also, I’ve managed to import the joint hierarchy, and convert it to a linear hierarchy. I do this to avoid recursive traversal of the joint hierarchy when updating the skeleton, also allowing me to do this math operation on the GPU if I please.

Also, I’ve thought of a way to store all animation data on the GPU. Not only do I want to store animationclips on the GPU, but I might also be able to store the skeleton, the bind poses, and the hierarchy in the texture memory. Depending on how much space this will occupy, I might even consider saving the agents current animation state, making it possible for me to make all necessary computations on the GPU. My model for storing the data will look something like this: The joint count should come before the bind poses

The way of storing animation clips is the same as proposed by Nvidia in Skinned Instancing by Bryan Dudash in 2007 (, except I expand on this idea, by storing the skeletons there as well. I hope this will minimize GPU buss choking, improving the performance.

I might also note that the model for saving the matrices only needs 3 rows, seeing as I can always treat the bottom row in the matrix as 0, 0, 0, 1

//Gustav Sterbrant

Animation progress – COLLADA and skeletons

I’ve managed to load vertex positions, vertex normals, texture coordinates, joint names, joint matrices and vertex weights. Also, I’ve managed to create a hierarchy for the skeleton using the given joint data. The way I do this in a DOD-manner is to let the parent of the children nodes contain all the children’s data. This allows me to batch calculations by letting the parent multiply all the children’s rotations with it’s own.

All I know about the functionality of this process is from what I can read in raw data, it remains to see if the data is read correctly. Also, I need to know how to read animations. I’ve tried this, but there’s a problem – animations can be saved either by using bezier curves or by using linear transformations. Despite this, I don’t have a COLLADA file containing linear transformations, but this method would be easier to use, seeing as I only have to multiply the bind pose with the matrix in the animation frame instead of having to construct a bezier curve, interpret it, and then apply the necessary operations.

I’ve been fencing with trying to create a skeleton that isn’t connected using an object hierarchy, but rather by making it linear somehow. I wanted this because I could perform all calculations without recursive function calls. What I found however, is that I can make it faster than an ordinary tree using the method stated above. I’m going to have to test how good the performance is with this hierarchy, but I fear that to first multiply each joint with it’s transformation each animation frame, and then to iterate through the tree to move each joint relative to its parent, will require lots of calculations, and perhaps slow down the entire system. This remains to be seen, and hopefully, we don’t have to.

With the VBO rendering up and running, i’ve managed to test the COLLADA file parser for meshes, and it seems to work for triangular meshes. Next week i’ll be focusing on getting a render of the skeleton, and also try to animate it.

//Gustav Sterbrant