The time I’ve had this past week as been focused on animations and characters in Nebula. There is a big difference between exporting characters in the new installment of Nebula compared to the old. The biggest feature is the fact that one can have several characters in one Maya scene, and have them exported as several individual characters! There is a pretty big difference between characters and ordinary static objects in Nebula, mainly in their model files (.n3). You see, a model file describes a scene, which is usually initiated with a transform node describing the global bounding box for the entire scene. This node then holds all meshes in the scene, with all their corresponding values for material, texture and variables. However, characters are much different! They have another parent node, called CharacterNode, which describes the character skeleton. All meshes described within the CharacterNode are counted as skins to the skeleton, which in turn means they have to be skinnable! This means that having both characters and static objects in the same scene is impossible with the current design. One might as why I don’t just add a root node which contains both a CharacterNode with all its skins, and then have all the other nodes parallel to that node. Well, you see, Nebula has to decide whether or not a MODEL is a character or a static mesh. So combining both static meshes and characters would cause big problems. This also means every single skeleton needs its very own model. Currently, the batcher decides whether or not a Maya scene should be a character in Nebula, or a static mesh. There wouldn’t be a problem if one would just take all static objects into one model, and have every character in their separate ones, except if it wasn’t for giving them a proper name! So one has to chose if they want to make an ordinary static object scene, or a character scene, so that’s that!
And of course, the biggest problem is getting the skeletons, animation curves and skinning to work properly, seeing how many variables there are that can go wrong. Currently I think I’ve managed to get the skeleton working properly, seeing as I can have a box using three joints, unanimated, and it looks correct. However, as soon as I apply an animation, it breaks. The image to the left shows how it looks after animation, and the right one before animation.
I also realized that Nebula only accepts skins which use 72 or less joints, which means that more complex models needs to be split into smaller fragments, where each fragment can use 72 or less joints. I should have this done by the end of the week unless something very time consuming turns up.
I’ve also been collaborating with my colleagues and we’ve started wrapping our programs together, mainly by designing a central class for handling settings. For example, if I set the project directory in Nody, it should be remembered by all toolkit applications so that one doesn’t need to reset it everywhere if one is to change the working directory.
My focus this past week has been to improve the usability of Nody. The most obvious thing I came up with was the ability to preview textures in the actual node window, so as to allow the user to see how different texture will look with the given shader. There was two major problems with this. First, is it wiser to use the work folder (containing a very broad mixture of texture formats) or use the export folder? Second, how do I deal with the formats I’m faced with?
Turns out Qt has this covered, for every single file format than TGA and PSD, which are the currently used work-formats in Nebula. I started off with basically copy-pasting a TGA loader done by the Qt crew. This version is not available in the standard Qt package, but requires Qt3D, so I thought I’d rather not use more modules but instead just copy the code. Then I tried figuring out how PSD works, and while it’s pretty straightforward, it’s still very hard to figure out exactly how the bytes are laid out.
In the PSD and PSB specification, it says the PSD file consists of 4 sections, header, color modes, layer and mask information, and image data. I want the header (for sizes, bit depths and such) and the raw image data. Also, PSD files are RLE-compressed, a loss-less compression method which results in a very small size if there is little to no variation in the source. In case you are to lazy to google, it works by having each scan-line (row in unfancy terms) be split into segments, where first you have one byte describing how many of the same pixel will appear in row, followed by the data for those pixels, depending on pixel depth. So for examle, I could have 15 pixels in row which are pure red, and with a R8 G8 B8 image that would give me something like: 15 255 0 0.
PSD supports up to 56 channels, so each pixel could potentially be 56 bytes for the lowest byte per pixel, which is a lot of information. The specification also stated that each row begins with the total byte count for that row, followed by the RLE-compressed data. What’s strange about that is the need to state how many bytes you have, because you already know from each RLE-package how much you are going to receive. If I know I have 3 channels and 8-bit coloring, I can also be sure how many bytes will follow each pixel repeat counter.
Halfway through this though, I got some advice that I maybe shouldn’t focus on handling every single work format there was, but instead reading the exported format, DDS.
DDS means DirectDraw Surface, and is nothing more than a container for an actual texture. The DDS header, specified here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb943982(v=vs.85).aspx has all the information necessary to reach the actual image data. This is the part where stuff gets trixy. DDS can contain either raw textures, which are easily read by simply reading them byte for byte. DDS can also contain compressed formats, such as DXT1, DXT3 or DXT5. If one likes to read specifications on how these compression algorithms work, one might want to visit: http://msdn.microsoft.com/en-us/library/windows/desktop/bb694531(v=vs.85).aspx#. The compression algorithms described are called block compressions. A block consists of 16 texels, which are somehow averaged into a data structure, which takes less room than the original raw data, but is still without minimum loss. If one is too lazy to read, I can give you a fast introduction how to decode these files.
First we have DXT1. DXT1 saves two colors, both of which are the two extreme colors of a block. Then, an integer is saved, which contains all the codes for all the 16 texels. Each texel only needs two bits (remember BITS) to describe what color they should use. The colors, let’s call them colorOne and colorTwo, which are read from the image, can create two averages, colorThree and colorFour. Together, they can be indexed by using the decimal values 0, 1, 2, 3, or binary values 00, 01, 10, 11. Remember how every texel has two bits? Well, these two bits are just what you need to count to 3. So each texel has its own two indices masked out, and then matched to a list of colors, so the appropriate texel can retrieve the appropriate color. I solved this by using an array, which saves the colors in order, so addressing the array with index 00 would give me colorOne, and 01 colorTwo etc. Getting the indices from the actual int simply required a bit shift to the appropriate bit, and then an & comparison with 11 to get their value. I almost forgot that if colorOne is bigger than colorTwo, one has to set colorFour to 0, and make colorThree a linear interpolation of colorOne and colorTwo, instead of a bilinear. This is to accommodate for alpha, and as you can see, DXT1 only supports binary alpha.
Then we have DXT3. DXT3 has the exact same color layout as DXT1. What differs is that DXT3 can handle non-binary alpha. The structure starts with a set of 8 bytes containing the alpha values of all indices. Once that is read, the rest is cake. When we have the color index for our texel, all we need to do is to fetch our two alpha-bits at that texel, and multiply the second bit with 256 to create a short. Slam dunk done.
DXT5 has an even more complex way of handling alpha. Instead of just simply having an alpha value per index, DXT5 stores alpha as a palette, much like the colors, and then simply use interpolation to find the wanted alpha value. So the structure starts with two alpha bytes, our extreme values. Then there are 6 alpha bytes. Why 6 you may ask? Well, remember how 4 bits could index our colors in a very nice way? Alpha needs 3 bits (as I said before, BITS again) per texel to be used as an index. So expand 4 bytes with half and you get 6 bytes. The fancy thing about DXT5, is that alpha has to be calculated just like we calculated colors, but now we have more data for alpha values than we have colors. So if we have a finer grain of alpha values, that means our indices must be bigger, so instead of just having 00, 01, 10 and 11, we have 000, 001, 010, 011, 100, 101, 110, 111. These values are then used to sample the alpha. The tricky part with this though, is that there is no structure which holds 6 bytes, so we have to divide the data into one int and one short. The real problem comes when we need to get our indices, even though the indices actually border from the int to the short. The easiest way to do this is to simply count, we start at the short, that one ends after 16 bits, which would give us that when we are at the 15th bit, we need to take two bits from the int, this is our transition zone. This sample code explains it in detail.
int alphaCodeIndex = 3*(4*i+j);
if (alphaCodeIndex <= 12)
alphaCode = (alphaCode2 >> alphaCodeIndex) & 0×07;
else if (alphaCodeIndex == 15)
alphaCode = (alphaCode2 >> 15) | ((alphaCode1 << 1) & 0×06);
else // alphaCodeIndex >= 18 && alphaCodeIndex <= 45
alphaCode = (alphaCode1 >> (alphaCodeIndex – 16)) & 0×07;
alphaCodeIndex is a running iterator, which for every row i increases by j so that we always have a three bit increment. When we go past the size of the short, we need to remove the size of the short from the index, seeing as we need to ‘start over’ when we are going to bit shift the int. I can’t take any credit for this clever solution, seeing as I found it at: http://www.glassechidna.com.au/2009/devblogs/s3tc-dxt1dxt5-texture-decompression/.
When alphaCode is retrieved, it is used to get an indexed alpha value, and the compression is done. This is the result in Nody:
The image shows us one bump map (to the left) compressed with DXT5 and one diffuse map (to the right) compressed with DXT1 being rendered in real-time in Nody. The next step is to actually see the texture being applied whenever this happens. I couldn’t be more excited!
One major feature a game engine needs to be able to do, is distribution. And by distribution I don’t only mean being able to distribute to customers and users, but also to developers and artists. How annoying wouldn’t it be to have to open every single scene in Maya, every shader project in Nody, every texture, and export them individually. Not only that, each artist or developer would all have to do this in order just to get the engine running.
Batching solves this. Batching can be both a good and a bad thing. The good thing is that everyone can run the batcher once and have everything set up, nice and working. The bad thing is that batching may and in most cases will perform lots of redundant batches, which takes time. The primary reason to why we’d want to redesign the way Nebula content is batched is because of the very fragile character batching, where everything had to be done in the exact unintuitive order or else the batcher would crash. But new reasons, such as the Nody shading system with the header file and the xml-formatted model files means that we need a new set of tools to batch the new content.
Right, I changed how models should be handled. Previously models would be exported DIRECTLY from Maya, so Maya would have to start in order to export the models. Not only that, but Maya would also need a plugin in order to set Nebula-specific stuff, such as the Nebula shader and the Nebula shader variables. In order to avoid this, Maya is only used for modelling. When exporting from Maya there are two paths. If the model file exists, only the mesh, primitive groups, and the node order will be updated. If it doesn’t exist, the exporter, currently under development, will create a basic model file using the always-existing Solid material. Also, instead of saving this directly to the export folder everything gets saved to work, so the model files, which previously only existed in export, can now be modified outside of Maya. Why you may ask?
The idea somehow resembles the material editor found in UDK. A model is presented to the user in a real-time preview of Nebula. The user can then change the material, and also set variables such as the textures, integers, floats and vectors, and of course see the result in real-time. Whenever a change is made, the xml-formatted model file is altered. This means that the modified model-file, still in the work folder, can be committed, and then batch exported without needing the huge Maya binary file, and everyone updating and exporting will see the changes. Not only is this better because one doesn’t need Maya for anything else but uv-mapping and modelling, but it’s also very easy to see how the model will look with the appropriate textures and settings seeing as the preview is rendered directly in Nebula.
This editor will exist as a tool for the level editor, allowing you to edit the model and see it in your level, in real-time!
Models aside, there is also the thing with batching shaders. The easy part is to compile the source, that’s simple, traverse all files in folder A, compile and write to file in folder B. The hard part is writing the .sdh-files without having to load the entire project every time. Since loading the graph network requires us to create all nodes, which in turn creates all the graphics, we’d want to avoid it because of the performance issues. Also, it seems a bit redundant to create graphics if we’re only going to use a command line interface to batch our shaders.
The project loader already works by loading specific objects, so I can chose to only load the settings manager and thus avoid having to create the entire scene in order to batch my shaders. The only problem is that my settings manager does not have the information required to perform a complete header generation, seeing as it doesn’t have any information relating to samplers, textures or class interfaces. These are found and generated when traversing the node graph, and what do we need for that? You guessed it, the graphics scene, which we don’t want in the first place! I’m currently working on a solution to this problem, and when it’s found, we will be able to batch the shaders with the push of a button.
When all the batchers are done, we will implement a program, very much like the current batcher, which will run all of our batchers, or a subset of them, using a GUI application. Also, the batchers themselves should also have a way to only export modified files, so as to avoid redundant exporting. It shouldn’t be more work than just checking the changed date for both the exported file (if present) and the source, and just see if the exported file has a change date older than the source.
So to wrap this up. Batching is good and evil, but is always vital for setting up a game environment from scratch. The key is to make the batching as fast as possible, so as to not slow down the development process whenever someone updates and start working. Also, it is also very important to avoid batching stuff which is already up-to-date, because it takes unnecessary time rather spent on working
Well, I’m back from being sick, so this post marks the first of the brand new work year. What we’ve realized is that we need to start wrapping things up, and that means making everything work together. To do this, everything related to materials and shaders have to be fully functional in the level editor (yeah, we have a level editor too!). I have a list with things to do, but it’s mostly smaller tasks such as fixing dynamic linkage, real-time reloading of frame shaders and material palette etc. Seeing as one can live with not using dynamic linkage (because one can just create specialized shaders), it’s more important to work on the cross-application stuff. One of the major ideas we came up with, is how to texture and set shader-specific variables to a model, without doing so in Maya using some very static and moronic GUI.
Currently, the .n3-files hold information about what texture is attached to what target. This is a problem because the .n3-files are in the export folder, which you will not commit. Instead, the assignment of variables has to lie in the work-folder, in some sort of pre-export model file. A model exported from Maya will have all its basic stuff, such as the Solid material, and no variables attached. Then, using the level editor and Nody lite (not yet implemented), one will select a model from a list of models, see a preview of that model using the assigned material and shader variables. The user can then chose to use another material from a list of materials, and see the model change in real-time. The user will also be able to set all material variables, such as texture, tessellation factor etc, depending on the shaders the material uses, and then apply the new settings. What will be happening beneath the hood is that model file will be replaced. First, the Nody lite runtime will replace the work .n3-file, so that future batches will still work smoothly, and also re-batch that one model so that it’s binary .n3-file looks right. That very user will then be done with the model, and will still be able to commit his or her export folder, and allow others to batch what he/she just did.
This of course means that I have to write another n3-writer, which writes to an xml-style model file, and then an xml-to-n3-converter which converts the xml-files to binary .n3. No problem. Nody lite will also be able to process and handle characters and particles. Nody lite can be compared to the UDK material editor. There, one has a big shader node, which has tons of inputs (basically it’s just one huge über-shader), where one can attach variables to each slot. This is basically just putting shader variables at different positions. Now, you might ask, how can Nody and Nody lite fare against such a worthy beast? Well, Nody not only lets you customize per-model shader variables and textures, but also lets you design the ENTIRE shader will every single tiny feature. Of course, this means that whatever engine you are using, you will need to support many different shaders to fully accomodate for Nody, but have in mind that one can just as easily create an übershader in Nody and use it the exact same way. But enough bragging, here’s a couple of images showing the level editor (in which I haven’t been involved) running in DX11.
This image shows Nebula running using the level editor with one global light, four spot lights, and five point lights spread out around a tiger tank. This window renders everything currently implemented using Nody, which means it uses deferred lighting, SSAO, and of course materials. You can also see the debug shape rendering, which now also works in DX11 because of the new ShapeRenderer.
If I haven’t told you already, materials handles shaders for an object. Since an object can be rendered with different shaders in different passes, there needs to be some way to describe what shaders, variations and class interfaces should be applied on an object. The solution for this is called a material. Materials can be created with Nody, using an editor resembling the frame shader editor to some extent. The batch rendering still works because materials are attached in the .n3-files, just like shaders used to do. The only problem is that Nebula only handles one shader, which is the one previously defined in the .n3-file. This is the ONLY shader attached to a model which can have dynamic shader variables. Have in mind though that the previous shaders were using the effect system, which meant that several shader programs can be defined in one file, which also includes its shader variables.
Now, when that is not an option anymore, we need to handle that. It might sound like a small problem, but keep in mind that the entire rendering engine is based on having a shader per model node, as well as the lights, and every frame pass. Now we need to make it so that every model node has a material instead, which means that shader variables needs to know in what shader they belong. At first, I thought this would be easy, so I just made a very basic layer which only modified shader variables if you already knew the shader name. I realized that this would be very cumbersome and also very ugly, so I rewrote the materials in Nebula.
A material works like the Nebula shaders, which can be instantiated to allow for object-specific variables. So a material can create a materialinstance, which has materialvariables, which in turn knows of the variable in all of the shaders, so that setting a material variable results in setting the variable for all shader instances. This sweet solution made the code very cleaner, and also allowed me to fix a lot of memory leaks that I struggled with before. So there are no memory leaks either. I also tried the system by having two models, one with a tessellated material and one without, and this is the result:
I also figured out how the actual batching works. When every n3-file is loaded, the visibility resolver checks to see if an n3-scene is visible, and if it is, it will find all visible scenes with a model containing a material, it then checks to find unique model nodes using a specific material, and when it’s done doing that, it finds all instances which is ultimately our models. This ensures that shaders and render states are set once per unique model, and also allows us to render the models without doing much else but setting instance-specific variables such as transformations per draw call. It could also be expanded to using instancing to render static models. Using the principle of having materials with the new material subsystem, it shouldn’t be too much work to just simply add the materials to particles, animators and characters.
So I’ve been looking into how characters are saved in the binary .n3 format. So far I’ve managed to write static meshes with materials and all that fancy stuff, but characters is a completely different story. The characters contains information describing the skins, the joints, the animation resource and the variation resource. The idea is to have a character that can use a lot of different skins for the same skeleton. This way, one can use a single skeleton, and a single animations table, but only using different skins. The character node can also contain an actual mesh, or several, seeing as the genius node-based content system simply handles nodes in order of features.
The nodes work in the way that they represent pieces of data, very much like the ones you find in Maya. A node implements a set of tags, which contains different data. For example, the StateNode looks for tags concerning the state of a model, for example its material (or in the old days, the shader), shader variables and textures. It also has the ability to pass a tag it doesn’t recognize further down the inheritance tree, where the base-level node is the TransformNode. The problem I encountered first was that every node, including the specialized nodes such as the ParticleSystemNode and the CharacterSkinNode both inherited from StateNode, which in turn was completely based on a system where an object could only have one shader. Seeing as materials represent an object with several shaders, the old StateNode had to be replaced, but at the same time still be compatible with the old system.
To address this issue, one can use the same pattern used for the DirectX rendering stuff, where there would be a base class which all classes inherit, and in the header of the used class, in this case the StateNode class, there is a series of defines which determines what class StateNode should inherit from. So for the old system, StateNode would inherit the old StateNode which has been renamed to SimpleStateNode. For the new system, StateNode inherits from MaterialStateNode. This way, when compiled using materials, Nebula will use the material system, and when it’s not, it will revert back to the old system. I should note that a model not using a material when loaded with Nebula using materials, will tell you what you did wrong, by crashing. The only bad thing is that the functions for applying a state, getting, creating and looking for the existence of a shader variable now needs a shader instance, for both the SimpleStateNode, and the MaterialStateNode. The only thing is that for the SimpleStateNode, the shader instance does absolutely nothing, which in my mind is bad. Have in mind that the old system is just an intermediate system, and can easily be converted to use materials with DirectX 9, but it still feels so very wrong to have that extra pointer.
Another solution is to branch it, by simply having a new set of nodes for shapes using materials, and for states using materials. This also means that one has to implement a new set of nodes for the characters, and the particles as well, seeing as they rely on the functionality of the state nodes. The beauty of this solution is to quite simply have both systems running simultaneously, and instead be dependent on the content. This of course needs us to create different types of shape nodes when we export our models. For example, using the original system would require us to use a ShapeNode, and using the new system would require a MaterialShapeNode. The problem with creating the wrong node here, is that the ShapeNode wont be able to recognize the “MNMT” or ModelNodeMaterial tag, while the MaterialShapeNode will. Also, the ShapeNode will not understand why a shader isn’t created by the SHDR tag which won’t exist when using a MaterialShapeNode. The MaterialShapeNode will not work with the ShapeNode for the reason that there is no material specified for a ShapeNode. So the result would be much nicer code, but it will crash when used incorrect.
I chose to go with the latter.
While investigating the nodes I found a node called AnimatorNode. For some reason, it was commented as “Legacy n2 crap!”, but I couldn’t find a replacement for it. The AnimatorNode handles keyframed variables, and just like our StateNode, it uses a single shader to do so. Research away!
Qt! Qt is so extremely flexible that it almost makes me teary eyed. I thought saving a project with all node positions, node links, node constant variable settings, link variable connections, effect settings, tessellation settings, geometry settings and variations would be extremely complex. The fact is that a save file containing that information will probably take up less space and take less time to implement than this blog post. I’ve devised a very simple system, consisting of two singletons, a loader and a saver, and an interface called Savable. The idea is that every class that should be savable needs to inherit the appropriately named Savable. Seeing as Savable is abstract, one has to implement its three functions, Save, Load and Reset. What the functions do is pretty obvious, Save basically writes the necessary data to a stream, Load does the inverse, and Reset resets a class to it’s default state.
A binary file will have all its information in order, because the order of saving follows a specific pattern. But in order to maintain the flexibility of adding new classes to be saved, I’ve devised a very deviant system. It’s pretty simple in fact, a class saves itself using its class name, and its data. The saver then first writes the class, and then the data for that class. Remember that data doesn’t need to be stored in this manner, only the top-level item does. What I mean by this is that the node scene, which is one of these top-level items, store all the links and nodes internally, which means that a load will read all the nodes and all the links. However, in order to avoid needing the data saved in perfect order, the loader automatically maps all classes that are supposed to be loaded by supplying to the loader what class you want to load, and what class name is supposed to be mapped to that class. Then, when the loader hits the class name in the file, it automatically calls the correct class with the stream at the start of that class. This basically means that classes can be loaded from a save file, even though the load order differs from the save order. The top-level items also have to be unique, either by being singletons or by being simple single instances, however, they can contain several objects of different classes.
The savable class also implements a function called ClassName, which uses C++ RTTI to perform a class lookup, then converts it to a QString, and removes the class/struct/union-part which type_info contains. This way, the ProjectLoader always knows how to handle saving and loading an object if and only if they implement Savable.
Fantastic! One can now not only create shaders completely dynamic, but also save them, open them, modify them, and save them again! I estimated at least half a month to getting this to work, but in the end, it took more like 2 and a half days. What’s left for Nody now is Nody-Nebula communication, and the about page. If I have spare time left, I was thinking of adding a new type of node, Custom node. This type of node allows for real-time creation and saving of a node using Nody, and not some text editor. It would have syntax coloring as well. Perhaps the easiest place to begin is to be able to modify the generated source code for the shader. One can argue the flexibility of letting Nody do this, or by simply having your favorite text editor at the ready. Projects can of course still be opened and compiled without the need of traversing the graph and/or saving the project again.
I just wanted to say that I was wrong when it came to the hull shader with the inputs and outputs that has to match. Well, it turns out that I shot myself in the foot. The reason why it didn’t work is because all variables that should be shared globally, and with that I mean things like the orientation, projection and view-matrices, were stored in a cbuffer called Globals. Well, if you then declare another variable that is not in a cbuffer, the shader compiler automatically generates a cbuffer for you, storing all those rogue variables. This cbuffer is called $Globals. Nebula currently just handles one single cbuffer, because variables are treated as if they were to be updated once per frame. If you were to design a program like Nody, where every shader should have a variable that is supposed to be interchangeable from the CPU to the GPU, then you have to put the variable in some sort of global scope, which then becomes the $Globals cbuffer. Also, seeing as the constant buffers are constantly changed due to the fact that each object needs its own set of variables, there is no real reason to bundle them together.
Yes, one might argue that variables which are not used in one shader takes up unnecessary space, and while that is true, there is complete control to design a shader which only contains the variables required. This basically takes that optimization out of my hands and place it into the hands of anyone using the program for their game engine.
But I wanted to say this to keep the record straight. I was wrong, there is no undocumented internal voodoo required to make the hull shader work, it was my mistake. Seeing this from the bright side of course means that making very complex hull and domain shaders are now even easier than before, and as a result of this, I thought I’d recreate the trianglesubdivision hull shader to scale the amount of tessellation based on distance to the camera. This literally means that the detail in the object is dynamically scaled when approaching or retreating. It would also be really nice to try tessellating a surface using a real height map, instead of using a diffuse map as I am now. But I’m probably going to focus on other stuff for a while.
If you haven’t noticed yet, I’ve been trying to not only make hull and domain shaders work, but also make them work in a practical sense, by which I mean having relevant nodes in Nody that allows a user to create a shader that uses tessellation without any hassle. This of course has to be interlaced with the deferred lighting, where the normals and depth has to be written using a tessellated surface. Sounds easy enough, but consider the fact that you need to divide your calculations for the position of a vertex and the view space position. Why? Well if you want to perform displacement, then it’s going to be rather hard if the vertex is already in projected into the scene. Instead, you want it to be in model-space (rotated by the object rotation), seeing as the heightmap is mapped to the object in model-space, meaning that for us to get a correct displacement. When we’ve done our displacement, then we’d like to move the new vertices into the model-view-projection space so they can be correctly rendered. Then of course, we want to render to our depth-buffer using the view-space position, which means that we not only need to use our displaced version, but we need to multiply that with the view-matrix. Remember that every vertex is already multiplied with the model-matrix from the vertex-shader, so we only need to multiply the final position with view-projection and the view-space position with the view-matrix.
If you’re the observational type, you might think why in earth I’m not displacing the normals to fit the displaced surface. A surface looking like this: _____ which turns into this: _/\_ of course has new normals. If you want per-vertex lighting. We still want to use normal-mapping, but why? Well, tessellation is great in a lot of ways, but a normal map will still contain more information than our tessellated surface will. What this means is that a surface doesn’t need to change the normals, because the normals are saved in the normal-map. What we do need however, is the correct TBN-matrix to displace the normals. But we don’t need to manipulate them either, because despite the fact that the surface goes from: ____ to: _/\_, the normals sampled from a texture mapped to a surface looking like: ____ will still appear to look like _/\_. The conclusion of the confusion is that one doesn’t need to care much about the normals, because the normal-map provides the lighting properties otherwise used on a non-tessellated surface, which means, the normal map will always fit the displaced surface if the artist made them match. This is the result:
What you see is nebula rendering a normally flat surface using a global light, a point light and two spot lights, and a height map to displace the surface. I will admit that I might be babbling with the normal-manipulation because I can sometimes see small artifacts along the edges of the tessellated surfaces where there is a light source present. If this artifact is a result of the lighting, or the fact that I’m ignorant enough to think I can get a way with just displacing the position remains to be seen. The result looks pretty good, and it’s more important to start focusing on getting the rest of Nody easier to use, instead of obsessing over small rarely-occurring light artifacts.
Nody still needs a project-format so one can open up node networks, redesign them, change the settings around, and generate them. Also, Nody needs to be able to recursively traverse over all projects and generate all the shaders, which is very useful when committing newly created shaders to SVN or some other subversion server. I also need to write the mini-edition of Nody to go along with the level editor, which will be used as a way to change shader variables for a model instance.
And when all that is done, Nody will need to be able to communicate with the Nebula runtime to change shaders, blend/depth/rasterizer-states, class linkage, frame shader passes, materials, render targets and multiple render targets without the need to restart either application.
One can get very far away from thinking with the mindset of the common algorithms such as sorting and tree traversal, when working with things such as advanced rendering, new technology and complex APIs. One of these examples is the way Nody traverses the tree, which is something that has haunted me for the entire development of the project. It might seem simple on the surface, and it really is, but the logical solution is often not as simple as the practical. Nody used to traverse the tree with the output node as the starting point, and then just do a breadth-first algorithm to traverse the tree. This worked well until I started working on the hull- and domain-shaders, where one node could have several connections from forks spanning across the node network. The problem would be that a node could be reached way before it should, which in turn put the nodes source code in the shader source code before it was supposed to be there. The image should suffice as an example.
Here you can see the displace-node which is reached from both the normalize-node and the barycentricposition-node. Well, let’s say the tree traverses there from the barycentricposition node first, which effectively puts the displace-node code in the shader before it can do the normalize part. This will cause an error in the final result, which is not acceptable. To address this problem, the tree is traversed by going breadth-first, starting from the vertex node (not the output node), but a node can only be traversed to if and only if all its incoming connections have been handled. This ensures that a node has all dependencies declared before itself, which means the node code will do what it is supposed to. Also, you might wonder why I changed the starting point from the output node to the vertex node, and that is because leaf nodes such as the projectiontransform-node you can see there, has to be traversed as well. You might think “hey, that node has no effect on the end result, it’s unnecessary!”, which would be very much like I thought at first. But what if this node writes to SV_POSITION, but there is no effect node which uses the actual SV_POSITION as an input? How do you generalize a node that has only an input, for a value that ALWAYS has to be written to? Instead, nodes like this will be treated specially, or at least their outputs will if they are writing to a system value. So this node, despite it being a leaf node with no other relation or effect to the final picture, actually still uses its output as if it were connected. Have in mind though that nodes with outputs to system values will be treated as a special case. If one wants to use the SV_POSITION-marked variable, it still works to attach it and use it as normal, and in that case, Nody doesn’t have to treat the output variable in a special way.
If you read my last post, you might already know that the hull shader requires the input and output struct to have their variables in the same order. It is OK for the output struct to be a subset of the input struct, but not vice versa. As a result of this, I thought a simple sorting algorithm will come in handy, seeing as the only thing I need to do is to sort the variables based on what they are connected to. Easy enough to implement using a simple insertion sort , because let’s face it, using a faster sorting algorithm here would give us nothing. So that’s working now, which means that we can construct a working shader using hull and domain shaders with Nody. The above image is a piece of the entire node network that makes up a shader that writes normal-depth using a tessellated surface. The following picture shows the network in it’s entirety.