Omnipotent Octree Implementation

Posted in C++, Uncategorized on May 10th, 2012 by Chaos Engineer

Alright. I know when to admit defeat, and this is no exception. I was trying to create an IDEAL octree implemetatiion that did not sacrifice memory usage for efficiency. I'm not afraid to admit that I failed. In failure, one can find inspiration for future endeavors, and here I present a post i had hoped would end in succcess. In realizing my failure I learneed much, so I post it here as a cautionary tail. is there an omnipotent octree implemenetation? Based on my studies it would seem not. What Follows is an incomplete post that I had hoped would result in an epiphany about octree structures, but ultimately failed:

Well, if you have read my last two posts regarding my quest for a lean, versatile and efficient octree implementation (with a focus on the node class design, while ignoring glaring problems with the downcasting)... you are probably having a chortle fest right now. Well, I'm not one to give up easily, and this is no exception. I very much like the structure of the implementation, but unfortunately the impact virtual inheritance would have on performance is a game breaker. Readable, extensible, maintainable code is important indeed, but when talking about an octree implementation... performance is key. The entire point of the octree is to increase performance, the organizational aspects of the tree are just a side-effect. So how can I create a polymorphic data type that won't slow down the implementation? Its all up to the compiler, so lets see what GCC can tell us.

To break down the details of the struct hierarchy, we can use the g++ switch -fdump-class-hierarchy, this will dump class hierarchy representations and vtables to disk. Neat, this should be interesting... to get a good baseline, lets look at a simple non-virtual ineheritance situation where I can still get the required data for a derived type. Please note, I will be using the words "struct" and "class" interchangeably. In C++, there is little difference between the two objects, save members in a struct are considered public by default. Personally, I use structs when the object has no members, and contains data only. Its a good differentiation. Anyway, for an example, I rewrote the structs for a root octree node including spatial data as follows :

struct OctreeNodeData_NonVirtual
{
	unsigned short uiLevel;
};
 
struct OctreeRootData_NonVirtual
{
	COctreeNode **nodChildren;
};
 
struct OctreeSpatialData_NonVirtual
{
	CVector3 vCenter;
	float fHalfExtents;
};
 
struct OctreeSpatialRootData_NonVirtual : OctreeNodeData_NonVirtual, OctreeRootData_NonVirtual, OctreeSpatialData_NonVirtual{};

Lets look at what g++ dumps regarding this hierarchy when using g++ -g -fdump-class-hierarchy :

Class OctreeNodeData_NonVirtual
   size=2 align=2
   base size=2 base align=2
OctreeNodeData_NonVirtual (0x7f2f28058300) 0

Class OctreeRootData_NonVirtual
   size=8 align=8
   base size=8 base align=8
OctreeRootData_NonVirtual (0x7f2f28058360) 0

Class OctreeSpatialData_NonVirtual
   size=16 align=4
   base size=16 base align=4
OctreeSpatialData_NonVirtual (0x7f2f280583c0) 0

Class OctreeSpatialRootData_NonVirtual
   size=32 align=8
   base size=32 base align=8
OctreeSpatialRootData_NonVirtual (0x7f2f28067d98) 0
  OctreeNodeData_NonVirtual (0x7f2f28058420) 0
  OctreeRootData_NonVirtual (0x7f2f28058480) 8
  OctreeSpatialData_NonVirtual (0x7f2f280584e0) 16

Alright, so we can see even when using multiple inheritance the data is neatly and compactly stored in the derived class. The derived struct is merely composed of all the data from inherited classes placed sequentially in memory. No vtable is involved. Accessing the members of this derived struct would be fast, and casting would not be complicated. Aright, so lets look at what g++ dumps regarding the previous hierarchy design that uses a virtual base class:

Vtable for OctreeNodeData
OctreeNodeData::_ZTV14OctreeNodeData: 4u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI14OctreeNodeData)
16    OctreeNodeData::~OctreeNodeData
24    OctreeNodeData::~OctreeNodeData

Class OctreeNodeData
   size=16 align=8
   base size=10 base align=8
OctreeNodeData (0x7f2f28058540) 0
    vptr=((& OctreeNodeData::_ZTV14OctreeNodeData) + 16u)

Vtable for OctreeRootData
OctreeRootData::_ZTV14OctreeRootData: 10u entries
0     16u
8     (int (*)(...))0
16    (int (*)(...))(& _ZTI14OctreeRootData)
24    OctreeRootData::~OctreeRootData
32    OctreeRootData::~OctreeRootData
40    -16u
48    (int (*)(...))-0x00000000000000010
56    (int (*)(...))(& _ZTI14OctreeRootData)
64    OctreeRootData::_ZTv0_n24_N14OctreeRootDataD1Ev
72    OctreeRootData::_ZTv0_n24_N14OctreeRootDataD0Ev

VTT for OctreeRootData
OctreeRootData::_ZTT14OctreeRootData: 2u entries
0     ((& OctreeRootData::_ZTV14OctreeRootData) + 24u)
8     ((& OctreeRootData::_ZTV14OctreeRootData) + 64u)

Class OctreeRootData
   size=32 align=8
   base size=16 base align=8
OctreeRootData (0x7f2f27e7f680) 0
    vptridx=0u vptr=((& OctreeRootData::_ZTV14OctreeRootData) + 24u)
  OctreeNodeData (0x7f2f280585a0) 16 virtual
      vptridx=8u vbaseoffset=-0x00000000000000018 vptr=((& OctreeRootData::_ZTV14OctreeRootData) + 64u)

Vtable for OctreeSpatialData
OctreeSpatialData::_ZTV17OctreeSpatialData: 10u entries
0     24u
8     (int (*)(...))0
16    (int (*)(...))(& _ZTI17OctreeSpatialData)
24    OctreeSpatialData::~OctreeSpatialData
32    OctreeSpatialData::~OctreeSpatialData
40    -24u
48    (int (*)(...))-0x00000000000000018
56    (int (*)(...))(& _ZTI17OctreeSpatialData)
64    OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD1Ev
72    OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD0Ev

VTT for OctreeSpatialData
OctreeSpatialData::_ZTT17OctreeSpatialData: 2u entries
0     ((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 24u)
8     ((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 64u)

Class OctreeSpatialData
   size=40 align=8
   base size=24 base align=8
OctreeSpatialData (0x7f2f27e7f958) 0
    vptridx=0u vptr=((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 24u)
  OctreeNodeData (0x7f2f280586c0) 24 virtual
      vptridx=8u vbaseoffset=-0x00000000000000018 vptr=((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 64u)

Vtable for OctreeSpatialRootData
OctreeSpatialRootData::_ZTV21OctreeSpatialRootData: 15u entries
0     40u
8     (int (*)(...))0
16    (int (*)(...))(& _ZTI21OctreeSpatialRootData)
24    OctreeSpatialRootData::~OctreeSpatialRootData
32    OctreeSpatialRootData::~OctreeSpatialRootData
40    24u
48    (int (*)(...))-0x00000000000000010
56    (int (*)(...))(& _ZTI21OctreeSpatialRootData)
64    OctreeSpatialRootData::_ZThn16_N21OctreeSpatialRootDataD1Ev
72    OctreeSpatialRootData::_ZThn16_N21OctreeSpatialRootDataD0Ev
80    -40u
88    (int (*)(...))-0x00000000000000028
96    (int (*)(...))(& _ZTI21OctreeSpatialRootData)
104   OctreeSpatialRootData::_ZTv0_n24_N21OctreeSpatialRootDataD1Ev
112   OctreeSpatialRootData::_ZTv0_n24_N21OctreeSpatialRootDataD0Ev

Construction vtable for OctreeRootData (0x7f2f27e7fa28 instance) in OctreeSpatialRootData
OctreeSpatialRootData::_ZTC21OctreeSpatialRootData0_14OctreeRootData: 10u entries
0     40u
8     (int (*)(...))0
16    (int (*)(...))(& _ZTI14OctreeRootData)
24    OctreeRootData::~OctreeRootData
32    OctreeRootData::~OctreeRootData
40    -40u
48    (int (*)(...))-0x00000000000000028
56    (int (*)(...))(& _ZTI14OctreeRootData)
64    OctreeRootData::_ZTv0_n24_N14OctreeRootDataD1Ev
72    OctreeRootData::_ZTv0_n24_N14OctreeRootDataD0Ev

Construction vtable for OctreeSpatialData (0x7f2f27e7fa90 instance) in OctreeSpatialRootData
OctreeSpatialRootData::_ZTC21OctreeSpatialRootData16_17OctreeSpatialData: 10u entries
0     24u
8     (int (*)(...))0
16    (int (*)(...))(& _ZTI17OctreeSpatialData)
24    OctreeSpatialData::~OctreeSpatialData
32    OctreeSpatialData::~OctreeSpatialData
40    -24u
48    (int (*)(...))-0x00000000000000018
56    (int (*)(...))(& _ZTI17OctreeSpatialData)
64    OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD1Ev
72    OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD0Ev

VTT for OctreeSpatialRootData
OctreeSpatialRootData::_ZTT21OctreeSpatialRootData: 7u entries
0     ((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 24u)
8     ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData0_14OctreeRootData) + 24u)
16    ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData0_14OctreeRootData) + 64u)
24    ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData16_17OctreeSpatialData) + 24u)
32    ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData16_17OctreeSpatialData) + 64u)
40    ((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 104u)
48    ((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 64u)

Class OctreeSpatialRootData
   size=56 align=8
   base size=40 base align=8
OctreeSpatialRootData (0x7f2f27e8c380) 0
    vptridx=0u vptr=((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 24u)
  OctreeRootData (0x7f2f27e7fa28) 0
      primary-for OctreeSpatialRootData (0x7f2f27e8c380)
      subvttidx=8u
    OctreeNodeData (0x7f2f28058720) 40 virtual
        vptridx=40u vbaseoffset=-0x00000000000000018 vptr=((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 104u)
  OctreeSpatialData (0x7f2f27e7fa90) 16
      subvttidx=24u vptridx=48u vptr=((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 64u)
    OctreeNodeData (0x7f2f28058720) alternative-path

O_O

...

Well... .... one thing is obvious. Virtual inheritance VASTLY complicates the interpretation of an otherwise simple data layout. Also pretty obvious is how badly it bloats data types by including all the vtable information in them -- the OctreeSpatialRootData type grew to 56B compared to 32B in the OctreeSpatialRootData_NonVirtual type. Seeing the size of a data type nearly double just to store information to support the virtual inheritance makes it pretty apparent that not carefully planning a class structure and falling back on complex language features isn't often a good idea. So how can I create a polymorphic data type can can be interpreted as any of it's constituent parts? Is it as simple as creating the derived classes with the longest and most granular inheritance lists? Oh snap, lets give it a whirl! So for example, if we define the node data types as follows :

 
struct OctreeNodeData
{
    unsigned short uiLevel;
};
 
struct OctreeParentData
{
    COctreeNode **nodChildren;
};
struct OctreeChildData
{
    COctreeNode *nodParent;
};
struct OctreeRootData : OctreeNodeData, OctreeParentData{};
struct OctreeLeafData : OctreeNodeData, OctreeChildData{};
struct OctreeBranchData : OctreeNodeData, OctreeParentData, OctreeChildData{};
 
struct OctreeSpatialData
{
    CVector3<float> vCenter;
    float fHalfExtents;
};
struct OctreeSpatialRootData : OctreeRootData, OctreeSpatialData{};
struct OctreeSpatialBranchData : OctreeBranchData, OctreeSpatialData{};
struct OctreeSpatialLeafData : OctreeLeafData, OctreeSpatialData{};
 

This enables us to maintain the use of OctreeNodeData as a base class, and lets us execute code like :

 
	OctreeSpatialRootData *pData = new OctreeSpatialRootData();
	pData->vCenter = CVector3<float>(0.4f, 0.8f, 1.3f);
 
	//This is kosher
	OctreeNodeData *pBase = (OctreeNodeData*)pData;
	pBase->uiLevel = 2;
 
	//This is sketchy
	OctreeSpatialData *pDataCast = (OctreeSpatialData*)pBase;
 

WOW! AWESOME, RIGHT??!?!??!!!one?!? NO, DAMN WRONG! Lol, I've been ignoring a glaring problem with this entire implementation, specifically the downcast process. What do you think the value of pDataCast->vCenter would be? I'll tell you: the first 12 bytes at offset 0 in the original OctreeSpatialRootData struct. So you'd be reading uiLevel, nodChildren etc, and interpreting it as the 3 float vector you were looking for. Ugly. There is absolutely Unfortunately C++ gives us the freedom to do this when using C style casts. If we were smart and used static_cast(pBase) instead, the compiler would correctly throw an error and tell us the static_cast was invalid. Obviously also if we were smart we could use dynamic_cast(pBase), and at the cost of a bit of overhead the compiler wouldn't complain and the C++ cast would actually correctly downcast to the derived struct from the base pointer. I have been ignoring this obvious solution because run time type information (RTTI) in C++ has been shunned in large scale projects due to it's habit of bloating types (as we saw), and also degrading performance. Again performance is key because we are designing an acceleration structure... it would be against the very nature of this endeavor to fall back on dynamic_cast... even if its the smart thing to do in this situation.

Are we smart though? Nah, in fact we are so stupid we are going to further complicate things. What would happen if we stuck with the normal C style cast, and provided an appropriate offset to the cast? Indeed, if you did something like (OctreeSpatialData*)((unsigned char*)pBase + 16);, you would get a pointer of type OctreeSpatialData* that actually pointed to the correct location within the original OctreeSpatialRootData structure, and you could do as you please with the pointers. So lets think about how we can somehow generate the required offset into a composite data type that is defined by its m_uiType bitmask given a particular subtype in the composite subtype.

Obviously, we can immediately tell if the requested subtype exists in the composite type found in the node instance by using bitwise operations between the node's m_uiType bitmask and the requested subtype's corresponding type bit. It also should be obvious though that the accessor for this subtype shouldn't even be called if the composite type doesn't contain the required type. We can save some branching by ignoring the possibility.

An immediate observation is also that to offset the cast based on the byte size of each constituent subtype struct in the composite data, the pointer has to point to data that is 1B in size. Due to the way pointer arithmetic works, adding 1 to a pointer shifts it in memory based on the size of the type, so adding one to a float* pointer would make the pointer point 32 bits (4B) ahead because this is the size of one float element (pointer arithmetic was designed to work on arrays of data). Sooo... in order to make our offsetting simple and eliminating the need for a cast, it is quite possible to redefine the base OctreeNodeData type to span only 1B. The only data this struct contains is the level of the node, and I should think that a 256 level would suffice for ANY purpose. Just for fun, if you recusively split a 256 level octree, you would end up with 1.55×10²³¹ nodes. If each node took 1ms to create, it would take 3.57×10²¹¹ times longer than the age of the universe (a scant 4.336×10¹? seconds) to create the entire tree. Sound like a big enough tree? Lets continue.

So we define the base data type as:

 
struct OctreeNodeData
{
    unsigned char uiLevel;
};
 

We can continue to use uiLevel in bitwise operations described by Frisken and Perry. It is always used as the right operand in bit shifts, it never gets shifted itself. So with that done all we need to do now is generate the offset... I actually had to stop and think about this a while... the offset needs to be generated very fast but depends on the subtypes included in a specific composite type. Iterating though the bits set in the node's m_uiType bitmask would allow us to sum the appropriate offset together pretty quick given the size of the component associated with each bit... but iteration is expensive. What if we multiplied the size of the subtype by a 0 or 1 based on if it was contained in the composite type? This would look something like:

 
 

So at the cost of some subtle issues, we haven't sacrificed speed or memory usage. In fact by moving away from virtual inheritance we have saved a huge amount of memory bloat, and that's just icing on the cake of vastly improved efficiency for casts and member access. As long as new derived data types are carefully laid out, there are only minor issues expanding or conjoining them with existing data. It is important to note if you for example tried to join the existing spatial node data with the existing location code note data, maybe like this:

 
 

You can easily create a data structure This is what the new struct hierarchy looks like to GCC :

Class COctreeNode
   size=16 align=8
   base size=10 base align=8
COctreeNode (0x7f66d3ba1240) 0

Class COctree
   size=8 align=8
   base size=8 base align=8
COctree (0x7f66d3ba12a0) 0

Class OctreeNodeData
   size=2 align=2
   base size=2 base align=2
OctreeNodeData (0x7f66d3ba1300) 0

Class OctreeParentData
   size=8 align=8
   base size=8 base align=8
OctreeParentData (0x7f66d3ba1360) 0

Class OctreeChildData
   size=8 align=8
   base size=8 base align=8
OctreeChildData (0x7f66d3ba13c0) 0

Class OctreeRootData
   size=16 align=8
   base size=16 base align=8
OctreeRootData (0x7f66d39ce690) 0
  OctreeNodeData (0x7f66d3ba1420) 0
  OctreeParentData (0x7f66d3ba1480) 8

Class OctreeLeafData
   size=16 align=8
   base size=16 base align=8
OctreeLeafData (0x7f66d39ce700) 0
  OctreeNodeData (0x7f66d3ba14e0) 0
  OctreeChildData (0x7f66d3ba1540) 8

Class OctreeBranchData
   size=24 align=8
   base size=24 base align=8
OctreeBranchData (0x7f66d39d81e0) 0
  OctreeNodeData (0x7f66d3ba15a0) 0
  OctreeParentData (0x7f66d3ba1600) 8
  OctreeChildData (0x7f66d3ba1660) 16

Class OctreeSpatialData
   size=16 align=4
   base size=16 base align=4
OctreeSpatialData (0x7f66d3ba16c0) 0

Class OctreeSpatialRootData
   size=32 align=8
   base size=32 base align=8
OctreeSpatialRootData (0x7f66d39ce770) 0
  OctreeRootData (0x7f66d39ce7e0) 0
    OctreeNodeData (0x7f66d3ba1720) 0
    OctreeParentData (0x7f66d3ba1780) 8
  OctreeSpatialData (0x7f66d3ba17e0) 16

Class OctreeSpatialBranchData
   size=40 align=8
   base size=40 base align=8
OctreeSpatialBranchData (0x7f66d39ce850) 0
  OctreeBranchData (0x7f66d39d83c0) 0
    OctreeNodeData (0x7f66d3ba1840) 0
    OctreeParentData (0x7f66d3ba18a0) 8
    OctreeChildData (0x7f66d3ba1900) 16
  OctreeSpatialData (0x7f66d3ba1960) 24

Class OctreeSpatialLeafData
   size=32 align=8
   base size=32 base align=8
OctreeSpatialLeafData (0x7f66d39ce8c0) 0
  OctreeLeafData (0x7f66d39ce930) 0
    OctreeNodeData (0x7f66d3ba19c0) 0
    OctreeChildData (0x7f66d3ba1a20) 8
  OctreeSpatialData (0x7f66d3ba1a80) 16

So in conclusion, there you have it. An all-powerful octree implementation. Not a bit of unnecessary storage as long as the nodes are constructed appropriately, not a bit of constraints that would stop us from creating further derived classes

I’m not dead

Posted in Uncategorized on April 20th, 2012 by Chaos Engineer

Contrary to what my blog post updates may hint at, I am not dead. Real life has pulled me away from the niches I previously attached myself to. I have bought a house, resurrected ancient hardware, started exploring electrical and mechanical engineering, and have had prized handheld consoles stolen by squatters.

Yes, I have joined the legions of individuals burdened by mortgages. I too long yearned to have the freedom to freely modify my homestead without the fear of loss. I now have a house and a man-cave I can call my own and am more or less limited only by my whims to create house mods and install devices in my walls.

I recently picked up a couple Tripp Lite hardwired power strips, the kind of power strips you would see in a laboratory. I installed circuit breakers for them, drilled and fished power line, and nicely horizonally mounted (wide-blade up -- this is AC neutral) above the tempered hardboard work surfaces (rubbed with boiled linseed oil) in my man-cave. I was quite pleased with the result. I now have a place to explore the mechanical and electrical engineering aspects of my obsession.

The mere prospect of generating chaos in electrical circuits crushes my spirit with it's scope. There are things you can accomplish simply in an electrical circuit that would bring even behemoth number crunching computers to thier knees. The possibilities opened up by operational amplifiers (opamps) has truly brought me to a standstill. I recently hooked up some opamps together to mathematically recreate equations I use on the computer. Yes, indeed... you can use opamps to perform multiplication, define coefficients, and even integrate/differentiate equations with a voltage signal. I created a circuit to recreate an equation I used in software to generate strange attractors with outputs for X Y and Z axis. When I hooked my Tek 465b oscilloscope up the outputs and saw the same attractor I draw in sofware represented in varying voltages in hardware, I hung my head. I am simply floored by the possibilities. DRSSTCs driven by chaos? I already hooked the outputs to an audio sampler and was astounded by the results.

Yes, with the freedom of a house I was able to bring old computers out of statis. A few machines with 1999 era cutting edge hardware were reborn running Ubuntu 12.04 and Fedora 17. Of this experiment I can say one thing for sure. Gnome Shell will change the future of Linux computing. Tablets powered by Linux and Gnome Shell will offer possibilities far exceeding those of even Android (which I respect greatly... I would sever a testicle to have an Asus transformer prime this instant). The gnome shell with mutter provides a composited graphical interface that is both simple and elaborate at the same time. The Gnome Shell Extensions site is an exciting preview of what is to come for gnome. installing shell extensions straight from your browser? Sign me up.

Also... yes, amidst my joys there is some agony. I had left my Japanese Nintendo DSi LL (first run, unique colors) in a backpack in my office at work. Apparently at the same time some homeless fellow had found a reliable way to enter our building on a regular basis and had taken up residence in a closet in an unused area of our floor. This indivudual stole my Japanese Nintendo DSi LL along with the development cartridge in it. Unfortunately the minimum claim for theft loss by my company's insurance is $1000, which my DSi didn't quite meet. This has left me at a total loss and in a state of being inable to develop software for the NintendoDS, which saddens me greatly.

In a sense this loss can be benefit, in that I can refocus on other things. Regardless, I felt the need to post again... you know, on the same theme as my last post, z-buffer clearing is critical! Since I had no NintendoDS to develop on, I started writing code for Android. The devices that support Android can still be quite a varied landscape, regardless of what Google says. I wrote some GLES 2.0 code that ran just fine on a Droid 3, but when trying to run it on a Droid X or the Motorola Xoom I was seeing 3D black snakes being stuck in the middle of my scenes. The resolution? If you want to clear a depth buffer make sure you enable writing of the depth buffer before clearing. glDepthMask FTW!

Anyway sorry If my post has been incoherent or riddled with grammatical errors. I am half cocked on Firefly Iced Tea Vodka. At least it gave me the impetus to post again. I hope soon to provide updates on my Gnome Shell 3.4 endeavors and maybe some details on how to generate chaos using non-linear opamp circuits (trust me, this shit will blow your mind).

Regards,

-Me

z-buffer frustration

Posted in OpenGL, rendering, Uncategorized on September 7th, 2011 by Chaos Engineer

As part of trying to keep active blogging, I thought I would start just detailing the day-to-day things that I scratched my head about before coming to a resolution. Today, I just spent a couple very frustrating hours trying to figure out why GL_DEPTH_TEST was culling my entire scene when rendering to a texture. I won't detail the process of setting up a framebuffer in OpenGL for render-to-texture (RTT) purposes, there are much better sites for that. If you have done any RTT in the past, you know a framebuffer contains only color information, and to perform depth testing when rendering to texture, you have to attach a renderbuffer object to the framebuffer as a "depth attachment". The renderbuffer object will then store the depth buffer for RTT purposes.

So I was doing a lot of RTT recently, a lot of which was just on fullscreen quads (resting on the z-near plane). When it came time to render some actual geometry to a texture, I realized I needed depth testing, so I enabled GL_DEPTH_TEST and as soon as I did that, my geometry no longer rendered. I was confused at first because the fullscreen quads continued to render correctly... I did have to go verify that my geometry was indeed just failing the depth test by adjusting glDepthFunc. After much reading on the net, something finally caught my eye: "No, the only way to use a renderbuffer object is to attach it to a Framebuffer Object. After you bind that FBO to the context, and set up the draw or read buffers appropriately, you can use pixel transfer operations to read and write to it. Of course, you can also render to it. The standard glClear function will also clear the appropriate buffer." .

For some unknown reason it never struck me that the framebuffer / renderbuffer needed to be cleared. I was always either drawing every pixel in the scene, or clearing the render target texture before each RTT operation, so I never noticed the framebuffer needing a clear. I have my engine set up to bind and unbind the framebuffer and renderbuffer at the beginning and end of an RTT operation. I was using glClear only outside the times the buffers were bound, so they were never being cleared! the renderbuffer (depth buffer) was being completely filled with the absolute smallest depth value because I was rendering quads to the z-near plane, and I was never clearing it!

Looking back it was a stupid mistake, but if depth testing is failing hard on your RTTs, just remember to call glClear with the depth bit specified WHILE YOUR FRAMEBUFFER AND RENDERBUFFER ARE STILL BOUND.

Well, I just thought I would share and give myself a reason to not forget that I have a blog. I'm headed back off to finish making a geometry buffer and see what kind of awesomeness I can accomplish with some deferred rendering.

Electric Fields and House of Cards

Posted in Uncategorized on August 4th, 2011 by Chaos Engineer

Prior to getting caught up in Windows and gaming, I did bring my electric field simulation to fruition with audio-reactive nuclei. The tech previously posted is pretty crappy, because it is just that... a tech demo. After establishing the electric field simulation, I spent some time using the OSX subsystems to read in audio files and perform realtime FFTs of the audio as it played. I used the FFT data to modulate the "pauli exclusion" radius, which made the nuclei pulse and push the electrons away based on the exponentially increasing force. Having electron-electron interactions was prohibitively expensive, so I settled on only electron-nuclei interactions.

It would perhaps be possible to create a spatial octtree that I could sort the electrons by and mediate the electron-electron interactions to make it computationally reasonable, but I never got around to finishing the implementation I was working up. The end result with only electron-nuclei interactions and integration of the laser scanned point-cloud data provided by Radiohead still provided a nice production-quality video.

An electric field simulation by itself would be rather boring. Taking the points initial positions as formulated by strange attractors really makes the difference. Moving the nuclei around in 3-space using the color channels of a 32-bit image of generated Perlin clouds adds to this. The video below is just that, nuclei being spatially modified by Perlin noise, pulsing based on FFT data and eating up various strange attractors. Thom Yorke's laser-scanned jibs make a cameo appearance.

Ashamed

Posted in Uncategorized on August 4th, 2011 by Chaos Engineer

Well, I have to admit I am totally ashamed. I am ashamed of a number of things.

First, I am ashamed I have yet again let my blog stagnate. My MacBook Pro keyboard malfunctioned, and when I brought it to Apple to have it fixed, I found out my Apple Care plan had expired and it is IMPOSSIBLE to get my computer covered again. The replacement cost for the keyboard was somewhat prohibitive, but more than anything else I was angry with Apple and became disenchanted with my MacBook... it has been gathering dust for over a year now. I moved on, which led to the second thing.

Second, I am ashamed to admit I bought a behemoth number cruncher laptop -- An ASUS G73JH-B2 -- and started to actually enjoy Windows 7. However, after months and months of furious gaming and spending money on Steam... I realized I wasn't being at all productive, and no amount of gaming could assuage the feeling of guilt I felt for letting my projects gather dust. I only recently partitioned the primary disk on my laptop and installed Linux, and I quickly recalled why Linux is glorious. Even Windows 7 can't stand up to the versatility and streamlined glory of a well done GNU Linux install. Flashy? Nah, Aero << Compiz-Fusion.

Third, and somewhat off-topic, I am ashamed of the current state of our nation... especially the political system. The FEC is not doing it's job to protect citizens from being governed by figureheads bought out by corporations and rich fat cats ( For Example ). The debt ceiling was raised, but with NO TAX INCREASES. The entirety of the deficit reduction measures are going to be taken out of budgets? It is inconceivable that we can address the deficit by mere budget cuts... and our infrastructure and education system can't stand them. Unfortunately due to all our representatives being motivated/owned by big money, it is nearly impossible to get tax increases on the rich (republican translation: "Job Creators"), which when coupled with modest budget cuts, is THE ONLY REASONABLE WAY FOR OUR NATION TO REDUCE OUR DEFICIT.

The first two things I can address and already have started to by installing Linux and resurrecting my dead projects, and trying to stay out of Windows and off Steam... although when Skyrim comes out I will certainly disappear for a while. I never released the final version of my NDS Wifi project for a couple reasons... for one I wrote it using SDL 1.3, which isn't used much yet... and for two I couldn't get it in a neatly functional state on Windows. By biting the bullet and actually posting here again, I can begin addressing the stagnant blog. By reviving my projects and working towards completion I can address the disuse of GNU Linux. They will couple and support one another.

The last item I feel completely helpless about. Can I write my congressman? Sure, but will it make a difference? Probably not. The FEC needs to be purged and for lack of a better term, reformatted. They need to actually protect the working man and common citizens against a government being owned by corporations and big money. Such a government will constantly lean in favor of the rich, at the expense of the common man. The provisions of the debt ceiling increase make it obvious that such a government is already established. It is unreasonable to not increase taxes--especially on the rich, who can afford it--when attempting to address our national deficit. I'm all for budget cuts--especially in the military sector--but our nation cannot afford to take cuts anywhere else.

States are going broke, its already become obvious to me that our state parks have suffered as a direct result. Whats next? Budget cuts for our already faltering education system? The rich don't care because they send their children to private schools where these cuts don't matter. The rich are padded from nearly all the ramifications of the cuts they favor over tax increases. They don't vacation in state parks, they go to private resorts. I could ramble on this for hours, but it makes me angry and anxious just to think about.

Anyway, that aside... I hope to get back into my chaos generation and visualization scheme, and blogging about my discoveries along the way. We shall see if I succeed in refocusing.

Hope everything is well for you so far in 2011.

Coulomb’s Law Simulation

Posted in Uncategorized on March 18th, 2010 by Chaos Engineer

I created a gravitational simulation a while back, and it was pretty cool. I was recently reading an article about how similar Coulomb's Law is to Newton's Law of Universal Gravitation. I thought I would visit the world of physical simulations again, this time using my strange attractor engine to simulate charges in an electric field.

Obviously the first step is to create charges that contribute to an electric field. The simple solution is to turn the points on the attractor into electric charges, lets say electrons. We now have a collection of charges creating an electric field. The next step is to implement Coulomb's Law.

The scalar form of Coulomb's Law states that f = k * ( (q0 * q1) / (r * r) ), where k is the Coulomb constant, q0 is the charge of particle 1, q1 is the charge of particle 2, and r is the distance between the particles.

There are a couple things to note about this equation. Most importantly is how the sign of the charge on each particle effects the sign of the force magnitude itself. Two charges of like sign result in a positive force, and two charges of opposing sign result in a negative force. Also, this equation simply gives the magnitude of the force, there is no direction associated with it.

To get a resultant force vector is quite simple. We take the vector formed by the two particles, and reduce it to unit length. We then multiply this unit length vector by the scalar magnitude of the force as determined by Coulomb's law.

To create a "field", we have to find the cumulative force felt on a charge from all sources. So to put it in programmer language, for each individual particle we iterate through all the charged particles in range, accumulating the force vector for each to create a resultant overall force vector. It can be said that this resultant force vector represents the field at that given point.

Obviously we can then resolve this force into acceleration knowing the particle's mass. Maintaining a velocity vector for each particle, we can use this acceleration to create a realtime simulation of the particle's motion through the field. As the position of these particles change, the field changes, creating a feedback system. Any lover of chaos knows we have established the prime requisite for a complex system. Hopefully we can stir some emergence from this system.

Applying these rules to a simple collection of electrons isn't very interesting. The cloud of electrons just explodes outward, and continues on to infinity (assuming no dampening). We need to add some positive charges to the mix, and shake it about. For the interest of science, let us create atomic nuclei with a charge much greater than that of a single electron.

Throwing a couple of randomly positioned positively charged nuclei into the simulation does indeed spice things up. Now we get quite interesting swirling clusters of electrons, some maintaining orbits strongly resembling electrons in the common depiction of the atom.

Sometimes an electron will approach a nucleus on just the right vector and get very very close to it's true position. This causes the denominator of the Coulomb's Law equation to approach zero, and the magnitude of the force to skyrocket. This results in the electron being flung off into space at high speeds. I wanted to address this problem, as it didn't jive with the aesthetics I hoped for.

I remember watching an episode of Fringe not too long ago, where two people were discussing the collision of two dimensions. One of the individuals dropped a reference to the Pauli Exclusion Principle, and then smashed two snowglobes together for effect. This principle explains much about the structure of atoms, and applies here because of what it says about electrons orbiting the atomic nucleus. Exercising some artistic liberties with the interpretation of this principle, I made our nuclei exert an exponentially increasing opposing force on electrons that approached too closely. If you wanted to add some scientific rigor, it would be elementary to make definitions on the characteristics of each nucleus, and apply a different force gradient to approaching electrons based on how many were already in orbit. You could easily create electron shells where the outermost (valence) shell could contain electrons that could easily be pulled off by passing nuclei, just like in chemical reactions!

I didn't go so far as to create multiple shells (yet), but the initial results are promising. The electrons around the nucleus form self-organizing spheres, as they seek the lowest energy by spacing themselves evenly across the surface. When multiple nuclei are interacting, its neat to see how the charge density changes and the electrons pack themselves more densely on the sides of the sphere facing nearby nuclei.

I'm looking forward to modulating the charge of nuclei based on realtime audio FFT data.

NDS WiFi Programming with devkitARM — Part 4

Posted in NintendoDS, sensors, Uncategorized on February 25th, 2010 by Chaos Engineer

We've established the important parts of the PC application, the NdsInterfaceHost, and now comes time to actually delve into the DS programming aspects w/ devkitARM and dswifi. We will here lay out the design of the NintendoDS application, the NdsInterfaceClient. Many of the network and socket concepts we covered during Part 3 of this article will carry over into the DS code. I will try not to cover the same ground. Also note that it is entirely possible to use C++ and iostream type code with devkitARM. I chose to use standard C because it is less verbose, and for the DS platform, it just seems more appropriate to use less abstraction.

The NDS application exhibits much code that is ancillary to the main purpose of the project. Being a platform that is rarely developed for, there is only a small community focusing on NintendoDS development. I figure that any code that exploits the features of the DS or establishes a framework that can be extended and used for new purposes is a good contribution to the community. I won't cover this ancillary code, but be sure to download the project and examine the extra bits like the chat server that will let you telnet into your DS.

The basis of the NdsInterfaceClient is simple. On start, it displays some information about the DS it is running on, and then prompts the user to press B to get a list of wireless access points (APs). The code then calls Wifi_ScanMode(), and enters a loop that constantly polls the number of APs, collects information about each, and displays this information on the screen. The Wifi_ScanMode() call actually tells the dswifi library to poll and maintain an internal list of available APs. You can then query the length of this internal list using Wifi_GetNumAp(), and then get specifics using Wifi_GetAPData(...), passing the index of a particular AP.

For the AP list, I reused some code written by Stephen Stair (the author of the dswifi library). I added some ANSI color for readability, and changed the format for AP data so it offered more info, particularly letting the user know if an AP was WAP protected (and thus unusable by dswifi). One of devkitPro's first feats was to implement a console on the DS. The state of this implementation today is quite elite. The console supports ANSI escape sequences, which allow coloring text and arbitrarily relocating the cursor position. One could create an ASCII game using the console and escape sequences alone. The chat server code shows how to use escape sequences to create a "window" in the console.

Once an AP is selected from the list, the code will determine if the user needs to enter an encryption key. If so, a keyboard appears, and allows the entry of an encryption key. The user can enter either a 40b or 104b encryption key. Based on the length of the entry, the code determines if it is 64 or 128-bit WEP encryption, and attempts a connection to the AP. This is one of the least documented basic steps for using dswifi, and on top of that, there are some curiosities worth mentioning.

WAP encryption is unsupported by the NDS and NDS Lite. Support for WAP was added to the DSi, but it isn't likely we'll see it supported by the dswifi library any time soon, if ever. We are limited to using WEP-secured APs. There are two levels of supported WEP encryption, 64-bit and 128-bit. Here is where a curiosity appears. Both 64b and 128b WEP encryption use a 24b Initialization Vector, which is not included in the length of the encryption key you enter. So the actual length of the entered encryption keys for 64b and 128b WEP encryption are 40b and 104b, respectively. It is fine to refer to these encryption levels as either bit length including the IV, or bit length excluding the IV. Strangely, the dswifi library does both, and defines the following two values: WEPMODE_40BIT and WEPMODE_128BIT. It should be obvious which is which.

So, with the keyboard displayed a user can enter the ASCII representation of a 40b or 104b hex key, which is either a 10 or 26 character long string, respectively. We then need to convert this ASCII representation of a hex value into a 5 or 13 byte actual hex value. To accomplish this, we look at each character in the string and convert it from it's ASCII value to it's hex value. We then pack two of these hex values into a single byte (unsigned char), and append each byte onto our encryption key. Once we have 5 or 13 of these bytes, we pass them to the Wifi_ConnectAP(...) function, and wait for the connection to succeed or fail. This is accomplished by polling Wifi_AssocStatus(), waiting for it to return ASSOCSTATUS_ASSOCIATED, or ASSOCSTATUS_CANNOTCONNECT.

Once we successfully connect to a wireless access point, our DS will be assigned an IP address (we assume the use of DHCP). Now connected to the network, we can then start using functions like gethostbyname(...) to look up DNS entries, or connect(...) to make socket connections to remote hosts. Things are finally getting interesting. To provide a user interface to a variety of functions, we print a simple menu, with each DS button bound to a different function, and wait for a keypress:

 
if(status == ASSOCSTATUS_ASSOCIATED) while(1)
{
	u32 ip = Wifi_GetIP();
 
	consoleClear();
 
	iprintf("\n");
	iprintf("ip: [%i.%i.%i.%i]\n", (ip ) & 0xFF, (ip >> 8) & 0xFF, (ip >> 16) & 0xFF, (ip >> 24) & 0xFF);
	iprintf("--------------------------------\n");
	iprintf("Press Y to nslookup host name\n");
	iprintf("Press X to connect to interface\n");
	iprintf("Press A to listen on socket\n");
	iprintf("Press B to break\n\n");
 
	keypressed = 0;
	while(!(keypressed & (KEY_Y | KEY_X | KEY_A | KEY_B | KEY_L)))
	{
		scanKeys();
		keypressed = keysDown();
	}
 
	if(keypressed & KEY_Y)
	{
		//execute gethostbyname(...)
	}
 
	//etc...
}
 

Now the code governing the interaction with the Host interface is very similar to the code in the NdsHostInterface application. We synchronously send a register signal, receive the signal port (and data port) back with the register acknowledgement signal from the Host interface. We then use this signal port to construct a listener, and this data port to pipe input data to the Host. This is the main loop of the NdsInterfaceClient:

 
while(1)
{
	tvTimeout.tv_sec = 0;
	tvTimeout.tv_usec = 1000; // This will throttle the data xfer rate just a tad...
	// I've found that anything less than 1000 microseconds will make select return immediately
	// when called on the DS platform
 
	FD_ZERO(&fdsRead);
	FD_SET(sktServer, &fdsRead);
 
	retval = select(1, &fdsRead, NULL, NULL, &tvTimeout);
 
	if (retval == -1)
		iprintf("select() error\n");
	else if (retval)
	{
 
		if((sktClient = accept(sktServer, (struct sockaddr *)&adrSignalClient, &iClientSize)) < 0)
		{
			iprintf("Failed to accept client connection.\n");
			continue;
		}
		iprintf("Client connected: %s\n", inet_ntoa(adrSignalClient.sin_addr));
		InterfaceSignalHandler(sktClient);
	}
 
	// If there are no signals from the host, and pen is down, send input data on UDP port.
	if(keysHeld() & KEY_TOUCH)
	{
		touchRead(&touchXY);
		pos = 0x00000000;
		pos = (touchXY.px | ((int)(touchXY.py) << 16));
		iprintf("Sending 0x%x", pos);
		sendto(sktData, &pos, 4, 0, &adrData, sizeof(adrData));
	}
 
	scanKeys();
	if(keysDown() & KEY_B)
		break;
}
 
// Broke out of main loop, so unregister from Host and shutdown...
 

That is really the jist of the NdsInterfaceClient. Obviously check out the source code to see how to appropriately construct sockets. In my last article in this series, I'll provide links to the complete source code for both applications. I might take the time to make the NdsInterfaceHost compile on Windows as well as Linux and OSX, but no promises. I'd rather leave the standard Berkeley sockets implementation alone, and let you port it to winsock yourself... or even better let you install a REAL operating system, and run it there. =)

NDS WiFi Programming with devkitARM — Part 3

Posted in NintendoDS, sockets on February 18th, 2010 by Chaos Engineer

Finally, after half a year... part 3.

I don't want to go any further without laying out some terminology to use. This project will consist of two parts, an application running on a PC, which I will refer to as the "Interface Host", and an application running on a NintendoDS, which I will refer to as the "Interface Client".

The host application as you may know will be rendering a cube using SDL and OpenGL. It will have a socket listener that will allow the NintendoDS to connect and negotiate terms by which it will send input data to the PC. I wanted to make the project simple but at the same time built with a framework that can be easily extended to do even more interesting things like pipe audio from the DS to the PC or something of the like. In order to do this, we'll define a simple protocol.

The protocol is quite simple. The Host machine will have a TCP listener that will allow Clients to connect and register. Once registered, a Client opens it's own TCP listener that will allow the Host to send commands to it. These listeners are used for signaling, or sending commands back and forth. I will use the terms signal and command interchangeably. After establishing signal listeners on both ends, there is a channel by which communication between the applications is simple. They can use this channel to negotiate terms for a transfer of DS touchpad data, for example.

By having an array of agreed-upon values that correspond to specific signals, and expected data to follow those signals, the applications have the foundation of a protocol. There may be more to the protocol as well, such as an expected order of operations. Again with a focus on simplicity, let us outline a protocol:

Nds Interface Protocol

Nds Interface Protocol

The idea here is that the machines first negotiate with one another, eventually establishing a method to transfer data from the client to the host. All negotiation occurs over TCP, but the actual data transfer will occur over UDP on an agreed-upon port. I won't go into details of TCP vs UDP. There are articles on the net written by people far more qualified than me if you are curious. The basic reasoning is that we need the negotiation to be reliable, but we can tolerate a missing data transfer. Although in theory a series of UDP packets sent over a network can arrive at the destination in any order, I have never seen this happen in practice. Perhaps because all my practice has been over a LAN... maybe things over a WAN would be different. Regardless... UDP works great in practice and provides the benefits of being simpler and carrying less overhead. UDP is generally used for realtime streaming applications where a lost transfer doesn't break the overall functionality of the system. This is exactly the situation for our input data.

The host machine opens a listener on a port the client knows to look for. The client connects to the host machine on this port, and sends the Register signal. The host responds on the same socket with the RegisterAck signal. This acknowledgement signal is followed by a 32-bit integer telling the client what port to open as it's own signal listener. The host then knows how to communicate with it's newly registered "node". After constructing it's own signal listener, the machines can have any number of conversations initiated by either end. Each conversation will typically occur over the same socket connection, and consist of a signal followed an acknowledgement signal w/ a data payload. A simple beowulf cluster could be implemented using this design. Typically deployed over a non-homogenous array of machines, a beowulf host might send a signal to query each node's capability and the node might respond with the time it required to crunch some numbers using it's potentially unique hardware specifications.

The first applications I wrote that implemented this signaling design used multiple threads. The main thread was the signal listener, and when it got incoming connections, it would accept them and then pass the connected socket to a newly spawned thread that would handle the signal from the client. At the time, I thought this was an elite implementation, and without a doubt the best way to accomplish things. I then got myself into situations where I wanted to construct a signal listener either in a project I didn't want to implement threads or on a platform that doesn't support them. Enter the select() system call.

Multiplexing in a single thread, now THAT is elite. Using select(), you can easily monitor a listening socket for new connections while still handling signals from new sockets and even sending data every frame. The innermost loop of our Interface Client application on the DS will do exactly this. It will be managing the signal listener, handling any new signals and sending touchpad coordinates over the established data socket.

Here is what this synchronous code would look like inside an application's main loop, here being part of the Interface Host's main loop:

 
while(!g_bDone) // Pseudo main loop
{
	// Check listener for new signals. This is done in a time-sensitive manner
	//using select to return immediately if there is no data/connections present.
	tvTimeout.tv_sec = 0;
	tvTimeout.tv_usec = 0;
 
	FD_ZERO(&fdsRead);
	FD_SET(sktHost, &fdsRead);
 
	retval = select(FD_SETSIZE, &fdsRead, NULL, NULL, &tvTimeout);
 
	if (retval == -1)
	{
		cerr << __FILE__ << '@' << __LINE__ << " : select() error waiting on accept." << endl;
	}
	else if (retval)
	{
		cout << "Incoming connection..." << endl;
 
		if((sktClient = accept(sktHost, (struct sockaddr*)&adrInterfaceClient, (socklen_t*)&iClientSize)) < 0)
		{
			cerr << __FILE__ << '@' << __LINE__ << " : Failed to accept client connection." << endl;
		}
		else
		{
			strcpy(g_szConnectedClientName, inet_ntoa(adrInterfaceClient.sin_addr));
			cout << "Client connected: " << g_szConnectedClientName << endl;
 
			InterfaceClientSignalHandler(sktClient);
		}
	}
 
	// Do whatever else here... render objects, process input, etc.
}
 

The next article -- part 4 -- will detail the NintendoDS implementation using the dswifi library, and I'll probably write a 5th article just to wrap it all up.

Tags: , , , ,

Unforgotten

Posted in gaming, OSX on February 15th, 2010 by Chaos Engineer

I almost forgot I started a blog!

About six months ago I got a MacBook Pro, so I spent a long while just updating my applications to run on OSX, and also exploring Core Foundation. I've made some neat stuff that I'll share with you guys, as soon as I get out this NDS WiFi programming project that I've been promising for nearly a year. =P

Around the same time I got my new Mac, I also found Dragon Age: Origins. Installing this game created a time anomaly centered around my computer. I played enough DA:O that the heat generated by my video card has likely contributed greatly to global warming, and melted a few iceburgs. The deaths of many penguins was worth the glory of turning back a blight.

After DA:O (like 6 characters and playthroughs later), I got wrapped up in the whole Bioware strategic RPG thing. I played through the entire Baldur's Gate series with all expansions, then the Neverwinter Nights series with all expansions, and I just finished Mass Effect. SOO... needless to say, I've been distracted.

I haven't forgotten about you however, and I will soon piece together the final bits of the NDS WiFi project, and make a post to bring it all together, I promise. =)

Tags:

DERAILED — Platform independent dynamic linking

Posted in Uncategorized on August 9th, 2009 by Chaos Engineer

Ooops, I definitely got sidetracked more mucking around with my platform independent rendering engine. Instead of just holding out on the NDS wifi article in silence for any longer, I thought I would chime in and let you know I haven't forgot about you and show you what I've been up to.

My platform independent rendering engine is based on a pure virtual renderer base class that is filled with an instance of a derived class by a dynamically linked code module. Along with the pure virtual renderer class, there are a few pure virtual resource classes as well, the actual instantiation of which are performed by the renderer class. Trying to maintain platform independence, the singleton (factory / manager) class that returns this renderer instance has some interesting code I thought I would share.

It is possible to have a project directory (even though a bit messy) that will compile in both Microsoft Visual Studio and in GCC without having to change a lick of code. Thats child's play for a simple project, but its still possible even for a project that contains a main executable and several other dynamic libraries.

So first off, since we are dynamically linking, we are definitely going to have function pointers returned by dlsym(...) or GetProcAddress(...). The actual DLL/SO modules export only three functions--one to create the instance, one to destroy it, and one to return the version. So we have the typedefs for these as so:

 
typedef int (*fpCreateRendererInterface)(CRenderer **pInterface);
typedef int (*fpDestroyRendererInterface)(CRenderer **pInterface);
typedef int (*fpQueryRendererVersion)(void);
 

Second off, this is platform independent, so obviously we will need to be using some fancy pre-processor conditionals (PPCs) to detect the platform / compiler. Now in certain instances we only want to know what the compiler is, because the actual platform doesn't matter, but in other cases we want to know the actual platform as well because the same compiler is used on multiple platforms (GCC is used in Linux and also Mac). First for the compiler, we can detect Visual C++ by checking to see if _MSC_VER is defined, and we can detect GCC by checking to see if __GNUC__ is defined. For the platform, we can detect Windows by checking if WIN32 is defined (yes, even 64-bit Windows defines this), we can detect Linux by checking if __linux__ is defined, and we can detect OSX by checking if __MACH__ and __APPLE__ are defined.

We can then use these PPCs to create code that is savagely flexible. In this singleton class, I have a static void pointer to the library (Windows' HMODULE is just a void*, don't let them trick you) called m_hLibrary. Lets see how we could dynamically link either a .so file in Linux or a .dll file in Windows and place the returned handle in the same variable... here is the code for the "SGL" (SDL GL) renderer type which can be used in either Linux or Windows, taking into account if the code is in release or debug mode:

 
case rstSGL:
	Log::Entry(-1,__FILE__,__LINE__,"Creating SGL rendering interface...");
#if defined(__GNUC__)
 
#if defined(NDEBUG)
	Log::Entry(-1,__FILE__,__LINE__,"__GNUC__ defined... opening release library");
	m_hLibrary = dlopen("./libRendererSGL.so",RTLD_NOW);
#else
	Log::Entry(-1,__FILE__,__LINE__,"__GNUC__ defined... opening debug library");
	m_hLibrary = dlopen("../RendererSGL/bin/Debug/libRendererSGL.so",RTLD_NOW);
#endif
	if(!m_hLibrary)
		Log::Entry(2,__FILE__,__LINE__,"Error loading shared library. dlerror() = %s",dlerror());
#elif defined(_MSC_VER)
#if defined(NDEBUG)
	m_hLibrary = LoadLibraryExA("RendererSGL.dll",NULL,NULL);
#else
	m_hLibrary = LoadLibraryExA("../RendererSGL/bin/Debug/RendererSGL.dll",NULL,NULL);
#endif
#endif
	break;
 

Please note that NDEBUG is not defined by GCC even when using -O2, so you need to set your release target to define this manually (-D NDEBUG). I find it strange because it is supposedly a standard...

Also note that when linking using dlopen(...), we specify RTLD_NOW to force resolution of all symbols in the dynamic library. If we used RTLD_LAZY, we could exclude a lot of code from our library, and let it resolve symbols later (i.e. from the main executable). Unfortunately this works great on Linux but seemingly has no analogy in Windows, and in order to keep the projects symmetrical across platforms, we use RTLD_NOW to behave appropriately.

So lets look at whats going on here... its a bit over complicated, but I thought I would provide some extra good ideas for you guys. If using GCC, we use dlopen(...) to get the handle to the .so file, and if using VC++, we use LoadLibraryExA(...) to get the handle to the .dll file. If NDEBUG is defined, we link to the release version of the dynamic library (actually look in the current directory or system path as if it were a proper deployment), otherwise we link to the debug version of the library to simplify development.

Oh snap! So we now have linked to a dynamic library regardless of what platform we are running on! This is kickass, so where to now? We need to get pointers to the functions in the library we are going to use. The function that does this in Linux is dlsym(...), and in windows is GetProcAddress(...). Here is what this would look like in a platform independent form:

 
#if defined(__GNUC__)
	fpCreateRendererInterface IntCreate=(fpCreateRendererInterface)dlsym(m_hLibrary,"CreateInterface");
#elif defined(_MSC_VER)
	fpCreateRendererInterface IntCreate=(fpCreateRendererInterface)GetProcAddress((HMODULE)m_hLibrary,"CreateInterface");
#endif
 

Damn, its that easy? Indeed it is. We now have a function pointer to a procedure in a dynamically linked library regardless of if it came from a .so file in Linux or a .dll file in Windows. Obviously once armed with the function pointer, the remaining code is the same regardless of platform. We just use IntCreate(...) like it were a normal function. In this case we pass a pointer to a pointer to the pure virtual renderer base class, and inside the dynamic library, we assign the pointer to an instance of a derived class:

 
extern "C"
{
	int CreateInterface(CRenderer **pInterface)
	{
		if(*pInterface)
			return -1;
 
		*pInterface = new CRendererSGL();
 
		return 0;
	}
...
 

This is really quite powerful, and since the only actual call to an exported function is when creating the derived class instance, the performance hit while using the derived class is only from the vftable. More importantly this lets us do something ultra simple in our main code to get a reference to an abstract renderer interface that doesn't require knowledge of the nitty-gritty of either D3D or OpenGL:

 
	Log::Entry(0,__FILE__,__LINE__,"Creating rendering device...");
	Graphics::CreateInterface();
	CRenderer *renderer = Graphics::GetInterface();
	renderer->Initialize(0);
 
	if(g_bSafeDevice)
		renderer->CreateDeviceSafe();
	else
		renderer->CreateDevice(800, 600, 32, g_bWindowed);
 

Yeah, I know. That code is delicious. Its even more tasty when you have a CMesh that contains instances of pure virtual CResourceVtxBuff and CResourceIdxBuff created by the derived CRenderer class in the dynamic library, and CModel that contains a number of CMesh instances as well as a number of CMaterial instances that contain instances of pure virtual CResourceTexture also created by the derived CRenderer class in the dynamic library. So to create a model and render it, you would have to do something like the following:

 
	// example of loading a mesh
	mshOut = new CMesh();
	iStride=mshOut->GetVtxStride();
	mshOut->SetVtxCount(iVertexCount);
	mshOut->SetFaceCount(iFaceCount);
 
	m_pRenderer->CreateMeshBuffers(iStride*iVertexCount,sizeof(int)*3*iFaceCount,mshOut);
 
	// if pRenderer is D3D, this lock would be like IDirect3DVertexBuffer9->Lock(...),
	//while in OpenGL it would be like glMapBufferARB(...). Transparent at this point.
	VtxData=(unsigned char*)mshOut->GetVtxPtr()->LockWrite(0,0);
 
	// Write vertex buffer to VtxData here...
 
	mshOut->GetVtxPtr()->Unlock();
 
	IdxData=(unsigned char*)mshOut->GetIdxPtr()->LockWrite(0,0);
 
	// write index buffer to IdxData here..
 
	mshOut->GetIdxPtr()->Unlock();
 
	// example of loading a material
	mtlOut=new CMaterial();
	mtlOut->SetDiffuse(FloatToLongColor(v3fDiffuse.x, v3fDiffuse.y, v3fDiffuse.z));
	mtlOut->SetAmbient(FloatToLongColor(v3fAmbient.x, v3fAmbient.y, v3fAmbient.z));
	mtlOut->SetSpecular(FloatToLongColor(v3fSpecular.x, v3fSpecular.y, v3fSpecular.z));
	mtlOut->SetMaterialName(sMaterialName);
 
	mtlOut->SetTexture(m_pRenderer->CreateTexture(&imgTexture));
 
	// and you add them to a model which contains std::vectors of meshes and materials...
 
	iMaterialIdxList[j] = mdlOut->AddMaterial(mtlOut);
 
	//...
 
	mdlOut->AddMesh(mshOut,iMaterialIdxList[iMaterialRef]);
 
	// and eventually render!
 
	iMeshCount = mdlCurr->GetMeshCount();
 
	for(j=0;j<iMeshCount;j++)
	{
		pRenderer->SetWorld(&matWorld);
 
		k = mdlCurr->GetMeshMaterialIdx(j);
 
		pRenderer->SetTexture(mdlCurr->GetMaterial(k)->GetTexture());
		pRenderer->Render(mdlCurr->GetMesh(j));
	}
 

Anyway, I guess this isn't all that useful, and maybe I'm just showing off at this point. It does exhibit some good ideas, and show how glorious abstraction can be though. Regardless, I got distracted again. The NDS Wifi example continues. I will post in the next few days to describe the protocol used and give a little primer on sockets. Keep your eyes peeled.

Tags: , , ,