Omnipotent Octree Implementation
Posted in C++, Uncategorized on May 10th, 2012 by Chaos EngineerAlright. I know when to admit defeat, and this is no exception. I was trying to create an IDEAL octree implemetatiion that did not sacrifice memory usage for efficiency. I'm not afraid to admit that I failed. In failure, one can find inspiration for future endeavors, and here I present a post i had hoped would end in succcess. In realizing my failure I learneed much, so I post it here as a cautionary tail. is there an omnipotent octree implemenetation? Based on my studies it would seem not. What Follows is an incomplete post that I had hoped would result in an epiphany about octree structures, but ultimately failed:
Well, if you have read my last two posts regarding my quest for a lean, versatile and efficient octree implementation (with a focus on the node class design, while ignoring glaring problems with the downcasting)... you are probably having a chortle fest right now. Well, I'm not one to give up easily, and this is no exception. I very much like the structure of the implementation, but unfortunately the impact virtual inheritance would have on performance is a game breaker. Readable, extensible, maintainable code is important indeed, but when talking about an octree implementation... performance is key. The entire point of the octree is to increase performance, the organizational aspects of the tree are just a side-effect. So how can I create a polymorphic data type that won't slow down the implementation? Its all up to the compiler, so lets see what GCC can tell us.
To break down the details of the struct hierarchy, we can use the g++ switch -fdump-class-hierarchy, this will dump class hierarchy representations and vtables to disk. Neat, this should be interesting... to get a good baseline, lets look at a simple non-virtual ineheritance situation where I can still get the required data for a derived type. Please note, I will be using the words "struct" and "class" interchangeably. In C++, there is little difference between the two objects, save members in a struct are considered public by default. Personally, I use structs when the object has no members, and contains data only. Its a good differentiation. Anyway, for an example, I rewrote the structs for a root octree node including spatial data as follows :
struct OctreeNodeData_NonVirtual { unsigned short uiLevel; }; struct OctreeRootData_NonVirtual { COctreeNode **nodChildren; }; struct OctreeSpatialData_NonVirtual { CVector3 vCenter; float fHalfExtents; }; struct OctreeSpatialRootData_NonVirtual : OctreeNodeData_NonVirtual, OctreeRootData_NonVirtual, OctreeSpatialData_NonVirtual{};
Lets look at what g++ dumps regarding this hierarchy when using g++ -g -fdump-class-hierarchy :
Class OctreeNodeData_NonVirtual size=2 align=2 base size=2 base align=2 OctreeNodeData_NonVirtual (0x7f2f28058300) 0 Class OctreeRootData_NonVirtual size=8 align=8 base size=8 base align=8 OctreeRootData_NonVirtual (0x7f2f28058360) 0 Class OctreeSpatialData_NonVirtual size=16 align=4 base size=16 base align=4 OctreeSpatialData_NonVirtual (0x7f2f280583c0) 0 Class OctreeSpatialRootData_NonVirtual size=32 align=8 base size=32 base align=8 OctreeSpatialRootData_NonVirtual (0x7f2f28067d98) 0 OctreeNodeData_NonVirtual (0x7f2f28058420) 0 OctreeRootData_NonVirtual (0x7f2f28058480) 8 OctreeSpatialData_NonVirtual (0x7f2f280584e0) 16
Alright, so we can see even when using multiple inheritance the data is neatly and compactly stored in the derived class. The derived struct is merely composed of all the data from inherited classes placed sequentially in memory. No vtable is involved. Accessing the members of this derived struct would be fast, and casting would not be complicated. Aright, so lets look at what g++ dumps regarding the previous hierarchy design that uses a virtual base class:
Vtable for OctreeNodeData
OctreeNodeData::_ZTV14OctreeNodeData: 4u entries
0 (int (*)(...))0
8 (int (*)(...))(& _ZTI14OctreeNodeData)
16 OctreeNodeData::~OctreeNodeData
24 OctreeNodeData::~OctreeNodeData
Class OctreeNodeData
size=16 align=8
base size=10 base align=8
OctreeNodeData (0x7f2f28058540) 0
vptr=((& OctreeNodeData::_ZTV14OctreeNodeData) + 16u)
Vtable for OctreeRootData
OctreeRootData::_ZTV14OctreeRootData: 10u entries
0 16u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI14OctreeRootData)
24 OctreeRootData::~OctreeRootData
32 OctreeRootData::~OctreeRootData
40 -16u
48 (int (*)(...))-0x00000000000000010
56 (int (*)(...))(& _ZTI14OctreeRootData)
64 OctreeRootData::_ZTv0_n24_N14OctreeRootDataD1Ev
72 OctreeRootData::_ZTv0_n24_N14OctreeRootDataD0Ev
VTT for OctreeRootData
OctreeRootData::_ZTT14OctreeRootData: 2u entries
0 ((& OctreeRootData::_ZTV14OctreeRootData) + 24u)
8 ((& OctreeRootData::_ZTV14OctreeRootData) + 64u)
Class OctreeRootData
size=32 align=8
base size=16 base align=8
OctreeRootData (0x7f2f27e7f680) 0
vptridx=0u vptr=((& OctreeRootData::_ZTV14OctreeRootData) + 24u)
OctreeNodeData (0x7f2f280585a0) 16 virtual
vptridx=8u vbaseoffset=-0x00000000000000018 vptr=((& OctreeRootData::_ZTV14OctreeRootData) + 64u)
Vtable for OctreeSpatialData
OctreeSpatialData::_ZTV17OctreeSpatialData: 10u entries
0 24u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI17OctreeSpatialData)
24 OctreeSpatialData::~OctreeSpatialData
32 OctreeSpatialData::~OctreeSpatialData
40 -24u
48 (int (*)(...))-0x00000000000000018
56 (int (*)(...))(& _ZTI17OctreeSpatialData)
64 OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD1Ev
72 OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD0Ev
VTT for OctreeSpatialData
OctreeSpatialData::_ZTT17OctreeSpatialData: 2u entries
0 ((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 24u)
8 ((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 64u)
Class OctreeSpatialData
size=40 align=8
base size=24 base align=8
OctreeSpatialData (0x7f2f27e7f958) 0
vptridx=0u vptr=((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 24u)
OctreeNodeData (0x7f2f280586c0) 24 virtual
vptridx=8u vbaseoffset=-0x00000000000000018 vptr=((& OctreeSpatialData::_ZTV17OctreeSpatialData) + 64u)
Vtable for OctreeSpatialRootData
OctreeSpatialRootData::_ZTV21OctreeSpatialRootData: 15u entries
0 40u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI21OctreeSpatialRootData)
24 OctreeSpatialRootData::~OctreeSpatialRootData
32 OctreeSpatialRootData::~OctreeSpatialRootData
40 24u
48 (int (*)(...))-0x00000000000000010
56 (int (*)(...))(& _ZTI21OctreeSpatialRootData)
64 OctreeSpatialRootData::_ZThn16_N21OctreeSpatialRootDataD1Ev
72 OctreeSpatialRootData::_ZThn16_N21OctreeSpatialRootDataD0Ev
80 -40u
88 (int (*)(...))-0x00000000000000028
96 (int (*)(...))(& _ZTI21OctreeSpatialRootData)
104 OctreeSpatialRootData::_ZTv0_n24_N21OctreeSpatialRootDataD1Ev
112 OctreeSpatialRootData::_ZTv0_n24_N21OctreeSpatialRootDataD0Ev
Construction vtable for OctreeRootData (0x7f2f27e7fa28 instance) in OctreeSpatialRootData
OctreeSpatialRootData::_ZTC21OctreeSpatialRootData0_14OctreeRootData: 10u entries
0 40u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI14OctreeRootData)
24 OctreeRootData::~OctreeRootData
32 OctreeRootData::~OctreeRootData
40 -40u
48 (int (*)(...))-0x00000000000000028
56 (int (*)(...))(& _ZTI14OctreeRootData)
64 OctreeRootData::_ZTv0_n24_N14OctreeRootDataD1Ev
72 OctreeRootData::_ZTv0_n24_N14OctreeRootDataD0Ev
Construction vtable for OctreeSpatialData (0x7f2f27e7fa90 instance) in OctreeSpatialRootData
OctreeSpatialRootData::_ZTC21OctreeSpatialRootData16_17OctreeSpatialData: 10u entries
0 24u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI17OctreeSpatialData)
24 OctreeSpatialData::~OctreeSpatialData
32 OctreeSpatialData::~OctreeSpatialData
40 -24u
48 (int (*)(...))-0x00000000000000018
56 (int (*)(...))(& _ZTI17OctreeSpatialData)
64 OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD1Ev
72 OctreeSpatialData::_ZTv0_n24_N17OctreeSpatialDataD0Ev
VTT for OctreeSpatialRootData
OctreeSpatialRootData::_ZTT21OctreeSpatialRootData: 7u entries
0 ((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 24u)
8 ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData0_14OctreeRootData) + 24u)
16 ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData0_14OctreeRootData) + 64u)
24 ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData16_17OctreeSpatialData) + 24u)
32 ((& OctreeSpatialRootData::_ZTC21OctreeSpatialRootData16_17OctreeSpatialData) + 64u)
40 ((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 104u)
48 ((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 64u)
Class OctreeSpatialRootData
size=56 align=8
base size=40 base align=8
OctreeSpatialRootData (0x7f2f27e8c380) 0
vptridx=0u vptr=((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 24u)
OctreeRootData (0x7f2f27e7fa28) 0
primary-for OctreeSpatialRootData (0x7f2f27e8c380)
subvttidx=8u
OctreeNodeData (0x7f2f28058720) 40 virtual
vptridx=40u vbaseoffset=-0x00000000000000018 vptr=((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 104u)
OctreeSpatialData (0x7f2f27e7fa90) 16
subvttidx=24u vptridx=48u vptr=((& OctreeSpatialRootData::_ZTV21OctreeSpatialRootData) + 64u)
OctreeNodeData (0x7f2f28058720) alternative-path
O_O
...
Well... .... one thing is obvious. Virtual inheritance VASTLY complicates the interpretation of an otherwise simple data layout. Also pretty obvious is how badly it bloats data types by including all the vtable information in them -- the OctreeSpatialRootData type grew to 56B compared to 32B in the OctreeSpatialRootData_NonVirtual type. Seeing the size of a data type nearly double just to store information to support the virtual inheritance makes it pretty apparent that not carefully planning a class structure and falling back on complex language features isn't often a good idea. So how can I create a polymorphic data type can can be interpreted as any of it's constituent parts? Is it as simple as creating the derived classes with the longest and most granular inheritance lists? Oh snap, lets give it a whirl! So for example, if we define the node data types as follows :
struct OctreeNodeData { unsigned short uiLevel; }; struct OctreeParentData { COctreeNode **nodChildren; }; struct OctreeChildData { COctreeNode *nodParent; }; struct OctreeRootData : OctreeNodeData, OctreeParentData{}; struct OctreeLeafData : OctreeNodeData, OctreeChildData{}; struct OctreeBranchData : OctreeNodeData, OctreeParentData, OctreeChildData{}; struct OctreeSpatialData { CVector3<float> vCenter; float fHalfExtents; }; struct OctreeSpatialRootData : OctreeRootData, OctreeSpatialData{}; struct OctreeSpatialBranchData : OctreeBranchData, OctreeSpatialData{}; struct OctreeSpatialLeafData : OctreeLeafData, OctreeSpatialData{};
This enables us to maintain the use of OctreeNodeData as a base class, and lets us execute code like :
OctreeSpatialRootData *pData = new OctreeSpatialRootData(); pData->vCenter = CVector3<float>(0.4f, 0.8f, 1.3f); //This is kosher OctreeNodeData *pBase = (OctreeNodeData*)pData; pBase->uiLevel = 2; //This is sketchy OctreeSpatialData *pDataCast = (OctreeSpatialData*)pBase;
WOW! AWESOME, RIGHT??!?!??!!!one?!? NO, DAMN WRONG! Lol, I've been ignoring a glaring problem with this entire implementation, specifically the downcast process. What do you think the value of pDataCast->vCenter would be? I'll tell you: the first 12 bytes at offset 0 in the original OctreeSpatialRootData struct. So you'd be reading uiLevel, nodChildren etc, and interpreting it as the 3 float vector you were looking for. Ugly. There is absolutely Unfortunately C++ gives us the freedom to do this when using C style casts. If we were smart and used static_cast
Are we smart though? Nah, in fact we are so stupid we are going to further complicate things. What would happen if we stuck with the normal C style cast, and provided an appropriate offset to the cast? Indeed, if you did something like (OctreeSpatialData*)((unsigned char*)pBase + 16);, you would get a pointer of type OctreeSpatialData* that actually pointed to the correct location within the original OctreeSpatialRootData structure, and you could do as you please with the pointers. So lets think about how we can somehow generate the required offset into a composite data type that is defined by its m_uiType bitmask given a particular subtype in the composite subtype.
Obviously, we can immediately tell if the requested subtype exists in the composite type found in the node instance by using bitwise operations between the node's m_uiType bitmask and the requested subtype's corresponding type bit. It also should be obvious though that the accessor for this subtype shouldn't even be called if the composite type doesn't contain the required type. We can save some branching by ignoring the possibility.
An immediate observation is also that to offset the cast based on the byte size of each constituent subtype struct in the composite data, the pointer has to point to data that is 1B in size. Due to the way pointer arithmetic works, adding 1 to a pointer shifts it in memory based on the size of the type, so adding one to a float* pointer would make the pointer point 32 bits (4B) ahead because this is the size of one float element (pointer arithmetic was designed to work on arrays of data). Sooo... in order to make our offsetting simple and eliminating the need for a cast, it is quite possible to redefine the base OctreeNodeData type to span only 1B. The only data this struct contains is the level of the node, and I should think that a 256 level would suffice for ANY purpose. Just for fun, if you recusively split a 256 level octree, you would end up with 1.55×10²³¹ nodes. If each node took 1ms to create, it would take 3.57×10²¹¹ times longer than the age of the universe (a scant 4.336×10¹? seconds) to create the entire tree. Sound like a big enough tree? Lets continue.
So we define the base data type as:
struct OctreeNodeData { unsigned char uiLevel; };
We can continue to use uiLevel in bitwise operations described by Frisken and Perry. It is always used as the right operand in bit shifts, it never gets shifted itself. So with that done all we need to do now is generate the offset... I actually had to stop and think about this a while... the offset needs to be generated very fast but depends on the subtypes included in a specific composite type. Iterating though the bits set in the node's m_uiType bitmask would allow us to sum the appropriate offset together pretty quick given the size of the component associated with each bit... but iteration is expensive. What if we multiplied the size of the subtype by a 0 or 1 based on if it was contained in the composite type? This would look something like:
So at the cost of some subtle issues, we haven't sacrificed speed or memory usage. In fact by moving away from virtual inheritance we have saved a huge amount of memory bloat, and that's just icing on the cake of vastly improved efficiency for casts and member access. As long as new derived data types are carefully laid out, there are only minor issues expanding or conjoining them with existing data. It is important to note if you for example tried to join the existing spatial node data with the existing location code note data, maybe like this:
You can easily create a data structure This is what the new struct hierarchy looks like to GCC :
Class COctreeNode
size=16 align=8
base size=10 base align=8
COctreeNode (0x7f66d3ba1240) 0
Class COctree
size=8 align=8
base size=8 base align=8
COctree (0x7f66d3ba12a0) 0
Class OctreeNodeData
size=2 align=2
base size=2 base align=2
OctreeNodeData (0x7f66d3ba1300) 0
Class OctreeParentData
size=8 align=8
base size=8 base align=8
OctreeParentData (0x7f66d3ba1360) 0
Class OctreeChildData
size=8 align=8
base size=8 base align=8
OctreeChildData (0x7f66d3ba13c0) 0
Class OctreeRootData
size=16 align=8
base size=16 base align=8
OctreeRootData (0x7f66d39ce690) 0
OctreeNodeData (0x7f66d3ba1420) 0
OctreeParentData (0x7f66d3ba1480) 8
Class OctreeLeafData
size=16 align=8
base size=16 base align=8
OctreeLeafData (0x7f66d39ce700) 0
OctreeNodeData (0x7f66d3ba14e0) 0
OctreeChildData (0x7f66d3ba1540) 8
Class OctreeBranchData
size=24 align=8
base size=24 base align=8
OctreeBranchData (0x7f66d39d81e0) 0
OctreeNodeData (0x7f66d3ba15a0) 0
OctreeParentData (0x7f66d3ba1600) 8
OctreeChildData (0x7f66d3ba1660) 16
Class OctreeSpatialData
size=16 align=4
base size=16 base align=4
OctreeSpatialData (0x7f66d3ba16c0) 0
Class OctreeSpatialRootData
size=32 align=8
base size=32 base align=8
OctreeSpatialRootData (0x7f66d39ce770) 0
OctreeRootData (0x7f66d39ce7e0) 0
OctreeNodeData (0x7f66d3ba1720) 0
OctreeParentData (0x7f66d3ba1780) 8
OctreeSpatialData (0x7f66d3ba17e0) 16
Class OctreeSpatialBranchData
size=40 align=8
base size=40 base align=8
OctreeSpatialBranchData (0x7f66d39ce850) 0
OctreeBranchData (0x7f66d39d83c0) 0
OctreeNodeData (0x7f66d3ba1840) 0
OctreeParentData (0x7f66d3ba18a0) 8
OctreeChildData (0x7f66d3ba1900) 16
OctreeSpatialData (0x7f66d3ba1960) 24
Class OctreeSpatialLeafData
size=32 align=8
base size=32 base align=8
OctreeSpatialLeafData (0x7f66d39ce8c0) 0
OctreeLeafData (0x7f66d39ce930) 0
OctreeNodeData (0x7f66d3ba19c0) 0
OctreeChildData (0x7f66d3ba1a20) 8
OctreeSpatialData (0x7f66d3ba1a80) 16
So in conclusion, there you have it. An all-powerful octree implementation. Not a bit of unnecessary storage as long as the nodes are constructed appropriately, not a bit of constraints that would stop us from creating further derived classes
