Procedural Voxel Destruction

C++
DirectX 11

About

Voxel art (or voxel-based meshes) is a style of 3D modeling that uses voxels (uniformly-sized cubes) to compose a 3D model. You've seen multiple examples of this already in this portfolio, but here are some more examples in case you're interested:

Video games will often opt for pre-calculated fracture meshes for objects that need to be able to be destroyed in-game, since generating them in real-time is expensive. But voxel meshes have some unique characteristics that make real-time procedural destruction viable, such as:

The fractured object shown on the left is the result of determining two random fracture points. As you can see, internal information is revealed by the fracture meshes (notice how the inside pages of the book are visible from the side). This is possible because the full voxel grid is stored - a feature that wouldn’t be possible with normal meshes without manually creating a high-resolution 3D texture of some kind.

The destruction algorithm has three major steps: Partition, Merge, and Mesh. The Partition step separates all voxels into groups to determine the content of the new fracture meshes. The Merge step occurs for each fracture group to combine nearby voxels that share the same color. The purpose of this step is to simplify the fracture meshes for optimal rendering. The Mesh step generates the fracture meshes from the merged bounds calculated previously.

Development

Octrees

My initial idea was to use an octree for merging and storing voxel information efficiently, since voxels are well-suited for them, being uniformly-sized cubes. While subdividing, the octree would check if all contained voxels are the same color. A subdivision where all voxels have the same color would stop subdividing and become a leaf node. This is simple to do because each voxel is simply three unsigned integers (x, y, z), identifying its place on the grid and allowing for simple for-loop iteration.

However, it turned out that this approach was both inefficient for storage and impossible to completely parallelize. Because the subdivisions of an octree are cubes, adjacent voxels that are only similar along one or two dimensions are not merged at all. Also, because each level of subdivided node requires its parent to be processed first, voxel merging cannot be parallelized and therefore cannot enjoy the benefits of SIMD processing.

In hindsight, I should have been able to predict these drawbacks and shouldn't have gone deep into the implementation details before confirming that it was the best option. I won't go into the weeds with this one, but here's the code if you're curious:

Full-Compute - X/Y/Z Merge

In order for this to be viable in real-time, we need a full-compute algorithm that utilizes SIMD parallelization at every step possible. In other words, we should keep the resources used by the algorithm as shader resources for the entire process, until the new mesh's vertex/index buffers are populated in CSVoxelMesh.

Instead of using octrees to merge similar voxels, a number of parallel loops are performed to merge voxels along the X-axis, then merge the result along the Y-axis, and then merge that result along the Z-axis. For example, if the voxel model is 12x8x4, an X-merge pass will consist of 8*4=32 compute threads, each of which iterate through 12 voxels and merge similar voxels as they go. This will be followed by a Y-merge pass on the result and then a Z-merge pass on the Y-merge's result. This approach succeeds where octrees fail, since one or two-dimensional strips of similar voxels are always merged, no matter which axis they align along.

Here's a high-level outline of the full-compute algorithm described:

These steps are initiated in the function below. The Partition and Merge steps are both done in PartitionGrid() (the fractureCount parameter is hard-set to 2 in this case), and the Mesh step is done upon construction of the VoxelRenderer component after the fracture is complete. The CSVoxelCollider step, which is not completely relevant here, is done upon construction of the VoxelCollider component.

void VoxelDestructable::Fracture()
{
std::unordered_map<IntVector, VoxelGridComponent*> fractureGroups = PartitionGrid(2);

Transform* transform = entity->GetComponent<Transform>();
for(auto& fractureGroup : fractureGroups)
{
Vector groupDir = voxelGrid->GridCoordsToNormalizedLocalSpace(fractureGroup.first);

Entity* fractureEntity = new Entity();
fractureEntity->AddComponent(new Transform(transform->GetLocation() + groupDir, transform->GetRotation(), transform->GetScale()));
fractureEntity->AddComponent(fractureGroup.second);
fractureEntity->AddComponent(new VoxelRenderer(fractureGroup.second, ResourceManager::GetMaterial(L"Palette")));
fractureEntity->AddComponent(new VoxelCollider(fractureGroup.second));
fractureEntity->AddComponent(new Rigidbody(Vector(0, -9.81f, 0)));
fractureEntity->AddComponent(new VoxelDestructable(voxelGrid));
EntityManager::AddEntity(fractureEntity);

fractureEntity->GetComponent<Rigidbody>()->SetVelocity(groupDir);
}

EntityManager::DestroyEntity(entity);
}

Here's a more detailed look at the PartitionGrid() function:

std::unordered_map<IntVector, VoxelGridComponent*> VoxelDestructable::PartitionGrid(size_t fractureCount)
{
VoxelGridComponent* voxelGridComponent = entity->GetComponent<VoxelGridComponent>();

/* Generate random grid points and initialize point groups */

std::vector<IntVector> points;
for(size_t i = 0; i < fractureCount; i++)
points.push_back(IntVector::Rand(IntVector::Zero(), voxelGridComponent->GetDimensions()));

std::unordered_map<IntVector, VoxelGridComponent*> pointGroups;
for(IntVector point : points)
pointGroups.emplace(point,
new VoxelGridComponent(voxelGridComponent->width, voxelGridComponent->height, voxelGridComponent->depth));

// Partition this grid into point groups
CS_VoxelPartition(voxelGridComponent, pointGroups);

// Merge the voxels of each of the partitioned grids
for(auto& pointGroup : pointGroups)
{
VoxelGridComponent* voxelGridComponent = pointGroup.second;

VoxelGridComponent::CS_VoxelMerge(
voxelGridComponent->width, voxelGridComponent->height, voxelGridComponent->depth,
voxelGridComponent->boundsTextureUAV,
voxelGridComponent->colorBoundsBufferUAV,
voxelGridComponent->nullBoundsBufferUAV,
voxelGridComponent->colorBoundsCounterBuffer, voxelGridComponent->colorBoundsCount,
voxelGridComponent->nullBoundsCounterBuffer);
}

return pointGroups;
}

CS_VoxelPartition() is one of the functions that run a compute shader using DX11 API calls. All of these functions are mostly the same, so here's an example from Partition:

void VoxelDestructable::CS_VoxelPartition(VoxelGridComponent* voxelGridComponent, std::unordered_map<IntVector, VoxelGridComponent*> pointGroups)
{
Microsoft::WRL::ComPtr<ID3D11ComputeShader> computeShader = ResourceManager::GetComputeShader(L"CS_VoxelPartition");
Microsoft::WRL::ComPtr<ID3D11Buffer> constantBuffer = ResourceManager::GetConstantBuffer(L"CS_VoxelPartition");

/* UAVs for the output textures are managed by their respective VoxelGridComponents */

std::vector<IntVector> inputPoints;
std::vector<ID3D11UnorderedAccessView*> outputUAVs;
for(auto& pointGroup : pointGroups)
{
inputPoints.push_back(pointGroup.first);
outputUAVs.push_back(pointGroup.second->GetBoundsTextureUAV().Get());
}
// Pad the rest of the vector with null UAVs
while(outputUAVs.size() < MAX_PARTITIONS)
outputUAVs.push_back(nullptr);

Graphics::Context->CSSetShader(computeShader.Get(), nullptr, 0);

Graphics::Context->CSSetShaderResources(0, 1, voxelGridComponent->boundsTextureSRV.GetAddressOf());

Graphics::Context->CSSetUnorderedAccessViews(0, MAX_PARTITIONS, outputUAVs.data(), nullptr);

Graphics::Context->CSSetConstantBuffers(0, 1, constantBuffer.GetAddressOf());

/* Dispatch */
{
VoxelPartitionConstants constants = {};
for(size_t i = 0; i < inputPoints.size(); i++)
constants.points[i] = DirectX::XMUINT4(inputPoints[i].x, inputPoints[i].y, inputPoints[i].z, inputPoints[i].w);
constants.pointCount = inputPoints.size();

D3D11_MAPPED_SUBRESOURCE cbMapped = {};
Graphics::Context->Map(constantBuffer.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0, &cbMapped);
memcpy(cbMapped.pData, &constants, sizeof(VoxelPartitionConstants));
Graphics::Context->Unmap(constantBuffer.Get(), 0);

Graphics::Context->Dispatch(voxelGridComponent->width, voxelGridComponent->height, voxelGridComponent->depth);
}

// Unbind
ID3D11ShaderResourceView* nullSRVs[16] = {};
Graphics::Context->CSSetShaderResources(0, 16, nullSRVs);
ID3D11UnorderedAccessView* nullUAVs[MAX_PARTITIONS] = {};
Graphics::Context->CSSetUnorderedAccessViews(0, MAX_PARTITIONS, nullUAVs, nullptr);
}

Future Work

Although I've achieved what I originally aimed for and plan to migrate this to Vulkan for my game engine (see this page for more info), there is more that can be done in terms of optimization. Specifically, collision interactions between the fractured meshes are still quite inaccurate and expensive. Even after adding a CSVoxelCollider stage, the approximated collision bounds look pretty awkward, and when there are many meshes close to each other (which is often the case immediately after fracturing), there's a noticable performance hit. I've read that this is a common problem in engines and is usually solved by adding a short period immediately after destruction where collision detection is disabled.

Demos

The model is broken into two meshes using random fracture points. Notice how the fracture occurs instantly, without any lag spikes.

Another fracture pattern. Notice how the surfaces of the new meshes account for all of the newly-visible surface voxels.

The fracture meshes are treated the same as the original, so they can be fractured as well. However, having too many fracture meshes can cause lag due to a suboptimal physics system and the frequency of collisions between the pieces.

Code

Compute Shaders

All of the main logic discussed here is found in the compute shaders; each step uses one compute shader and passes the resulting information to the next step via their shared shader resources on the GPU.

Copyright © Kevin Mapstone 2025