3DWorld: Instancing 3D Models in Tiled Terrain

My grand goal for 3DWorld is to allow an entire city of many square miles to be explored seamlessly. No loading screens, no delays, a huge view distance, shadows, indirect lighting, collision detection, etc. I started with the terrain in tiled terrain mode. Then I added water, trees, grass, flowers, plants, and rocks. That's good for natural scenery, but what about buildings? I haven't implemented procedural generation of building yet. In the meantime, I'll stick with importing 3D models in OBJ and 3DS formats.

Over the past few weeks, I've implemented a model instancing system that has allowed me to place a 2D array of - not just a few - but 10 thousand building models into a scene. I'm using the museum model from this post, which contains about 1.5M triangles across all materials. While only 100 or so museums are visible at any viewing location (limited by fog/distance), this is still a huge amount of data. 3DWorld's instancing system also includes sun/moon shadows, indirect sky lighting, and collision detection for the player. I'll discuss these features in more detail below.

These museum models are all placed on the mesh at the proper height/Z value using the terrain height values, in this case from a heightmap source texture. The mesh can be flattened under the buildings to make them level and remove any gaps. Buildings that are far away are obscured by fog and are not rendered.

Rendering / Level of Detail

Each museum model has nearly 1.5M triangles, so rendering all 10K of them would require 15 billion triangles. Clearly, that's no good for realtime. I needed to cut that down by three orders of magnitude (1000x) to something closer to 15M triangles. The obvious first step is to skip drawing of models that are far away. I already do distance culling for terrain, trees, etc. - that's what the distant fog is for. Also, View Frustum Culling (VFC) can be used to skip drawing models that aren't in the camera's view, for example buildings behind the camera. If I use the same visibility distance for buildings, and add VFC, this brings the number of models to draw down to only a hundred or so. Here is a screenshot of them, taken from a view direction near where the corner of the array starts. I believe there are about 120 buildings visible.

Large 2D arrays of museum models in various orientations, with shadows and indirect lighting. Around 120 model instances are visible in this view.

Okay, that's 120 * 1.5M = 180M triangles. If I use 3DWorld to brute force draw these, it runs at around 12 Frames Per Second (FPS). Interactive, but not realtime. Now, the buildings have a lot of small objects inside them, and these objects can hardly be seen when the player camera is outside the building such as in the screenshot above. Can you actually see any dinosaur bones in the nearby buildings? Disabling these small objects when the player is a few hundred meters away from a building helps somewhat, and the frame rate increases to 19 FPS. This is definitely helpful, but doesn't quite reach my goal.

Why doesn't this help more? Well, the problem is all that brick texture you see on the buildings. Over half the total triangles are brick material, and it's all one large object with randomly distributed triangle sizes. Most of what you see are the large outer wall polygons. What you don't see from outside are all the tiny triangles from the interior columns, stairs, railings, walkways, etc. Here is an interior screenshot showing indirect lighting and shadows (more on those topics later). The model looks much more complex on the inside than on the outside. Look at all those bricks!

Interior of museum showing indirect lighting. Adjacent museum models are visible though the windows.

I decided to bin the triangles by area in a power-of-two histogram, using up to 10 bins per material. Each bin contains triangles that are on average twice the surface area of the triangles in the previous bin. If only the first bin is drawn, this represents the largest 2% to 5% of triangles, which together account for 50% or so of the total surface area of that material. The min/max/average area values are stored for each material, along with the offsets for where each bin starts. The maximum visible bin can be determined based on projected pixel area, which varies as the square of the distance from the camera to the closest point on the model's bounding cube. The further the object is from the camera, the fewer bins need to be drawn.

If the player is inside a model, the distance is zero and all triangle bins are drawn. If the player is far from the model, only the first few bins are drawn, drastically reducing the number of triangles sent to the GPU. The largest count bins happen to be the ones with the smallest triangles, so even dropping the last bin or two can reduce triangle count by a factor of 2. This yields an overall 3-4x speedup, increasing frame rate from 19 FPS to 63 FPS for the view in the first image above. I think I was lucky with this model, because the outer walls don't have any small triangles in them that would produce holes when they're removed. The image with LOD turned on looks almost exactly like the image with LOD turned off. So much similar that I'm not even going to show both screenshots.

The final rendering performance improvement is dynamic occlusion culling. I manually added occlusion cubes to the museum model that include the large rectangular walls. Yes, these walls have some windows in them, so the occlusion isn't entirely correct. I was able to exclude the large windows in the roof though. This makes such a big difference in performance that I enabled occlusion culling anyway, even if it's not entirely correct. Each frame, 3DWorld collects a list of the 5 closest visible models to use as occluders. All of the models are checked to see if they're completely enclosed in the projected volume of any of the occluders from the point of view of the camera. If so, drawing of that model is skipped. This optimization has the most effect when the player is at ground level in the middle of the buildings, where the row of nearby museums forms a wall that obscures the other rows of museums behind it. In this case, only a handful of museums are drawn, and frame rate is increased from 60 FPS to 150-250 FPS.

Here is a video showing showing the array of museum models from various view points, both from the air and from the ground. There are almost no visible LOD artifacts while in the air. There are some artifacts due to occlusion culling when entering and exiting buildings, where the player crosses through a building's occlusion cube. Occluders are disabled when the player is inside them. I'll see if I can fix that somehow later.

This system works pretty well. I'm getting a good trade-off of performance and visual quality. But, I'm still lacking variety. I can't have a scene with the same one building placed over and over again. It's difficult to find high quality 3D building/architecture models for use in 3DWorld, and I don't have the time, tools, or experience to create them myself. Many of the free model files I can find online are poor quality, have missing meshes or textures, only represent one part/side of a building, are in a format that 3DWorld can't import (not OBJ or 3DS format), or have import errors/invalid data. I'll have to invest more time in searching for suitable model files in the future if I want to create a realistic city. However, the low-level rendering technology may be close to completion. That is, assuming there's enough memory for storing shadow maps and indirect lighting for each model. On to those topics next.

Shadows

I enabled standard shadow maps with a 9x9 tap percentage closer filter for 3D models in tiled terrain mode. Shadow map size is defined in the config file and currently set to 1024x1024 pixels. Models cast shadows on the terrain, trees, plants, grass, and water. They're rendered into the individual shadow maps of each terrain tile and cached for reuse across multiple frames. This is no different from how mesh tile and tree shadows work.

Models also cast shadows on themselves. Shadows from directional light sources only depend on light direction, so the shadow maps can be reused in all translated instances of a model that have the same orientation. My arrays of museum models use three different orientations (0, 90, and 180 degree rotations), so three shadow maps are needed.

These shadow maps only need to be regenerated when the light source (sun or moon) moves. They're shared, so updating them only requires rendering one museum model in a depth only pass for each orientation, which is quite cheap. This means that the light sources can move in realtime with only a small reduction in frame rate - for self shadows, anyway. Updating all of the tile shadows can be more expensive, especially for low light positions during sunrise and sunset. This is because the shadow of a single model can stretch far across the landscape, which requires drawing many models into the shadow map of each tile. Note that models out of the player's field of view can still cast shadows that fall within the field of view. For this reason, nearby models have to be loaded and ready to draw, even if they're behind the player.

However, a model can't currently cast a shadow on another nearby model. This breaks the translational invariance, and seems to require many more than three shadow maps. If the models were all on the same plane I could reuse the same shadow map for all interior instances, which are known to have neighbors in all directions. Unfortunately, the models are all placed at different Z height values (based on terrain height), so this doesn't work. It can't generally be relied on. I'll try to find a workaround for this problem later. As long as the buildings aren't too close together, and the sun isn't too low in the sky, this shouldn't be much of a problem.

Indirect Lighting

I managed to apply my previous work on indirect lighting to tiled terrain models. In particular, I wanted to apply indirect sky lighting to instanced buildings. Indirect lighting is precomputed by casting rays from uniformly distributed points in the upper (+Z) hemisphere in random directions into the scene. A billion points are ray traced along their multiple-bounce reflection paths on all CPU cores using multiple threads. All ray paths are rasterized into a 3D grid that's then sampled on the GPU during lighting of each pixel fragment. The sampled texel contains the intensity and color of the indirect lighting. The resulting lighting information is also saved to disk and reused when the same building model is loaded later.

The nice property of sky light is that it comes from all directions, which means the lighting solution for an isolated model is independent of it's position or orientation within the scene. All I needed to do was generate the indirect lighting solution for an isolated museum, and the same solution could be used for all instances. This assumes nearby buildings have little impact on the indirect illumination. It all depends on how close the buildings are to each other. I'm not sure how much influence the other buildings would have on the lighting because I have no easy way to show it. The scene doesn't look obviously wrong, so it must be acceptable to drop this term. Buildings are pretty bright when viewed from the outside, even when in shadow and close to other buildings. The interior lighting mostly comes from the skylights in the roof, which aren't affected by adjacent models.

One additional benefit of my lighting system is that it stores normalized reflection values, rather than absolute color and intensity. Meaning that the cached lighting values are multiplied by the sky color and intensity during rendering, which allows these values to be changed dynamically. The lighting solution can be reused for all times of day, even at night! Just swap the daytime blue sky color with a much lower intensity night time color. This also works for changes in weather such as cloud cover, where bright blue is replaced with a dim white on cloudy days.

Collision Detection

3DWorld supports a simple, limited collision detection for tiled terrain mode that has been extended to models. It's a limited version of the ray/sphere collision detection system used in ground and gameplay modes. Here it's only used for player collisions, since I haven't implemented any gameplay yet.

Each unique model stores its own Bounding Volume Hierarchy (BVH), which is used across all model instances. This serves as an acceleration structure for fast ray queries. When the player is within the bounding cube of a model, two lines/rays are fired from the player's center point.

One ray points down, in the -Z direction. This is the gravity ray. Gravity is enabled in "walking" mode but not in "flight" mode. The first polygon that this ray hits is the polygon the player is walking on, and is used to set the height (Z) value of the player. This test is what allows for walking up stairs and falling over ledges. I haven't implemented real falling, so walking over a ledge will just teleport the player to the bottom. There are some holes in the stairs of the museum model which can cause the player to fall though the floor. Oops! I'm not sure what I can do to fix this, other than inserting some other tiny model to fill in the gaps like I did in another scene. It's not like I can easily find and fix these polygons in a 56MB file filled with numbers.

The second line extends from the player position in the previous frame to the position in the current frame. This represents the distance the player has walked over the past frame, and is typically very short. If the movement is legal, the line won't intersect any polygons. But if the line does intersect, this means the player has run into a wall. The normal to the polygon is used to produce an updated player position that allows for sliding against a wall but not traveling through it. I haven't implemented anything more complex such as bumping your head on a low ceiling.

This simple collision system is enough to allow for exploring the buildings and terrain by walking. I'll have to find a way to extend this system to volume (cube/sphere) intersections if I want gameplay to work in the future.

3 comments:

Paul SpoonerOctober 16, 2020 at 11:42 AM
I really like how your collision model allows for arbitrarily large velocities, and should prevent multiple geometries from getting stuck in each other.

What height is the walking collision vector set at? It occurs to me that you could pass through small holes if they were the right height off the ground.
Frank GennariOctober 16, 2020 at 12:59 PM
The ray collision test uses the camera/eye height. You can in theory pass through small holes if you move fast enough such that your bounding sphere doesn't intersect with any geometry in the previous or current frames. I haven't seen this happen for horizontal movement.

There is an issue where you can fall through a small gap in the upper museum floor by standing in the correct location. Here the previous position is on top of the mesh (so not colliding). There's no gravity physics in tiled terrain mode, so falling happens instantly. This means the next frame the player is on the ground under the floor, and also not colliding. I'm not really sure how to fix this without rewriting the entire system. As a hack/workaround, I edited the model to add a small cube to fill the crack and prevent this problem.

Sunday, May 14, 2017

Instancing 3D Models in Tiled Terrain

3 comments: