EDIT: BLAS build is now per-frame async
This commit is contained in:
113
docs/IBL.md
113
docs/IBL.md
@@ -5,8 +5,9 @@ Overview
|
|||||||
- Shaders share a common include, `shaders/ibl_common.glsl`, which defines the IBL bindings for descriptor set 3 and helper functions used by deferred, forward, and background passes.
|
- Shaders share a common include, `shaders/ibl_common.glsl`, which defines the IBL bindings for descriptor set 3 and helper functions used by deferred, forward, and background passes.
|
||||||
- The engine currently supports:
|
- The engine currently supports:
|
||||||
- Specular environment from an equirectangular 2D texture with prefiltered mips (`sampler2D iblSpec2D`).
|
- Specular environment from an equirectangular 2D texture with prefiltered mips (`sampler2D iblSpec2D`).
|
||||||
- Diffuse irradiance from 2nd‑order SH (9 coefficients baked on the CPU).
|
- Diffuse irradiance from 2nd-order SH (9 coefficients baked on the CPU).
|
||||||
- A 2D BRDF integration LUT used for the split‑sum approximation.
|
- A 2D BRDF integration LUT used for the split-sum approximation.
|
||||||
|
- An optional separate background environment texture (`sampler2D iblBackground2D`); when not provided, the system falls back to using the specular environment for background rendering.
|
||||||
|
|
||||||
Data Flow
|
Data Flow
|
||||||
- Init:
|
- Init:
|
||||||
@@ -18,49 +19,115 @@ Data Flow
|
|||||||
- Tries `ktxutil::load_ktx2_cubemap` first. If successful, uploads via `ResourceManager::create_image_compressed_layers` with `VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT`.
|
- Tries `ktxutil::load_ktx2_cubemap` first. If successful, uploads via `ResourceManager::create_image_compressed_layers` with `VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT`.
|
||||||
- If cubemap loading fails, falls back to 2D `.ktx2` via `ktxutil::load_ktx2_2d` and `create_image_compressed`. The image is treated as equirectangular with prefiltered mips.
|
- If cubemap loading fails, falls back to 2D `.ktx2` via `ktxutil::load_ktx2_2d` and `create_image_compressed`. The image is treated as equirectangular with prefiltered mips.
|
||||||
- When the specular `.ktx2` is HDR (`R16G16B16A16_SFLOAT` or `R32G32B32A32_SFLOAT`) and 2:1 aspect, `IBLManager` computes 9 SH coefficients on the CPU:
|
- When the specular `.ktx2` is HDR (`R16G16B16A16_SFLOAT` or `R32G32B32A32_SFLOAT`) and 2:1 aspect, `IBLManager` computes 9 SH coefficients on the CPU:
|
||||||
- Integrates the environment over the sphere using real SH basis functions (L2) with solid‑angle weighting.
|
- Integrates the environment over the sphere using real SH basis functions (L2) with solid-angle weighting.
|
||||||
- Applies Lambert band scaling (A0 = π, A1 = 2π/3, A2 = π/4).
|
- Applies Lambert band scaling (A0 = pi, A1 = 2pi/3, A2 = pi/4).
|
||||||
- Uploads the result as `vec4 sh[9]` in a uniform buffer (`_shBuffer`).
|
- Uploads the result as `vec4 sh[9]` in a uniform buffer (`_shBuffer`).
|
||||||
- Diffuse:
|
- Diffuse:
|
||||||
- If `IBLPaths::diffuseCube` is provided and valid, loads it as a cubemap via `load_ktx2_cubemap` + `create_image_compressed_layers`.
|
- If `IBLPaths::diffuseCube` is provided and valid, loads it as a cubemap via `load_ktx2_cubemap` + `create_image_compressed_layers`.
|
||||||
- Current shaders only use the SH buffer for diffuse; the diffuse cubemap is reserved for future variants.
|
- Current shaders only use the SH buffer for diffuse; the diffuse cubemap is reserved for future variants.
|
||||||
|
- Background:
|
||||||
|
- If `IBLPaths::background2D` is provided and valid, loads it as a 2D equirectangular `.ktx2` via `load_ktx2_2d` + `create_image_compressed`.
|
||||||
|
- This allows using a separate, potentially higher-resolution or unfiltered environment for the sky background while using a prefiltered version for specular IBL.
|
||||||
- BRDF LUT:
|
- BRDF LUT:
|
||||||
- Loaded as 2D `.ktx2` via `ktxutil::load_ktx2_2d` and uploaded with `create_image_compressed`.
|
- Loaded as 2D `.ktx2` via `ktxutil::load_ktx2_2d` and uploaded with `create_image_compressed`.
|
||||||
- Fallbacks:
|
- Fallbacks:
|
||||||
- If `diffuseCube` is missing but a specular env exists, `_diff` is aliased to `_spec`.
|
- If `diffuseCube` is missing but a specular env exists, `_diff` is aliased to `_spec`.
|
||||||
- `IBLManager::load_async(const IBLPaths&)` + `IBLManager::pump_async()` (runtime path used by the engine):
|
- If `background2D` is missing but a specular env exists, `_background` is aliased to `_spec`.
|
||||||
- `load_async` runs KTX2 file I/O and SH bake on a worker thread and stores a prepared CPU-side description (`PreparedIBLData`).
|
|
||||||
- `pump_async` is called on the main thread once per frame (after the previous frame is idle) to:
|
|
||||||
- Destroy old IBL images/SH via `destroy_images_and_sh()`.
|
|
||||||
- Create new GPU images with `create_image_compressed(_layers)` and upload the SH buffer.
|
|
||||||
- This avoids stalls in the main/game loop when switching IBL volumes or loading the default environment at startup.
|
|
||||||
- `IBLManager::unload()` releases GPU images, the SH buffer, and the descriptor set layout.
|
- `IBLManager::unload()` releases GPU images, the SH buffer, and the descriptor set layout.
|
||||||
- Descriptor layout:
|
- Descriptor layout:
|
||||||
- `IBLManager::ensureLayout()` builds a descriptor set layout (set=3) with:
|
- `IBLManager::ensureLayout()` builds a descriptor set layout (set=3) with:
|
||||||
- binding 0: `COMBINED_IMAGE_SAMPLER` — specular environment (2D equirect).
|
- binding 0: `COMBINED_IMAGE_SAMPLER` - specular environment (2D equirect).
|
||||||
- binding 1: `COMBINED_IMAGE_SAMPLER` — BRDF LUT 2D.
|
- binding 1: `COMBINED_IMAGE_SAMPLER` - BRDF LUT 2D.
|
||||||
- binding 2: `UNIFORM_BUFFER` — SH coefficients (`vec4 sh[9]`).
|
- binding 2: `UNIFORM_BUFFER` - SH coefficients (`vec4 sh[9]`).
|
||||||
|
- binding 3: `COMBINED_IMAGE_SAMPLER` - background environment (2D equirect, optional).
|
||||||
- Passes request this layout from `EngineContext::ibl` and plug it into their pipeline set layouts:
|
- Passes request this layout from `EngineContext::ibl` and plug it into their pipeline set layouts:
|
||||||
- Background: `vk_renderpass_background.cpp` (set 3 used for env background).
|
- Background: `vk_renderpass_background.cpp` (set 3 used for env background).
|
||||||
- Lighting: `vk_renderpass_lighting.cpp` (deferred lighting pass, set 3).
|
- Lighting: `vk_renderpass_lighting.cpp` (deferred lighting pass, set 3).
|
||||||
- Transparent: `vk_renderpass_transparent.cpp` (forward/transparent materials, set 3).
|
- Transparent: `vk_renderpass_transparent.cpp` (forward/transparent materials, set 3).
|
||||||
|
|
||||||
|
Asynchronous Loading
|
||||||
|
- Overview:
|
||||||
|
- `IBLManager` provides an asynchronous loading path via `load_async()` + `pump_async()` to avoid blocking the main/game loop during IBL environment switches or initial loading.
|
||||||
|
- Heavy CPU work (KTX2 file I/O, decompression, SH coefficient baking) runs on a dedicated worker thread.
|
||||||
|
- GPU resource creation (image uploads, buffer allocation) is deferred to the main thread via `pump_async()`.
|
||||||
|
- API:
|
||||||
|
- `bool load_async(const IBLPaths &paths)`:
|
||||||
|
- Queues an asynchronous IBL load job.
|
||||||
|
- Returns `false` if the job could not be queued (e.g., context not initialized).
|
||||||
|
- If called while a previous job is still pending, the new request supersedes the old one (the old result is discarded when ready).
|
||||||
|
- `struct AsyncResult { bool completed; bool success; }`:
|
||||||
|
- `completed`: `true` when an async job finished since the last `pump_async()` call.
|
||||||
|
- `success`: `true` when the finished job successfully produced new GPU IBL resources.
|
||||||
|
- `AsyncResult pump_async()`:
|
||||||
|
- Must be called on the main thread, typically once per frame after the previous frame's GPU work is idle.
|
||||||
|
- If a completed async job is pending:
|
||||||
|
- Destroys old IBL images and SH buffer via `destroy_images_and_sh()`.
|
||||||
|
- Creates new GPU images with `create_image_compressed(_layers)` and uploads the SH buffer.
|
||||||
|
- Returns `AsyncResult` indicating whether a job completed and its success status.
|
||||||
|
- Internal Architecture:
|
||||||
|
- `IBLManager::init()` spawns a persistent worker thread that waits on a condition variable.
|
||||||
|
- When `load_async()` is called:
|
||||||
|
- The request paths and a unique job ID are stored in `AsyncStateData`.
|
||||||
|
- The worker thread is signaled via condition variable.
|
||||||
|
- Any previous pending result is invalidated (superseded by the new job ID).
|
||||||
|
- Worker thread execution:
|
||||||
|
- Calls `prepare_ibl_cpu()` to load KTX2 files and bake SH coefficients.
|
||||||
|
- Stores the prepared data (`PreparedIBLData`) in `AsyncStateData`.
|
||||||
|
- Marks the result as ready with the corresponding job ID.
|
||||||
|
- If the job ID no longer matches (superseded), the result is discarded.
|
||||||
|
- Main thread integration (`pump_async()`):
|
||||||
|
- Checks if a result is ready.
|
||||||
|
- If ready, calls `commit_prepared()` to create GPU resources from the prepared CPU data.
|
||||||
|
- Clears the ready flag and returns the result status.
|
||||||
|
- Thread Safety:
|
||||||
|
- All shared state in `AsyncStateData` is protected by a mutex.
|
||||||
|
- The worker thread only reads request data and writes result data.
|
||||||
|
- The main thread only reads result data and writes request data.
|
||||||
|
- GPU resource creation is strictly on the main thread.
|
||||||
|
- Usage Example:
|
||||||
|
```cpp
|
||||||
|
// Queue async IBL load (non-blocking)
|
||||||
|
iblManager->load_async(IBLPaths{
|
||||||
|
.specularCube = "assets/ibl/studio_spec.ktx2",
|
||||||
|
.brdfLut2D = "assets/ibl/brdf_lut.ktx2",
|
||||||
|
.background2D = "assets/ibl/studio_bg.ktx2"
|
||||||
|
});
|
||||||
|
|
||||||
|
// In main loop, after waiting for previous frame:
|
||||||
|
auto result = iblManager->pump_async();
|
||||||
|
if (result.completed) {
|
||||||
|
if (result.success) {
|
||||||
|
// New IBL environment is now active
|
||||||
|
} else {
|
||||||
|
// Loading failed, handle error (e.g., keep previous IBL)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- Benefits:
|
||||||
|
- No frame stalls when loading large HDR environment maps.
|
||||||
|
- Seamless IBL volume transitions (e.g., entering a building with different lighting).
|
||||||
|
- SH baking (CPU-intensive) happens off the main thread.
|
||||||
|
- Cleanup:
|
||||||
|
- `IBLManager::unload()` shuts down the async worker thread (joins) and releases all GPU resources.
|
||||||
|
- The destructor also calls `shutdown_async()` to ensure clean termination.
|
||||||
|
|
||||||
Shader Side (`shaders/ibl_common.glsl`)
|
Shader Side (`shaders/ibl_common.glsl`)
|
||||||
- Bindings:
|
- Bindings:
|
||||||
- `layout(set=3, binding=0) uniform sampler2D iblSpec2D;`
|
- `layout(set=3, binding=0) uniform sampler2D iblSpec2D;`
|
||||||
- `layout(set=3, binding=1) uniform sampler2D iblBRDF;`
|
- `layout(set=3, binding=1) uniform sampler2D iblBRDF;`
|
||||||
- `layout(std140, set=3, binding=2) uniform IBL_SH { vec4 sh[9]; } iblSH;`
|
- `layout(std140, set=3, binding=2) uniform IBL_SH { vec4 sh[9]; } iblSH;`
|
||||||
|
- `layout(set=3, binding=3) uniform sampler2D iblBackground2D;`
|
||||||
- Helpers:
|
- Helpers:
|
||||||
- `vec3 sh_eval_irradiance(vec3 n)`:
|
- `vec3 sh_eval_irradiance(vec3 n)`:
|
||||||
- Evaluates the 9 SH basis functions (L2) at direction `n` using the same real SH basis as the CPU bake.
|
- Evaluates the 9 SH basis functions (L2) at direction `n` using the same real SH basis as the CPU bake.
|
||||||
- Multiplies each basis value by the corresponding `iblSH.sh[i].rgb` coefficient and sums the result.
|
- Multiplies each basis value by the corresponding `iblSH.sh[i].rgb` coefficient and sums the result.
|
||||||
- Coefficients are already convolved with the Lambert kernel on the CPU; the function returns diffuse irradiance directly.
|
- Coefficients are already convolved with the Lambert kernel on the CPU; the function returns diffuse irradiance directly.
|
||||||
- `vec2 dir_to_equirect(vec3 d)`:
|
- `vec2 dir_to_equirect(vec3 d)`:
|
||||||
- Normalizes `d`, computes `(phi, theta)` and returns equirectangular UV in `[0,1]²`.
|
- Normalizes `d`, computes `(phi, theta)` and returns equirectangular UV in `[0,1]^2`.
|
||||||
- Used consistently by background, deferred, and forward pipelines.
|
- Used consistently by background, deferred, and forward pipelines.
|
||||||
- `float ibl_lod_from_roughness(float roughness, float levels)`:
|
- `float ibl_lod_from_roughness(float roughness, float levels)`:
|
||||||
- Computes the mip LOD for specular IBL using `roughness² * (levels - 1)`.
|
- Computes the mip LOD for specular IBL using `roughness^2 * (levels - 1)`.
|
||||||
- This biases mid‑roughness reflections towards blurrier mips and avoids overly sharp reflections.
|
- This biases mid-roughness reflections towards blurrier mips and avoids overly sharp reflections.
|
||||||
|
|
||||||
Usage in Passes
|
Usage in Passes
|
||||||
- Deferred lighting (`shaders/deferred_lighting.frag` and `shaders/deferred_lighting_nort.frag`):
|
- Deferred lighting (`shaders/deferred_lighting.frag` and `shaders/deferred_lighting_nort.frag`):
|
||||||
@@ -84,24 +151,30 @@ Usage in Passes
|
|||||||
- Same include and IBL logic as deferred, applied after direct lighting.
|
- Same include and IBL logic as deferred, applied after direct lighting.
|
||||||
- Uses the same `ibl_lod_from_roughness` helper for LOD selection.
|
- Uses the same `ibl_lod_from_roughness` helper for LOD selection.
|
||||||
- Background (`shaders/background_env.frag`):
|
- Background (`shaders/background_env.frag`):
|
||||||
- Includes `ibl_common.glsl` and uses `dir_to_equirect(worldDir)` + `textureLod(iblSpec2D, uv, 0.0)` to render the environment at LOD 0.
|
- Includes `ibl_common.glsl` and uses `dir_to_equirect(worldDir)` + `textureLod(iblBackground2D, uv, 0.0)` to render the environment at LOD 0.
|
||||||
|
- When a dedicated background texture is provided via `IBLPaths::background2D`, the background pass renders from `iblBackground2D` which may differ from `iblSpec2D`.
|
||||||
|
|
||||||
Authoring IBL Assets
|
Authoring IBL Assets
|
||||||
- Specular environment:
|
- Specular environment:
|
||||||
- Preferred: prefiltered HDR cubemap in `.ktx2` (BC6H or `R16G16B16A16_SFLOAT`) with multiple mips.
|
- Preferred: prefiltered HDR cubemap in `.ktx2` (BC6H or `R16G16B16A16_SFLOAT`) with multiple mips.
|
||||||
- Alternative: prefiltered equirectangular 2D `.ktx2` with width = 2 × height and full mip chain.
|
- Alternative: prefiltered equirectangular 2D `.ktx2` with width = 2 x height and full mip chain.
|
||||||
- Make sure the mip chain is generated with a GGX importance sampling tool so the BRDF LUT + mip chain match.
|
- Make sure the mip chain is generated with a GGX importance sampling tool so the BRDF LUT + mip chain match.
|
||||||
- BRDF LUT:
|
- BRDF LUT:
|
||||||
- A standard 2D preintegrated GGX LUT (RG), usually stored as `R8G8_UNORM` or BC5.
|
- A standard 2D preintegrated GGX LUT (RG), usually stored as `R8G8_UNORM` or BC5.
|
||||||
- The LUT is sampled with `(NoV, roughness)` coordinates.
|
- The LUT is sampled with `(NoV, roughness)` coordinates.
|
||||||
- Diffuse:
|
- Diffuse:
|
||||||
- The engine currently uses SH coefficients baked from the specular equirectangular map. If you provide a separate diffuse cubemap, the CPU SH bake still uses the specular HDR; you can adjust this in `IBLManager` if you want SH to come from a different source.
|
- The engine currently uses SH coefficients baked from the specular equirectangular map. If you provide a separate diffuse cubemap, the CPU SH bake still uses the specular HDR; you can adjust this in `IBLManager` if you want SH to come from a different source.
|
||||||
|
- Background:
|
||||||
|
- Optional: equirectangular 2D `.ktx2` used exclusively for the sky background pass.
|
||||||
|
- Useful when you want a sharper or unfiltered environment for the visible sky while using a prefiltered version for specular reflections.
|
||||||
|
- If not provided, the system falls back to using `specularCube` for background rendering.
|
||||||
|
|
||||||
Implementation Notes
|
Implementation Notes
|
||||||
- CPU SH bake:
|
- CPU SH bake:
|
||||||
- Implemented in `IBLManager::load` using libktx to access raw HDR pixel data from `.ktx2`.
|
- Implemented in `IBLManager::load` using libktx to access raw HDR pixel data from `.ktx2`.
|
||||||
- Uses a simple nested loop over pixels with solid‑angle weighting and the same SH basis as `sh_eval_irradiance`.
|
- Uses a simple nested loop over pixels with solid-angle weighting and the same SH basis as `sh_eval_irradiance`.
|
||||||
- Fallbacks:
|
- Fallbacks:
|
||||||
- Lighting and transparent passes create small fallback textures so that the IBL descriptor set is always valid, even when no IBL assets are loaded.
|
- Lighting and transparent passes create small fallback textures so that the IBL descriptor set is always valid, even when no IBL assets are loaded.
|
||||||
- Background pass builds a 1×1×6 black cube as a fallback env.
|
- Background pass builds a 1x1x6 black cube as a fallback env.
|
||||||
|
- When `background2D` is not provided, `IBLManager::background()` returns the same image as `specular()`.
|
||||||
|
|
||||||
|
|||||||
@@ -12,22 +12,27 @@ Optional subsystem that enables hybrid or full ray traced shadows via Ray Query.
|
|||||||
|
|
||||||
### BLAS Build & Cache
|
### BLAS Build & Cache
|
||||||
- `AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh)`:
|
- `AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh)`:
|
||||||
- One GAS per `MeshAsset`, keyed by vertex buffer `VkBuffer`.
|
- One GAS per `MeshAsset`, keyed by mesh pointer.
|
||||||
- Populated with one triangle geometry per `GeoSurface`.
|
- Populated with one triangle geometry per `GeoSurface`.
|
||||||
- Built with `VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR` and device-local storage + scratch.
|
- Built with `VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR` and device-local storage + scratch.
|
||||||
- Cached in `_blasByVB` for reuse across frames.
|
- Cached in `_blasByMesh` for reuse across frames.
|
||||||
- Called from `AssetManager::createMesh(...)` and from GLTF loader after mesh upload.
|
- When a BLAS does not exist yet, the mesh is queued for an asynchronous build and an empty handle is returned; callers must treat this as "BLAS not ready" and skip the instance for the current frame (see TLAS section below).
|
||||||
|
- `pump_blas_builds(max_builds_per_frame)` advances an internal BLAS build queue and is called once per frame from the engine main loop to spread work across multiple frames instead of doing all BLAS builds in a single spike.
|
||||||
|
|
||||||
### TLAS Rebuild Per Frame
|
### TLAS Rebuild Per Frame
|
||||||
- `VkAccelerationStructureKHR buildTLASFromDrawContext(const DrawContext& dc)`:
|
- `VkAccelerationStructureKHR buildTLASFromDrawContext(const DrawContext& dc)`:
|
||||||
- Iterates `dc.OpaqueSurfaces` and creates one instance per render object.
|
- Iterates `dc.OpaqueSurfaces` and creates one instance per render object.
|
||||||
- Looks up BLAS by `RenderObject::vertexBuffer`; if missing, instance is skipped.
|
- Looks up BLAS by `RenderObject::sourceMesh` (the `MeshAsset*`); if a BLAS is not cached yet, it calls `getOrBuildBLAS` with a non-owning `shared_ptr` to queue a build and then skips the instance for this frame.
|
||||||
- Uploads instances to a CPU→GPU buffer with device address.
|
- Uploads instances to a CPU→GPU buffer with device address.
|
||||||
- Builds TLAS with `immediate_submit` and stores device address for Ray Query.
|
- Builds TLAS with `immediate_submit` and stores device address for Ray Query.
|
||||||
|
|
||||||
### Renderer Integration
|
### Renderer Integration
|
||||||
- In `VulkanEngine::draw()` before building passes:
|
- In `VulkanEngine::draw()` before building passes:
|
||||||
- If RT mode is enabled (`shadowSettings.mode != 0`) and manager exists, TLAS is rebuilt from the latest draw context.
|
- If RT mode is enabled (`shadowSettings.mode != 0`) or ray-traced SSR is enabled (`enableSSR && reflectionMode != 0`), and the manager exists, TLAS is rebuilt from the latest draw context.
|
||||||
|
- TLAS only references BLAS that are already built; instances whose meshes are still in the BLAS build queue are skipped until their BLAS completes.
|
||||||
|
- In `VulkanEngine::run()` at the start of each frame (after waiting for the previous frame fence):
|
||||||
|
- Calls `RayTracingManager::flushPendingDeletes()` to safely destroy any BLAS scheduled for deferred deletion.
|
||||||
|
- Calls `RayTracingManager::pump_blas_builds(1)` to build at most one queued BLAS per frame (tunable if you want more or fewer builds per frame).
|
||||||
- Lighting pass binds the TLAS at `set=0,binding=1` when available.
|
- Lighting pass binds the TLAS at `set=0,binding=1` when available.
|
||||||
|
|
||||||
### Modes & UI
|
### Modes & UI
|
||||||
@@ -36,7 +41,8 @@ Optional subsystem that enables hybrid or full ray traced shadows via Ray Query.
|
|||||||
- Mode 2: Ray Query only (no shadow maps).
|
- Mode 2: Ray Query only (no shadow maps).
|
||||||
|
|
||||||
### Notes & Caveats
|
### Notes & Caveats
|
||||||
- BLAS cache key is the vertex buffer handle; if you rebuild meshes in-place, BLAS must be invalidated.
|
- BLAS cache key is the `MeshAsset*`. If you destroy or rebuild meshes (or their GPU buffers) you must invalidate associated BLAS via `RayTracingManager::removeBLASForMesh(mesh)` or `removeBLASForBuffer(vertexBuffer)`.
|
||||||
- CPU→GPU memory is used for the TLAS instance buffer to simplify updates. On some platforms, you may prefer staging + device-local.
|
- CPU→GPU memory is used for the TLAS instance buffer to simplify updates. On some platforms, you may prefer staging + device-local.
|
||||||
|
- Because BLAS builds are asynchronous and capped per frame, newly spawned meshes may take a few frames before they appear in the ray-traced path; this is a deliberate tradeoff to avoid large hitches when many meshes are introduced.
|
||||||
- The RT path requires Vulkan 1.2+ with Ray Query and Acceleration Structure features available.
|
- The RT path requires Vulkan 1.2+ with Ray Query and Acceleration Structure features available.
|
||||||
|
|
||||||
|
|||||||
@@ -111,8 +111,8 @@ Buffer usage → stage/access examples:
|
|||||||
|
|
||||||
### Built‑In Pass Wiring (Current)
|
### Built‑In Pass Wiring (Current)
|
||||||
|
|
||||||
- Resource uploads (if any) → Background (compute) → Geometry (G‑Buffer) → Lighting (deferred) → Transparent → CopyToSwapchain → ImGui → PreparePresent.
|
- Resource uploads (if any) → Background (compute) → Geometry (G‑Buffer) → Lighting (deferred) → SSR → Tonemap+Bloom → FXAA → Transparent → CopyToSwapchain → ImGui → PreparePresent.
|
||||||
- See registrations: `src/core/engine.cpp:321`–`src/core/engine.cpp:352`.
|
- See registrations in `src/core/engine.cpp`.
|
||||||
|
|
||||||
### Notes & Limits
|
### Notes & Limits
|
||||||
|
|
||||||
|
|||||||
@@ -68,10 +68,14 @@ addPass(std::move(myPass));
|
|||||||
### Built-in Passes
|
### Built-in Passes
|
||||||
|
|
||||||
- Background (compute): Declares `ComputeWrite(drawImage)` and dispatches a selected effect instance.
|
- Background (compute): Declares `ComputeWrite(drawImage)` and dispatches a selected effect instance.
|
||||||
- Geometry (G-Buffer): Declares 3 color attachments and `DepthAttachment`, plus buffer reads for shared index/vertex buffers.
|
- Geometry (G-Buffer): Declares 4 color attachments (position, normal+roughness, albedo+metallic, AO+emissive) and `DepthAttachment`, plus buffer reads for shared index/vertex buffers.
|
||||||
- Lighting (deferred): Reads G‑Buffer as sampled images and writes to `drawImage`.
|
- Lighting (deferred): Reads G‑Buffer as sampled images and writes to `drawImage`. Applies AO to indirect lighting and adds emissive contribution.
|
||||||
- Shadows: Cascaded shadow maps render to per-frame transient depth images (four cascades). If Ray Query is enabled,
|
- Shadows: Cascaded shadow maps render to per-frame transient depth images (four cascades). If Ray Query is enabled,
|
||||||
the lighting pass additionally samples TLAS to evaluate shadow visibility according to the selected mode.
|
the lighting pass additionally samples TLAS to evaluate shadow visibility according to the selected mode.
|
||||||
|
- SSR (Screen Space Reflections): Reads HDR lighting result + G-Buffer and outputs reflections blended with the scene.
|
||||||
|
Two variants: `ssr.nort` (screen-space only) and `ssr.rt` (SSR + RT fallback using TLAS ray queries).
|
||||||
|
- Tonemap + Bloom: Converts HDR to LDR with exposure control and optional bloom. Supports Reinhard and ACES tonemapping.
|
||||||
|
- FXAA: Post-process anti-aliasing on the LDR tonemapped image. Simple 5-tap edge-detection blur.
|
||||||
- Transparent (forward): Writes to `drawImage` with depth test against `depthImage` after lighting.
|
- Transparent (forward): Writes to `drawImage` with depth test against `depthImage` after lighting.
|
||||||
- ImGui: Inserted just before present to draw on the swapchain image.
|
- ImGui: Inserted just before present to draw on the swapchain image.
|
||||||
|
|
||||||
@@ -90,3 +94,83 @@ addPass(std::move(myPass));
|
|||||||
|
|
||||||
See also: `docs/RenderGraph.md` for the builder API and synchronization details.
|
See also: `docs/RenderGraph.md` for the builder API and synchronization details.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Post-Processing Pipeline
|
||||||
|
|
||||||
|
After deferred lighting, the engine runs a post-processing chain: SSR → Tonemap (with Bloom) → FXAA → Present.
|
||||||
|
|
||||||
|
### SSR (Screen Space Reflections)
|
||||||
|
|
||||||
|
Located in `src/render/passes/ssr.cpp` and `shaders/ssr.frag` / `shaders/ssr_rt.frag`.
|
||||||
|
|
||||||
|
**Algorithm:**
|
||||||
|
- World-space ray marching along the reflection vector `R = reflect(-V, N)`.
|
||||||
|
- Depth comparison against G-Buffer position to find intersection.
|
||||||
|
- Fresnel (Schlick) and glossiness-based blending with the base HDR color.
|
||||||
|
|
||||||
|
**Parameters (shader constants):**
|
||||||
|
- `MAX_STEPS = 64` – maximum ray march iterations (reduced for rough surfaces).
|
||||||
|
- `STEP_LENGTH = 0.5` – world units per step.
|
||||||
|
- `MAX_DISTANCE = 50.0` – maximum ray travel distance.
|
||||||
|
- `THICKNESS = 3.0` – depth tolerance for hit detection.
|
||||||
|
|
||||||
|
**Variants:**
|
||||||
|
- `ssr.nort` – Pure screen-space reflections.
|
||||||
|
- `ssr.rt` – SSR + RT fallback using TLAS ray queries when SSR misses (requires `GL_EXT_ray_query`).
|
||||||
|
Reflection mode controlled via `sceneData.rtOptions.w`: 0 = SSR only, 1 = SSR + RT fallback, 2 = RT only.
|
||||||
|
|
||||||
|
**Inputs (set=1):**
|
||||||
|
- binding 0: `hdrColor` – HDR lighting result.
|
||||||
|
- binding 1: `posTex` – G-Buffer world position (RGBA32F).
|
||||||
|
- binding 2: `normalTex` – G-Buffer normal + roughness.
|
||||||
|
- binding 3: `albedoTex` – G-Buffer albedo + metallic.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Tonemap + Bloom
|
||||||
|
|
||||||
|
Located in `src/render/passes/tonemap.cpp` and `shaders/tonemap.frag`.
|
||||||
|
|
||||||
|
**Tonemapping modes:**
|
||||||
|
- `mode = 0` – Reinhard: `x / (1 + x)`.
|
||||||
|
- `mode = 1` – ACES (Narkowicz approximation, default).
|
||||||
|
|
||||||
|
**Bloom:**
|
||||||
|
- Simple gather-based bloom computed in HDR space before tonemapping.
|
||||||
|
- 5×5 kernel (radius=2) samples neighbors; pixels exceeding `bloomThreshold` contribute weighted by their brightness.
|
||||||
|
- Accumulated bloom is multiplied by `bloomIntensity` and added to the HDR color.
|
||||||
|
|
||||||
|
**Runtime parameters:**
|
||||||
|
- `exposure` (default 1.0) – exposure multiplier.
|
||||||
|
- `bloomEnabled` (default true) – toggle bloom.
|
||||||
|
- `bloomThreshold` (default 1.0) – brightness threshold for bloom contribution.
|
||||||
|
- `bloomIntensity` (default 0.7) – bloom blend strength.
|
||||||
|
|
||||||
|
**Output:** LDR image (`VK_FORMAT_R8G8B8A8_UNORM`) with gamma correction (γ = 2.2).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### FXAA (Fast Approximate Anti-Aliasing)
|
||||||
|
|
||||||
|
Located in `src/render/passes/fxaa.cpp` and `shaders/fxaa.frag`.
|
||||||
|
|
||||||
|
**Algorithm:**
|
||||||
|
- Luma-based edge detection using a 5-tap cross pattern (N, S, E, W, center).
|
||||||
|
- If luma range exceeds threshold, apply a simple box blur; otherwise pass through.
|
||||||
|
|
||||||
|
**Runtime parameters:**
|
||||||
|
- `enabled` (default true) – toggle FXAA.
|
||||||
|
- `edge_threshold` (default 0.125) – relative contrast threshold.
|
||||||
|
- `edge_threshold_min` (default 0.0312) – absolute minimum threshold.
|
||||||
|
|
||||||
|
**Push constants:**
|
||||||
|
```glsl
|
||||||
|
layout(push_constant) uniform Push {
|
||||||
|
float inverse_width;
|
||||||
|
float inverse_height;
|
||||||
|
float edge_threshold;
|
||||||
|
float edge_threshold_min;
|
||||||
|
} pc;
|
||||||
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -23,9 +23,36 @@
|
|||||||
- `vec4 metal_rough_factors; // x = metallic, y = roughness`
|
- `vec4 metal_rough_factors; // x = metallic, y = roughness`
|
||||||
- `vec4 extra[14]; // extra[0].x = normalScale`
|
- `vec4 extra[14]; // extra[0].x = normalScale`
|
||||||
- Material texture bindings (set=1):
|
- Material texture bindings (set=1):
|
||||||
- binding=1 `colorTex`, binding=2 `metalRoughTex`, binding=3 `normalMap`.
|
- binding=1 `colorTex`, binding=2 `metalRoughTex`, binding=3 `normalMap`, binding=4 `occlusionTex`, binding=5 `emissiveTex`.
|
||||||
|
|
||||||
- Adding a pipeline (graphics)
|
- Adding a pipeline (graphics)
|
||||||
- Fill `GraphicsPipelineCreateInfo` with shader paths, descriptor set layouts, optional push constants, and a `configure(PipelineBuilder&)` callback to set topology, raster, depth/blend, and attachment formats.
|
- Fill `GraphicsPipelineCreateInfo` with shader paths, descriptor set layouts, optional push constants, and a `configure(PipelineBuilder&)` callback to set topology, raster, depth/blend, and attachment formats.
|
||||||
- Register with `PipelineManager::createGraphicsPipeline(name, info)`. Retrieve via `getGraphics` or `getMaterialPipeline`.
|
- Register with `PipelineManager::createGraphicsPipeline(name, info)`. Retrieve via `getGraphics` or `getMaterialPipeline`.
|
||||||
|
|
||||||
|
Shader File Reference
|
||||||
|
|
||||||
|
| File | Stage | Description |
|
||||||
|
|------|-------|-------------|
|
||||||
|
| `fullscreen.vert` | Vertex | Fullscreen triangle for post-process passes |
|
||||||
|
| `mesh.vert` | Vertex | Standard mesh vertex transform |
|
||||||
|
| `mesh.frag` | Fragment | Forward shading (deprecated, use gbuffer) |
|
||||||
|
| `gbuffer.frag` | Fragment | G-Buffer output (position, normal, albedo, AO+emissive) |
|
||||||
|
| `deferred_lighting.frag` | Fragment | Deferred lighting with RT shadows + IBL |
|
||||||
|
| `deferred_lighting_nort.frag` | Fragment | Deferred lighting without RT |
|
||||||
|
| `shadow.vert/.frag` | Vertex/Fragment | Shadow map generation |
|
||||||
|
| `ssr.frag` | Fragment | Screen-space reflections (ray march) |
|
||||||
|
| `ssr_rt.frag` | Fragment | SSR + RT fallback (ray query) |
|
||||||
|
| `tonemap.frag` | Fragment | HDR→LDR tonemapping + bloom |
|
||||||
|
| `fxaa.frag` | Fragment | Fast approximate anti-aliasing |
|
||||||
|
| `sky.comp` | Compute | Procedural sky background |
|
||||||
|
| `gradient_color.comp` | Compute | Gradient background |
|
||||||
|
| `background_env.frag` | Fragment | Environment map background |
|
||||||
|
|
||||||
|
GLSL Includes
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `input_structures.glsl` | SceneData UBO, material bindings, light structs |
|
||||||
|
| `lighting_common.glsl` | BRDF evaluation, point light helpers |
|
||||||
|
| `ibl_common.glsl` | IBL split-sum, SH irradiance |
|
||||||
|
|
||||||
|
|||||||
@@ -1,20 +1,27 @@
|
|||||||
Materials and Textures Overview (PBR)
|
Materials and Textures Overview (PBR)
|
||||||
|
|
||||||
Current state (as of Nov 1, 2025)
|
|
||||||
- PBR textures bound per material (set=1):
|
- PBR textures bound per material (set=1):
|
||||||
- binding=0: GLTFMaterialData (UBO)
|
- binding=0: GLTFMaterialData (UBO)
|
||||||
- binding=1: `colorTex` (albedo/base color) — sRGB
|
- binding=1: `colorTex` (albedo/base color) — sRGB
|
||||||
- binding=2: `metalRoughTex` (G=roughness, B=metallic) — UNORM
|
- binding=2: `metalRoughTex` (G=roughness, B=metallic) — UNORM
|
||||||
- binding=3: `normalMap` (tangent-space normal, UNORM)
|
- binding=3: `normalMap` (tangent-space normal, UNORM)
|
||||||
|
- binding=4: `occlusionTex` (ambient occlusion, R channel) — UNORM
|
||||||
|
- binding=5: `emissiveTex` (emissive RGB) — sRGB
|
||||||
- G‑Buffer writes world‑space normals. Tangent‑space normal maps are decoded with TBN using a sign‑correct bitangent (B = sign * cross(N, T)).
|
- G‑Buffer writes world‑space normals. Tangent‑space normal maps are decoded with TBN using a sign‑correct bitangent (B = sign * cross(N, T)).
|
||||||
- Numeric fallbacks via `MaterialConstants` (CPU) / `GLTFMaterialData` (GPU):
|
- Numeric fallbacks via `MaterialConstants` (CPU) / `GLTFMaterialData` (GPU):
|
||||||
- `colorFactors` (RGBA). Defaults to 1 if zero.
|
- `colorFactors` (RGBA). Defaults to 1 if zero.
|
||||||
- `metal_rough_factors` (X=metallic, Y=roughness). Roughness is clamped to ≥ 0.04 in shaders.
|
- `metal_rough_factors` (X=metallic, Y=roughness). Roughness is clamped to ≥ 0.04 in shaders.
|
||||||
- `extra[0].x` = `normalScale` (scalar, default 1.0). Multiplies the XY of decoded normal.
|
- `extra[0].x` = `normalScale` (scalar, default 1.0). Multiplies the XY of decoded normal.
|
||||||
|
- `extra[0].y` = `aoStrength` (scalar, 0–1). Controls AO influence.
|
||||||
|
- `extra[0].z` = `hasAO` (flag, 1 = use AO texture, 0 = skip).
|
||||||
|
- `extra[1].rgb` = `emissiveFactor` (vec3). Multiplied with emissive texture.
|
||||||
|
- `extra[2].x` = `alphaCutoff` (scalar). For MASK alpha mode.
|
||||||
- Defaults when a texture is missing:
|
- Defaults when a texture is missing:
|
||||||
- Albedo → checkerboard error texture
|
- Albedo → checkerboard error texture
|
||||||
- MR → white (no effect)
|
- MR → white (no effect)
|
||||||
- Normal → 1×1 flat normal (0.5, 0.5, 1.0)
|
- Normal → 1×1 flat normal (0.5, 0.5, 1.0)
|
||||||
|
- Occlusion → 1×1 white (AO = 1.0, no occlusion)
|
||||||
|
- Emissive → 1×1 black (no emission)
|
||||||
|
|
||||||
Implications for primitive meshes
|
Implications for primitive meshes
|
||||||
- Primitives can use:
|
- Primitives can use:
|
||||||
@@ -41,3 +48,14 @@ Usage Examples
|
|||||||
- Adjust normal strength per material: set `material.constants.extra[0].x` (CPU) or `normalTexture.scale` in glTF.
|
- Adjust normal strength per material: set `material.constants.extra[0].x` (CPU) or `normalTexture.scale` in glTF.
|
||||||
- Primitive with PBR textures:
|
- Primitive with PBR textures:
|
||||||
- Set `MeshMaterialDesc::Kind::Textured` and fill `albedoPath`, `metalRoughPath`, and `normalPath`.
|
- Set `MeshMaterialDesc::Kind::Textured` and fill `albedoPath`, `metalRoughPath`, and `normalPath`.
|
||||||
|
|
||||||
|
G-Buffer Outputs
|
||||||
|
- The geometry pass (`gbuffer.frag`) writes 4 render targets:
|
||||||
|
- `outPos` (location 0): World position (xyz) + valid flag (w=1).
|
||||||
|
- `outNorm` (location 1): World normal (xyz) + roughness (w).
|
||||||
|
- `outAlbedo` (location 2): Albedo (rgb) + metallic (a).
|
||||||
|
- `outExtra` (location 3): AO (x) + emissive (yzw).
|
||||||
|
- Deferred lighting reads these and computes:
|
||||||
|
```glsl
|
||||||
|
vec3 color = direct + indirect * ao + emissive;
|
||||||
|
```
|
||||||
|
|||||||
@@ -1185,6 +1185,10 @@ void VulkanEngine::run()
|
|||||||
// Safe to destroy any BLAS queued for deletion now that the previous frame is idle.
|
// Safe to destroy any BLAS queued for deletion now that the previous frame is idle.
|
||||||
if (_rayManager) { _rayManager->flushPendingDeletes(); }
|
if (_rayManager) { _rayManager->flushPendingDeletes(); }
|
||||||
|
|
||||||
|
// Progress queued BLAS builds over multiple frames to avoid large
|
||||||
|
// stalls when many meshes require ray tracing structures at once.
|
||||||
|
if (_rayManager) { _rayManager->pump_blas_builds(1); }
|
||||||
|
|
||||||
// Commit any completed async IBL load now that the GPU is idle.
|
// Commit any completed async IBL load now that the GPU is idle.
|
||||||
if (_iblManager && _pendingIBLRequest.active)
|
if (_iblManager && _pendingIBLRequest.active)
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -35,6 +35,8 @@ void RayTracingManager::cleanup()
|
|||||||
VkDevice dv = _device->device();
|
VkDevice dv = _device->device();
|
||||||
// Destroy any deferred BLAS first
|
// Destroy any deferred BLAS first
|
||||||
flushPendingDeletes();
|
flushPendingDeletes();
|
||||||
|
_blasBuildQueue.clear();
|
||||||
|
_blasPendingMeshes.clear();
|
||||||
|
|
||||||
if (_tlas.handle)
|
if (_tlas.handle)
|
||||||
{
|
{
|
||||||
@@ -100,21 +102,53 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
|
|||||||
{
|
{
|
||||||
if (!mesh) return {};
|
if (!mesh) return {};
|
||||||
|
|
||||||
// If uploads are deferred, ensure any pending mesh buffer uploads are flushed
|
const MeshAsset* key = mesh.get();
|
||||||
// before building a BLAS that reads from those GPU buffers.
|
|
||||||
if (_resources && _resources->deferred_uploads() && _resources->has_pending_uploads())
|
|
||||||
{
|
|
||||||
fmt::println("[RT] getOrBuildBLAS: flushing pending resource uploads before BLAS build");
|
|
||||||
_resources->process_queued_uploads_immediate();
|
|
||||||
}
|
|
||||||
|
|
||||||
if (auto it = _blasByMesh.find(mesh.get()); it != _blasByMesh.end())
|
// If a BLAS is already cached (even an empty sentinel), return it directly.
|
||||||
|
if (auto it = _blasByMesh.find(key); it != _blasByMesh.end())
|
||||||
{
|
{
|
||||||
fmt::println("[RT] getOrBuildBLAS reuse by mesh mesh='{}' handle={}", mesh->name,
|
fmt::println("[RT] getOrBuildBLAS reuse by mesh mesh='{}' handle={}", mesh->name,
|
||||||
static_cast<const void *>(it->second.handle));
|
static_cast<const void *>(it->second.handle));
|
||||||
return it->second;
|
return it->second;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// If a build is already queued or in progress for this mesh, do not enqueue
|
||||||
|
// another job; simply report "not ready yet".
|
||||||
|
if (_blasPendingMeshes.find(key) != _blasPendingMeshes.end())
|
||||||
|
{
|
||||||
|
fmt::println("[RT] getOrBuildBLAS pending build mesh='{}'", mesh->name);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
// If uploads are deferred, ensure any pending mesh buffer uploads are flushed
|
||||||
|
// before queuing a BLAS that will read from those GPU buffers.
|
||||||
|
if (_resources && _resources->deferred_uploads() && _resources->has_pending_uploads())
|
||||||
|
{
|
||||||
|
fmt::println("[RT] getOrBuildBLAS: flushing pending resource uploads before queuing BLAS build");
|
||||||
|
_resources->process_queued_uploads_immediate();
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt::println("[RT] getOrBuildBLAS queue build mesh='{}'", mesh->name);
|
||||||
|
_blasPendingMeshes.insert(key);
|
||||||
|
_blasBuildQueue.push_back(PendingBlasBuild{key});
|
||||||
|
|
||||||
|
// BLAS will be built asynchronously by pump_blas_builds(); until then,
|
||||||
|
// callers should treat the empty handle as "not ready yet".
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
AccelStructureHandle RayTracingManager::build_blas_for_mesh(const MeshAsset *mesh)
|
||||||
|
{
|
||||||
|
if (!mesh || !_resources || !_device) return {};
|
||||||
|
|
||||||
|
// If uploads are deferred, ensure any pending mesh buffer uploads are flushed
|
||||||
|
// before building a BLAS that reads from those GPU buffers.
|
||||||
|
if (_resources->deferred_uploads() && _resources->has_pending_uploads())
|
||||||
|
{
|
||||||
|
fmt::println("[RT] build_blas_for_mesh: flushing pending resource uploads before BLAS build");
|
||||||
|
_resources->process_queued_uploads_immediate();
|
||||||
|
}
|
||||||
|
|
||||||
// Build BLAS with one geometry per surface (skip empty primitives)
|
// Build BLAS with one geometry per surface (skip empty primitives)
|
||||||
std::vector<VkAccelerationStructureGeometryKHR> geoms;
|
std::vector<VkAccelerationStructureGeometryKHR> geoms;
|
||||||
std::vector<VkAccelerationStructureBuildRangeInfoKHR> ranges;
|
std::vector<VkAccelerationStructureBuildRangeInfoKHR> ranges;
|
||||||
@@ -126,7 +160,7 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
|
|||||||
const uint32_t vcount = mesh->meshBuffers.vertexCount;
|
const uint32_t vcount = mesh->meshBuffers.vertexCount;
|
||||||
VkBuffer vb = mesh->meshBuffers.vertexBuffer.buffer;
|
VkBuffer vb = mesh->meshBuffers.vertexBuffer.buffer;
|
||||||
|
|
||||||
fmt::println("[RT] getOrBuildBLAS build mesh='{}' surfaces={} vcount={}", mesh->name,
|
fmt::println("[RT] build_blas_for_mesh mesh='{}' surfaces={} vcount={}", mesh->name,
|
||||||
mesh->surfaces.size(), vcount);
|
mesh->surfaces.size(), vcount);
|
||||||
|
|
||||||
for (const auto &s: mesh->surfaces)
|
for (const auto &s: mesh->surfaces)
|
||||||
@@ -162,9 +196,11 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
|
|||||||
ranges.push_back(r);
|
ranges.push_back(r);
|
||||||
}
|
}
|
||||||
|
|
||||||
// If no valid geometries, skip BLAS build
|
// If no valid geometries, record an empty sentinel to avoid re-queuing.
|
||||||
if (geoms.empty())
|
if (geoms.empty())
|
||||||
{
|
{
|
||||||
|
fmt::println("[RT] build_blas_for_mesh: mesh='{}' has no primitives; skipping BLAS", mesh->name);
|
||||||
|
_blasByMesh.emplace(mesh, AccelStructureHandle{});
|
||||||
return {};
|
return {};
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -231,10 +267,50 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
|
|||||||
dai.accelerationStructure = blas.handle;
|
dai.accelerationStructure = blas.handle;
|
||||||
blas.deviceAddress = _vkGetAccelerationStructureDeviceAddressKHR(_device->device(), &dai);
|
blas.deviceAddress = _vkGetAccelerationStructureDeviceAddressKHR(_device->device(), &dai);
|
||||||
|
|
||||||
_blasByMesh.emplace(mesh.get(), blas);
|
_blasByMesh.emplace(mesh, blas);
|
||||||
return blas;
|
return blas;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void RayTracingManager::pump_blas_builds(uint32_t max_builds_per_frame)
|
||||||
|
{
|
||||||
|
if (max_builds_per_frame == 0 || _blasBuildQueue.empty())
|
||||||
|
{
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint32_t built = 0;
|
||||||
|
|
||||||
|
while (built < max_builds_per_frame && !_blasBuildQueue.empty())
|
||||||
|
{
|
||||||
|
PendingBlasBuild job = _blasBuildQueue.front();
|
||||||
|
_blasBuildQueue.pop_front();
|
||||||
|
|
||||||
|
const MeshAsset* mesh = job.mesh;
|
||||||
|
if (mesh)
|
||||||
|
{
|
||||||
|
// Drop the pending flag for this mesh now; if the build ends up
|
||||||
|
// with an empty handle, getOrBuildBLAS will see the cache entry
|
||||||
|
// (including the empty sentinel) and avoid re-queuing.
|
||||||
|
_blasPendingMeshes.erase(mesh);
|
||||||
|
|
||||||
|
// Skip if a BLAS was already created meanwhile.
|
||||||
|
if (_blasByMesh.find(mesh) == _blasByMesh.end())
|
||||||
|
{
|
||||||
|
AccelStructureHandle blas = build_blas_for_mesh(mesh);
|
||||||
|
if (blas.handle)
|
||||||
|
{
|
||||||
|
++built;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
// Mesh pointer is null; just drop the pending flag.
|
||||||
|
_blasPendingMeshes.erase(mesh);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
void RayTracingManager::ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize /*requiredScratch*/, DeletionQueue& dq)
|
void RayTracingManager::ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize /*requiredScratch*/, DeletionQueue& dq)
|
||||||
{
|
{
|
||||||
// Recreate TLAS storage if size grows. Defer destruction to the frame DQ to
|
// Recreate TLAS storage if size grows. Defer destruction to the frame DQ to
|
||||||
@@ -296,7 +372,10 @@ VkAccelerationStructureKHR RayTracingManager::buildTLASFromDrawContext(const Dra
|
|||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
// Try to build on the fly if the mesh is still alive (non-owning shared_ptr wrapper).
|
// Queue an async BLAS build if the mesh is still alive
|
||||||
|
// (non-owning shared_ptr wrapper). The BLAS will be built
|
||||||
|
// over subsequent frames by pump_blas_builds(); until then,
|
||||||
|
// this instance will be skipped.
|
||||||
std::shared_ptr<MeshAsset> nonOwning(const_cast<MeshAsset *>(r.sourceMesh), [](MeshAsset *) {});
|
std::shared_ptr<MeshAsset> nonOwning(const_cast<MeshAsset *>(r.sourceMesh), [](MeshAsset *) {});
|
||||||
blas = getOrBuildBLAS(nonOwning);
|
blas = getOrBuildBLAS(nonOwning);
|
||||||
}
|
}
|
||||||
@@ -423,6 +502,24 @@ void RayTracingManager::removeBLASForBuffer(VkBuffer vertexBuffer)
|
|||||||
{
|
{
|
||||||
if (!vertexBuffer) return;
|
if (!vertexBuffer) return;
|
||||||
|
|
||||||
|
// Drop any queued builds referencing this vertex buffer.
|
||||||
|
if (!_blasBuildQueue.empty())
|
||||||
|
{
|
||||||
|
for (auto itQ = _blasBuildQueue.begin(); itQ != _blasBuildQueue.end(); )
|
||||||
|
{
|
||||||
|
const MeshAsset* mesh = itQ->mesh;
|
||||||
|
if (mesh && mesh->meshBuffers.vertexBuffer.buffer == vertexBuffer)
|
||||||
|
{
|
||||||
|
_blasPendingMeshes.erase(mesh);
|
||||||
|
itQ = _blasBuildQueue.erase(itQ);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
++itQ;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Find any mesh whose vertex buffer matches and evict its BLAS.
|
// Find any mesh whose vertex buffer matches and evict its BLAS.
|
||||||
for (auto it = _blasByMesh.begin(); it != _blasByMesh.end(); )
|
for (auto it = _blasByMesh.begin(); it != _blasByMesh.end(); )
|
||||||
{
|
{
|
||||||
@@ -443,6 +540,24 @@ void RayTracingManager::removeBLASForBuffer(VkBuffer vertexBuffer)
|
|||||||
void RayTracingManager::removeBLASForMesh(const MeshAsset *mesh)
|
void RayTracingManager::removeBLASForMesh(const MeshAsset *mesh)
|
||||||
{
|
{
|
||||||
if (!mesh) return;
|
if (!mesh) return;
|
||||||
|
|
||||||
|
// Drop any queued builds for this mesh.
|
||||||
|
if (!_blasBuildQueue.empty())
|
||||||
|
{
|
||||||
|
for (auto itQ = _blasBuildQueue.begin(); itQ != _blasBuildQueue.end(); )
|
||||||
|
{
|
||||||
|
if (itQ->mesh == mesh)
|
||||||
|
{
|
||||||
|
itQ = _blasBuildQueue.erase(itQ);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
++itQ;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_blasPendingMeshes.erase(mesh);
|
||||||
|
|
||||||
auto it = _blasByMesh.find(mesh);
|
auto it = _blasByMesh.find(mesh);
|
||||||
if (it == _blasByMesh.end()) return;
|
if (it == _blasByMesh.end()) return;
|
||||||
|
|
||||||
|
|||||||
@@ -1,20 +1,22 @@
|
|||||||
#pragma once
|
#pragma once
|
||||||
#include <core/types.h>
|
#include <core/types.h>
|
||||||
#include <unordered_map>
|
#include <unordered_map>
|
||||||
#include <vector>
|
#include <unordered_set>
|
||||||
#include <memory>
|
#include <vector>
|
||||||
|
#include <deque>
|
||||||
class DeviceManager;
|
#include <memory>
|
||||||
class ResourceManager;
|
|
||||||
struct DrawContext;
|
class DeviceManager;
|
||||||
struct MeshAsset;
|
class ResourceManager;
|
||||||
|
struct DrawContext;
|
||||||
struct AccelStructureHandle {
|
struct MeshAsset;
|
||||||
VkAccelerationStructureKHR handle{VK_NULL_HANDLE};
|
|
||||||
AllocatedBuffer storage{}; // buffer that backs the AS
|
struct AccelStructureHandle {
|
||||||
VkDeviceAddress deviceAddress{0};
|
VkAccelerationStructureKHR handle{VK_NULL_HANDLE};
|
||||||
};
|
AllocatedBuffer storage{}; // buffer that backs the AS
|
||||||
|
VkDeviceAddress deviceAddress{0};
|
||||||
|
};
|
||||||
|
|
||||||
// Ray tracing helper that caches BLAS per mesh and rebuilds TLAS per frame
|
// Ray tracing helper that caches BLAS per mesh and rebuilds TLAS per frame
|
||||||
// for hybrid/full ray query shadows. See docs/RayTracing.md.
|
// for hybrid/full ray query shadows. See docs/RayTracing.md.
|
||||||
class RayTracingManager {
|
class RayTracingManager {
|
||||||
@@ -22,8 +24,16 @@ public:
|
|||||||
void init(DeviceManager* dev, ResourceManager* res);
|
void init(DeviceManager* dev, ResourceManager* res);
|
||||||
void cleanup();
|
void cleanup();
|
||||||
|
|
||||||
// Build (or get) BLAS for a mesh. Safe to call multiple times.
|
// Queue a BLAS build for a mesh (if not already built or queued) and
|
||||||
AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh);
|
// return the cached handle when available. Safe to call multiple times.
|
||||||
|
// When builds are pending, this may return an empty handle; callers
|
||||||
|
// should treat that as "BLAS not ready yet" and skip ray instances.
|
||||||
|
AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh);
|
||||||
|
|
||||||
|
// Progress asynchronous BLAS builds. Call once per frame after waiting
|
||||||
|
// for the previous frame's GPU fence. max_builds_per_frame controls how
|
||||||
|
// many BLAS are built in this pump to spread work over multiple frames.
|
||||||
|
void pump_blas_builds(uint32_t max_builds_per_frame = 1);
|
||||||
|
|
||||||
// Rebuild TLAS from current draw context; returns TLAS handle (or null if unavailable)
|
// Rebuild TLAS from current draw context; returns TLAS handle (or null if unavailable)
|
||||||
// Destruction of previous TLAS resources is deferred via the provided frame deletion queue
|
// Destruction of previous TLAS resources is deferred via the provided frame deletion queue
|
||||||
@@ -41,12 +51,12 @@ public:
|
|||||||
void removeBLASForMesh(const MeshAsset *mesh);
|
void removeBLASForMesh(const MeshAsset *mesh);
|
||||||
|
|
||||||
private:
|
private:
|
||||||
// function pointers (resolved on init)
|
// function pointers (resolved on init)
|
||||||
PFN_vkCreateAccelerationStructureKHR _vkCreateAccelerationStructureKHR{};
|
PFN_vkCreateAccelerationStructureKHR _vkCreateAccelerationStructureKHR{};
|
||||||
PFN_vkDestroyAccelerationStructureKHR _vkDestroyAccelerationStructureKHR{};
|
PFN_vkDestroyAccelerationStructureKHR _vkDestroyAccelerationStructureKHR{};
|
||||||
PFN_vkGetAccelerationStructureBuildSizesKHR _vkGetAccelerationStructureBuildSizesKHR{};
|
PFN_vkGetAccelerationStructureBuildSizesKHR _vkGetAccelerationStructureBuildSizesKHR{};
|
||||||
PFN_vkCmdBuildAccelerationStructuresKHR _vkCmdBuildAccelerationStructuresKHR{};
|
PFN_vkCmdBuildAccelerationStructuresKHR _vkCmdBuildAccelerationStructuresKHR{};
|
||||||
PFN_vkGetAccelerationStructureDeviceAddressKHR _vkGetAccelerationStructureDeviceAddressKHR{};
|
PFN_vkGetAccelerationStructureDeviceAddressKHR _vkGetAccelerationStructureDeviceAddressKHR{};
|
||||||
|
|
||||||
DeviceManager* _device{nullptr};
|
DeviceManager* _device{nullptr};
|
||||||
ResourceManager* _resources{nullptr};
|
ResourceManager* _resources{nullptr};
|
||||||
@@ -55,6 +65,16 @@ private:
|
|||||||
// when a mesh is destroyed or its GPU buffers are freed, the owning code
|
// when a mesh is destroyed or its GPU buffers are freed, the owning code
|
||||||
// must call removeBLASForMesh/removeBLASForBuffer to drop the cached BLAS.
|
// must call removeBLASForMesh/removeBLASForBuffer to drop the cached BLAS.
|
||||||
std::unordered_map<const MeshAsset*, AccelStructureHandle> _blasByMesh;
|
std::unordered_map<const MeshAsset*, AccelStructureHandle> _blasByMesh;
|
||||||
|
|
||||||
|
struct PendingBlasBuild
|
||||||
|
{
|
||||||
|
const MeshAsset* mesh{nullptr};
|
||||||
|
};
|
||||||
|
|
||||||
|
// Queue of BLAS builds to execute over multiple frames.
|
||||||
|
std::deque<PendingBlasBuild> _blasBuildQueue;
|
||||||
|
// Tracks meshes that have a queued or in-progress BLAS build.
|
||||||
|
std::unordered_set<const MeshAsset*> _blasPendingMeshes;
|
||||||
|
|
||||||
// TLAS + scratch / instance buffer (rebuilt per frame)
|
// TLAS + scratch / instance buffer (rebuilt per frame)
|
||||||
AccelStructureHandle _tlas{};
|
AccelStructureHandle _tlas{};
|
||||||
@@ -68,5 +88,6 @@ private:
|
|||||||
VkDeviceSize _minScratchAlignment{256};
|
VkDeviceSize _minScratchAlignment{256};
|
||||||
|
|
||||||
void ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize requiredScratch, DeletionQueue& frameDQ);
|
void ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize requiredScratch, DeletionQueue& frameDQ);
|
||||||
|
AccelStructureHandle build_blas_for_mesh(const MeshAsset* mesh);
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user