EDIT: BLAS build is now per-frame async

This commit is contained in:
2025-12-08 15:43:06 +09:00
parent f95520dcb1
commit 33fc35ab6c
9 changed files with 416 additions and 68 deletions

View File

@@ -5,8 +5,9 @@ Overview
- Shaders share a common include, `shaders/ibl_common.glsl`, which defines the IBL bindings for descriptor set 3 and helper functions used by deferred, forward, and background passes. - Shaders share a common include, `shaders/ibl_common.glsl`, which defines the IBL bindings for descriptor set 3 and helper functions used by deferred, forward, and background passes.
- The engine currently supports: - The engine currently supports:
- Specular environment from an equirectangular 2D texture with prefiltered mips (`sampler2D iblSpec2D`). - Specular environment from an equirectangular 2D texture with prefiltered mips (`sampler2D iblSpec2D`).
- Diffuse irradiance from 2ndorder SH (9 coefficients baked on the CPU). - Diffuse irradiance from 2nd-order SH (9 coefficients baked on the CPU).
- A 2D BRDF integration LUT used for the splitsum approximation. - A 2D BRDF integration LUT used for the split-sum approximation.
- An optional separate background environment texture (`sampler2D iblBackground2D`); when not provided, the system falls back to using the specular environment for background rendering.
Data Flow Data Flow
- Init: - Init:
@@ -18,49 +19,115 @@ Data Flow
- Tries `ktxutil::load_ktx2_cubemap` first. If successful, uploads via `ResourceManager::create_image_compressed_layers` with `VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT`. - Tries `ktxutil::load_ktx2_cubemap` first. If successful, uploads via `ResourceManager::create_image_compressed_layers` with `VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT`.
- If cubemap loading fails, falls back to 2D `.ktx2` via `ktxutil::load_ktx2_2d` and `create_image_compressed`. The image is treated as equirectangular with prefiltered mips. - If cubemap loading fails, falls back to 2D `.ktx2` via `ktxutil::load_ktx2_2d` and `create_image_compressed`. The image is treated as equirectangular with prefiltered mips.
- When the specular `.ktx2` is HDR (`R16G16B16A16_SFLOAT` or `R32G32B32A32_SFLOAT`) and 2:1 aspect, `IBLManager` computes 9 SH coefficients on the CPU: - When the specular `.ktx2` is HDR (`R16G16B16A16_SFLOAT` or `R32G32B32A32_SFLOAT`) and 2:1 aspect, `IBLManager` computes 9 SH coefficients on the CPU:
- Integrates the environment over the sphere using real SH basis functions (L2) with solidangle weighting. - Integrates the environment over the sphere using real SH basis functions (L2) with solid-angle weighting.
- Applies Lambert band scaling (A0 = π, A1 = 2π/3, A2 = π/4). - Applies Lambert band scaling (A0 = pi, A1 = 2pi/3, A2 = pi/4).
- Uploads the result as `vec4 sh[9]` in a uniform buffer (`_shBuffer`). - Uploads the result as `vec4 sh[9]` in a uniform buffer (`_shBuffer`).
- Diffuse: - Diffuse:
- If `IBLPaths::diffuseCube` is provided and valid, loads it as a cubemap via `load_ktx2_cubemap` + `create_image_compressed_layers`. - If `IBLPaths::diffuseCube` is provided and valid, loads it as a cubemap via `load_ktx2_cubemap` + `create_image_compressed_layers`.
- Current shaders only use the SH buffer for diffuse; the diffuse cubemap is reserved for future variants. - Current shaders only use the SH buffer for diffuse; the diffuse cubemap is reserved for future variants.
- Background:
- If `IBLPaths::background2D` is provided and valid, loads it as a 2D equirectangular `.ktx2` via `load_ktx2_2d` + `create_image_compressed`.
- This allows using a separate, potentially higher-resolution or unfiltered environment for the sky background while using a prefiltered version for specular IBL.
- BRDF LUT: - BRDF LUT:
- Loaded as 2D `.ktx2` via `ktxutil::load_ktx2_2d` and uploaded with `create_image_compressed`. - Loaded as 2D `.ktx2` via `ktxutil::load_ktx2_2d` and uploaded with `create_image_compressed`.
- Fallbacks: - Fallbacks:
- If `diffuseCube` is missing but a specular env exists, `_diff` is aliased to `_spec`. - If `diffuseCube` is missing but a specular env exists, `_diff` is aliased to `_spec`.
- `IBLManager::load_async(const IBLPaths&)` + `IBLManager::pump_async()` (runtime path used by the engine): - If `background2D` is missing but a specular env exists, `_background` is aliased to `_spec`.
- `load_async` runs KTX2 file I/O and SH bake on a worker thread and stores a prepared CPU-side description (`PreparedIBLData`).
- `pump_async` is called on the main thread once per frame (after the previous frame is idle) to:
- Destroy old IBL images/SH via `destroy_images_and_sh()`.
- Create new GPU images with `create_image_compressed(_layers)` and upload the SH buffer.
- This avoids stalls in the main/game loop when switching IBL volumes or loading the default environment at startup.
- `IBLManager::unload()` releases GPU images, the SH buffer, and the descriptor set layout. - `IBLManager::unload()` releases GPU images, the SH buffer, and the descriptor set layout.
- Descriptor layout: - Descriptor layout:
- `IBLManager::ensureLayout()` builds a descriptor set layout (set=3) with: - `IBLManager::ensureLayout()` builds a descriptor set layout (set=3) with:
- binding 0: `COMBINED_IMAGE_SAMPLER` specular environment (2D equirect). - binding 0: `COMBINED_IMAGE_SAMPLER` - specular environment (2D equirect).
- binding 1: `COMBINED_IMAGE_SAMPLER` BRDF LUT 2D. - binding 1: `COMBINED_IMAGE_SAMPLER` - BRDF LUT 2D.
- binding 2: `UNIFORM_BUFFER` SH coefficients (`vec4 sh[9]`). - binding 2: `UNIFORM_BUFFER` - SH coefficients (`vec4 sh[9]`).
- binding 3: `COMBINED_IMAGE_SAMPLER` - background environment (2D equirect, optional).
- Passes request this layout from `EngineContext::ibl` and plug it into their pipeline set layouts: - Passes request this layout from `EngineContext::ibl` and plug it into their pipeline set layouts:
- Background: `vk_renderpass_background.cpp` (set 3 used for env background). - Background: `vk_renderpass_background.cpp` (set 3 used for env background).
- Lighting: `vk_renderpass_lighting.cpp` (deferred lighting pass, set 3). - Lighting: `vk_renderpass_lighting.cpp` (deferred lighting pass, set 3).
- Transparent: `vk_renderpass_transparent.cpp` (forward/transparent materials, set 3). - Transparent: `vk_renderpass_transparent.cpp` (forward/transparent materials, set 3).
Asynchronous Loading
- Overview:
- `IBLManager` provides an asynchronous loading path via `load_async()` + `pump_async()` to avoid blocking the main/game loop during IBL environment switches or initial loading.
- Heavy CPU work (KTX2 file I/O, decompression, SH coefficient baking) runs on a dedicated worker thread.
- GPU resource creation (image uploads, buffer allocation) is deferred to the main thread via `pump_async()`.
- API:
- `bool load_async(const IBLPaths &paths)`:
- Queues an asynchronous IBL load job.
- Returns `false` if the job could not be queued (e.g., context not initialized).
- If called while a previous job is still pending, the new request supersedes the old one (the old result is discarded when ready).
- `struct AsyncResult { bool completed; bool success; }`:
- `completed`: `true` when an async job finished since the last `pump_async()` call.
- `success`: `true` when the finished job successfully produced new GPU IBL resources.
- `AsyncResult pump_async()`:
- Must be called on the main thread, typically once per frame after the previous frame's GPU work is idle.
- If a completed async job is pending:
- Destroys old IBL images and SH buffer via `destroy_images_and_sh()`.
- Creates new GPU images with `create_image_compressed(_layers)` and uploads the SH buffer.
- Returns `AsyncResult` indicating whether a job completed and its success status.
- Internal Architecture:
- `IBLManager::init()` spawns a persistent worker thread that waits on a condition variable.
- When `load_async()` is called:
- The request paths and a unique job ID are stored in `AsyncStateData`.
- The worker thread is signaled via condition variable.
- Any previous pending result is invalidated (superseded by the new job ID).
- Worker thread execution:
- Calls `prepare_ibl_cpu()` to load KTX2 files and bake SH coefficients.
- Stores the prepared data (`PreparedIBLData`) in `AsyncStateData`.
- Marks the result as ready with the corresponding job ID.
- If the job ID no longer matches (superseded), the result is discarded.
- Main thread integration (`pump_async()`):
- Checks if a result is ready.
- If ready, calls `commit_prepared()` to create GPU resources from the prepared CPU data.
- Clears the ready flag and returns the result status.
- Thread Safety:
- All shared state in `AsyncStateData` is protected by a mutex.
- The worker thread only reads request data and writes result data.
- The main thread only reads result data and writes request data.
- GPU resource creation is strictly on the main thread.
- Usage Example:
```cpp
// Queue async IBL load (non-blocking)
iblManager->load_async(IBLPaths{
.specularCube = "assets/ibl/studio_spec.ktx2",
.brdfLut2D = "assets/ibl/brdf_lut.ktx2",
.background2D = "assets/ibl/studio_bg.ktx2"
});
// In main loop, after waiting for previous frame:
auto result = iblManager->pump_async();
if (result.completed) {
if (result.success) {
// New IBL environment is now active
} else {
// Loading failed, handle error (e.g., keep previous IBL)
}
}
```
- Benefits:
- No frame stalls when loading large HDR environment maps.
- Seamless IBL volume transitions (e.g., entering a building with different lighting).
- SH baking (CPU-intensive) happens off the main thread.
- Cleanup:
- `IBLManager::unload()` shuts down the async worker thread (joins) and releases all GPU resources.
- The destructor also calls `shutdown_async()` to ensure clean termination.
Shader Side (`shaders/ibl_common.glsl`) Shader Side (`shaders/ibl_common.glsl`)
- Bindings: - Bindings:
- `layout(set=3, binding=0) uniform sampler2D iblSpec2D;` - `layout(set=3, binding=0) uniform sampler2D iblSpec2D;`
- `layout(set=3, binding=1) uniform sampler2D iblBRDF;` - `layout(set=3, binding=1) uniform sampler2D iblBRDF;`
- `layout(std140, set=3, binding=2) uniform IBL_SH { vec4 sh[9]; } iblSH;` - `layout(std140, set=3, binding=2) uniform IBL_SH { vec4 sh[9]; } iblSH;`
- `layout(set=3, binding=3) uniform sampler2D iblBackground2D;`
- Helpers: - Helpers:
- `vec3 sh_eval_irradiance(vec3 n)`: - `vec3 sh_eval_irradiance(vec3 n)`:
- Evaluates the 9 SH basis functions (L2) at direction `n` using the same real SH basis as the CPU bake. - Evaluates the 9 SH basis functions (L2) at direction `n` using the same real SH basis as the CPU bake.
- Multiplies each basis value by the corresponding `iblSH.sh[i].rgb` coefficient and sums the result. - Multiplies each basis value by the corresponding `iblSH.sh[i].rgb` coefficient and sums the result.
- Coefficients are already convolved with the Lambert kernel on the CPU; the function returns diffuse irradiance directly. - Coefficients are already convolved with the Lambert kernel on the CPU; the function returns diffuse irradiance directly.
- `vec2 dir_to_equirect(vec3 d)`: - `vec2 dir_to_equirect(vec3 d)`:
- Normalizes `d`, computes `(phi, theta)` and returns equirectangular UV in `[0,1]²`. - Normalizes `d`, computes `(phi, theta)` and returns equirectangular UV in `[0,1]^2`.
- Used consistently by background, deferred, and forward pipelines. - Used consistently by background, deferred, and forward pipelines.
- `float ibl_lod_from_roughness(float roughness, float levels)`: - `float ibl_lod_from_roughness(float roughness, float levels)`:
- Computes the mip LOD for specular IBL using `roughness² * (levels - 1)`. - Computes the mip LOD for specular IBL using `roughness^2 * (levels - 1)`.
- This biases midroughness reflections towards blurrier mips and avoids overly sharp reflections. - This biases mid-roughness reflections towards blurrier mips and avoids overly sharp reflections.
Usage in Passes Usage in Passes
- Deferred lighting (`shaders/deferred_lighting.frag` and `shaders/deferred_lighting_nort.frag`): - Deferred lighting (`shaders/deferred_lighting.frag` and `shaders/deferred_lighting_nort.frag`):
@@ -84,24 +151,30 @@ Usage in Passes
- Same include and IBL logic as deferred, applied after direct lighting. - Same include and IBL logic as deferred, applied after direct lighting.
- Uses the same `ibl_lod_from_roughness` helper for LOD selection. - Uses the same `ibl_lod_from_roughness` helper for LOD selection.
- Background (`shaders/background_env.frag`): - Background (`shaders/background_env.frag`):
- Includes `ibl_common.glsl` and uses `dir_to_equirect(worldDir)` + `textureLod(iblSpec2D, uv, 0.0)` to render the environment at LOD 0. - Includes `ibl_common.glsl` and uses `dir_to_equirect(worldDir)` + `textureLod(iblBackground2D, uv, 0.0)` to render the environment at LOD 0.
- When a dedicated background texture is provided via `IBLPaths::background2D`, the background pass renders from `iblBackground2D` which may differ from `iblSpec2D`.
Authoring IBL Assets Authoring IBL Assets
- Specular environment: - Specular environment:
- Preferred: prefiltered HDR cubemap in `.ktx2` (BC6H or `R16G16B16A16_SFLOAT`) with multiple mips. - Preferred: prefiltered HDR cubemap in `.ktx2` (BC6H or `R16G16B16A16_SFLOAT`) with multiple mips.
- Alternative: prefiltered equirectangular 2D `.ktx2` with width = 2 × height and full mip chain. - Alternative: prefiltered equirectangular 2D `.ktx2` with width = 2 x height and full mip chain.
- Make sure the mip chain is generated with a GGX importance sampling tool so the BRDF LUT + mip chain match. - Make sure the mip chain is generated with a GGX importance sampling tool so the BRDF LUT + mip chain match.
- BRDF LUT: - BRDF LUT:
- A standard 2D preintegrated GGX LUT (RG), usually stored as `R8G8_UNORM` or BC5. - A standard 2D preintegrated GGX LUT (RG), usually stored as `R8G8_UNORM` or BC5.
- The LUT is sampled with `(NoV, roughness)` coordinates. - The LUT is sampled with `(NoV, roughness)` coordinates.
- Diffuse: - Diffuse:
- The engine currently uses SH coefficients baked from the specular equirectangular map. If you provide a separate diffuse cubemap, the CPU SH bake still uses the specular HDR; you can adjust this in `IBLManager` if you want SH to come from a different source. - The engine currently uses SH coefficients baked from the specular equirectangular map. If you provide a separate diffuse cubemap, the CPU SH bake still uses the specular HDR; you can adjust this in `IBLManager` if you want SH to come from a different source.
- Background:
- Optional: equirectangular 2D `.ktx2` used exclusively for the sky background pass.
- Useful when you want a sharper or unfiltered environment for the visible sky while using a prefiltered version for specular reflections.
- If not provided, the system falls back to using `specularCube` for background rendering.
Implementation Notes Implementation Notes
- CPU SH bake: - CPU SH bake:
- Implemented in `IBLManager::load` using libktx to access raw HDR pixel data from `.ktx2`. - Implemented in `IBLManager::load` using libktx to access raw HDR pixel data from `.ktx2`.
- Uses a simple nested loop over pixels with solidangle weighting and the same SH basis as `sh_eval_irradiance`. - Uses a simple nested loop over pixels with solid-angle weighting and the same SH basis as `sh_eval_irradiance`.
- Fallbacks: - Fallbacks:
- Lighting and transparent passes create small fallback textures so that the IBL descriptor set is always valid, even when no IBL assets are loaded. - Lighting and transparent passes create small fallback textures so that the IBL descriptor set is always valid, even when no IBL assets are loaded.
- Background pass builds a 1×1×6 black cube as a fallback env. - Background pass builds a 1x1x6 black cube as a fallback env.
- When `background2D` is not provided, `IBLManager::background()` returns the same image as `specular()`.

View File

@@ -12,22 +12,27 @@ Optional subsystem that enables hybrid or full ray traced shadows via Ray Query.
### BLAS Build & Cache ### BLAS Build & Cache
- `AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh)`: - `AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh)`:
- One GAS per `MeshAsset`, keyed by vertex buffer `VkBuffer`. - One GAS per `MeshAsset`, keyed by mesh pointer.
- Populated with one triangle geometry per `GeoSurface`. - Populated with one triangle geometry per `GeoSurface`.
- Built with `VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR` and device-local storage + scratch. - Built with `VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR` and device-local storage + scratch.
- Cached in `_blasByVB` for reuse across frames. - Cached in `_blasByMesh` for reuse across frames.
- Called from `AssetManager::createMesh(...)` and from GLTF loader after mesh upload. - When a BLAS does not exist yet, the mesh is queued for an asynchronous build and an empty handle is returned; callers must treat this as "BLAS not ready" and skip the instance for the current frame (see TLAS section below).
- `pump_blas_builds(max_builds_per_frame)` advances an internal BLAS build queue and is called once per frame from the engine main loop to spread work across multiple frames instead of doing all BLAS builds in a single spike.
### TLAS Rebuild Per Frame ### TLAS Rebuild Per Frame
- `VkAccelerationStructureKHR buildTLASFromDrawContext(const DrawContext& dc)`: - `VkAccelerationStructureKHR buildTLASFromDrawContext(const DrawContext& dc)`:
- Iterates `dc.OpaqueSurfaces` and creates one instance per render object. - Iterates `dc.OpaqueSurfaces` and creates one instance per render object.
- Looks up BLAS by `RenderObject::vertexBuffer`; if missing, instance is skipped. - Looks up BLAS by `RenderObject::sourceMesh` (the `MeshAsset*`); if a BLAS is not cached yet, it calls `getOrBuildBLAS` with a non-owning `shared_ptr` to queue a build and then skips the instance for this frame.
- Uploads instances to a CPU→GPU buffer with device address. - Uploads instances to a CPU→GPU buffer with device address.
- Builds TLAS with `immediate_submit` and stores device address for Ray Query. - Builds TLAS with `immediate_submit` and stores device address for Ray Query.
### Renderer Integration ### Renderer Integration
- In `VulkanEngine::draw()` before building passes: - In `VulkanEngine::draw()` before building passes:
- If RT mode is enabled (`shadowSettings.mode != 0`) and manager exists, TLAS is rebuilt from the latest draw context. - If RT mode is enabled (`shadowSettings.mode != 0`) or ray-traced SSR is enabled (`enableSSR && reflectionMode != 0`), and the manager exists, TLAS is rebuilt from the latest draw context.
- TLAS only references BLAS that are already built; instances whose meshes are still in the BLAS build queue are skipped until their BLAS completes.
- In `VulkanEngine::run()` at the start of each frame (after waiting for the previous frame fence):
- Calls `RayTracingManager::flushPendingDeletes()` to safely destroy any BLAS scheduled for deferred deletion.
- Calls `RayTracingManager::pump_blas_builds(1)` to build at most one queued BLAS per frame (tunable if you want more or fewer builds per frame).
- Lighting pass binds the TLAS at `set=0,binding=1` when available. - Lighting pass binds the TLAS at `set=0,binding=1` when available.
### Modes & UI ### Modes & UI
@@ -36,7 +41,8 @@ Optional subsystem that enables hybrid or full ray traced shadows via Ray Query.
- Mode 2: Ray Query only (no shadow maps). - Mode 2: Ray Query only (no shadow maps).
### Notes & Caveats ### Notes & Caveats
- BLAS cache key is the vertex buffer handle; if you rebuild meshes in-place, BLAS must be invalidated. - BLAS cache key is the `MeshAsset*`. If you destroy or rebuild meshes (or their GPU buffers) you must invalidate associated BLAS via `RayTracingManager::removeBLASForMesh(mesh)` or `removeBLASForBuffer(vertexBuffer)`.
- CPU→GPU memory is used for the TLAS instance buffer to simplify updates. On some platforms, you may prefer staging + device-local. - CPU→GPU memory is used for the TLAS instance buffer to simplify updates. On some platforms, you may prefer staging + device-local.
- Because BLAS builds are asynchronous and capped per frame, newly spawned meshes may take a few frames before they appear in the ray-traced path; this is a deliberate tradeoff to avoid large hitches when many meshes are introduced.
- The RT path requires Vulkan 1.2+ with Ray Query and Acceleration Structure features available. - The RT path requires Vulkan 1.2+ with Ray Query and Acceleration Structure features available.

View File

@@ -111,8 +111,8 @@ Buffer usage → stage/access examples:
### BuiltIn Pass Wiring (Current) ### BuiltIn Pass Wiring (Current)
- Resource uploads (if any) → Background (compute) → Geometry (GBuffer) → Lighting (deferred) → Transparent → CopyToSwapchain → ImGui → PreparePresent. - Resource uploads (if any) → Background (compute) → Geometry (GBuffer) → Lighting (deferred) → SSR → Tonemap+Bloom → FXAA → Transparent → CopyToSwapchain → ImGui → PreparePresent.
- See registrations: `src/core/engine.cpp:321``src/core/engine.cpp:352`. - See registrations in `src/core/engine.cpp`.
### Notes & Limits ### Notes & Limits

View File

@@ -68,10 +68,14 @@ addPass(std::move(myPass));
### Built-in Passes ### Built-in Passes
- Background (compute): Declares `ComputeWrite(drawImage)` and dispatches a selected effect instance. - Background (compute): Declares `ComputeWrite(drawImage)` and dispatches a selected effect instance.
- Geometry (G-Buffer): Declares 3 color attachments and `DepthAttachment`, plus buffer reads for shared index/vertex buffers. - Geometry (G-Buffer): Declares 4 color attachments (position, normal+roughness, albedo+metallic, AO+emissive) and `DepthAttachment`, plus buffer reads for shared index/vertex buffers.
- Lighting (deferred): Reads GBuffer as sampled images and writes to `drawImage`. - Lighting (deferred): Reads GBuffer as sampled images and writes to `drawImage`. Applies AO to indirect lighting and adds emissive contribution.
- Shadows: Cascaded shadow maps render to per-frame transient depth images (four cascades). If Ray Query is enabled, - Shadows: Cascaded shadow maps render to per-frame transient depth images (four cascades). If Ray Query is enabled,
the lighting pass additionally samples TLAS to evaluate shadow visibility according to the selected mode. the lighting pass additionally samples TLAS to evaluate shadow visibility according to the selected mode.
- SSR (Screen Space Reflections): Reads HDR lighting result + G-Buffer and outputs reflections blended with the scene.
Two variants: `ssr.nort` (screen-space only) and `ssr.rt` (SSR + RT fallback using TLAS ray queries).
- Tonemap + Bloom: Converts HDR to LDR with exposure control and optional bloom. Supports Reinhard and ACES tonemapping.
- FXAA: Post-process anti-aliasing on the LDR tonemapped image. Simple 5-tap edge-detection blur.
- Transparent (forward): Writes to `drawImage` with depth test against `depthImage` after lighting. - Transparent (forward): Writes to `drawImage` with depth test against `depthImage` after lighting.
- ImGui: Inserted just before present to draw on the swapchain image. - ImGui: Inserted just before present to draw on the swapchain image.
@@ -90,3 +94,83 @@ addPass(std::move(myPass));
See also: `docs/RenderGraph.md` for the builder API and synchronization details. See also: `docs/RenderGraph.md` for the builder API and synchronization details.
---
## Post-Processing Pipeline
After deferred lighting, the engine runs a post-processing chain: SSR → Tonemap (with Bloom) → FXAA → Present.
### SSR (Screen Space Reflections)
Located in `src/render/passes/ssr.cpp` and `shaders/ssr.frag` / `shaders/ssr_rt.frag`.
**Algorithm:**
- World-space ray marching along the reflection vector `R = reflect(-V, N)`.
- Depth comparison against G-Buffer position to find intersection.
- Fresnel (Schlick) and glossiness-based blending with the base HDR color.
**Parameters (shader constants):**
- `MAX_STEPS = 64` maximum ray march iterations (reduced for rough surfaces).
- `STEP_LENGTH = 0.5` world units per step.
- `MAX_DISTANCE = 50.0` maximum ray travel distance.
- `THICKNESS = 3.0` depth tolerance for hit detection.
**Variants:**
- `ssr.nort` Pure screen-space reflections.
- `ssr.rt` SSR + RT fallback using TLAS ray queries when SSR misses (requires `GL_EXT_ray_query`).
Reflection mode controlled via `sceneData.rtOptions.w`: 0 = SSR only, 1 = SSR + RT fallback, 2 = RT only.
**Inputs (set=1):**
- binding 0: `hdrColor` HDR lighting result.
- binding 1: `posTex` G-Buffer world position (RGBA32F).
- binding 2: `normalTex` G-Buffer normal + roughness.
- binding 3: `albedoTex` G-Buffer albedo + metallic.
---
### Tonemap + Bloom
Located in `src/render/passes/tonemap.cpp` and `shaders/tonemap.frag`.
**Tonemapping modes:**
- `mode = 0` Reinhard: `x / (1 + x)`.
- `mode = 1` ACES (Narkowicz approximation, default).
**Bloom:**
- Simple gather-based bloom computed in HDR space before tonemapping.
- 5×5 kernel (radius=2) samples neighbors; pixels exceeding `bloomThreshold` contribute weighted by their brightness.
- Accumulated bloom is multiplied by `bloomIntensity` and added to the HDR color.
**Runtime parameters:**
- `exposure` (default 1.0) exposure multiplier.
- `bloomEnabled` (default true) toggle bloom.
- `bloomThreshold` (default 1.0) brightness threshold for bloom contribution.
- `bloomIntensity` (default 0.7) bloom blend strength.
**Output:** LDR image (`VK_FORMAT_R8G8B8A8_UNORM`) with gamma correction (γ = 2.2).
---
### FXAA (Fast Approximate Anti-Aliasing)
Located in `src/render/passes/fxaa.cpp` and `shaders/fxaa.frag`.
**Algorithm:**
- Luma-based edge detection using a 5-tap cross pattern (N, S, E, W, center).
- If luma range exceeds threshold, apply a simple box blur; otherwise pass through.
**Runtime parameters:**
- `enabled` (default true) toggle FXAA.
- `edge_threshold` (default 0.125) relative contrast threshold.
- `edge_threshold_min` (default 0.0312) absolute minimum threshold.
**Push constants:**
```glsl
layout(push_constant) uniform Push {
float inverse_width;
float inverse_height;
float edge_threshold;
float edge_threshold_min;
} pc;
```

View File

@@ -23,9 +23,36 @@
- `vec4 metal_rough_factors; // x = metallic, y = roughness` - `vec4 metal_rough_factors; // x = metallic, y = roughness`
- `vec4 extra[14]; // extra[0].x = normalScale` - `vec4 extra[14]; // extra[0].x = normalScale`
- Material texture bindings (set=1): - Material texture bindings (set=1):
- binding=1 `colorTex`, binding=2 `metalRoughTex`, binding=3 `normalMap`. - binding=1 `colorTex`, binding=2 `metalRoughTex`, binding=3 `normalMap`, binding=4 `occlusionTex`, binding=5 `emissiveTex`.
- Adding a pipeline (graphics) - Adding a pipeline (graphics)
- Fill `GraphicsPipelineCreateInfo` with shader paths, descriptor set layouts, optional push constants, and a `configure(PipelineBuilder&)` callback to set topology, raster, depth/blend, and attachment formats. - Fill `GraphicsPipelineCreateInfo` with shader paths, descriptor set layouts, optional push constants, and a `configure(PipelineBuilder&)` callback to set topology, raster, depth/blend, and attachment formats.
- Register with `PipelineManager::createGraphicsPipeline(name, info)`. Retrieve via `getGraphics` or `getMaterialPipeline`. - Register with `PipelineManager::createGraphicsPipeline(name, info)`. Retrieve via `getGraphics` or `getMaterialPipeline`.
Shader File Reference
| File | Stage | Description |
|------|-------|-------------|
| `fullscreen.vert` | Vertex | Fullscreen triangle for post-process passes |
| `mesh.vert` | Vertex | Standard mesh vertex transform |
| `mesh.frag` | Fragment | Forward shading (deprecated, use gbuffer) |
| `gbuffer.frag` | Fragment | G-Buffer output (position, normal, albedo, AO+emissive) |
| `deferred_lighting.frag` | Fragment | Deferred lighting with RT shadows + IBL |
| `deferred_lighting_nort.frag` | Fragment | Deferred lighting without RT |
| `shadow.vert/.frag` | Vertex/Fragment | Shadow map generation |
| `ssr.frag` | Fragment | Screen-space reflections (ray march) |
| `ssr_rt.frag` | Fragment | SSR + RT fallback (ray query) |
| `tonemap.frag` | Fragment | HDR→LDR tonemapping + bloom |
| `fxaa.frag` | Fragment | Fast approximate anti-aliasing |
| `sky.comp` | Compute | Procedural sky background |
| `gradient_color.comp` | Compute | Gradient background |
| `background_env.frag` | Fragment | Environment map background |
GLSL Includes
| File | Purpose |
|------|---------|
| `input_structures.glsl` | SceneData UBO, material bindings, light structs |
| `lighting_common.glsl` | BRDF evaluation, point light helpers |
| `ibl_common.glsl` | IBL split-sum, SH irradiance |

View File

@@ -1,20 +1,27 @@
Materials and Textures Overview (PBR) Materials and Textures Overview (PBR)
Current state (as of Nov 1, 2025)
- PBR textures bound per material (set=1): - PBR textures bound per material (set=1):
- binding=0: GLTFMaterialData (UBO) - binding=0: GLTFMaterialData (UBO)
- binding=1: `colorTex` (albedo/base color) — sRGB - binding=1: `colorTex` (albedo/base color) — sRGB
- binding=2: `metalRoughTex` (G=roughness, B=metallic) — UNORM - binding=2: `metalRoughTex` (G=roughness, B=metallic) — UNORM
- binding=3: `normalMap` (tangent-space normal, UNORM) - binding=3: `normalMap` (tangent-space normal, UNORM)
- binding=4: `occlusionTex` (ambient occlusion, R channel) — UNORM
- binding=5: `emissiveTex` (emissive RGB) — sRGB
- GBuffer writes worldspace normals. Tangentspace normal maps are decoded with TBN using a signcorrect bitangent (B = sign * cross(N, T)). - GBuffer writes worldspace normals. Tangentspace normal maps are decoded with TBN using a signcorrect bitangent (B = sign * cross(N, T)).
- Numeric fallbacks via `MaterialConstants` (CPU) / `GLTFMaterialData` (GPU): - Numeric fallbacks via `MaterialConstants` (CPU) / `GLTFMaterialData` (GPU):
- `colorFactors` (RGBA). Defaults to 1 if zero. - `colorFactors` (RGBA). Defaults to 1 if zero.
- `metal_rough_factors` (X=metallic, Y=roughness). Roughness is clamped to ≥ 0.04 in shaders. - `metal_rough_factors` (X=metallic, Y=roughness). Roughness is clamped to ≥ 0.04 in shaders.
- `extra[0].x` = `normalScale` (scalar, default 1.0). Multiplies the XY of decoded normal. - `extra[0].x` = `normalScale` (scalar, default 1.0). Multiplies the XY of decoded normal.
- `extra[0].y` = `aoStrength` (scalar, 01). Controls AO influence.
- `extra[0].z` = `hasAO` (flag, 1 = use AO texture, 0 = skip).
- `extra[1].rgb` = `emissiveFactor` (vec3). Multiplied with emissive texture.
- `extra[2].x` = `alphaCutoff` (scalar). For MASK alpha mode.
- Defaults when a texture is missing: - Defaults when a texture is missing:
- Albedo → checkerboard error texture - Albedo → checkerboard error texture
- MR → white (no effect) - MR → white (no effect)
- Normal → 1×1 flat normal (0.5, 0.5, 1.0) - Normal → 1×1 flat normal (0.5, 0.5, 1.0)
- Occlusion → 1×1 white (AO = 1.0, no occlusion)
- Emissive → 1×1 black (no emission)
Implications for primitive meshes Implications for primitive meshes
- Primitives can use: - Primitives can use:
@@ -41,3 +48,14 @@ Usage Examples
- Adjust normal strength per material: set `material.constants.extra[0].x` (CPU) or `normalTexture.scale` in glTF. - Adjust normal strength per material: set `material.constants.extra[0].x` (CPU) or `normalTexture.scale` in glTF.
- Primitive with PBR textures: - Primitive with PBR textures:
- Set `MeshMaterialDesc::Kind::Textured` and fill `albedoPath`, `metalRoughPath`, and `normalPath`. - Set `MeshMaterialDesc::Kind::Textured` and fill `albedoPath`, `metalRoughPath`, and `normalPath`.
G-Buffer Outputs
- The geometry pass (`gbuffer.frag`) writes 4 render targets:
- `outPos` (location 0): World position (xyz) + valid flag (w=1).
- `outNorm` (location 1): World normal (xyz) + roughness (w).
- `outAlbedo` (location 2): Albedo (rgb) + metallic (a).
- `outExtra` (location 3): AO (x) + emissive (yzw).
- Deferred lighting reads these and computes:
```glsl
vec3 color = direct + indirect * ao + emissive;
```

View File

@@ -1185,6 +1185,10 @@ void VulkanEngine::run()
// Safe to destroy any BLAS queued for deletion now that the previous frame is idle. // Safe to destroy any BLAS queued for deletion now that the previous frame is idle.
if (_rayManager) { _rayManager->flushPendingDeletes(); } if (_rayManager) { _rayManager->flushPendingDeletes(); }
// Progress queued BLAS builds over multiple frames to avoid large
// stalls when many meshes require ray tracing structures at once.
if (_rayManager) { _rayManager->pump_blas_builds(1); }
// Commit any completed async IBL load now that the GPU is idle. // Commit any completed async IBL load now that the GPU is idle.
if (_iblManager && _pendingIBLRequest.active) if (_iblManager && _pendingIBLRequest.active)
{ {

View File

@@ -35,6 +35,8 @@ void RayTracingManager::cleanup()
VkDevice dv = _device->device(); VkDevice dv = _device->device();
// Destroy any deferred BLAS first // Destroy any deferred BLAS first
flushPendingDeletes(); flushPendingDeletes();
_blasBuildQueue.clear();
_blasPendingMeshes.clear();
if (_tlas.handle) if (_tlas.handle)
{ {
@@ -100,21 +102,53 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
{ {
if (!mesh) return {}; if (!mesh) return {};
// If uploads are deferred, ensure any pending mesh buffer uploads are flushed const MeshAsset* key = mesh.get();
// before building a BLAS that reads from those GPU buffers.
if (_resources && _resources->deferred_uploads() && _resources->has_pending_uploads())
{
fmt::println("[RT] getOrBuildBLAS: flushing pending resource uploads before BLAS build");
_resources->process_queued_uploads_immediate();
}
if (auto it = _blasByMesh.find(mesh.get()); it != _blasByMesh.end()) // If a BLAS is already cached (even an empty sentinel), return it directly.
if (auto it = _blasByMesh.find(key); it != _blasByMesh.end())
{ {
fmt::println("[RT] getOrBuildBLAS reuse by mesh mesh='{}' handle={}", mesh->name, fmt::println("[RT] getOrBuildBLAS reuse by mesh mesh='{}' handle={}", mesh->name,
static_cast<const void *>(it->second.handle)); static_cast<const void *>(it->second.handle));
return it->second; return it->second;
} }
// If a build is already queued or in progress for this mesh, do not enqueue
// another job; simply report "not ready yet".
if (_blasPendingMeshes.find(key) != _blasPendingMeshes.end())
{
fmt::println("[RT] getOrBuildBLAS pending build mesh='{}'", mesh->name);
return {};
}
// If uploads are deferred, ensure any pending mesh buffer uploads are flushed
// before queuing a BLAS that will read from those GPU buffers.
if (_resources && _resources->deferred_uploads() && _resources->has_pending_uploads())
{
fmt::println("[RT] getOrBuildBLAS: flushing pending resource uploads before queuing BLAS build");
_resources->process_queued_uploads_immediate();
}
fmt::println("[RT] getOrBuildBLAS queue build mesh='{}'", mesh->name);
_blasPendingMeshes.insert(key);
_blasBuildQueue.push_back(PendingBlasBuild{key});
// BLAS will be built asynchronously by pump_blas_builds(); until then,
// callers should treat the empty handle as "not ready yet".
return {};
}
AccelStructureHandle RayTracingManager::build_blas_for_mesh(const MeshAsset *mesh)
{
if (!mesh || !_resources || !_device) return {};
// If uploads are deferred, ensure any pending mesh buffer uploads are flushed
// before building a BLAS that reads from those GPU buffers.
if (_resources->deferred_uploads() && _resources->has_pending_uploads())
{
fmt::println("[RT] build_blas_for_mesh: flushing pending resource uploads before BLAS build");
_resources->process_queued_uploads_immediate();
}
// Build BLAS with one geometry per surface (skip empty primitives) // Build BLAS with one geometry per surface (skip empty primitives)
std::vector<VkAccelerationStructureGeometryKHR> geoms; std::vector<VkAccelerationStructureGeometryKHR> geoms;
std::vector<VkAccelerationStructureBuildRangeInfoKHR> ranges; std::vector<VkAccelerationStructureBuildRangeInfoKHR> ranges;
@@ -126,7 +160,7 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
const uint32_t vcount = mesh->meshBuffers.vertexCount; const uint32_t vcount = mesh->meshBuffers.vertexCount;
VkBuffer vb = mesh->meshBuffers.vertexBuffer.buffer; VkBuffer vb = mesh->meshBuffers.vertexBuffer.buffer;
fmt::println("[RT] getOrBuildBLAS build mesh='{}' surfaces={} vcount={}", mesh->name, fmt::println("[RT] build_blas_for_mesh mesh='{}' surfaces={} vcount={}", mesh->name,
mesh->surfaces.size(), vcount); mesh->surfaces.size(), vcount);
for (const auto &s: mesh->surfaces) for (const auto &s: mesh->surfaces)
@@ -162,9 +196,11 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
ranges.push_back(r); ranges.push_back(r);
} }
// If no valid geometries, skip BLAS build // If no valid geometries, record an empty sentinel to avoid re-queuing.
if (geoms.empty()) if (geoms.empty())
{ {
fmt::println("[RT] build_blas_for_mesh: mesh='{}' has no primitives; skipping BLAS", mesh->name);
_blasByMesh.emplace(mesh, AccelStructureHandle{});
return {}; return {};
} }
@@ -231,10 +267,50 @@ AccelStructureHandle RayTracingManager::getOrBuildBLAS(const std::shared_ptr<Mes
dai.accelerationStructure = blas.handle; dai.accelerationStructure = blas.handle;
blas.deviceAddress = _vkGetAccelerationStructureDeviceAddressKHR(_device->device(), &dai); blas.deviceAddress = _vkGetAccelerationStructureDeviceAddressKHR(_device->device(), &dai);
_blasByMesh.emplace(mesh.get(), blas); _blasByMesh.emplace(mesh, blas);
return blas; return blas;
} }
void RayTracingManager::pump_blas_builds(uint32_t max_builds_per_frame)
{
if (max_builds_per_frame == 0 || _blasBuildQueue.empty())
{
return;
}
uint32_t built = 0;
while (built < max_builds_per_frame && !_blasBuildQueue.empty())
{
PendingBlasBuild job = _blasBuildQueue.front();
_blasBuildQueue.pop_front();
const MeshAsset* mesh = job.mesh;
if (mesh)
{
// Drop the pending flag for this mesh now; if the build ends up
// with an empty handle, getOrBuildBLAS will see the cache entry
// (including the empty sentinel) and avoid re-queuing.
_blasPendingMeshes.erase(mesh);
// Skip if a BLAS was already created meanwhile.
if (_blasByMesh.find(mesh) == _blasByMesh.end())
{
AccelStructureHandle blas = build_blas_for_mesh(mesh);
if (blas.handle)
{
++built;
}
}
}
else
{
// Mesh pointer is null; just drop the pending flag.
_blasPendingMeshes.erase(mesh);
}
}
}
void RayTracingManager::ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize /*requiredScratch*/, DeletionQueue& dq) void RayTracingManager::ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize /*requiredScratch*/, DeletionQueue& dq)
{ {
// Recreate TLAS storage if size grows. Defer destruction to the frame DQ to // Recreate TLAS storage if size grows. Defer destruction to the frame DQ to
@@ -296,7 +372,10 @@ VkAccelerationStructureKHR RayTracingManager::buildTLASFromDrawContext(const Dra
} }
else else
{ {
// Try to build on the fly if the mesh is still alive (non-owning shared_ptr wrapper). // Queue an async BLAS build if the mesh is still alive
// (non-owning shared_ptr wrapper). The BLAS will be built
// over subsequent frames by pump_blas_builds(); until then,
// this instance will be skipped.
std::shared_ptr<MeshAsset> nonOwning(const_cast<MeshAsset *>(r.sourceMesh), [](MeshAsset *) {}); std::shared_ptr<MeshAsset> nonOwning(const_cast<MeshAsset *>(r.sourceMesh), [](MeshAsset *) {});
blas = getOrBuildBLAS(nonOwning); blas = getOrBuildBLAS(nonOwning);
} }
@@ -423,6 +502,24 @@ void RayTracingManager::removeBLASForBuffer(VkBuffer vertexBuffer)
{ {
if (!vertexBuffer) return; if (!vertexBuffer) return;
// Drop any queued builds referencing this vertex buffer.
if (!_blasBuildQueue.empty())
{
for (auto itQ = _blasBuildQueue.begin(); itQ != _blasBuildQueue.end(); )
{
const MeshAsset* mesh = itQ->mesh;
if (mesh && mesh->meshBuffers.vertexBuffer.buffer == vertexBuffer)
{
_blasPendingMeshes.erase(mesh);
itQ = _blasBuildQueue.erase(itQ);
}
else
{
++itQ;
}
}
}
// Find any mesh whose vertex buffer matches and evict its BLAS. // Find any mesh whose vertex buffer matches and evict its BLAS.
for (auto it = _blasByMesh.begin(); it != _blasByMesh.end(); ) for (auto it = _blasByMesh.begin(); it != _blasByMesh.end(); )
{ {
@@ -443,6 +540,24 @@ void RayTracingManager::removeBLASForBuffer(VkBuffer vertexBuffer)
void RayTracingManager::removeBLASForMesh(const MeshAsset *mesh) void RayTracingManager::removeBLASForMesh(const MeshAsset *mesh)
{ {
if (!mesh) return; if (!mesh) return;
// Drop any queued builds for this mesh.
if (!_blasBuildQueue.empty())
{
for (auto itQ = _blasBuildQueue.begin(); itQ != _blasBuildQueue.end(); )
{
if (itQ->mesh == mesh)
{
itQ = _blasBuildQueue.erase(itQ);
}
else
{
++itQ;
}
}
}
_blasPendingMeshes.erase(mesh);
auto it = _blasByMesh.find(mesh); auto it = _blasByMesh.find(mesh);
if (it == _blasByMesh.end()) return; if (it == _blasByMesh.end()) return;

View File

@@ -1,20 +1,22 @@
#pragma once #pragma once
#include <core/types.h> #include <core/types.h>
#include <unordered_map> #include <unordered_map>
#include <vector> #include <unordered_set>
#include <memory> #include <vector>
#include <deque>
class DeviceManager; #include <memory>
class ResourceManager;
struct DrawContext; class DeviceManager;
struct MeshAsset; class ResourceManager;
struct DrawContext;
struct AccelStructureHandle { struct MeshAsset;
VkAccelerationStructureKHR handle{VK_NULL_HANDLE};
AllocatedBuffer storage{}; // buffer that backs the AS struct AccelStructureHandle {
VkDeviceAddress deviceAddress{0}; VkAccelerationStructureKHR handle{VK_NULL_HANDLE};
}; AllocatedBuffer storage{}; // buffer that backs the AS
VkDeviceAddress deviceAddress{0};
};
// Ray tracing helper that caches BLAS per mesh and rebuilds TLAS per frame // Ray tracing helper that caches BLAS per mesh and rebuilds TLAS per frame
// for hybrid/full ray query shadows. See docs/RayTracing.md. // for hybrid/full ray query shadows. See docs/RayTracing.md.
class RayTracingManager { class RayTracingManager {
@@ -22,8 +24,16 @@ public:
void init(DeviceManager* dev, ResourceManager* res); void init(DeviceManager* dev, ResourceManager* res);
void cleanup(); void cleanup();
// Build (or get) BLAS for a mesh. Safe to call multiple times. // Queue a BLAS build for a mesh (if not already built or queued) and
AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh); // return the cached handle when available. Safe to call multiple times.
// When builds are pending, this may return an empty handle; callers
// should treat that as "BLAS not ready yet" and skip ray instances.
AccelStructureHandle getOrBuildBLAS(const std::shared_ptr<MeshAsset>& mesh);
// Progress asynchronous BLAS builds. Call once per frame after waiting
// for the previous frame's GPU fence. max_builds_per_frame controls how
// many BLAS are built in this pump to spread work over multiple frames.
void pump_blas_builds(uint32_t max_builds_per_frame = 1);
// Rebuild TLAS from current draw context; returns TLAS handle (or null if unavailable) // Rebuild TLAS from current draw context; returns TLAS handle (or null if unavailable)
// Destruction of previous TLAS resources is deferred via the provided frame deletion queue // Destruction of previous TLAS resources is deferred via the provided frame deletion queue
@@ -41,12 +51,12 @@ public:
void removeBLASForMesh(const MeshAsset *mesh); void removeBLASForMesh(const MeshAsset *mesh);
private: private:
// function pointers (resolved on init) // function pointers (resolved on init)
PFN_vkCreateAccelerationStructureKHR _vkCreateAccelerationStructureKHR{}; PFN_vkCreateAccelerationStructureKHR _vkCreateAccelerationStructureKHR{};
PFN_vkDestroyAccelerationStructureKHR _vkDestroyAccelerationStructureKHR{}; PFN_vkDestroyAccelerationStructureKHR _vkDestroyAccelerationStructureKHR{};
PFN_vkGetAccelerationStructureBuildSizesKHR _vkGetAccelerationStructureBuildSizesKHR{}; PFN_vkGetAccelerationStructureBuildSizesKHR _vkGetAccelerationStructureBuildSizesKHR{};
PFN_vkCmdBuildAccelerationStructuresKHR _vkCmdBuildAccelerationStructuresKHR{}; PFN_vkCmdBuildAccelerationStructuresKHR _vkCmdBuildAccelerationStructuresKHR{};
PFN_vkGetAccelerationStructureDeviceAddressKHR _vkGetAccelerationStructureDeviceAddressKHR{}; PFN_vkGetAccelerationStructureDeviceAddressKHR _vkGetAccelerationStructureDeviceAddressKHR{};
DeviceManager* _device{nullptr}; DeviceManager* _device{nullptr};
ResourceManager* _resources{nullptr}; ResourceManager* _resources{nullptr};
@@ -55,6 +65,16 @@ private:
// when a mesh is destroyed or its GPU buffers are freed, the owning code // when a mesh is destroyed or its GPU buffers are freed, the owning code
// must call removeBLASForMesh/removeBLASForBuffer to drop the cached BLAS. // must call removeBLASForMesh/removeBLASForBuffer to drop the cached BLAS.
std::unordered_map<const MeshAsset*, AccelStructureHandle> _blasByMesh; std::unordered_map<const MeshAsset*, AccelStructureHandle> _blasByMesh;
struct PendingBlasBuild
{
const MeshAsset* mesh{nullptr};
};
// Queue of BLAS builds to execute over multiple frames.
std::deque<PendingBlasBuild> _blasBuildQueue;
// Tracks meshes that have a queued or in-progress BLAS build.
std::unordered_set<const MeshAsset*> _blasPendingMeshes;
// TLAS + scratch / instance buffer (rebuilt per frame) // TLAS + scratch / instance buffer (rebuilt per frame)
AccelStructureHandle _tlas{}; AccelStructureHandle _tlas{};
@@ -68,5 +88,6 @@ private:
VkDeviceSize _minScratchAlignment{256}; VkDeviceSize _minScratchAlignment{256};
void ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize requiredScratch, DeletionQueue& frameDQ); void ensure_tlas_storage(VkDeviceSize requiredASSize, VkDeviceSize requiredScratch, DeletionQueue& frameDQ);
AccelStructureHandle build_blas_for_mesh(const MeshAsset* mesh);
}; };