Files
QuaternionEngine/docs/TextureLoading.md
2025-12-07 00:26:55 +09:00

120 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Texture Loading & Streaming
Overview
- Streaming cache: `src/core/assets/texture_cache.{h,cpp}` asynchronously decodes images (stb_image) on a small worker pool (14 threads, clamped by hardware concurrency) and uploads them via `ResourceManager` with optional mipmaps. For FilePath keys, a sibling `<stem>.ktx2` (or direct `.ktx2`) is preferred over PNG/JPEG. Descriptors registered upfront are patched inplace once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory.
- Uploads: `src/core/frame/resource.{h,cpp}` stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps use `vkutil::generate_mipmaps(...)` and finish in `SHADER_READ_ONLY_OPTIMAL`.
- Integration points:
- Materials: layouts use `UPDATE_AFTER_BIND`; descriptors can be rewritten after bind.
- glTF loader: `src/scene/vk_loader.cpp` builds keys, requests handles, and registers descriptor patches with the cache.
- Primitives/adhoc: `src/core/assets/manager.cpp` builds materials and registers texture watches.
- Visibility: `src/render/passes/geometry.cpp` and `src/render/passes/transparent.cpp` call `TextureCache::markSetUsed(...)` for sets that are actually drawn.
- IBL: highdynamicrange environment textures are typically loaded directly as `.ktx2` via `IBLManager` instead of the generic streaming cache. See “ImageBased Lighting (IBL)” below.
Data Flow
- Request
- Build a `TextureCache::TextureKey` (FilePath or Bytes), set `srgb` and `mipmapped`.
- Call `request(key, sampler)` → returns a stable `TextureHandle`, deduplicated by a 64bit FNV1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XORd with a constant when `srgb=true`.
- Register target descriptors via `watchBinding(handle, set, binding, sampler, fallbackView)`.
- Visibilitygated scheduling
- `pumpLoads(...)` looks for entries in `Unloaded` or `Evicted` state that were seen recently (`now == 0` or `now - lastUsed <= 1`) and starts at most `max_loads_per_pump` decodes per call, while enforcing a byte budget for uploads per frame.
- Render passes mark used sets each frame with `markSetUsed(...)` (or specific handles via `markUsed(...)`).
- Decode
- FilePath: if the path ends with `.ktx2` or a sibling exists, we load via libktx (and transcode to BCn if needed). Otherwise, decode to RGBA8 via stb_image.
- Bytes: always decode via stb_image (no sibling discovery possible).
- Admission & Upload
- Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mipfactor). A perframe byte budget (`max_bytes_per_pump`) throttles uploads.
- If a GPU texture budget is set, the cache evicts leastrecentlyused textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff.
- Raster: `ResourceManager::create_image(...)` stages a single region, then optionally generates mips on GPU.
- KTX2: `ResourceManager::create_image_compressed(...)` allocates an image with the files `VkFormat` (from libktx) and records one `VkBufferImageCopy` per mip level (no GPU mip gen). Immediate path transitions to `SHADER_READ_ONLY_OPTIMAL`; the RenderGraph path transitions after copy when no mip gen.
- If the device cannot sample the KTX2 format, the cache falls back to raster decode.
- After upload: state → `Resident`, descriptors recorded via `watchBinding` are rewritten to the new image view with the chosen sampler and `SHADER_READ_ONLY_OPTIMAL` layout. For Bytesbacked keys, compressed source bytes are dropped unless `keep_source_bytes` is enabled.
- Eviction & Reload
- `evictToBudget(bytes)` rewrites watchers to fallbacks, destroys images, and marks entries `Evicted`. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.
Runtime UI
- ImGui → Debug → Textures (see `src/core/engine.cpp`)
- Shows: devicelocal budget/usage (from VMA), texture streaming budget (~35% of devicelocal by default), resident MiB, CPU source MiB, counts per state, and a TopN table of consumers.
- Controls: `Loads/Frame`, `Upload Budget (MiB)` (bytebased throttle), `Keep Source Bytes`, `CPU Source Budget (MiB)`, `Max Upload Dimension` (progressive downscale cap), and `Trim To Budget Now`.
Key APIs (src/core/assets/texture_cache.h)
- `TextureHandle request(const TextureKey&, VkSampler)`
- `void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)`
- `void unwatchSet(VkDescriptorSet)` — call before destroying descriptor pools/sets
- `void markSetUsed(VkDescriptorSet, uint32_t frameIndex)` and `void markUsed(TextureHandle, uint32_t frameIndex)`
- `void pumpLoads(ResourceManager&, FrameResources&)`
- `void evictToBudget(size_t bytes)`
- Controls: `set_max_loads_per_pump`, `set_keep_source_bytes`, `set_cpu_source_budget`, `set_gpu_budget_bytes`
Defaults & Budgets
- Worker threads: 14 decode threads depending on hardware.
- Loads per pump: default 4.
- Upload byte budget: default 128 MiB per frame.
- GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of devicelocal memory (via VMA) and calls `set_gpu_budget_bytes(...)`, then runs `evictToBudget(...)` and `pumpLoads(...)` during the frame loop (`src/core/engine.cpp`).
- CPU source bytes: default budget 64 MiB; `keep_source_bytes` defaults to false. Retention only applies to entries created from Bytes keys.
Examples
- Asset materials (`src/core/assets/manager.cpp`)
- Create materials with visible fallbacks (checkerboard/white/flatnormal), then:
- Build a key from an asset path, `request(key, sampler)`, and `watchBinding(handle, materialSet, binding, sampler, fallbackView)` for albedo (1), metalrough (2), normal (3).
- glTF loader (`src/scene/vk_loader.cpp`)
- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls `unwatchSet(materialSet)` before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.
Implementation Notes
- Uploads and layouts
- Deferred uploads: the RG transfer pass transitions `UNDEFINED → TRANSFER_DST_OPTIMAL`, copies, and either generates mipmaps (finishing in `SHADER_READ_ONLY_OPTIMAL`) or transitions directly there. No extra transition is needed after mip gen.
- Descriptor rewrites
- Material descriptor sets and pools are created with `UPDATE_AFTER_BIND` flags; patches are applied safely across frames using a `DescriptorWriter`.
- Key hashing
- 64bit FNV1a for dedup. FilePath keys hash `PATH:<path>#(sRGB|UNORM)`. Bytes keys hash the payload and XOR an sRGB tag when requested.
- Format selection and channel packing
- `TextureKey::channels` can be `Auto` (default), `R`, `RG`, or `RGBA`. The cache chooses `VK_FORMAT_R8/R8G8/RGBA8` (sRGB variants when requested) and packs channels on CPU for `R`/`RG` to reduce staging + VRAM.
- Progressive downscale
- The decode thread downsizes large images by powers of 2 until within `Max Upload Dimension`, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.
KTX2 specifics
- Supported: 2D, singleface, singlelayer KTX2. If BasisLZ/UASTC, libktx transcodes to BCn. sRGB/UNORM is honored from the files DFD and can be nudged by request (albedo sRGB, MR/normal UNORM).
- Not supported: Cube/array/multilayer KTX2 in the generic cache path (it assumes singlelayer, 2D). Cubemap KTX2 for IBL is loaded via `IBLManager` (see below).
Limitations / Future Work
- Linearblit capability check
- `vkutil::generate_mipmaps` / `generate_mipmaps_levels` always use `VK_FILTER_LINEAR` for blits without checking `VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT`. Add a performat capability check and a fallback path (nearest or compute downsample) for formats that do not support linear filtering (especially some compressed formats).
- Texture formats
- Raster path: limited to 8bit R/RG/RGBA via `stbi_load`. KTX2 path in `TextureCache::worker_loop` currently accepts only BCn/BC6H formats and rejects other VkFormats returned by libktx (e.g., uncompressed `R16G16B16A16_SFLOAT`). Future work: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (`stbi_loadf``R16G16B16A16_SFLOAT`) so HDR albedo/lighting data can stream through the generic cache (today HDR IBL uses the separate `IBLManager` path).
- Normalmap mip quality
- Normal maps share the same linear blit pipeline as color textures; no renormalization pass runs after mip generation. Consider a compute or fragment pass to renormalize normal map mips (or a dedicated normalaware downsample) to improve shading at grazing angles and distant LODs.
- Samplers
- Anisotropy is currently disabled in `SamplerManager` (`anisotropyEnable = VK_FALSE`). Enable it when the feature is present, expose a knob in the Debug UI, and consider permaterial/pertexture anisotropy settings.
- Minor robustness
- `enqueue_decode()` computes the handle from the entry pointer (`&e - _entries.data()`) and passes it to worker threads. This is safe as long as `_entries` is not resized during enqueue, but storing the index explicitly when the entry is created (in `request()`) would make the relationship clearer and robust against future refactors.
Operational Tips
- Keep deferred uploads enabled (`ResourceManager::set_deferred_uploads(true)`) to coalesce copies per frame (engine does this during init).
- To debug VMA allocations and name images, set `VE_VMA_DEBUG=1`.
ImageBased Lighting (IBL) Textures
- Manager: `src/core/assets/ibl_manager.{h,cpp}` owns IBL GPU resources and the shared descriptor set layout for set=3.
- Inputs (`IBLPaths`):
- `specularCube`: preferred is a GPUready `.ktx2` (BC6H or `R16G16B16A16_SFLOAT`) containing either a cubemap or an equirectangular 2D env with prefiltered mips.
- `diffuseCube`: optional `.ktx2` cubemap for diffuse irradiance. If missing, diffuse IBL falls back to SH only.
- `brdfLut2D`: `.ktx2` 2D RG LUT (e.g., `VK_FORMAT_R8G8_UNORM` or BC5).
- Loading:
- Specular:
- If `specularCube` is a cubemap `.ktx2`, `IBLManager` uses `ktxutil::load_ktx2_cubemap` and uploads via `ResourceManager::create_image_compressed_layers`, preserving the files format and mip chain.
- If cubemap load fails, it falls back to 2D `.ktx2` via `ktxutil::load_ktx2_2d` + `ResourceManager::create_image_compressed`. The image is treated as equirectangular with prefiltered mips and sampled with explicit LOD in shaders.
- If the format is float HDR (`R16G16B16A16_SFLOAT` or `R32G32B32A32_SFLOAT`) and the aspect ratio is 2:1, `IBLManager` additionally computes 2ndorder SH coefficients (9×`vec3`) on a worker thread and uploads them to a UBO (`_shBuffer`) when `pump_async()` is called on the main thread.
- Diffuse (optional):
- If `diffuseCube` is provided and valid, it is uploaded as a cubemap using `create_image_compressed_layers`. Current shaders use the SH buffer for diffuse; this cubemap can be wired into a future path if you want to sample it directly.
- BRDF LUT:
- `brdfLut2D` is loaded as 2D `.ktx2` via `ktxutil::load_ktx2_2d` and uploaded with `create_image_compressed`.
- Fallbacks:
- `LightingPass` and `TransparentPass` create tiny 1×1 UNORM textures (grey 2D for env, RG for BRDF LUT) so shaders can safely sample IBL bindings even when IBL assets are not loaded.
- Descriptor layout & bindings:
- `IBLManager::ensureLayout()` creates a descriptor set layout for set=3 with:
- binding 0: `COMBINED_IMAGE_SAMPLER` — specular env (2D equirect with mips or cubemap sampled via 2D path).
- binding 1: `COMBINED_IMAGE_SAMPLER` — BRDF LUT 2D.
- binding 2: `UNIFORM_BUFFER` — SH coefficients (`vec4 sh[9]`, RGB in `.xyz`).
- Render passes that use IBL fetch this layout from `EngineContext::ibl` and allocate perframe sets:
- `passes/lighting.cpp`: deferred lighting (set=3).
- `passes/transparent.cpp`: forward/transparent PBR materials (set=3).
- `passes/background.cpp`: environment background (set=3; only binding 0 is used in the shader).