Files
QuaternionEngine/docs/TextureLoading.md

92 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Texture Loading & Streaming
Overview
- Streaming cache: `src/core/texture_cache.{h,cpp}` asynchronously decodes images (stb_image) on a small worker pool (14 threads, clamped by hardware concurrency) and uploads them via `ResourceManager` with optional mipmaps. For FilePath keys, a sibling `<stem>.ktx2` (or direct `.ktx2`) is preferred over PNG/JPEG. Descriptors registered upfront are patched inplace once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory.
- Uploads: `src/core/vk_resource.{h,cpp}` stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps use `vkutil::generate_mipmaps(...)` and finish in `SHADER_READ_ONLY_OPTIMAL`.
- Integration points:
- Materials: layouts use `UPDATE_AFTER_BIND`; descriptors can be rewritten after bind.
- glTF loader: `src/scene/vk_loader.cpp` builds keys, requests handles, and registers descriptor patches with the cache.
- Primitives/adhoc: `src/core/asset_manager.cpp` builds materials and registers texture watches.
- Visibility: `src/render/vk_renderpass_geometry.cpp` and `src/render/vk_renderpass_transparent.cpp` call `TextureCache::markSetUsed(...)` for sets that are actually drawn.
Data Flow
- Request
- Build a `TextureCache::TextureKey` (FilePath or Bytes), set `srgb` and `mipmapped`.
- Call `request(key, sampler)` → returns a stable `TextureHandle`, deduplicated by a 64bit FNV1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XORd with a constant when `srgb=true`.
- Register target descriptors via `watchBinding(handle, set, binding, sampler, fallbackView)`.
- Visibilitygated scheduling
- `pumpLoads(...)` looks for entries in `Unloaded` or `Evicted` state that were seen recently (`now == 0` or `now - lastUsed <= 1`) and starts at most `max_loads_per_pump` decodes per call, while enforcing a byte budget for uploads per frame.
- Render passes mark used sets each frame with `markSetUsed(...)` (or specific handles via `markUsed(...)`).
- Decode
- FilePath: if the path ends with `.ktx2` or a sibling exists, we load via libktx (and transcode to BCn if needed). Otherwise, decode to RGBA8 via stb_image.
- Bytes: always decode via stb_image (no sibling discovery possible).
- Admission & Upload
- Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mipfactor). A perframe byte budget (`max_bytes_per_pump`) throttles uploads.
- If a GPU texture budget is set, the cache evicts leastrecentlyused textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff.
- Raster: `ResourceManager::create_image(...)` stages a single region, then optionally generates mips on GPU.
- KTX2: `ResourceManager::create_image_compressed(...)` allocates an image with the files `VkFormat` (from libktx) and records one `VkBufferImageCopy` per mip level (no GPU mip gen). Immediate path transitions to `SHADER_READ_ONLY_OPTIMAL`; the RenderGraph path transitions after copy when no mip gen.
- If the device cannot sample the KTX2 format, the cache falls back to raster decode.
- After upload: state → `Resident`, descriptors recorded via `watchBinding` are rewritten to the new image view with the chosen sampler and `SHADER_READ_ONLY_OPTIMAL` layout. For Bytesbacked keys, compressed source bytes are dropped unless `keep_source_bytes` is enabled.
- Eviction & Reload
- `evictToBudget(bytes)` rewrites watchers to fallbacks, destroys images, and marks entries `Evicted`. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.
Runtime UI
- ImGui → Debug → Textures (see `src/core/vk_engine.cpp`)
- Shows: devicelocal budget/usage (from VMA), texture streaming budget (~35% of devicelocal by default), resident MiB, CPU source MiB, counts per state, and a TopN table of consumers.
- Controls: `Loads/Frame`, `Upload Budget (MiB)` (bytebased throttle), `Keep Source Bytes`, `CPU Source Budget (MiB)`, `Max Upload Dimension` (progressive downscale cap), and `Trim To Budget Now`.
Key APIs (src/core/texture_cache.h)
- `TextureHandle request(const TextureKey&, VkSampler)`
- `void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)`
- `void unwatchSet(VkDescriptorSet)` — call before destroying descriptor pools/sets
- `void markSetUsed(VkDescriptorSet, uint32_t frameIndex)` and `void markUsed(TextureHandle, uint32_t frameIndex)`
- `void pumpLoads(ResourceManager&, FrameResources&)`
- `void evictToBudget(size_t bytes)`
- Controls: `set_max_loads_per_pump`, `set_keep_source_bytes`, `set_cpu_source_budget`, `set_gpu_budget_bytes`
Defaults & Budgets
- Worker threads: 14 decode threads depending on hardware.
- Loads per pump: default 4.
- Upload byte budget: default 128 MiB per frame.
- GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of devicelocal memory (via VMA) and calls `set_gpu_budget_bytes(...)`, then runs `evictToBudget(...)` and `pumpLoads(...)` during the frame loop (`src/core/vk_engine.cpp`).
- CPU source bytes: default budget 64 MiB; `keep_source_bytes` defaults to false. Retention only applies to entries created from Bytes keys.
Examples
- Asset materials (`src/core/asset_manager.cpp`)
- Create materials with visible fallbacks (checkerboard/white/flatnormal), then:
- Build a key from an asset path, `request(key, sampler)`, and `watchBinding(handle, materialSet, binding, sampler, fallbackView)` for albedo (1), metalrough (2), normal (3).
- glTF loader (`src/scene/vk_loader.cpp`)
- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls `unwatchSet(materialSet)` before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.
Implementation Notes
- Uploads and layouts
- Deferred uploads: the RG transfer pass transitions `UNDEFINED → TRANSFER_DST_OPTIMAL`, copies, and either generates mipmaps (finishing in `SHADER_READ_ONLY_OPTIMAL`) or transitions directly there. No extra transition is needed after mip gen.
- Descriptor rewrites
- Material descriptor sets and pools are created with `UPDATE_AFTER_BIND` flags; patches are applied safely across frames using a `DescriptorWriter`.
- Key hashing
- 64bit FNV1a for dedup. FilePath keys hash `PATH:<path>#(sRGB|UNORM)`. Bytes keys hash the payload and XOR an sRGB tag when requested.
- Format selection and channel packing
- `TextureKey::channels` can be `Auto` (default), `R`, `RG`, or `RGBA`. The cache chooses `VK_FORMAT_R8/R8G8/RGBA8` (sRGB variants when requested) and packs channels on CPU for `R`/`RG` to reduce staging + VRAM.
- Progressive downscale
- The decode thread downsizes large images by powers of 2 until within `Max Upload Dimension`, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.
KTX2 specifics
- Supported: 2D, singleface, singlelayer KTX2. If BasisLZ/UASTC, libktx transcodes to BCn. sRGB/UNORM is honored from the files DFD and can be nudged by request (albedo sRGB, MR/normal UNORM).
- Not supported: Cube/array/multilayer KTX2 (current code path assumes single layer, 2D).
Limitations / Future Work
- Linearblit capability check
- `generate_mipmaps` always uses `VK_FILTER_LINEAR`. Add a format/feature check and a fallback path (nearest or compute downsample).
- Texture formats
- Raster path: 8bit R/RG/RGBA via stb_image. Compressed path: BCn via `.ktx2`. Future: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (`stbi_loadf``R16G16B16A16_SFLOAT`).
- Normalmap mip quality
- Linear blits reduce normal length; consider a compute renormalization pass.
- Samplers
- Anisotropy is currently disabled in `SamplerManager`; enable when supported and expose a knob.
- Minor robustness
- `enqueue_decode()` derives the handle via pointer arithmetic on `_entries`. Passing the precomputed index would avoid any future reallocation hazards.
Operational Tips
- Keep deferred uploads enabled (`ResourceManager::set_deferred_uploads(true)`) to coalesce copies per frame (engine does this during init).
- To debug VMA allocations and name images, set `VE_VMA_DEBUG=1`.