Files
QuaternionEngine/docs/TextureLoading.md

8.3 KiB
Raw Blame History

Texture Loading & Streaming

Overview

  • Streaming cache: src/core/texture_cache.{h,cpp} asynchronously decodes images (stb_image) on a small worker pool (14 threads, clamped by hardware concurrency) and uploads them via ResourceManager with optional mipmaps. For FilePath keys, a sibling <stem>.ktx2 (or direct .ktx2) is preferred over PNG/JPEG. Descriptors registered upfront are patched inplace once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory.
  • Uploads: src/core/vk_resource.{h,cpp} stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps use vkutil::generate_mipmaps(...) and finish in SHADER_READ_ONLY_OPTIMAL.
  • Integration points:
    • Materials: layouts use UPDATE_AFTER_BIND; descriptors can be rewritten after bind.
    • glTF loader: src/scene/vk_loader.cpp builds keys, requests handles, and registers descriptor patches with the cache.
    • Primitives/adhoc: src/core/asset_manager.cpp builds materials and registers texture watches.
    • Visibility: src/render/vk_renderpass_geometry.cpp and src/render/vk_renderpass_transparent.cpp call TextureCache::markSetUsed(...) for sets that are actually drawn.

Data Flow

  • Request
    • Build a TextureCache::TextureKey (FilePath or Bytes), set srgb and mipmapped.
    • Call request(key, sampler) → returns a stable TextureHandle, deduplicated by a 64bit FNV1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XORd with a constant when srgb=true.
    • Register target descriptors via watchBinding(handle, set, binding, sampler, fallbackView).
  • Visibilitygated scheduling
    • pumpLoads(...) looks for entries in Unloaded or Evicted state that were seen recently (now == 0 or now - lastUsed <= 1) and starts at most max_loads_per_pump decodes per call, while enforcing a byte budget for uploads per frame.
    • Render passes mark used sets each frame with markSetUsed(...) (or specific handles via markUsed(...)).
  • Decode
    • FilePath: if the path ends with .ktx2 or a sibling exists, we load via libktx (and transcode to BCn if needed). Otherwise, decode to RGBA8 via stb_image.
    • Bytes: always decode via stb_image (no sibling discovery possible).
  • Admission & Upload
    • Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mipfactor). A perframe byte budget (max_bytes_per_pump) throttles uploads.
    • If a GPU texture budget is set, the cache evicts leastrecentlyused textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff.
    • Raster: ResourceManager::create_image(...) stages a single region, then optionally generates mips on GPU.
    • KTX2: ResourceManager::create_image_compressed(...) allocates an image with the files VkFormat (from libktx) and records one VkBufferImageCopy per mip level (no GPU mip gen). Immediate path transitions to SHADER_READ_ONLY_OPTIMAL; the RenderGraph path transitions after copy when no mip gen.
    • If the device cannot sample the KTX2 format, the cache falls back to raster decode.
    • After upload: state → Resident, descriptors recorded via watchBinding are rewritten to the new image view with the chosen sampler and SHADER_READ_ONLY_OPTIMAL layout. For Bytesbacked keys, compressed source bytes are dropped unless keep_source_bytes is enabled.
  • Eviction & Reload
    • evictToBudget(bytes) rewrites watchers to fallbacks, destroys images, and marks entries Evicted. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.

Runtime UI

  • ImGui → Debug → Textures (see src/core/vk_engine.cpp)
    • Shows: devicelocal budget/usage (from VMA), texture streaming budget (~35% of devicelocal by default), resident MiB, CPU source MiB, counts per state, and a TopN table of consumers.
    • Controls: Loads/Frame, Upload Budget (MiB) (bytebased throttle), Keep Source Bytes, CPU Source Budget (MiB), Max Upload Dimension (progressive downscale cap), and Trim To Budget Now.

Key APIs (src/core/texture_cache.h)

  • TextureHandle request(const TextureKey&, VkSampler)
  • void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)
  • void unwatchSet(VkDescriptorSet) — call before destroying descriptor pools/sets
  • void markSetUsed(VkDescriptorSet, uint32_t frameIndex) and void markUsed(TextureHandle, uint32_t frameIndex)
  • void pumpLoads(ResourceManager&, FrameResources&)
  • void evictToBudget(size_t bytes)
  • Controls: set_max_loads_per_pump, set_keep_source_bytes, set_cpu_source_budget, set_gpu_budget_bytes

Defaults & Budgets

  • Worker threads: 14 decode threads depending on hardware.
  • Loads per pump: default 4.
  • Upload byte budget: default 128 MiB per frame.
  • GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of devicelocal memory (via VMA) and calls set_gpu_budget_bytes(...), then runs evictToBudget(...) and pumpLoads(...) during the frame loop (src/core/vk_engine.cpp).
  • CPU source bytes: default budget 64 MiB; keep_source_bytes defaults to false. Retention only applies to entries created from Bytes keys.

Examples

  • Asset materials (src/core/asset_manager.cpp)
    • Create materials with visible fallbacks (checkerboard/white/flatnormal), then:
      • Build a key from an asset path, request(key, sampler), and watchBinding(handle, materialSet, binding, sampler, fallbackView) for albedo (1), metalrough (2), normal (3).
  • glTF loader (src/scene/vk_loader.cpp)
    • Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls unwatchSet(materialSet) before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.

Implementation Notes

  • Uploads and layouts
    • Deferred uploads: the RG transfer pass transitions UNDEFINED → TRANSFER_DST_OPTIMAL, copies, and either generates mipmaps (finishing in SHADER_READ_ONLY_OPTIMAL) or transitions directly there. No extra transition is needed after mip gen.
  • Descriptor rewrites
    • Material descriptor sets and pools are created with UPDATE_AFTER_BIND flags; patches are applied safely across frames using a DescriptorWriter.
  • Key hashing
    • 64bit FNV1a for dedup. FilePath keys hash PATH:<path>#(sRGB|UNORM). Bytes keys hash the payload and XOR an sRGB tag when requested.
  • Format selection and channel packing
    • TextureKey::channels can be Auto (default), R, RG, or RGBA. The cache chooses VK_FORMAT_R8/R8G8/RGBA8 (sRGB variants when requested) and packs channels on CPU for R/RG to reduce staging + VRAM.
  • Progressive downscale
    • The decode thread downsizes large images by powers of 2 until within Max Upload Dimension, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.

KTX2 specifics

  • Supported: 2D, singleface, singlelayer KTX2. If BasisLZ/UASTC, libktx transcodes to BCn. sRGB/UNORM is honored from the files DFD and can be nudged by request (albedo sRGB, MR/normal UNORM).
  • Not supported: Cube/array/multilayer KTX2 (current code path assumes single layer, 2D).

Limitations / Future Work

  • Linearblit capability check
    • generate_mipmaps always uses VK_FILTER_LINEAR. Add a format/feature check and a fallback path (nearest or compute downsample).
  • Texture formats
    • Raster path: 8bit R/RG/RGBA via stb_image. Compressed path: BCn via .ktx2. Future: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (stbi_loadfR16G16B16A16_SFLOAT).
  • Normalmap mip quality
    • Linear blits reduce normal length; consider a compute renormalization pass.
  • Samplers
    • Anisotropy is currently disabled in SamplerManager; enable when supported and expose a knob.
  • Minor robustness
    • enqueue_decode() derives the handle via pointer arithmetic on _entries. Passing the precomputed index would avoid any future reallocation hazards.

Operational Tips

  • Keep deferred uploads enabled (ResourceManager::set_deferred_uploads(true)) to coalesce copies per frame (engine does this during init).
  • To debug VMA allocations and name images, set VE_VMA_DEBUG=1.