Files
QuaternionEngine/docs/TextureLoading.md
2025-12-07 00:26:55 +09:00

12 KiB
Raw Permalink Blame History

Texture Loading & Streaming

Overview

  • Streaming cache: src/core/assets/texture_cache.{h,cpp} asynchronously decodes images (stb_image) on a small worker pool (14 threads, clamped by hardware concurrency) and uploads them via ResourceManager with optional mipmaps. For FilePath keys, a sibling <stem>.ktx2 (or direct .ktx2) is preferred over PNG/JPEG. Descriptors registered upfront are patched inplace once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory.
  • Uploads: src/core/frame/resource.{h,cpp} stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps use vkutil::generate_mipmaps(...) and finish in SHADER_READ_ONLY_OPTIMAL.
  • Integration points:
    • Materials: layouts use UPDATE_AFTER_BIND; descriptors can be rewritten after bind.
    • glTF loader: src/scene/vk_loader.cpp builds keys, requests handles, and registers descriptor patches with the cache.
    • Primitives/adhoc: src/core/assets/manager.cpp builds materials and registers texture watches.
    • Visibility: src/render/passes/geometry.cpp and src/render/passes/transparent.cpp call TextureCache::markSetUsed(...) for sets that are actually drawn.
  • IBL: highdynamicrange environment textures are typically loaded directly as .ktx2 via IBLManager instead of the generic streaming cache. See “ImageBased Lighting (IBL)” below.

Data Flow

  • Request
    • Build a TextureCache::TextureKey (FilePath or Bytes), set srgb and mipmapped.
    • Call request(key, sampler) → returns a stable TextureHandle, deduplicated by a 64bit FNV1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XORd with a constant when srgb=true.
    • Register target descriptors via watchBinding(handle, set, binding, sampler, fallbackView).
  • Visibilitygated scheduling
    • pumpLoads(...) looks for entries in Unloaded or Evicted state that were seen recently (now == 0 or now - lastUsed <= 1) and starts at most max_loads_per_pump decodes per call, while enforcing a byte budget for uploads per frame.
    • Render passes mark used sets each frame with markSetUsed(...) (or specific handles via markUsed(...)).
  • Decode
    • FilePath: if the path ends with .ktx2 or a sibling exists, we load via libktx (and transcode to BCn if needed). Otherwise, decode to RGBA8 via stb_image.
    • Bytes: always decode via stb_image (no sibling discovery possible).
  • Admission & Upload
    • Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mipfactor). A perframe byte budget (max_bytes_per_pump) throttles uploads.
    • If a GPU texture budget is set, the cache evicts leastrecentlyused textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff.
    • Raster: ResourceManager::create_image(...) stages a single region, then optionally generates mips on GPU.
    • KTX2: ResourceManager::create_image_compressed(...) allocates an image with the files VkFormat (from libktx) and records one VkBufferImageCopy per mip level (no GPU mip gen). Immediate path transitions to SHADER_READ_ONLY_OPTIMAL; the RenderGraph path transitions after copy when no mip gen.
    • If the device cannot sample the KTX2 format, the cache falls back to raster decode.
    • After upload: state → Resident, descriptors recorded via watchBinding are rewritten to the new image view with the chosen sampler and SHADER_READ_ONLY_OPTIMAL layout. For Bytesbacked keys, compressed source bytes are dropped unless keep_source_bytes is enabled.
  • Eviction & Reload
    • evictToBudget(bytes) rewrites watchers to fallbacks, destroys images, and marks entries Evicted. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.

Runtime UI

  • ImGui → Debug → Textures (see src/core/engine.cpp)
    • Shows: devicelocal budget/usage (from VMA), texture streaming budget (~35% of devicelocal by default), resident MiB, CPU source MiB, counts per state, and a TopN table of consumers.
    • Controls: Loads/Frame, Upload Budget (MiB) (bytebased throttle), Keep Source Bytes, CPU Source Budget (MiB), Max Upload Dimension (progressive downscale cap), and Trim To Budget Now.

Key APIs (src/core/assets/texture_cache.h)

  • TextureHandle request(const TextureKey&, VkSampler)
  • void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)
  • void unwatchSet(VkDescriptorSet) — call before destroying descriptor pools/sets
  • void markSetUsed(VkDescriptorSet, uint32_t frameIndex) and void markUsed(TextureHandle, uint32_t frameIndex)
  • void pumpLoads(ResourceManager&, FrameResources&)
  • void evictToBudget(size_t bytes)
  • Controls: set_max_loads_per_pump, set_keep_source_bytes, set_cpu_source_budget, set_gpu_budget_bytes

Defaults & Budgets

  • Worker threads: 14 decode threads depending on hardware.
  • Loads per pump: default 4.
  • Upload byte budget: default 128 MiB per frame.
  • GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of devicelocal memory (via VMA) and calls set_gpu_budget_bytes(...), then runs evictToBudget(...) and pumpLoads(...) during the frame loop (src/core/engine.cpp).
  • CPU source bytes: default budget 64 MiB; keep_source_bytes defaults to false. Retention only applies to entries created from Bytes keys.

Examples

  • Asset materials (src/core/assets/manager.cpp)
    • Create materials with visible fallbacks (checkerboard/white/flatnormal), then:
      • Build a key from an asset path, request(key, sampler), and watchBinding(handle, materialSet, binding, sampler, fallbackView) for albedo (1), metalrough (2), normal (3).
  • glTF loader (src/scene/vk_loader.cpp)
    • Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls unwatchSet(materialSet) before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.

Implementation Notes

  • Uploads and layouts
    • Deferred uploads: the RG transfer pass transitions UNDEFINED → TRANSFER_DST_OPTIMAL, copies, and either generates mipmaps (finishing in SHADER_READ_ONLY_OPTIMAL) or transitions directly there. No extra transition is needed after mip gen.
  • Descriptor rewrites
    • Material descriptor sets and pools are created with UPDATE_AFTER_BIND flags; patches are applied safely across frames using a DescriptorWriter.
  • Key hashing
    • 64bit FNV1a for dedup. FilePath keys hash PATH:<path>#(sRGB|UNORM). Bytes keys hash the payload and XOR an sRGB tag when requested.
  • Format selection and channel packing
    • TextureKey::channels can be Auto (default), R, RG, or RGBA. The cache chooses VK_FORMAT_R8/R8G8/RGBA8 (sRGB variants when requested) and packs channels on CPU for R/RG to reduce staging + VRAM.
  • Progressive downscale
    • The decode thread downsizes large images by powers of 2 until within Max Upload Dimension, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.

KTX2 specifics

  • Supported: 2D, singleface, singlelayer KTX2. If BasisLZ/UASTC, libktx transcodes to BCn. sRGB/UNORM is honored from the files DFD and can be nudged by request (albedo sRGB, MR/normal UNORM).
  • Not supported: Cube/array/multilayer KTX2 in the generic cache path (it assumes singlelayer, 2D). Cubemap KTX2 for IBL is loaded via IBLManager (see below).

Limitations / Future Work

  • Linearblit capability check
    • vkutil::generate_mipmaps / generate_mipmaps_levels always use VK_FILTER_LINEAR for blits without checking VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT. Add a performat capability check and a fallback path (nearest or compute downsample) for formats that do not support linear filtering (especially some compressed formats).
  • Texture formats
    • Raster path: limited to 8bit R/RG/RGBA via stbi_load. KTX2 path in TextureCache::worker_loop currently accepts only BCn/BC6H formats and rejects other VkFormats returned by libktx (e.g., uncompressed R16G16B16A16_SFLOAT). Future work: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (stbi_loadfR16G16B16A16_SFLOAT) so HDR albedo/lighting data can stream through the generic cache (today HDR IBL uses the separate IBLManager path).
  • Normalmap mip quality
    • Normal maps share the same linear blit pipeline as color textures; no renormalization pass runs after mip generation. Consider a compute or fragment pass to renormalize normal map mips (or a dedicated normalaware downsample) to improve shading at grazing angles and distant LODs.
  • Samplers
    • Anisotropy is currently disabled in SamplerManager (anisotropyEnable = VK_FALSE). Enable it when the feature is present, expose a knob in the Debug UI, and consider permaterial/pertexture anisotropy settings.
  • Minor robustness
    • enqueue_decode() computes the handle from the entry pointer (&e - _entries.data()) and passes it to worker threads. This is safe as long as _entries is not resized during enqueue, but storing the index explicitly when the entry is created (in request()) would make the relationship clearer and robust against future refactors.

Operational Tips

  • Keep deferred uploads enabled (ResourceManager::set_deferred_uploads(true)) to coalesce copies per frame (engine does this during init).
  • To debug VMA allocations and name images, set VE_VMA_DEBUG=1.

ImageBased Lighting (IBL) Textures

  • Manager: src/core/assets/ibl_manager.{h,cpp} owns IBL GPU resources and the shared descriptor set layout for set=3.
  • Inputs (IBLPaths):
    • specularCube: preferred is a GPUready .ktx2 (BC6H or R16G16B16A16_SFLOAT) containing either a cubemap or an equirectangular 2D env with prefiltered mips.
    • diffuseCube: optional .ktx2 cubemap for diffuse irradiance. If missing, diffuse IBL falls back to SH only.
    • brdfLut2D: .ktx2 2D RG LUT (e.g., VK_FORMAT_R8G8_UNORM or BC5).
  • Loading:
    • Specular:
      • If specularCube is a cubemap .ktx2, IBLManager uses ktxutil::load_ktx2_cubemap and uploads via ResourceManager::create_image_compressed_layers, preserving the files format and mip chain.
      • If cubemap load fails, it falls back to 2D .ktx2 via ktxutil::load_ktx2_2d + ResourceManager::create_image_compressed. The image is treated as equirectangular with prefiltered mips and sampled with explicit LOD in shaders.
      • If the format is float HDR (R16G16B16A16_SFLOAT or R32G32B32A32_SFLOAT) and the aspect ratio is 2:1, IBLManager additionally computes 2ndorder SH coefficients (9×vec3) on a worker thread and uploads them to a UBO (_shBuffer) when pump_async() is called on the main thread.
    • Diffuse (optional):
      • If diffuseCube is provided and valid, it is uploaded as a cubemap using create_image_compressed_layers. Current shaders use the SH buffer for diffuse; this cubemap can be wired into a future path if you want to sample it directly.
    • BRDF LUT:
      • brdfLut2D is loaded as 2D .ktx2 via ktxutil::load_ktx2_2d and uploaded with create_image_compressed.
    • Fallbacks:
      • LightingPass and TransparentPass create tiny 1×1 UNORM textures (grey 2D for env, RG for BRDF LUT) so shaders can safely sample IBL bindings even when IBL assets are not loaded.
  • Descriptor layout & bindings:
    • IBLManager::ensureLayout() creates a descriptor set layout for set=3 with:
      • binding 0: COMBINED_IMAGE_SAMPLER — specular env (2D equirect with mips or cubemap sampled via 2D path).
      • binding 1: COMBINED_IMAGE_SAMPLER — BRDF LUT 2D.
      • binding 2: UNIFORM_BUFFER — SH coefficients (vec4 sh[9], RGB in .xyz).
    • Render passes that use IBL fetch this layout from EngineContext::ibl and allocate perframe sets:
      • passes/lighting.cpp: deferred lighting (set=3).
      • passes/transparent.cpp: forward/transparent PBR materials (set=3).
      • passes/background.cpp: environment background (set=3; only binding 0 is used in the shader).