8.3 KiB
8.3 KiB
Texture Loading & Streaming
Overview
- Streaming cache:
src/core/texture_cache.{h,cpp}asynchronously decodes images (stb_image) on a small worker pool (1–4 threads, clamped by hardware concurrency) and uploads them viaResourceManagerwith optional mipmaps. For FilePath keys, a sibling<stem>.ktx2(or direct.ktx2) is preferred over PNG/JPEG. Descriptors registered up‑front are patched in‑place once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory. - Uploads:
src/core/vk_resource.{h,cpp}stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps usevkutil::generate_mipmaps(...)and finish inSHADER_READ_ONLY_OPTIMAL. - Integration points:
- Materials: layouts use
UPDATE_AFTER_BIND; descriptors can be rewritten after bind. - glTF loader:
src/scene/vk_loader.cppbuilds keys, requests handles, and registers descriptor patches with the cache. - Primitives/adhoc:
src/core/asset_manager.cppbuilds materials and registers texture watches. - Visibility:
src/render/vk_renderpass_geometry.cppandsrc/render/vk_renderpass_transparent.cppcallTextureCache::markSetUsed(...)for sets that are actually drawn.
- Materials: layouts use
Data Flow
- Request
- Build a
TextureCache::TextureKey(FilePath or Bytes), setsrgbandmipmapped. - Call
request(key, sampler)→ returns a stableTextureHandle, deduplicated by a 64‑bit FNV‑1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XOR’d with a constant whensrgb=true. - Register target descriptors via
watchBinding(handle, set, binding, sampler, fallbackView).
- Build a
- Visibility‑gated scheduling
pumpLoads(...)looks for entries inUnloadedorEvictedstate that were seen recently (now == 0ornow - lastUsed <= 1) and starts at mostmax_loads_per_pumpdecodes per call, while enforcing a byte budget for uploads per frame.- Render passes mark used sets each frame with
markSetUsed(...)(or specific handles viamarkUsed(...)).
- Decode
- FilePath: if the path ends with
.ktx2or a sibling exists, we load via libktx (and transcode to BCn if needed). Otherwise, decode to RGBA8 via stb_image. - Bytes: always decode via stb_image (no sibling discovery possible).
- FilePath: if the path ends with
- Admission & Upload
- Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mip‑factor). A per‑frame byte budget (
max_bytes_per_pump) throttles uploads. - If a GPU texture budget is set, the cache evicts least‑recently‑used textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff.
- Raster:
ResourceManager::create_image(...)stages a single region, then optionally generates mips on GPU. - KTX2:
ResourceManager::create_image_compressed(...)allocates an image with the file’sVkFormat(from libktx) and records oneVkBufferImageCopyper mip level (no GPU mip gen). Immediate path transitions toSHADER_READ_ONLY_OPTIMAL; the RenderGraph path transitions after copy when no mip gen. - If the device cannot sample the KTX2 format, the cache falls back to raster decode.
- After upload: state →
Resident, descriptors recorded viawatchBindingare rewritten to the new image view with the chosen sampler andSHADER_READ_ONLY_OPTIMALlayout. For Bytes‑backed keys, compressed source bytes are dropped unlesskeep_source_bytesis enabled.
- Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mip‑factor). A per‑frame byte budget (
- Eviction & Reload
evictToBudget(bytes)rewrites watchers to fallbacks, destroys images, and marks entriesEvicted. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.
Runtime UI
- ImGui → Debug → Textures (see
src/core/vk_engine.cpp)- Shows: device‑local budget/usage (from VMA), texture streaming budget (~35% of device‑local by default), resident MiB, CPU source MiB, counts per state, and a Top‑N table of consumers.
- Controls:
Loads/Frame,Upload Budget (MiB)(byte‑based throttle),Keep Source Bytes,CPU Source Budget (MiB),Max Upload Dimension(progressive downscale cap), andTrim To Budget Now.
Key APIs (src/core/texture_cache.h)
TextureHandle request(const TextureKey&, VkSampler)void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)void unwatchSet(VkDescriptorSet)— call before destroying descriptor pools/setsvoid markSetUsed(VkDescriptorSet, uint32_t frameIndex)andvoid markUsed(TextureHandle, uint32_t frameIndex)void pumpLoads(ResourceManager&, FrameResources&)void evictToBudget(size_t bytes)- Controls:
set_max_loads_per_pump,set_keep_source_bytes,set_cpu_source_budget,set_gpu_budget_bytes
Defaults & Budgets
- Worker threads: 1–4 decode threads depending on hardware.
- Loads per pump: default 4.
- Upload byte budget: default 128 MiB per frame.
- GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of device‑local memory (via VMA) and calls
set_gpu_budget_bytes(...), then runsevictToBudget(...)andpumpLoads(...)during the frame loop (src/core/vk_engine.cpp). - CPU source bytes: default budget 64 MiB;
keep_source_bytesdefaults to false. Retention only applies to entries created from Bytes keys.
Examples
- Asset materials (
src/core/asset_manager.cpp)- Create materials with visible fallbacks (checkerboard/white/flat‑normal), then:
- Build a key from an asset path,
request(key, sampler), andwatchBinding(handle, materialSet, binding, sampler, fallbackView)for albedo (1), metal‑rough (2), normal (3).
- Build a key from an asset path,
- Create materials with visible fallbacks (checkerboard/white/flat‑normal), then:
- glTF loader (
src/scene/vk_loader.cpp)- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls
unwatchSet(materialSet)before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.
- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls
Implementation Notes
- Uploads and layouts
- Deferred uploads: the RG transfer pass transitions
UNDEFINED → TRANSFER_DST_OPTIMAL, copies, and either generates mipmaps (finishing inSHADER_READ_ONLY_OPTIMAL) or transitions directly there. No extra transition is needed after mip gen.
- Deferred uploads: the RG transfer pass transitions
- Descriptor rewrites
- Material descriptor sets and pools are created with
UPDATE_AFTER_BINDflags; patches are applied safely across frames using aDescriptorWriter.
- Material descriptor sets and pools are created with
- Key hashing
- 64‑bit FNV‑1a for dedup. FilePath keys hash
PATH:<path>#(sRGB|UNORM). Bytes keys hash the payload and XOR an sRGB tag when requested.
- 64‑bit FNV‑1a for dedup. FilePath keys hash
- Format selection and channel packing
TextureKey::channelscan beAuto(default),R,RG, orRGBA. The cache choosesVK_FORMAT_R8/R8G8/RGBA8(sRGB variants when requested) and packs channels on CPU forR/RGto reduce staging + VRAM.
- Progressive downscale
- The decode thread downsizes large images by powers of 2 until within
Max Upload Dimension, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.
- The decode thread downsizes large images by powers of 2 until within
KTX2 specifics
- Supported: 2D, single‑face, single‑layer KTX2. If BasisLZ/UASTC, libktx transcodes to BCn. sRGB/UNORM is honored from the file’s DFD and can be nudged by request (albedo sRGB, MR/normal UNORM).
- Not supported: Cube/array/multilayer KTX2 (current code path assumes single layer, 2D).
Limitations / Future Work
- Linear‑blit capability check
generate_mipmapsalways usesVK_FILTER_LINEAR. Add a format/feature check and a fallback path (nearest or compute downsample).
- Texture formats
- Raster path: 8‑bit R/RG/RGBA via stb_image. Compressed path: BCn via
.ktx2. Future: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (stbi_loadf→R16G16B16A16_SFLOAT).
- Raster path: 8‑bit R/RG/RGBA via stb_image. Compressed path: BCn via
- Normal‑map mip quality
- Linear blits reduce normal length; consider a compute renormalization pass.
- Samplers
- Anisotropy is currently disabled in
SamplerManager; enable when supported and expose a knob.
- Anisotropy is currently disabled in
- Minor robustness
enqueue_decode()derives the handle via pointer arithmetic on_entries. Passing the precomputed index would avoid any future reallocation hazards.
Operational Tips
- Keep deferred uploads enabled (
ResourceManager::set_deferred_uploads(true)) to coalesce copies per frame (engine does this during init). - To debug VMA allocations and name images, set
VE_VMA_DEBUG=1.