7.6 KiB
7.6 KiB
Texture Loading & Streaming
Overview
- Streaming cache:
src/core/texture_cache.{h,cpp}asynchronously decodes images (stb_image) on a small worker pool (1–4 threads, clamped by hardware concurrency) and uploads them viaResourceManagerwith optional mipmaps. Descriptors registered up‑front are patched in‑place once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory. - Uploads:
src/core/vk_resource.{h,cpp}stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps usevkutil::generate_mipmaps(...)and finish inSHADER_READ_ONLY_OPTIMAL. - Integration points:
- Materials: layouts use
UPDATE_AFTER_BIND; descriptors can be rewritten after bind. - glTF loader:
src/scene/vk_loader.cppbuilds keys, requests handles, and registers descriptor patches with the cache. - Primitives/adhoc:
src/core/asset_manager.cppbuilds materials and registers texture watches. - Visibility:
src/render/vk_renderpass_geometry.cppandsrc/render/vk_renderpass_transparent.cppcallTextureCache::markSetUsed(...)for sets that are actually drawn.
- Materials: layouts use
Data Flow
- Request
- Build a
TextureCache::TextureKey(FilePath or Bytes), setsrgbandmipmapped. - Call
request(key, sampler)→ returns a stableTextureHandle, deduplicated by a 64‑bit FNV‑1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XOR’d with a constant whensrgb=true. - Register target descriptors via
watchBinding(handle, set, binding, sampler, fallbackView).
- Build a
- Visibility‑gated scheduling
pumpLoads(...)looks for entries inUnloadedorEvictedstate that were seen recently (now == 0ornow - lastUsed <= 1) and starts at mostmax_loads_per_pumpdecodes per call, while enforcing a byte budget for uploads per frame.- Render passes mark used sets each frame with
markSetUsed(...)(or specific handles viamarkUsed(...)).
- Decode
- Worker threads decode to RGBA8 with stb_image (
stbi_load/stbi_load_from_memory). Results are queued for the main thread. -- Admission & Upload - Before upload, an expected resident size is computed from chosen format (R/RG/RGBA) and mip count (full chain or clamped). A per‑frame byte budget (
max_bytes_per_pump) throttles the total amount uploaded each pump. - If a GPU texture budget is set, the cache tries to free space by evicting least‑recently‑used textures not used this frame. If it still cannot fit, the decode is deferred (kept in the ready queue) or dropped with backoff if VRAM is tight.
- Uploads are created via
ResourceManager::create_image(...), which now supports an explicit mip count. Deferred upload paths generate exactly the requested number of mips. - After upload: state →
Resident, descriptors recorded viawatchBindingare rewritten to the new image view with the chosen sampler andSHADER_READ_ONLY_OPTIMALlayout. For Bytes‑backed keys, compressed source bytes are dropped unlesskeep_source_bytesis enabled.
- Worker threads decode to RGBA8 with stb_image (
- Eviction & Reload
evictToBudget(bytes)rewrites watchers to fallbacks, destroys images, and marks entriesEvicted. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.
Runtime UI
- ImGui → Debug → Textures (see
src/core/vk_engine.cpp)- Shows: device‑local budget/usage (from VMA), texture streaming budget (~35% of device‑local by default), resident MiB, CPU source MiB, counts per state, and a Top‑N table of consumers.
- Controls:
Loads/Frame,Upload Budget (MiB)(byte‑based throttle),Keep Source Bytes,CPU Source Budget (MiB),Max Upload Dimension(progressive downscale cap), andTrim To Budget Now.
Key APIs (src/core/texture_cache.h)
TextureHandle request(const TextureKey&, VkSampler)void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)void unwatchSet(VkDescriptorSet)— call before destroying descriptor pools/setsvoid markSetUsed(VkDescriptorSet, uint32_t frameIndex)andvoid markUsed(TextureHandle, uint32_t frameIndex)void pumpLoads(ResourceManager&, FrameResources&)void evictToBudget(size_t bytes)- Controls:
set_max_loads_per_pump,set_keep_source_bytes,set_cpu_source_budget,set_gpu_budget_bytes
Defaults & Budgets
- Worker threads: 1–4 decode threads depending on hardware.
- Loads per pump: default 4.
- Upload byte budget: default 128 MiB per frame.
- GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of device‑local memory (via VMA) and calls
set_gpu_budget_bytes(...), then runsevictToBudget(...)andpumpLoads(...)during the frame loop (src/core/vk_engine.cpp). - CPU source bytes: default budget 64 MiB;
keep_source_bytesdefaults to false. Retention only applies to entries created from Bytes keys.
Examples
- Asset materials (
src/core/asset_manager.cpp)- Create materials with visible fallbacks (checkerboard/white/flat‑normal), then:
- Build a key from an asset path,
request(key, sampler), andwatchBinding(handle, materialSet, binding, sampler, fallbackView)for albedo (1), metal‑rough (2), normal (3).
- Build a key from an asset path,
- Create materials with visible fallbacks (checkerboard/white/flat‑normal), then:
- glTF loader (
src/scene/vk_loader.cpp)- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls
unwatchSet(materialSet)before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.
- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls
Implementation Notes
- Uploads and layouts
- Deferred uploads: the RG transfer pass transitions
UNDEFINED → TRANSFER_DST_OPTIMAL, copies, and either generates mipmaps (finishing inSHADER_READ_ONLY_OPTIMAL) or transitions directly there. No extra transition is needed after mip gen.
- Deferred uploads: the RG transfer pass transitions
- Descriptor rewrites
- Material descriptor sets and pools are created with
UPDATE_AFTER_BINDflags; patches are applied safely across frames using aDescriptorWriter.
- Material descriptor sets and pools are created with
- Key hashing
- 64‑bit FNV‑1a for dedup. FilePath keys hash
PATH:<path>#(sRGB|UNORM). Bytes keys hash the payload and XOR an sRGB tag when requested.
- 64‑bit FNV‑1a for dedup. FilePath keys hash
- Format selection and channel packing
TextureKey::channelscan beAuto(default),R,RG, orRGBA. The cache choosesVK_FORMAT_R8/R8G8/RGBA8(sRGB variants when requested) and packs channels on CPU forR/RGto reduce staging + VRAM.
- Progressive downscale
- The decode thread downsizes large images by powers of 2 until within
Max Upload Dimension, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.
- The decode thread downsizes large images by powers of 2 until within
Limitations / Future Work
- Linear‑blit capability check
generate_mipmapsalways usesVK_FILTER_LINEAR. Add a format/feature check and a fallback path (nearest or compute downsample).
- Texture formats
- Only 8‑bit RGBA uploads via stb_image today. Consider KTX2/BasisU for ASTC/BCn, specialized R8/RG8 paths, and float HDR support (
stbi_loadf→R16G16B16A16_SFLOAT).
- Only 8‑bit RGBA uploads via stb_image today. Consider KTX2/BasisU for ASTC/BCn, specialized R8/RG8 paths, and float HDR support (
- Normal‑map mip quality
- Linear blits reduce normal length; consider a compute renormalization pass.
- Samplers
- Anisotropy is currently disabled in
SamplerManager; enable when supported and expose a knob.
- Anisotropy is currently disabled in
- Minor robustness
enqueue_decode()derives the handle via pointer arithmetic on_entries. Passing the precomputed index would avoid any future reallocation hazards.
Operational Tips
- Keep deferred uploads enabled (
ResourceManager::set_deferred_uploads(true)) to coalesce copies per frame (engine does this during init). - To debug VMA allocations and name images, set
VE_VMA_DEBUG=1.