12 KiB
12 KiB
Texture Loading & Streaming
Overview
- Streaming cache:
src/core/assets/texture_cache.{h,cpp}asynchronously decodes images (stb_image) on a small worker pool (1–4 threads, clamped by hardware concurrency) and uploads them viaResourceManagerwith optional mipmaps. For FilePath keys, a sibling<stem>.ktx2(or direct.ktx2) is preferred over PNG/JPEG. Descriptors registered up‑front are patched in‑place once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory. - Uploads:
src/core/frame/resource.{h,cpp}stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps usevkutil::generate_mipmaps(...)and finish inSHADER_READ_ONLY_OPTIMAL. - Integration points:
- Materials: layouts use
UPDATE_AFTER_BIND; descriptors can be rewritten after bind. - glTF loader:
src/scene/vk_loader.cppbuilds keys, requests handles, and registers descriptor patches with the cache. - Primitives/adhoc:
src/core/assets/manager.cppbuilds materials and registers texture watches. - Visibility:
src/render/passes/geometry.cppandsrc/render/passes/transparent.cppcallTextureCache::markSetUsed(...)for sets that are actually drawn.
- Materials: layouts use
- IBL: high‑dynamic‑range environment textures are typically loaded directly as
.ktx2viaIBLManagerinstead of the generic streaming cache. See “Image‑Based Lighting (IBL)” below.
Data Flow
- Request
- Build a
TextureCache::TextureKey(FilePath or Bytes), setsrgbandmipmapped. - Call
request(key, sampler)→ returns a stableTextureHandle, deduplicated by a 64‑bit FNV‑1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XOR’d with a constant whensrgb=true. - Register target descriptors via
watchBinding(handle, set, binding, sampler, fallbackView).
- Build a
- Visibility‑gated scheduling
pumpLoads(...)looks for entries inUnloadedorEvictedstate that were seen recently (now == 0ornow - lastUsed <= 1) and starts at mostmax_loads_per_pumpdecodes per call, while enforcing a byte budget for uploads per frame.- Render passes mark used sets each frame with
markSetUsed(...)(or specific handles viamarkUsed(...)).
- Decode
- FilePath: if the path ends with
.ktx2or a sibling exists, we load via libktx (and transcode to BCn if needed). Otherwise, decode to RGBA8 via stb_image. - Bytes: always decode via stb_image (no sibling discovery possible).
- FilePath: if the path ends with
- Admission & Upload
- Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mip‑factor). A per‑frame byte budget (
max_bytes_per_pump) throttles uploads. - If a GPU texture budget is set, the cache evicts least‑recently‑used textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff.
- Raster:
ResourceManager::create_image(...)stages a single region, then optionally generates mips on GPU. - KTX2:
ResourceManager::create_image_compressed(...)allocates an image with the file’sVkFormat(from libktx) and records oneVkBufferImageCopyper mip level (no GPU mip gen). Immediate path transitions toSHADER_READ_ONLY_OPTIMAL; the RenderGraph path transitions after copy when no mip gen. - If the device cannot sample the KTX2 format, the cache falls back to raster decode.
- After upload: state →
Resident, descriptors recorded viawatchBindingare rewritten to the new image view with the chosen sampler andSHADER_READ_ONLY_OPTIMALlayout. For Bytes‑backed keys, compressed source bytes are dropped unlesskeep_source_bytesis enabled.
- Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mip‑factor). A per‑frame byte budget (
- Eviction & Reload
evictToBudget(bytes)rewrites watchers to fallbacks, destroys images, and marks entriesEvicted. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash.
Runtime UI
- ImGui → Debug → Textures (see
src/core/engine.cpp)- Shows: device‑local budget/usage (from VMA), texture streaming budget (~35% of device‑local by default), resident MiB, CPU source MiB, counts per state, and a Top‑N table of consumers.
- Controls:
Loads/Frame,Upload Budget (MiB)(byte‑based throttle),Keep Source Bytes,CPU Source Budget (MiB),Max Upload Dimension(progressive downscale cap), andTrim To Budget Now.
Key APIs (src/core/assets/texture_cache.h)
TextureHandle request(const TextureKey&, VkSampler)void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)void unwatchSet(VkDescriptorSet)— call before destroying descriptor pools/setsvoid markSetUsed(VkDescriptorSet, uint32_t frameIndex)andvoid markUsed(TextureHandle, uint32_t frameIndex)void pumpLoads(ResourceManager&, FrameResources&)void evictToBudget(size_t bytes)- Controls:
set_max_loads_per_pump,set_keep_source_bytes,set_cpu_source_budget,set_gpu_budget_bytes
Defaults & Budgets
- Worker threads: 1–4 decode threads depending on hardware.
- Loads per pump: default 4.
- Upload byte budget: default 128 MiB per frame.
- GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of device‑local memory (via VMA) and calls
set_gpu_budget_bytes(...), then runsevictToBudget(...)andpumpLoads(...)during the frame loop (src/core/engine.cpp). - CPU source bytes: default budget 64 MiB;
keep_source_bytesdefaults to false. Retention only applies to entries created from Bytes keys.
Examples
- Asset materials (
src/core/assets/manager.cpp)- Create materials with visible fallbacks (checkerboard/white/flat‑normal), then:
- Build a key from an asset path,
request(key, sampler), andwatchBinding(handle, materialSet, binding, sampler, fallbackView)for albedo (1), metal‑rough (2), normal (3).
- Build a key from an asset path,
- Create materials with visible fallbacks (checkerboard/white/flat‑normal), then:
- glTF loader (
src/scene/vk_loader.cpp)- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls
unwatchSet(materialSet)before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame.
- Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls
Implementation Notes
- Uploads and layouts
- Deferred uploads: the RG transfer pass transitions
UNDEFINED → TRANSFER_DST_OPTIMAL, copies, and either generates mipmaps (finishing inSHADER_READ_ONLY_OPTIMAL) or transitions directly there. No extra transition is needed after mip gen.
- Deferred uploads: the RG transfer pass transitions
- Descriptor rewrites
- Material descriptor sets and pools are created with
UPDATE_AFTER_BINDflags; patches are applied safely across frames using aDescriptorWriter.
- Material descriptor sets and pools are created with
- Key hashing
- 64‑bit FNV‑1a for dedup. FilePath keys hash
PATH:<path>#(sRGB|UNORM). Bytes keys hash the payload and XOR an sRGB tag when requested.
- 64‑bit FNV‑1a for dedup. FilePath keys hash
- Format selection and channel packing
TextureKey::channelscan beAuto(default),R,RG, orRGBA. The cache choosesVK_FORMAT_R8/R8G8/RGBA8(sRGB variants when requested) and packs channels on CPU forR/RGto reduce staging + VRAM.
- Progressive downscale
- The decode thread downsizes large images by powers of 2 until within
Max Upload Dimension, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI.
- The decode thread downsizes large images by powers of 2 until within
KTX2 specifics
- Supported: 2D, single‑face, single‑layer KTX2. If BasisLZ/UASTC, libktx transcodes to BCn. sRGB/UNORM is honored from the file’s DFD and can be nudged by request (albedo sRGB, MR/normal UNORM).
- Not supported: Cube/array/multilayer KTX2 in the generic cache path (it assumes single‑layer, 2D). Cubemap KTX2 for IBL is loaded via
IBLManager(see below).
Limitations / Future Work
- Linear‑blit capability check
vkutil::generate_mipmaps/generate_mipmaps_levelsalways useVK_FILTER_LINEARfor blits without checkingVK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT. Add a per‑format capability check and a fallback path (nearest or compute downsample) for formats that do not support linear filtering (especially some compressed formats).
- Texture formats
- Raster path: limited to 8‑bit R/RG/RGBA via
stbi_load. KTX2 path inTextureCache::worker_loopcurrently accepts only BCn/BC6H formats and rejects other VkFormats returned by libktx (e.g., uncompressedR16G16B16A16_SFLOAT). Future work: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (stbi_loadf→R16G16B16A16_SFLOAT) so HDR albedo/lighting data can stream through the generic cache (today HDR IBL uses the separateIBLManagerpath).
- Raster path: limited to 8‑bit R/RG/RGBA via
- Normal‑map mip quality
- Normal maps share the same linear blit pipeline as color textures; no renormalization pass runs after mip generation. Consider a compute or fragment pass to renormalize normal map mips (or a dedicated normal‑aware downsample) to improve shading at grazing angles and distant LODs.
- Samplers
- Anisotropy is currently disabled in
SamplerManager(anisotropyEnable = VK_FALSE). Enable it when the feature is present, expose a knob in the Debug UI, and consider per‑material/per‑texture anisotropy settings.
- Anisotropy is currently disabled in
- Minor robustness
enqueue_decode()computes the handle from the entry pointer (&e - _entries.data()) and passes it to worker threads. This is safe as long as_entriesis not resized during enqueue, but storing the index explicitly when the entry is created (inrequest()) would make the relationship clearer and robust against future refactors.
Operational Tips
- Keep deferred uploads enabled (
ResourceManager::set_deferred_uploads(true)) to coalesce copies per frame (engine does this during init). - To debug VMA allocations and name images, set
VE_VMA_DEBUG=1.
Image‑Based Lighting (IBL) Textures
- Manager:
src/core/assets/ibl_manager.{h,cpp}owns IBL GPU resources and the shared descriptor set layout for set=3. - Inputs (
IBLPaths):specularCube: preferred is a GPU‑ready.ktx2(BC6H orR16G16B16A16_SFLOAT) containing either a cubemap or an equirectangular 2D env with prefiltered mips.diffuseCube: optional.ktx2cubemap for diffuse irradiance. If missing, diffuse IBL falls back to SH only.brdfLut2D:.ktx22D RG LUT (e.g.,VK_FORMAT_R8G8_UNORMor BC5).
- Loading:
- Specular:
- If
specularCubeis a cubemap.ktx2,IBLManagerusesktxutil::load_ktx2_cubemapand uploads viaResourceManager::create_image_compressed_layers, preserving the file’s format and mip chain. - If cubemap load fails, it falls back to 2D
.ktx2viaktxutil::load_ktx2_2d+ResourceManager::create_image_compressed. The image is treated as equirectangular with prefiltered mips and sampled with explicit LOD in shaders. - If the format is float HDR (
R16G16B16A16_SFLOATorR32G32B32A32_SFLOAT) and the aspect ratio is 2:1,IBLManageradditionally computes 2nd‑order SH coefficients (9×vec3) on a worker thread and uploads them to a UBO (_shBuffer) whenpump_async()is called on the main thread.
- If
- Diffuse (optional):
- If
diffuseCubeis provided and valid, it is uploaded as a cubemap usingcreate_image_compressed_layers. Current shaders use the SH buffer for diffuse; this cubemap can be wired into a future path if you want to sample it directly.
- If
- BRDF LUT:
brdfLut2Dis loaded as 2D.ktx2viaktxutil::load_ktx2_2dand uploaded withcreate_image_compressed.
- Fallbacks:
LightingPassandTransparentPasscreate tiny 1×1 UNORM textures (grey 2D for env, RG for BRDF LUT) so shaders can safely sample IBL bindings even when IBL assets are not loaded.
- Specular:
- Descriptor layout & bindings:
IBLManager::ensureLayout()creates a descriptor set layout for set=3 with:- binding 0:
COMBINED_IMAGE_SAMPLER— specular env (2D equirect with mips or cubemap sampled via 2D path). - binding 1:
COMBINED_IMAGE_SAMPLER— BRDF LUT 2D. - binding 2:
UNIFORM_BUFFER— SH coefficients (vec4 sh[9], RGB in.xyz).
- binding 0:
- Render passes that use IBL fetch this layout from
EngineContext::ibland allocate per‑frame sets:passes/lighting.cpp: deferred lighting (set=3).passes/transparent.cpp: forward/transparent PBR materials (set=3).passes/background.cpp: environment background (set=3; only binding 0 is used in the shader).