Texture Loading & Streaming Overview - Streaming cache: `src/core/texture_cache.{h,cpp}` asynchronously decodes images (stb_image) on a small worker pool (1–4 threads, clamped by hardware concurrency) and uploads them via `ResourceManager` with optional mipmaps. For FilePath keys, a sibling `.ktx2` (or direct `.ktx2`) is preferred over PNG/JPEG. Descriptors registered up‑front are patched in‑place once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory. - Uploads: `src/core/vk_resource.{h,cpp}` stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps use `vkutil::generate_mipmaps(...)` and finish in `SHADER_READ_ONLY_OPTIMAL`. - Integration points: - Materials: layouts use `UPDATE_AFTER_BIND`; descriptors can be rewritten after bind. - glTF loader: `src/scene/vk_loader.cpp` builds keys, requests handles, and registers descriptor patches with the cache. - Primitives/adhoc: `src/core/asset_manager.cpp` builds materials and registers texture watches. - Visibility: `src/render/vk_renderpass_geometry.cpp` and `src/render/vk_renderpass_transparent.cpp` call `TextureCache::markSetUsed(...)` for sets that are actually drawn. Data Flow - Request - Build a `TextureCache::TextureKey` (FilePath or Bytes), set `srgb` and `mipmapped`. - Call `request(key, sampler)` → returns a stable `TextureHandle`, deduplicated by a 64‑bit FNV‑1a hash. For FilePath keys the path plus the sRGB bit are hashed; for Bytes keys the payload hash is XOR’d with a constant when `srgb=true`. - Register target descriptors via `watchBinding(handle, set, binding, sampler, fallbackView)`. - Visibility‑gated scheduling - `pumpLoads(...)` looks for entries in `Unloaded` or `Evicted` state that were seen recently (`now == 0` or `now - lastUsed <= 1`) and starts at most `max_loads_per_pump` decodes per call, while enforcing a byte budget for uploads per frame. - Render passes mark used sets each frame with `markSetUsed(...)` (or specific handles via `markUsed(...)`). - Decode - FilePath: if the path ends with `.ktx2` or a sibling exists, parse KTX2 (2D, single‑face, single‑layer, no supercompression). Otherwise, decode to RGBA8 via stb_image. - Bytes: always decode via stb_image (no sibling discovery possible). - Admission & Upload - Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mip‑factor). A per‑frame byte budget (`max_bytes_per_pump`) throttles uploads. - If a GPU texture budget is set, the cache evicts least‑recently‑used textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff. - Raster: `ResourceManager::create_image(...)` stages a single region, then optionally generates mips on GPU. - KTX2: `ResourceManager::create_image_compressed(...)` allocates an image with the file’s `VkFormat` and records one `VkBufferImageCopy` per mip level (no GPU mip gen). Immediate path transitions to `SHADER_READ_ONLY_OPTIMAL`; RG path leaves it in `TRANSFER_DST` until a sampling pass. - If the device cannot sample the KTX2 format, the cache falls back to raster decode. - After upload: state → `Resident`, descriptors recorded via `watchBinding` are rewritten to the new image view with the chosen sampler and `SHADER_READ_ONLY_OPTIMAL` layout. For Bytes‑backed keys, compressed source bytes are dropped unless `keep_source_bytes` is enabled. - Eviction & Reload - `evictToBudget(bytes)` rewrites watchers to fallbacks, destroys images, and marks entries `Evicted`. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash. Runtime UI - ImGui → Debug → Textures (see `src/core/vk_engine.cpp`) - Shows: device‑local budget/usage (from VMA), texture streaming budget (~35% of device‑local by default), resident MiB, CPU source MiB, counts per state, and a Top‑N table of consumers. - Controls: `Loads/Frame`, `Upload Budget (MiB)` (byte‑based throttle), `Keep Source Bytes`, `CPU Source Budget (MiB)`, `Max Upload Dimension` (progressive downscale cap), and `Trim To Budget Now`. Key APIs (src/core/texture_cache.h) - `TextureHandle request(const TextureKey&, VkSampler)` - `void watchBinding(TextureHandle, VkDescriptorSet, uint32_t binding, VkSampler, VkImageView fallback)` - `void unwatchSet(VkDescriptorSet)` — call before destroying descriptor pools/sets - `void markSetUsed(VkDescriptorSet, uint32_t frameIndex)` and `void markUsed(TextureHandle, uint32_t frameIndex)` - `void pumpLoads(ResourceManager&, FrameResources&)` - `void evictToBudget(size_t bytes)` - Controls: `set_max_loads_per_pump`, `set_keep_source_bytes`, `set_cpu_source_budget`, `set_gpu_budget_bytes` Defaults & Budgets - Worker threads: 1–4 decode threads depending on hardware. - Loads per pump: default 4. - Upload byte budget: default 128 MiB per frame. - GPU budget: unlimited until the engine sets one each frame. The engine queries ~35% of device‑local memory (via VMA) and calls `set_gpu_budget_bytes(...)`, then runs `evictToBudget(...)` and `pumpLoads(...)` during the frame loop (`src/core/vk_engine.cpp`). - CPU source bytes: default budget 64 MiB; `keep_source_bytes` defaults to false. Retention only applies to entries created from Bytes keys. Examples - Asset materials (`src/core/asset_manager.cpp`) - Create materials with visible fallbacks (checkerboard/white/flat‑normal), then: - Build a key from an asset path, `request(key, sampler)`, and `watchBinding(handle, materialSet, binding, sampler, fallbackView)` for albedo (1), metal‑rough (2), normal (3). - glTF loader (`src/scene/vk_loader.cpp`) - Builds keys from URI/Vector/BufferView sources, requests handles, and registers watches for material textures. On teardown, calls `unwatchSet(materialSet)` before resetting descriptor pools to avoid patching dead sets. The geometry/transparent passes mark used sets each frame. Implementation Notes - Uploads and layouts - Deferred uploads: the RG transfer pass transitions `UNDEFINED → TRANSFER_DST_OPTIMAL`, copies, and either generates mipmaps (finishing in `SHADER_READ_ONLY_OPTIMAL`) or transitions directly there. No extra transition is needed after mip gen. - Descriptor rewrites - Material descriptor sets and pools are created with `UPDATE_AFTER_BIND` flags; patches are applied safely across frames using a `DescriptorWriter`. - Key hashing - 64‑bit FNV‑1a for dedup. FilePath keys hash `PATH:#(sRGB|UNORM)`. Bytes keys hash the payload and XOR an sRGB tag when requested. - Format selection and channel packing - `TextureKey::channels` can be `Auto` (default), `R`, `RG`, or `RGBA`. The cache chooses `VK_FORMAT_R8/R8G8/RGBA8` (sRGB variants when requested) and packs channels on CPU for `R`/`RG` to reduce staging + VRAM. - Progressive downscale - The decode thread downsizes large images by powers of 2 until within `Max Upload Dimension`, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI. KTX2 specifics - Supported: 2D, single‑face, single‑layer, no supercompression; pre‑transcoded BCn (including sRGB variants). - Not supported: UASTC/BasisLZ transcoding at runtime, cube/array/multilayer. Limitations / Future Work - Linear‑blit capability check - `generate_mipmaps` always uses `VK_FILTER_LINEAR`. Add a format/feature check and a fallback path (nearest or compute downsample). - Texture formats - Raster path: 8‑bit R/RG/RGBA via stb_image. Compressed path: BCn via `.ktx2`. Future: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (`stbi_loadf` → `R16G16B16A16_SFLOAT`). - Normal‑map mip quality - Linear blits reduce normal length; consider a compute renormalization pass. - Samplers - Anisotropy is currently disabled in `SamplerManager`; enable when supported and expose a knob. - Minor robustness - `enqueue_decode()` derives the handle via pointer arithmetic on `_entries`. Passing the precomputed index would avoid any future reallocation hazards. Operational Tips - Keep deferred uploads enabled (`ResourceManager::set_deferred_uploads(true)`) to coalesce copies per frame (engine does this during init). - To debug VMA allocations and name images, set `VE_VMA_DEBUG=1`.