diff --git a/docs/TextureLoading.md b/docs/TextureLoading.md index 282c06d..71913f7 100644 --- a/docs/TextureLoading.md +++ b/docs/TextureLoading.md @@ -1,7 +1,7 @@ Texture Loading & Streaming Overview -- Streaming cache: `src/core/texture_cache.{h,cpp}` asynchronously decodes images (stb_image) on a small worker pool (1–4 threads, clamped by hardware concurrency) and uploads them via `ResourceManager` with optional mipmaps. Descriptors registered up‑front are patched in‑place once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory. +- Streaming cache: `src/core/texture_cache.{h,cpp}` asynchronously decodes images (stb_image) on a small worker pool (1–4 threads, clamped by hardware concurrency) and uploads them via `ResourceManager` with optional mipmaps. For FilePath keys, a sibling `.ktx2` (or direct `.ktx2`) is preferred over PNG/JPEG. Descriptors registered up‑front are patched in‑place once the texture becomes resident. Large decodes can be downscaled on workers before upload to cap peak memory. - Uploads: `src/core/vk_resource.{h,cpp}` stages pixel data and either submits immediately or registers a Render Graph transfer pass. Mipmaps use `vkutil::generate_mipmaps(...)` and finish in `SHADER_READ_ONLY_OPTIMAL`. - Integration points: - Materials: layouts use `UPDATE_AFTER_BIND`; descriptors can be rewritten after bind. @@ -18,11 +18,14 @@ Data Flow - `pumpLoads(...)` looks for entries in `Unloaded` or `Evicted` state that were seen recently (`now == 0` or `now - lastUsed <= 1`) and starts at most `max_loads_per_pump` decodes per call, while enforcing a byte budget for uploads per frame. - Render passes mark used sets each frame with `markSetUsed(...)` (or specific handles via `markUsed(...)`). - Decode - - Worker threads decode to RGBA8 with stb_image (`stbi_load` / `stbi_load_from_memory`). Results are queued for the main thread. --- Admission & Upload - - Before upload, an expected resident size is computed from chosen format (R/RG/RGBA) and mip count (full chain or clamped). A per‑frame byte budget (`max_bytes_per_pump`) throttles the total amount uploaded each pump. - - If a GPU texture budget is set, the cache tries to free space by evicting least‑recently‑used textures not used this frame. If it still cannot fit, the decode is deferred (kept in the ready queue) or dropped with backoff if VRAM is tight. - - Uploads are created via `ResourceManager::create_image(...)`, which now supports an explicit mip count. Deferred upload paths generate exactly the requested number of mips. + - FilePath: if the path ends with `.ktx2` or a sibling exists, parse KTX2 (2D, single‑face, single‑layer, no supercompression). Otherwise, decode to RGBA8 via stb_image. + - Bytes: always decode via stb_image (no sibling discovery possible). +- Admission & Upload + - Before upload, an expected resident size is computed (exact for KTX2 by summing level byte lengths; estimated for raster by format×area×mip‑factor). A per‑frame byte budget (`max_bytes_per_pump`) throttles uploads. + - If a GPU texture budget is set, the cache evicts least‑recently‑used textures not used this frame. If it still cannot fit, the decode is deferred or dropped with backoff. + - Raster: `ResourceManager::create_image(...)` stages a single region, then optionally generates mips on GPU. + - KTX2: `ResourceManager::create_image_compressed(...)` allocates an image with the file’s `VkFormat` and records one `VkBufferImageCopy` per mip level (no GPU mip gen). Immediate path transitions to `SHADER_READ_ONLY_OPTIMAL`; RG path leaves it in `TRANSFER_DST` until a sampling pass. + - If the device cannot sample the KTX2 format, the cache falls back to raster decode. - After upload: state → `Resident`, descriptors recorded via `watchBinding` are rewritten to the new image view with the chosen sampler and `SHADER_READ_ONLY_OPTIMAL` layout. For Bytes‑backed keys, compressed source bytes are dropped unless `keep_source_bytes` is enabled. - Eviction & Reload - `evictToBudget(bytes)` rewrites watchers to fallbacks, destroys images, and marks entries `Evicted`. Evicted entries can reload automatically when seen again and a short cooldown has passed (default ~2 frames), avoiding immediate thrash. @@ -67,11 +70,15 @@ Implementation Notes - Progressive downscale - The decode thread downsizes large images by powers of 2 until within `Max Upload Dimension`, reducing both staging and VRAM. You can increase the cap or disable it (set to 0) from the UI. +KTX2 specifics +- Supported: 2D, single‑face, single‑layer, no supercompression; pre‑transcoded BCn (including sRGB variants). +- Not supported: UASTC/BasisLZ transcoding at runtime, cube/array/multilayer. + Limitations / Future Work - Linear‑blit capability check - `generate_mipmaps` always uses `VK_FILTER_LINEAR`. Add a format/feature check and a fallback path (nearest or compute downsample). - Texture formats - - Only 8‑bit RGBA uploads via stb_image today. Consider KTX2/BasisU for ASTC/BCn, specialized R8/RG8 paths, and float HDR support (`stbi_loadf` → `R16G16B16A16_SFLOAT`). + - Raster path: 8‑bit R/RG/RGBA via stb_image. Compressed path: BCn via `.ktx2`. Future: ASTC/ETC2, specialized R8/RG8 parsing, and float HDR support (`stbi_loadf` → `R16G16B16A16_SFLOAT`). - Normal‑map mip quality - Linear blits reduce normal length; consider a compute renormalization pass. - Samplers diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 36b5fd4..055d819 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -34,6 +34,8 @@ add_executable (vulkan_engine core/frame_resources.cpp core/texture_cache.h core/texture_cache.cpp + core/ktx2_loader.h + core/ktx2_loader.cpp core/config.h core/vk_engine.h core/vk_engine.cpp diff --git a/src/core/texture_cache.cpp b/src/core/texture_cache.cpp index 64f204f..29706a1 100644 --- a/src/core/texture_cache.cpp +++ b/src/core/texture_cache.cpp @@ -6,9 +6,12 @@ #include #include #include "stb_image.h" +#include "ktx2_loader.h" #include #include "vk_device.h" #include +#include +#include #include #include @@ -372,58 +375,131 @@ void TextureCache::worker_loop() _queue.pop_front(); } - // Decode using stb_image - int w = 0, h = 0, comp = 0; - unsigned char *data = nullptr; - if (rq.key.kind == TextureKey::SourceKind::FilePath) - { - data = stbi_load(rq.path.c_str(), &w, &h, &comp, 4); - } - else - { - if (!rq.bytes.empty()) - { - data = stbi_load_from_memory(rq.bytes.data(), static_cast(rq.bytes.size()), &w, &h, &comp, 4); - } - } - DecodedResult out{}; out.handle = rq.handle; - out.width = w; - out.height = h; out.mipmapped = rq.key.mipmapped; out.srgb = rq.key.srgb; out.channels = rq.key.channels; out.mipClampLevels = rq.key.mipClampLevels; - if (data && w > 0 && h > 0) + + // 1) Prefer KTX2 when source is a file path and a .ktx2 version exists + bool attemptedKTX2 = false; + if (rq.key.kind == TextureKey::SourceKind::FilePath) { - // Progressive downscale if requested - if (_maxUploadDimension > 0 && (w > static_cast(_maxUploadDimension) || h > static_cast(_maxUploadDimension))) + std::filesystem::path p = rq.path; + std::filesystem::path ktxPath; + if (p.extension() == ".ktx2") { - std::vector scaled; - scaled.assign(data, data + static_cast(w) * h * 4); - int cw = w, ch = h; - while (cw > static_cast(_maxUploadDimension) || ch > static_cast(_maxUploadDimension)) - { - auto tmp = downscale_half(scaled.data(), cw, ch, 4); - scaled.swap(tmp); - cw = std::max(1, cw / 2); - ch = std::max(1, ch / 2); - } - stbi_image_free(data); - out.rgba = std::move(scaled); - out.width = cw; - out.height = ch; + ktxPath = p; } else { - out.heap = data; - out.heapBytes = static_cast(w) * static_cast(h) * 4u; + ktxPath = p; + ktxPath.replace_extension(".ktx2"); + } + std::error_code ec; + bool hasKTX2 = (!ktxPath.empty() && std::filesystem::exists(ktxPath, ec) && !ec); + if (hasKTX2) + { + attemptedKTX2 = true; + // Read file + fmt::println("[TextureCache] KTX2 candidate for '{}' → '{}'", rq.path, ktxPath.string()); + std::ifstream ifs(ktxPath, std::ios::binary); + if (ifs) + { + std::vector fileBytes(std::istreambuf_iterator(ifs), {}); + fmt::println("[TextureCache] KTX2 read {} bytes", fileBytes.size()); + KTX2Image ktx{}; + std::string err; + if (parse_ktx2(fileBytes.data(), fileBytes.size(), ktx, &err)) + { + fmt::println("[TextureCache] KTX2 parsed: format={}, {}x{}, mips={}, faces={}, layers={}, supercompression={}", + string_VkFormat(static_cast(ktx.format)), ktx.width, ktx.height, + ktx.mipLevels, ktx.faceCount, ktx.layerCount, ktx.supercompression); + size_t sum = 0; for (const auto &lv: ktx.levels) sum += static_cast(lv.length); + fmt::println("[TextureCache] KTX2 levels: {} totalBytes={}", ktx.levels.size(), sum); + for (size_t li = 0; li < ktx.levels.size(); ++li) + { + fmt::println(" L{}: off={}, len={}, extent={}x{}", li, ktx.levels[li].offset, + ktx.levels[li].length, + std::max(1u, ktx.width >> li), + std::max(1u, ktx.height >> li)); + } + out.isKTX2 = true; + out.ktxFormat = ktx.format; + out.ktxMipLevels = ktx.mipLevels; + out.ktx.bytes = std::move(ktx.data); + out.ktx.levels.reserve(ktx.levels.size()); + for (const auto &lv : ktx.levels) + { + out.ktx.levels.push_back({lv.offset, lv.length, lv.width, lv.height}); + } + out.width = static_cast(ktx.width); + out.height = static_cast(ktx.height); + } + else + { + fmt::println("[TextureCache] parse_ktx2 failed for '{}' ({} bytes): {}", + ktxPath.string(), fileBytes.size(), err); + } + } + else + { + fmt::println("[TextureCache] Failed to open KTX2 file '{}'", ktxPath.string()); + } + } + else if (p.extension() == ".ktx2") + { + fmt::println("[TextureCache] Requested .ktx2 '{}' but file not found (ec={})", p.string(), ec.value()); } } - else if (data) + + // 2) Raster fallback via stb_image if not KTX2 or unsupported + if (!out.isKTX2) { - stbi_image_free(data); + int w = 0, h = 0, comp = 0; + unsigned char *data = nullptr; + if (rq.key.kind == TextureKey::SourceKind::FilePath) + { + data = stbi_load(rq.path.c_str(), &w, &h, &comp, 4); + } + else if (!rq.bytes.empty()) + { + data = stbi_load_from_memory(rq.bytes.data(), static_cast(rq.bytes.size()), &w, &h, &comp, 4); + } + + out.width = w; + out.height = h; + if (data && w > 0 && h > 0) + { + // Progressive downscale if requested + if (_maxUploadDimension > 0 && (w > static_cast(_maxUploadDimension) || h > static_cast(_maxUploadDimension))) + { + std::vector scaled; + scaled.assign(data, data + static_cast(w) * h * 4); + int cw = w, ch = h; + while (cw > static_cast(_maxUploadDimension) || ch > static_cast(_maxUploadDimension)) + { + auto tmp = downscale_half(scaled.data(), cw, ch, 4); + scaled.swap(tmp); + cw = std::max(1, cw / 2); + ch = std::max(1, ch / 2); + } + stbi_image_free(data); + out.rgba = std::move(scaled); + out.width = cw; + out.height = ch; + } + else + { + out.heap = data; + out.heapBytes = static_cast(w) * static_cast(h) * 4u; + } + } + else if (data) + { + stbi_image_free(data); + } } { @@ -447,34 +523,38 @@ size_t TextureCache::drain_ready_uploads(ResourceManager &rm, size_t budgetBytes { if (res.handle == InvalidHandle || res.handle >= _entries.size()) continue; Entry &e = _entries[res.handle]; - if ((res.heap == nullptr && res.rgba.empty()) || res.width <= 0 || res.height <= 0) + if (!res.isKTX2 && ((res.heap == nullptr && res.rgba.empty()) || res.width <= 0 || res.height <= 0)) { e.state = EntryState::Evicted; // failed decode; keep fallback continue; } const uint32_t now = _context ? _context->frameIndex : 0u; - VkExtent3D extent{static_cast(res.width), static_cast(res.height), 1u}; + VkExtent3D extent{static_cast(std::max(0, res.width)), static_cast(std::max(0, res.height)), 1u}; TextureKey::ChannelsHint hint = (e.key.channels == TextureKey::ChannelsHint::Auto) ? TextureKey::ChannelsHint::Auto : e.key.channels; - VkFormat fmt = choose_format(hint, res.srgb); - // Estimate resident size for admission control (match post-upload computation) + size_t expectedBytes = 0; + VkFormat fmt = VK_FORMAT_UNDEFINED; uint32_t desiredLevels = 1; - if (res.mipmapped) + if (res.isKTX2) { - if (res.mipClampLevels > 0) - { - desiredLevels = res.mipClampLevels; - } - else - { - desiredLevels = static_cast(std::floor(std::log2(std::max(extent.width, extent.height)))) + 1u; - } + fmt = res.ktxFormat; + desiredLevels = res.ktxMipLevels; + for (const auto &lv : res.ktx.levels) expectedBytes += static_cast(lv.length); + } + else + { + fmt = choose_format(hint, res.srgb); + if (res.mipmapped) + { + if (res.mipClampLevels > 0) desiredLevels = res.mipClampLevels; + else desiredLevels = static_cast(std::floor(std::log2(std::max(extent.width, extent.height)))) + 1u; + } + const float mipFactor = res.mipmapped ? mip_factor_for_levels(desiredLevels) : 1.0f; + expectedBytes = static_cast(extent.width) * extent.height * bytes_per_texel(fmt) * mipFactor; } - const float mipFactor = res.mipmapped ? mip_factor_for_levels(desiredLevels) : 1.0f; - const size_t expectedBytes = static_cast(extent.width) * extent.height * bytes_per_texel(fmt) * mipFactor; // Byte budget for this pump (frame) if (admitted + expectedBytes > budgetBytes) @@ -503,38 +583,95 @@ size_t TextureCache::drain_ready_uploads(ResourceManager &rm, size_t budgetBytes } } - // Optionally repack channels to R or RG to save memory - std::vector packed; - const void *src = nullptr; - if (hint == TextureKey::ChannelsHint::R) + if (res.isKTX2) { - packed.resize(static_cast(extent.width) * extent.height); - const uint8_t* in = res.heap ? res.heap : res.rgba.data(); - for (size_t i = 0, px = static_cast(extent.width) * extent.height; i < px; ++i) + // Basic format support check: ensure the GPU can sample this format + bool supported = true; + if (_context && _context->getDevice()) { - packed[i] = in[i * 4 + 0]; + VkFormatProperties props{}; + vkGetPhysicalDeviceFormatProperties(_context->getDevice()->physicalDevice(), fmt, &props); + supported = (props.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) != 0; } - src = packed.data(); - } - else if (hint == TextureKey::ChannelsHint::RG) - { - packed.resize(static_cast(extent.width) * extent.height * 2); - const uint8_t* in = res.heap ? res.heap : res.rgba.data(); - for (size_t i = 0, px = static_cast(extent.width) * extent.height; i < px; ++i) + + if (!supported) { - packed[i * 2 + 0] = in[i * 4 + 0]; - packed[i * 2 + 1] = in[i * 4 + 1]; + VkFormatProperties props{}; + if (_context && _context->getDevice()) + { + vkGetPhysicalDeviceFormatProperties(_context->getDevice()->physicalDevice(), fmt, &props); + } + fmt::println("[TextureCache] Compressed format unsupported: format={} (optimalFeatures=0x{:08x}) — fallback raster for {}", + string_VkFormat(fmt), props.optimalTilingFeatures, e.path); + // Fall back to raster path: requeue by synthesizing a non-KTX result + // Attempt synchronous fallback decode from file if available. + int fw = 0, fh = 0, comp = 0; + unsigned char *fdata = nullptr; + if (e.key.kind == TextureKey::SourceKind::FilePath) + { + fdata = stbi_load(e.path.c_str(), &fw, &fh, &comp, 4); + } + if (!fdata) + { + e.state = EntryState::Evicted; + continue; + } + VkExtent3D fext{ (uint32_t)fw, (uint32_t)fh, 1 }; + VkFormat ffmt = choose_format(hint, res.srgb); + uint32_t mips = (res.mipmapped) ? static_cast(std::floor(std::log2(std::max(fext.width, fext.height)))) + 1u : 1u; + e.image = rm.create_image(fdata, fext, ffmt, VK_IMAGE_USAGE_SAMPLED_BIT, res.mipmapped, mips); + stbi_image_free(fdata); + e.sizeBytes = static_cast(fext.width) * fext.height * bytes_per_texel(ffmt) * (res.mipmapped ? mip_factor_for_levels(mips) : 1.0f); + } + else + { + // Prepare level table for ResourceManager + std::vector levels; + levels.reserve(res.ktx.levels.size()); + for (const auto &lv : res.ktx.levels) + { + levels.push_back(ResourceManager::MipLevelCopy{ lv.offset, lv.length, lv.width, lv.height }); + } + e.image = rm.create_image_compressed(res.ktx.bytes.data(), res.ktx.bytes.size(), fmt, levels); + e.sizeBytes = expectedBytes; } - src = packed.data(); } else { - src = res.heap ? static_cast(res.heap) - : static_cast(res.rgba.data()); - } + // Optionally repack channels to R or RG to save memory + std::vector packed; + const void *src = nullptr; + if (hint == TextureKey::ChannelsHint::R) + { + packed.resize(static_cast(extent.width) * extent.height); + const uint8_t* in = res.heap ? res.heap : res.rgba.data(); + for (size_t i = 0, px = static_cast(extent.width) * extent.height; i < px; ++i) + { + packed[i] = in[i * 4 + 0]; + } + src = packed.data(); + } + else if (hint == TextureKey::ChannelsHint::RG) + { + packed.resize(static_cast(extent.width) * extent.height * 2); + const uint8_t* in = res.heap ? res.heap : res.rgba.data(); + for (size_t i = 0, px = static_cast(extent.width) * extent.height; i < px; ++i) + { + packed[i * 2 + 0] = in[i * 4 + 0]; + packed[i * 2 + 1] = in[i * 4 + 1]; + } + src = packed.data(); + } + else + { + src = res.heap ? static_cast(res.heap) + : static_cast(res.rgba.data()); + } - uint32_t mipOverride = (res.mipmapped ? desiredLevels : 1); - e.image = rm.create_image(src, extent, fmt, VK_IMAGE_USAGE_SAMPLED_BIT, res.mipmapped, mipOverride); + uint32_t mipOverride = (res.mipmapped ? desiredLevels : 1); + e.image = rm.create_image(src, extent, fmt, VK_IMAGE_USAGE_SAMPLED_BIT, res.mipmapped, mipOverride); + e.sizeBytes = expectedBytes; + } if (vmaDebugEnabled()) { @@ -542,7 +679,6 @@ size_t TextureCache::drain_ready_uploads(ResourceManager &rm, size_t budgetBytes vmaSetAllocationName(_context->getDevice()->allocator(), e.image.allocation, name.c_str()); } - e.sizeBytes = expectedBytes; _residentBytes += e.sizeBytes; e.state = EntryState::Resident; e.nextAttemptFrame = 0; // clear backoff after success diff --git a/src/core/texture_cache.h b/src/core/texture_cache.h index fa642ff..03a57ba 100644 --- a/src/core/texture_cache.h +++ b/src/core/texture_cache.h @@ -188,6 +188,17 @@ private: bool srgb{false}; TextureKey::ChannelsHint channels{TextureKey::ChannelsHint::Auto}; uint32_t mipClampLevels{0}; + + // Compressed path (KTX2 pre-transcoded BCn). When true, 'rgba/heap' + // are ignored and the fields below describe the payload. + bool isKTX2{false}; + VkFormat ktxFormat{VK_FORMAT_UNDEFINED}; + uint32_t ktxMipLevels{0}; + struct KTXPack { + struct L { uint64_t offset{0}, length{0}; uint32_t width{0}, height{0}; }; + std::vector bytes; // full file content + std::vector levels; // per-mip region description + } ktx; }; void worker_loop(); diff --git a/src/core/vk_resource.cpp b/src/core/vk_resource.cpp index f8bcaba..4b97341 100644 --- a/src/core/vk_resource.cpp +++ b/src/core/vk_resource.cpp @@ -391,22 +391,34 @@ void ResourceManager::process_queued_uploads_immediate() vkutil::transition_image(cmd, imageUpload.image, imageUpload.initialLayout, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL); - VkBufferImageCopy copyRegion = {}; - copyRegion.bufferOffset = 0; - copyRegion.bufferRowLength = 0; - copyRegion.bufferImageHeight = 0; - copyRegion.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; - copyRegion.imageSubresource.mipLevel = 0; - copyRegion.imageSubresource.baseArrayLayer = 0; - copyRegion.imageSubresource.layerCount = 1; - copyRegion.imageExtent = imageUpload.extent; + if (!imageUpload.copies.empty()) + { + vkCmdCopyBufferToImage(cmd, + imageUpload.staging.buffer, + imageUpload.image, + VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, + static_cast(imageUpload.copies.size()), + imageUpload.copies.data()); + } + else + { + VkBufferImageCopy copyRegion = {}; + copyRegion.bufferOffset = 0; + copyRegion.bufferRowLength = 0; + copyRegion.bufferImageHeight = 0; + copyRegion.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + copyRegion.imageSubresource.mipLevel = 0; + copyRegion.imageSubresource.baseArrayLayer = 0; + copyRegion.imageSubresource.layerCount = 1; + copyRegion.imageExtent = imageUpload.extent; - vkCmdCopyBufferToImage(cmd, - imageUpload.staging.buffer, - imageUpload.image, - VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, - 1, - ©Region); + vkCmdCopyBufferToImage(cmd, + imageUpload.staging.buffer, + imageUpload.image, + VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, + 1, + ©Region); + } if (imageUpload.generateMips) { @@ -571,26 +583,37 @@ void ResourceManager::register_upload_pass(RenderGraph &graph, FrameResources &f VkBuffer staging = res.buffer(binding.stagingHandle); VkImage image = res.image(binding.imageHandle); - VkBufferImageCopy region{}; - region.bufferOffset = 0; - region.bufferRowLength = 0; - region.bufferImageHeight = 0; - region.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; - region.imageSubresource.mipLevel = 0; - region.imageSubresource.baseArrayLayer = 0; - region.imageSubresource.layerCount = 1; - region.imageExtent = upload.extent; - - vkCmdCopyBufferToImage(cmd, staging, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion); + if (!upload.copies.empty()) + { + vkCmdCopyBufferToImage(cmd, staging, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, + static_cast(upload.copies.size()), upload.copies.data()); + } + else + { + VkBufferImageCopy region{}; + region.bufferOffset = 0; + region.bufferRowLength = 0; + region.bufferImageHeight = 0; + region.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.imageSubresource.mipLevel = 0; + region.imageSubresource.baseArrayLayer = 0; + region.imageSubresource.layerCount = 1; + region.imageExtent = upload.extent; + vkCmdCopyBufferToImage(cmd, staging, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion); + } if (upload.generateMips) { // NOTE: generate_mipmaps_levels() transitions the image to // VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL at the end. - // Do not transition back to TRANSFER here. vkutil::generate_mipmaps_levels(cmd, image, VkExtent2D{upload.extent.width, upload.extent.height}, static_cast(upload.mipLevels)); } + else + { + // Transition to final layout for sampling + vkutil::transition_image(cmd, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, upload.finalLayout); + } } }); @@ -606,3 +629,63 @@ void ResourceManager::register_upload_pass(RenderGraph &graph, FrameResources &f } }); } + +AllocatedImage ResourceManager::create_image_compressed(const void* bytes, size_t size, + VkFormat fmt, + std::span levels, + VkImageUsageFlags usage) +{ + if (bytes == nullptr || size == 0 || levels.empty()) + { + return {}; + } + + // Determine base extent from level 0 + VkExtent3D extent{ levels[0].width, levels[0].height, 1 }; + + // Stage full payload as-is + AllocatedBuffer uploadbuffer = create_buffer(size, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, + VMA_MEMORY_USAGE_CPU_TO_GPU); + std::memcpy(uploadbuffer.info.pMappedData, bytes, size); + vmaFlushAllocation(_deviceManager->allocator(), uploadbuffer.allocation, 0, size); + + // Create GPU image with explicit mip count; no mip generation + const uint32_t mipCount = static_cast(levels.size()); + AllocatedImage new_image = create_image(extent, fmt, + usage | VK_IMAGE_USAGE_TRANSFER_DST_BIT, + /*mipmapped=*/true, mipCount); + + PendingImageUpload pending{}; + pending.staging = uploadbuffer; + pending.image = new_image.image; + pending.extent = extent; + pending.format = fmt; + pending.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; + pending.finalLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; + pending.generateMips = false; + pending.mipLevels = mipCount; + pending.copies.reserve(levels.size()); + + for (uint32_t i = 0; i < mipCount; ++i) + { + VkBufferImageCopy region{}; + region.bufferOffset = levels[i].offset; + region.bufferRowLength = 0; // tightly packed + region.bufferImageHeight = 0; + region.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.imageSubresource.mipLevel = i; + region.imageSubresource.baseArrayLayer = 0; + region.imageSubresource.layerCount = 1; + region.imageExtent = { levels[i].width, levels[i].height, 1 }; + pending.copies.push_back(region); + } + + _pendingImageUploads.push_back(std::move(pending)); + + if (!_deferUploads) + { + process_queued_uploads_immediate(); + } + + return new_image; +} diff --git a/src/core/vk_resource.h b/src/core/vk_resource.h index 8973435..a6d414a 100644 --- a/src/core/vk_resource.h +++ b/src/core/vk_resource.h @@ -13,6 +13,13 @@ struct FrameResources; class ResourceManager { public: + struct MipLevelCopy + { + uint64_t offset{0}; + uint64_t length{0}; + uint32_t width{0}; + uint32_t height{0}; + }; struct BufferCopyRegion { VkBuffer destination = VK_NULL_HANDLE; @@ -37,6 +44,8 @@ public: VkImageLayout finalLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; bool generateMips = false; uint32_t mipLevels = 1; + // For multi-region (per-mip) uploads + std::vector copies; }; void init(DeviceManager *deviceManager); @@ -59,6 +68,14 @@ public: AllocatedImage create_image(const void *data, VkExtent3D size, VkFormat format, VkImageUsageFlags usage, bool mipmapped, uint32_t mipLevelsOverride); + // Create an image from a compressed payload (e.g., KTX2 pre-transcoded BCn). + // 'bytes' backs a single staging buffer; 'levels' provides per-mip copy regions. + // No GPU mip generation is performed; the number of mips equals levels.size(). + AllocatedImage create_image_compressed(const void* bytes, size_t size, + VkFormat fmt, + std::span levels, + VkImageUsageFlags usage = VK_IMAGE_USAGE_SAMPLED_BIT); + void destroy_image(const AllocatedImage &img) const; GPUMeshBuffers uploadMesh(std::span indices, std::span vertices); diff --git a/texture_compression.py b/texture_compression.py index 92bd1e5..2ca636d 100644 --- a/texture_compression.py +++ b/texture_compression.py @@ -156,7 +156,6 @@ def process_one(img_path: Path, out_dir: Path, role, opts): ktx_trans = [ "ktx", "transcode", "--target", target_bc, - "--zstd", "18", str(tmp_ktx2), str(out_ktx2) ] rc = run_cmd(ktx_trans, dry_run=opts.dry_run)