Files
QuaternionEngine/docs/RenderGraph.md

243 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Render Graph: PerFrame Scheduling, Barriers, and Dynamic Rendering
Lightweight render graph that builds a perframe DAG from pass declarations, computes the necessary resource barriers/layout transitions, and records passes with dynamic rendering when attachments are declared.
### Why
- Centralize synchronization and image layout transitions across passes.
- Make passes declarative: author declares reads/writes; the graph inserts barriers and begins/ends rendering.
- Keep existing pass classes (`IRenderPass`) while migrating execution to the graph.
- Provide runtime profiling and debugging capabilities for pass execution.
### HighLevel Flow
- Engine creates the graph each frame and imports swapchain/GBuffer images: `src/core/engine.cpp`.
- Each pass registers its work by calling `register_graph(graph, ...)` and declaring resources via a builder.
- The graph appends a present chain (copy HDR `drawImage` → swapchain, then transition to `PRESENT`), optionally inserting ImGui before present.
- `compile()` topologically sorts passes by data dependencies (read/write hazards: RAW/WAW/WAR) and computes perpass barriers using `VkDependencyInfo` with `Vk*MemoryBarrier2`.
- `execute(cmd)` creates timestamp query pools, emits barriers, begins dynamic rendering if attachments were declared, calls the pass record lambda, ends rendering, and records GPU/CPU timings.
- `resolve_timings()` retrieves GPU timestamp results after the fence is signaled, converting them to milliseconds.
### Core API
**Lifecycle:**
- `RenderGraph::init(ctx)` — Initialize with engine context. See `src/render/graph/graph.cpp:28`.
- `RenderGraph::clear()` — Clear all passes and reset resources. See `src/render/graph/graph.cpp:34`.
- `RenderGraph::shutdown()` — Destroy GPU resources (query pools) before device shutdown. See `src/render/graph/graph.cpp:40`.
**Pass Registration:**
- `RenderGraph::add_pass(name, RGPassType type, BuildCallback build, RecordCallback record)`
- Declare image/buffer accesses and attachments inside `build` using `RGPassBuilder`.
- Do your actual rendering/copies in `record` using resolved Vulkan objects from `RGPassResources`.
- See: `src/render/graph/graph.h:42`, `src/render/graph/graph.cpp:91`.
- Legacy form: `add_pass(name, type, record)` for passes with no resource declarations. See `src/render/graph/graph.cpp:117`.
**Resource Creation:**
- `import_image(desc)` / `import_buffer(desc)` — Import externally owned resources (deduplicated by VkImage/VkBuffer handle).
- `create_image(desc)` / `create_buffer(desc)` — Create transient resources (destroyed at end of frame via deletion queue).
- `create_depth_image(name, extent, format=D32_SFLOAT)` — Convenience helper for depth-only images with depth attachment + sampled usage. See `src/render/graph/graph.cpp:67`.
**Compilation and Execution:**
- `RenderGraph::compile()` — Build topological ordering (Kahn's algorithm) and perpass `VkImageMemoryBarrier2` / `VkBufferMemoryBarrier2` lists. Returns false on error. See `src/render/graph/graph.cpp:123`.
- `RenderGraph::execute(cmd)` — Creates timestamp query pool, emits barriers via `vkCmdPipelineBarrier2`, begins dynamic rendering if attachments exist, invokes record callbacks, ends rendering, and writes GPU timestamps. See `src/render/graph/graph.cpp:874`.
- `RenderGraph::resolve_timings()` — Fetch GPU timestamp results after fence wait and convert to milliseconds. Must be called before next `execute()`. See `src/render/graph/graph.cpp:1314`.
**Import Helpers:**
- `import_draw_image()`, `import_depth_image()`, `import_gbuffer_position()`, `import_gbuffer_normal()`, `import_gbuffer_albedo()`, `import_gbuffer_extra()`, `import_id_buffer()`, `import_swapchain_image(index)` — Convenience wrappers for engine-owned images. See `src/render/graph/graph.cpp:11471312`.
**Present Chain:**
- `add_present_chain(draw, swapchain, appendExtra)` — Inserts `PresentLetterbox` pass (blit draw→swapchain with letterboxing) and `PreparePresent` pass (layout transition to `PRESENT_SRC_KHR`). Optional `appendExtra` callback injects passes (e.g., ImGui) in between. See `src/render/graph/graph.cpp:1043`.
**Debug and Profiling:**
- `pass_count()`, `pass_name(i)`, `pass_enabled(i)`, `set_pass_enabled(i, enabled)` — Runtime pass enable/disable. See `src/render/graph/graph.h:105108`.
- `debug_get_passes(out)` — Retrieve pass metadata including GPU/CPU timings, resource access counts, attachment info. See `src/render/graph/graph.cpp:1163`.
- `debug_get_images(out)` — Retrieve image metadata (imported/transient, format, extent, usage, lifetime). See `src/render/graph/graph.cpp:1186`.
- `debug_get_buffers(out)` — Retrieve buffer metadata. See `src/render/graph/graph.cpp:1207`.
### Declaring a Pass
Use `register_graph(...)` on your pass to declare resources and record work. The graph handles transitions and dynamic rendering.
```c++
void MyPass::register_graph(RenderGraph* graph,
RGImageHandle draw,
RGImageHandle depth) {
graph->add_pass(
"MyPass",
RGPassType::Graphics,
// Build: declare resources + attachments
[draw, depth](RGPassBuilder& b, EngineContext*) {
b.read(draw, RGImageUsage::SampledFragment); // example read
b.write_color(draw); // render target
b.write_depth(depth, /*clear*/ false); // depth test
},
// Record: issue Vulkan commands (no begin/end rendering needed)
[this, draw](VkCommandBuffer cmd, const RGPassResources& res, EngineContext* ctx) {
VkPipeline p{}; VkPipelineLayout l{};
ctx->pipelines->getGraphics("my_pass", p, l); // hotreload safe
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, p);
VkViewport vp{0,0,(float)ctx->getDrawExtent().width,(float)ctx->getDrawExtent().height,0,1};
vkCmdSetViewport(cmd, 0, 1, &vp);
VkRect2D sc{{0,0}, ctx->getDrawExtent()};
vkCmdSetScissor(cmd, 0, 1, &sc);
vkCmdDraw(cmd, 3, 1, 0, 0);
}
);
}
```
### Builder Reference (`RGPassBuilder`)
Passed to the `BuildCallback` to declare resource accesses and attachments. See `src/render/graph/builder.h:40`.
**Image Access:**
- `read(RGImageHandle, RGImageUsage)` — Declare sampled/read usage (e.g., `SampledFragment`, `TransferSrc`). See `src/render/graph/builder.cpp:20`.
- `write(RGImageHandle, RGImageUsage)` — Declare write usage (e.g., `ComputeWrite`, `TransferDst`). See `src/render/graph/builder.cpp:25`.
- `write_color(RGImageHandle, bool clearOnLoad=false, VkClearValue clear={})` — Declare color attachment with optional clear. Sets usage to `ColorAttachment` and `store=true` by default. See `src/render/graph/builder.cpp:30`.
- `write_depth(RGImageHandle, bool clearOnLoad=false, VkClearValue clear={})` — Declare depth attachment with optional clear. See `src/render/graph/builder.cpp:40`.
**Buffer Access:**
- `read_buffer(RGBufferHandle, RGBufferUsage)` — Declare buffer read (e.g., `VertexRead`, `IndexRead`, `UniformRead`, `StorageRead`). See `src/render/graph/builder.cpp:50`.
- `write_buffer(RGBufferHandle, RGBufferUsage)` — Declare buffer write (e.g., `StorageReadWrite`, `TransferDst`). See `src/render/graph/builder.cpp:55`.
- Convenience overloads: `read_buffer(VkBuffer, RGBufferUsage, size, name)` and `write_buffer(VkBuffer, ...)` automatically import and deduplicate by raw `VkBuffer` handle. See `src/render/graph/builder.cpp:60,70`.
**Resource Resolution (`RGPassResources`):**
Used inside the `RecordCallback` to fetch resolved Vulkan objects. See `src/render/graph/builder.h:22`.
- `image(RGImageHandle)` → `VkImage`
- `image_view(RGImageHandle)` → `VkImageView`
- `buffer(RGBufferHandle)` → `VkBuffer`
### Resource Model (`RGResourceRegistry`)
Manages both imported (externally owned) and transient (graph-owned) resources. See `src/render/graph/resources.h:52`.
**Imported Resources:**
- Deduplicated by raw Vulkan handle (`VkImage`/`VkBuffer`) using hash maps (`_imageLookup`/`_bufferLookup`). See `src/render/graph/resources.cpp`.
- Initial layout/stage/access preserved from `RGImportedImageDesc`/`RGImportedBufferDesc`.
- Ownership remains external; graph does not destroy these resources.
**Transient Resources:**
- Created via `ResourceManager` (`AllocatedImage`/`AllocatedBuffer`) with VMA allocations. See `src/render/graph/resources.cpp`.
- Automatically destroyed at end of frame via frame deletion queue.
- Usage flags must cover all declared usages (validated during `compile()`).
**Lifetime Tracking:**
- `firstUse` and `lastUse` indices computed during `compile()` (see `src/render/graph/graph.cpp:854869`).
- Used for debug visualization and future aliasing/pooling optimizations.
**Records (`RGImageRecord`/`RGBufferRecord`):**
Unified representation storing `VkImage`/`VkBuffer`, `VkImageView`, format, extent, initial state, and allocation info. See `src/render/graph/resources.h:11,34`.
### Synchronization and Layouts
**Barrier Generation (see `src/render/graph/graph.cpp:232851`):**
For each enabled pass, `compile()` tracks per-resource state (`ImageState`/`BufferState`) and inserts barriers when hazards are detected:
**Image Barriers (`VkImageMemoryBarrier2`):**
- Triggered by: layout change, prior write before read/write (RAW/WAW), prior reads before write (WAR).
- Stage/access/layout derived from `RGImageUsage` via `usage_info_image()` (see `src/render/graph/graph.cpp:313365`).
- Aspect determined by usage and format (depth formats get `DEPTH_BIT`, others `COLOR_BIT`).
- Initial state from `RGImportedImageDesc::currentLayout/currentStage/currentAccess`; if unknown (layout ≠ UNDEFINED but stage=NONE), conservatively assumes `ALL_COMMANDS + MEMORY_READ|WRITE`.
**Buffer Barriers (`VkBufferMemoryBarrier2`):**
- Triggered by: prior write before read/write, prior reads before write.
- Stage/access derived from `RGBufferUsage` via `usage_info_buffer()` (see `src/render/graph/graph.cpp:367411`).
- Size: exact size for transients, `VK_WHOLE_SIZE` for imports (to avoid validation errors).
**Usage Priority and Conflict Resolution:**
When a pass declares multiple conflicting usages for the same resource (e.g., both `SampledFragment` and `ColorAttachment`), the graph selects the highest-priority usage for layout determination (see `image_usage_priority()` at `src/render/graph/graph.cpp:499`). Stages and access masks are unioned. Warns if layout mismatch detected.
**Image Usage → Layout/Stage/Access Mapping:**
See `usage_info_image()` at `src/render/graph/graph.cpp:313`.
| RGImageUsage | Layout | Stage | Access |
|---|---|---|---|
| `SampledFragment` | `SHADER_READ_ONLY_OPTIMAL` | `FRAGMENT_SHADER` | `SHADER_SAMPLED_READ` |
| `SampledCompute` | `SHADER_READ_ONLY_OPTIMAL` | `COMPUTE_SHADER` | `SHADER_SAMPLED_READ` |
| `TransferSrc` | `TRANSFER_SRC_OPTIMAL` | `TRANSFER` | `TRANSFER_READ` |
| `TransferDst` | `TRANSFER_DST_OPTIMAL` | `TRANSFER` | `TRANSFER_WRITE` |
| `ColorAttachment` | `COLOR_ATTACHMENT_OPTIMAL` | `COLOR_ATTACHMENT_OUTPUT` | `COLOR_ATTACHMENT_READ\|WRITE` |
| `DepthAttachment` | `DEPTH_ATTACHMENT_OPTIMAL` | `EARLY_FRAGMENT_TESTS\|LATE_FRAGMENT_TESTS` | `DEPTH_STENCIL_ATTACHMENT_READ\|WRITE` |
| `ComputeWrite` | `GENERAL` | `COMPUTE_SHADER` | `SHADER_STORAGE_READ\|WRITE` |
| `Present` | `PRESENT_SRC_KHR` | `BOTTOM_OF_PIPE` | `MEMORY_READ` |
**Buffer Usage → Stage/Access Mapping:**
See `usage_info_buffer()` at `src/render/graph/graph.cpp:367`.
| RGBufferUsage | Stage | Access |
|---|---|---|
| `TransferSrc` | `TRANSFER` | `TRANSFER_READ` |
| `TransferDst` | `TRANSFER` | `TRANSFER_WRITE` |
| `VertexRead` | `VERTEX_INPUT` | `VERTEX_ATTRIBUTE_READ` |
| `IndexRead` | `INDEX_INPUT` | `INDEX_READ` |
| `UniformRead` | `ALL_GRAPHICS\|COMPUTE_SHADER` | `UNIFORM_READ` |
| `StorageRead` | `COMPUTE_SHADER\|ALL_GRAPHICS` | `SHADER_STORAGE_READ` |
| `StorageReadWrite` | `COMPUTE_SHADER\|ALL_GRAPHICS` | `SHADER_STORAGE_READ\|WRITE` |
| `IndirectArgs` | `DRAW_INDIRECT` | `INDIRECT_COMMAND_READ` |
**Validation Warnings:**
- Depth-format image declared as color attachment (or vice versa). See `src/render/graph/graph.cpp:645657`.
- Transient resource used without required usage flags. See `src/render/graph/graph.cpp:659667` (images), `818826` (buffers).
- Multiple conflicting layouts in single pass. See `src/render/graph/graph.cpp:536543`.
### BuiltIn Pass Wiring (Current)
- Resource uploads (if any) → Background (compute) → Geometry (GBuffer) → Lighting (deferred) → SSR → Tonemap+Bloom → FXAA → Transparent → CopyToSwapchain → ImGui → PreparePresent.
- See registrations in `src/core/engine.cpp`.
### Topological Sorting and Scheduling
**Dependency Graph Construction (see `src/render/graph/graph.cpp:127231`):**
- Reads/writes create directed edges: `writer → reader` (RAW), `writer → writer` (WAW), `reader → writer` (WAR).
- Disabled passes are skipped during edge construction but remain in the pass list.
- Kahn's algorithm produces a linear execution order respecting all dependencies.
- If cycle detected (topological sort fails), falls back to insertion order but still computes barriers.
**Execution Order:**
Passes execute in sorted order (or insertion order if cycle). Only enabled passes run; disabled passes are skipped during `execute()`. See `src/render/graph/graph.cpp:895`.
### Dynamic Rendering Setup
**Render Area Calculation (see `src/render/graph/graph.cpp:9361000`):**
- Chooses min extent across all color/depth attachments.
- Falls back to `EngineContext::drawExtent` if no attachments.
- Warns if color attachments have mismatched extents.
**Attachment Construction:**
- Color attachments: `VkRenderingAttachmentInfo` with `clearOnLoad` → `LOAD_OP_CLEAR` / `LOAD_OP_LOAD`, `store` → `STORE_OP_STORE` / `STORE_OP_DONT_CARE`.
- Depth attachment: similar logic; `clearValue.depthStencil` used if `clearOnLoad=true`.
- Layout forced to `COLOR_ATTACHMENT_OPTIMAL` or `DEPTH_ATTACHMENT_OPTIMAL`.
See `src/render/graph/graph.cpp:9271012`.
### Profiling and Timing
**GPU Timing (Timestamps):**
- Per-frame `VkQueryPool` with 2 queries per pass (begin/end). Created in `execute()`, destroyed in `resolve_timings()` or next `execute()`.
- `vkCmdWriteTimestamp2()` at `ALL_COMMANDS_BIT` stage before/after pass recording (see `src/render/graph/graph.cpp:919923`, `10281032`).
- `resolve_timings()` fetches results with `VK_QUERY_RESULT_WAIT_BIT`, converts ticks to milliseconds using `timestampPeriod`. See `src/render/graph/graph.cpp:13141355`.
**CPU Timing:**
- `std::chrono::high_resolution_clock` measures command recording duration (`cpuStart`/`cpuEnd`). See `src/render/graph/graph.cpp:924`, `1026`.
- Stored in `_lastCpuMillis` vector; accessible via `debug_get_passes()`.
**Debug Structures:**
- `RGDebugPassInfo`: name, type, enabled, resource counts, attachment info, `gpuMillis`, `cpuMillis`. See `src/render/graph/graph.h:66`.
- `RGDebugImageInfo`: id, name, imported, format, extent, usage, lifetime. See `src/render/graph/graph.h:83`.
- `RGDebugBufferInfo`: id, name, imported, size, usage, lifetime. See `src/render/graph/graph.h:94`.
### Notes & Limits
- **No aliasing or transient pooling**: Transient images/buffers created via `create_*` are released endofframe via frame deletion queue.
- **Single-queue execution**: Topological order is linear; no multi-queue parallelization.
- **Minimal load/store control**: Only `clearOnLoad` and `store` flags on `RGAttachmentInfo`; no resolve or stencil control.
- **No mid-pass barriers**: Conflicting usages within a single pass cannot be synchronized (warns but proceeds with unioned stages/access).
- **No automatic resource aliasing**: Future work could reuse transient allocations based on lifetime non-overlap.
### Debugging
- **Per-pass debug labels**: `vkdebug::cmd_begin_label(cmd, "RG: <name>")` wraps each pass (see `src/render/graph/graph.cpp:903906`, `10351038`).
- **Compile-time validation warnings**: Printed via `fmt::println` for format mismatches, missing usage flags, layout conflicts.
- **Runtime introspection**: Use `debug_get_*` APIs to export pass/image/buffer metadata for visualization/debugging tools.