16 KiB
Render Graph: Per‑Frame Scheduling, Barriers, and Dynamic Rendering
Lightweight render graph that builds a per‑frame DAG from pass declarations, computes the necessary resource barriers/layout transitions, and records passes with dynamic rendering when attachments are declared.
Why
- Centralize synchronization and image layout transitions across passes.
- Make passes declarative: author declares reads/writes; the graph inserts barriers and begins/ends rendering.
- Keep existing pass classes (
IRenderPass) while migrating execution to the graph. - Provide runtime profiling and debugging capabilities for pass execution.
High‑Level Flow
- Engine creates the graph each frame and imports swapchain/G‑Buffer images:
src/core/engine.cpp. - Each pass registers its work by calling
register_graph(graph, ...)and declaring resources via a builder. - The graph appends a present chain (copy HDR
drawImage→ swapchain, then transition toPRESENT), optionally inserting ImGui before present. compile()topologically sorts passes by data dependencies (read/write hazards: RAW/WAW/WAR) and computes per‑pass barriers usingVkDependencyInfowithVk*MemoryBarrier2.execute(cmd)creates timestamp query pools, emits barriers, begins dynamic rendering if attachments were declared, calls the pass record lambda, ends rendering, and records GPU/CPU timings.resolve_timings()retrieves GPU timestamp results after the fence is signaled, converting them to milliseconds.
Core API
Lifecycle:
RenderGraph::init(ctx)— Initialize with engine context. Seesrc/render/graph/graph.cpp:28.RenderGraph::clear()— Clear all passes and reset resources. Seesrc/render/graph/graph.cpp:34.RenderGraph::shutdown()— Destroy GPU resources (query pools) before device shutdown. Seesrc/render/graph/graph.cpp:40.
Pass Registration:
RenderGraph::add_pass(name, RGPassType type, BuildCallback build, RecordCallback record)- Declare image/buffer accesses and attachments inside
buildusingRGPassBuilder. - Do your actual rendering/copies in
recordusing resolved Vulkan objects fromRGPassResources. - See:
src/render/graph/graph.h:42,src/render/graph/graph.cpp:91.
- Declare image/buffer accesses and attachments inside
- Legacy form:
add_pass(name, type, record)for passes with no resource declarations. Seesrc/render/graph/graph.cpp:117.
Resource Creation:
import_image(desc)/import_buffer(desc)— Import externally owned resources (deduplicated by VkImage/VkBuffer handle).create_image(desc)/create_buffer(desc)— Create transient resources (destroyed at end of frame via deletion queue).create_depth_image(name, extent, format=D32_SFLOAT)— Convenience helper for depth-only images with depth attachment + sampled usage. Seesrc/render/graph/graph.cpp:67.
Compilation and Execution:
RenderGraph::compile()— Build topological ordering (Kahn's algorithm) and per‑passVkImageMemoryBarrier2/VkBufferMemoryBarrier2lists. Returns false on error. Seesrc/render/graph/graph.cpp:123.RenderGraph::execute(cmd)— Creates timestamp query pool, emits barriers viavkCmdPipelineBarrier2, begins dynamic rendering if attachments exist, invokes record callbacks, ends rendering, and writes GPU timestamps. Seesrc/render/graph/graph.cpp:874.RenderGraph::resolve_timings()— Fetch GPU timestamp results after fence wait and convert to milliseconds. Must be called before nextexecute(). Seesrc/render/graph/graph.cpp:1314.
Import Helpers:
import_draw_image(),import_depth_image(),import_gbuffer_position(),import_gbuffer_normal(),import_gbuffer_albedo(),import_gbuffer_extra(),import_id_buffer(),import_swapchain_image(index)— Convenience wrappers for engine-owned images. Seesrc/render/graph/graph.cpp:1147–1312.
Present Chain:
add_present_chain(draw, swapchain, appendExtra)— InsertsPresentLetterboxpass (blit draw→swapchain with letterboxing) andPreparePresentpass (layout transition toPRESENT_SRC_KHR). OptionalappendExtracallback injects passes (e.g., ImGui) in between. Seesrc/render/graph/graph.cpp:1043.
Debug and Profiling:
pass_count(),pass_name(i),pass_enabled(i),set_pass_enabled(i, enabled)— Runtime pass enable/disable. Seesrc/render/graph/graph.h:105–108.debug_get_passes(out)— Retrieve pass metadata including GPU/CPU timings, resource access counts, attachment info. Seesrc/render/graph/graph.cpp:1163.debug_get_images(out)— Retrieve image metadata (imported/transient, format, extent, usage, lifetime). Seesrc/render/graph/graph.cpp:1186.debug_get_buffers(out)— Retrieve buffer metadata. Seesrc/render/graph/graph.cpp:1207.
Declaring a Pass
Use register_graph(...) on your pass to declare resources and record work. The graph handles transitions and dynamic rendering.
void MyPass::register_graph(RenderGraph* graph,
RGImageHandle draw,
RGImageHandle depth) {
graph->add_pass(
"MyPass",
RGPassType::Graphics,
// Build: declare resources + attachments
[draw, depth](RGPassBuilder& b, EngineContext*) {
b.read(draw, RGImageUsage::SampledFragment); // example read
b.write_color(draw); // render target
b.write_depth(depth, /*clear*/ false); // depth test
},
// Record: issue Vulkan commands (no begin/end rendering needed)
[this, draw](VkCommandBuffer cmd, const RGPassResources& res, EngineContext* ctx) {
VkPipeline p{}; VkPipelineLayout l{};
ctx->pipelines->getGraphics("my_pass", p, l); // hot‑reload safe
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, p);
VkViewport vp{0,0,(float)ctx->getDrawExtent().width,(float)ctx->getDrawExtent().height,0,1};
vkCmdSetViewport(cmd, 0, 1, &vp);
VkRect2D sc{{0,0}, ctx->getDrawExtent()};
vkCmdSetScissor(cmd, 0, 1, &sc);
vkCmdDraw(cmd, 3, 1, 0, 0);
}
);
}
Builder Reference (RGPassBuilder)
Passed to the BuildCallback to declare resource accesses and attachments. See src/render/graph/builder.h:40.
Image Access:
read(RGImageHandle, RGImageUsage)— Declare sampled/read usage (e.g.,SampledFragment,TransferSrc). Seesrc/render/graph/builder.cpp:20.write(RGImageHandle, RGImageUsage)— Declare write usage (e.g.,ComputeWrite,TransferDst). Seesrc/render/graph/builder.cpp:25.write_color(RGImageHandle, bool clearOnLoad=false, VkClearValue clear={})— Declare color attachment with optional clear. Sets usage toColorAttachmentandstore=trueby default. Seesrc/render/graph/builder.cpp:30.write_depth(RGImageHandle, bool clearOnLoad=false, VkClearValue clear={})— Declare depth attachment with optional clear. Seesrc/render/graph/builder.cpp:40.
Buffer Access:
read_buffer(RGBufferHandle, RGBufferUsage)— Declare buffer read (e.g.,VertexRead,IndexRead,UniformRead,StorageRead). Seesrc/render/graph/builder.cpp:50.write_buffer(RGBufferHandle, RGBufferUsage)— Declare buffer write (e.g.,StorageReadWrite,TransferDst). Seesrc/render/graph/builder.cpp:55.- Convenience overloads:
read_buffer(VkBuffer, RGBufferUsage, size, name)andwrite_buffer(VkBuffer, ...)automatically import and deduplicate by rawVkBufferhandle. Seesrc/render/graph/builder.cpp:60,70.
Resource Resolution (RGPassResources):
Used inside the RecordCallback to fetch resolved Vulkan objects. See src/render/graph/builder.h:22.
image(RGImageHandle)→VkImageimage_view(RGImageHandle)→VkImageViewbuffer(RGBufferHandle)→VkBuffer
Resource Model (RGResourceRegistry)
Manages both imported (externally owned) and transient (graph-owned) resources. See src/render/graph/resources.h:52.
Imported Resources:
- Deduplicated by raw Vulkan handle (
VkImage/VkBuffer) using hash maps (_imageLookup/_bufferLookup). Seesrc/render/graph/resources.cpp. - Initial layout/stage/access preserved from
RGImportedImageDesc/RGImportedBufferDesc. - Ownership remains external; graph does not destroy these resources.
Transient Resources:
- Created via
ResourceManager(AllocatedImage/AllocatedBuffer) with VMA allocations. Seesrc/render/graph/resources.cpp. - Automatically destroyed at end of frame via frame deletion queue.
- Usage flags must cover all declared usages (validated during
compile()).
Lifetime Tracking:
firstUseandlastUseindices computed duringcompile()(seesrc/render/graph/graph.cpp:854–869).- Used for debug visualization and future aliasing/pooling optimizations.
Records (RGImageRecord/RGBufferRecord):
Unified representation storing VkImage/VkBuffer, VkImageView, format, extent, initial state, and allocation info. See src/render/graph/resources.h:11,34.
Synchronization and Layouts
Barrier Generation (see src/render/graph/graph.cpp:232–851):
For each enabled pass, compile() tracks per-resource state (ImageState/BufferState) and inserts barriers when hazards are detected:
Image Barriers (VkImageMemoryBarrier2):
- Triggered by: layout change, prior write before read/write (RAW/WAW), prior reads before write (WAR).
- Stage/access/layout derived from
RGImageUsageviausage_info_image()(seesrc/render/graph/graph.cpp:313–365). - Aspect determined by usage and format (depth formats get
DEPTH_BIT, othersCOLOR_BIT). - Initial state from
RGImportedImageDesc::currentLayout/currentStage/currentAccess; if unknown (layout ≠ UNDEFINED but stage=NONE), conservatively assumesALL_COMMANDS + MEMORY_READ|WRITE.
Buffer Barriers (VkBufferMemoryBarrier2):
- Triggered by: prior write before read/write, prior reads before write.
- Stage/access derived from
RGBufferUsageviausage_info_buffer()(seesrc/render/graph/graph.cpp:367–411). - Size: exact size for transients,
VK_WHOLE_SIZEfor imports (to avoid validation errors).
Usage Priority and Conflict Resolution:
When a pass declares multiple conflicting usages for the same resource (e.g., both SampledFragment and ColorAttachment), the graph selects the highest-priority usage for layout determination (see image_usage_priority() at src/render/graph/graph.cpp:499). Stages and access masks are unioned. Warns if layout mismatch detected.
Image Usage → Layout/Stage/Access Mapping:
See usage_info_image() at src/render/graph/graph.cpp:313.
| RGImageUsage | Layout | Stage | Access |
|---|---|---|---|
SampledFragment |
SHADER_READ_ONLY_OPTIMAL |
FRAGMENT_SHADER |
SHADER_SAMPLED_READ |
SampledCompute |
SHADER_READ_ONLY_OPTIMAL |
COMPUTE_SHADER |
SHADER_SAMPLED_READ |
TransferSrc |
TRANSFER_SRC_OPTIMAL |
TRANSFER |
TRANSFER_READ |
TransferDst |
TRANSFER_DST_OPTIMAL |
TRANSFER |
TRANSFER_WRITE |
ColorAttachment |
COLOR_ATTACHMENT_OPTIMAL |
COLOR_ATTACHMENT_OUTPUT |
COLOR_ATTACHMENT_READ|WRITE |
DepthAttachment |
DEPTH_ATTACHMENT_OPTIMAL |
EARLY_FRAGMENT_TESTS|LATE_FRAGMENT_TESTS |
DEPTH_STENCIL_ATTACHMENT_READ|WRITE |
ComputeWrite |
GENERAL |
COMPUTE_SHADER |
SHADER_STORAGE_READ|WRITE |
Present |
PRESENT_SRC_KHR |
BOTTOM_OF_PIPE |
MEMORY_READ |
Buffer Usage → Stage/Access Mapping:
See usage_info_buffer() at src/render/graph/graph.cpp:367.
| RGBufferUsage | Stage | Access |
|---|---|---|
TransferSrc |
TRANSFER |
TRANSFER_READ |
TransferDst |
TRANSFER |
TRANSFER_WRITE |
VertexRead |
VERTEX_INPUT |
VERTEX_ATTRIBUTE_READ |
IndexRead |
INDEX_INPUT |
INDEX_READ |
UniformRead |
ALL_GRAPHICS|COMPUTE_SHADER |
UNIFORM_READ |
StorageRead |
COMPUTE_SHADER|ALL_GRAPHICS |
SHADER_STORAGE_READ |
StorageReadWrite |
COMPUTE_SHADER|ALL_GRAPHICS |
SHADER_STORAGE_READ|WRITE |
IndirectArgs |
DRAW_INDIRECT |
INDIRECT_COMMAND_READ |
Validation Warnings:
- Depth-format image declared as color attachment (or vice versa). See
src/render/graph/graph.cpp:645–657. - Transient resource used without required usage flags. See
src/render/graph/graph.cpp:659–667(images),818–826(buffers). - Multiple conflicting layouts in single pass. See
src/render/graph/graph.cpp:536–543.
Built‑In Pass Wiring (Current)
- Resource uploads (if any) → Background (compute) → Geometry (G‑Buffer) → Lighting (deferred) → SSR → Tonemap+Bloom → FXAA → Transparent → CopyToSwapchain → ImGui → PreparePresent.
- See registrations in
src/core/engine.cpp.
Topological Sorting and Scheduling
Dependency Graph Construction (see src/render/graph/graph.cpp:127–231):
- Reads/writes create directed edges:
writer → reader(RAW),writer → writer(WAW),reader → writer(WAR). - Disabled passes are skipped during edge construction but remain in the pass list.
- Kahn's algorithm produces a linear execution order respecting all dependencies.
- If cycle detected (topological sort fails), falls back to insertion order but still computes barriers.
Execution Order:
Passes execute in sorted order (or insertion order if cycle). Only enabled passes run; disabled passes are skipped during execute(). See src/render/graph/graph.cpp:895.
Dynamic Rendering Setup
Render Area Calculation (see src/render/graph/graph.cpp:936–1000):
- Chooses min extent across all color/depth attachments.
- Falls back to
EngineContext::drawExtentif no attachments. - Warns if color attachments have mismatched extents.
Attachment Construction:
- Color attachments:
VkRenderingAttachmentInfowithclearOnLoad→LOAD_OP_CLEAR/LOAD_OP_LOAD,store→STORE_OP_STORE/STORE_OP_DONT_CARE. - Depth attachment: similar logic;
clearValue.depthStencilused ifclearOnLoad=true. - Layout forced to
COLOR_ATTACHMENT_OPTIMALorDEPTH_ATTACHMENT_OPTIMAL.
See src/render/graph/graph.cpp:927–1012.
Profiling and Timing
GPU Timing (Timestamps):
- Per-frame
VkQueryPoolwith 2 queries per pass (begin/end). Created inexecute(), destroyed inresolve_timings()or nextexecute(). vkCmdWriteTimestamp2()atALL_COMMANDS_BITstage before/after pass recording (seesrc/render/graph/graph.cpp:919–923,1028–1032).resolve_timings()fetches results withVK_QUERY_RESULT_WAIT_BIT, converts ticks to milliseconds usingtimestampPeriod. Seesrc/render/graph/graph.cpp:1314–1355.
CPU Timing:
std::chrono::high_resolution_clockmeasures command recording duration (cpuStart/cpuEnd). Seesrc/render/graph/graph.cpp:924,1026.- Stored in
_lastCpuMillisvector; accessible viadebug_get_passes().
Debug Structures:
RGDebugPassInfo: name, type, enabled, resource counts, attachment info,gpuMillis,cpuMillis. Seesrc/render/graph/graph.h:66.RGDebugImageInfo: id, name, imported, format, extent, usage, lifetime. Seesrc/render/graph/graph.h:83.RGDebugBufferInfo: id, name, imported, size, usage, lifetime. Seesrc/render/graph/graph.h:94.
Notes & Limits
- No aliasing or transient pooling: Transient images/buffers created via
create_*are released end‑of‑frame via frame deletion queue. - Single-queue execution: Topological order is linear; no multi-queue parallelization.
- Minimal load/store control: Only
clearOnLoadandstoreflags onRGAttachmentInfo; no resolve or stencil control. - No mid-pass barriers: Conflicting usages within a single pass cannot be synchronized (warns but proceeds with unioned stages/access).
- No automatic resource aliasing: Future work could reuse transient allocations based on lifetime non-overlap.
Debugging
- Per-pass debug labels:
vkdebug::cmd_begin_label(cmd, "RG: <name>")wraps each pass (seesrc/render/graph/graph.cpp:903–906,1035–1038). - Compile-time validation warnings: Printed via
fmt::printlnfor format mismatches, missing usage flags, layout conflicts. - Runtime introspection: Use
debug_get_*APIs to export pass/image/buffer metadata for visualization/debugging tools.