Loading
Thesis Research
Stylized Deferred Rendering

A stylized deferred rendering pipeline built on a custom DirectX 12 engine, inspired by Minecraft’s Iris/OptiFine shader ecosystem. The project implements a complete G-Buffer pipeline with SM6.6 bindless resources, a modular ShaderBundle system with hot-swappable shader packs, and a suite of post-processing effects including volumetric clouds, volumetric lighting, screen-space reflections with three-layer blending, SSAO, bloom via compute-shader mipmap generation, Lottes 2016 tonemapping with BSL-derived color grading, water refraction, and sky glare. The engine layer provides chunk batch rendering with frustum culling, multi-frame in-flight GPU pipelining, and dirty-tracked root signature binding for efficient draw-call submission.

0:00
/0:00
Demonstration of Stylize Rendering and In App ShaderPack Switching

Deferred Rendering Pipeline

The rendering pipeline is built on a strict four-layer architecture: the application layer (Game) drives scene logic, the integration layer (RendererSubsystem) exposes 60+ public APIs, the core system layer manages DirectX 12 state and PSO caching, and the resource layer handles GPU memory through RAII-based D12Resource subclasses. All textures and buffers are registered into a single global descriptor heap (1M capacity) and accessed via SM6.6 bindless indexing, eliminating traditional root signature slot juggling and enabling the shader to reference any resource through ResourceDescriptorHeap[index]. The pipeline is driven by a set of discrete RenderPass classes on the application side (Shadow, Terrain, TerrainCutout, TerrainTranslucent, SkyBasic, SkyTextured, Cloud, Deferred, Composite, and Final), each encapsulating its own shader programs, texture bindings, and render state. Render targets, depth textures, and shader programs are resolved at runtime through a data-driven ShaderBundle configuration.

Supporting this pipeline is a set of tightly scoped subsystems: a DXCCompiler that translates HLSL to DXIL with include graph resolution and comment-directive parsing, a UniformManager orchestrating 11 constant buffers via the Template Method pattern, a PSOManager that lazily creates and caches pipeline state objects keyed by shader + render state + RT format, and provider classes (ColorTextureProvider, DepthTextureProvider, SamplerProvider) that own render target allocation, ping-pong flipping, depth copy semantics, and dynamic sampler binding.

graph TB
    subgraph APP["App Layer"]
        RP["RenderPasses"]
    end

    subgraph INT["Integration Layer"]
        RS["RendererSubsystem"]
        FQ["FullQuadsRenderer"]
    end

    subgraph CORE["Core Systems"]
        D3D["D3D12RenderSystem"]
        PSO["PSOManager"]
        UM["UniformManager"]
        DXC["DXCCompiler"]
    end

    subgraph BUNDLE["ShaderBundle System"]
        SBM["ShaderBundleManager"]
        FB["FallbackChain"]
        PROP["ShaderProperties"]
    end

    subgraph PROVIDER["Providers"]
        CTP["ColorTextureProvider"]
        DTP["DepthTextureProvider"]
        SP["SamplerProvider"]
    end

    subgraph RES["Resource Layer"]
        BRM["BindlessResourceManager"]
        GDH["GlobalDescriptorHeap"]
        BUF["D12Buffer"]
        TEX["D12Texture"]
    end

    RP --> RS
    RP --> FQ
    RS --> D3D
    RS --> PSO
    RS --> UM
    RS --> CTP
    RS --> DTP
    RS --> SP
    SBM --> DXC
    SBM --> FB
    SBM --> PROP
    PSO --> SBM
    PSO --> D3D
    CTP --> TEX
    DTP --> TEX
    UM --> BUF
    BRM --> GDH
    TEX --> BRM
    BUF --> BRM

Render Target / Depth Provider

The engine manages all render targets through a unified provider architecture built on the IRenderTargetProvider interface. Four concrete providers, ColorTextureProvider (colortex0 to 15), DepthTextureProvider (depthtex0 to 2), ShadowColorProvider (shadowcolor0 to 7), and ShadowTextureProvider (shadowtex0 to 1), each own their GPU resources and expose a consistent API for creation, binding, clearing, and resizing. A RenderTargetBinder aggregates all four providers behind a single facade, using pending/current state hashing to minimize redundant OMSetRenderTargets calls.

Color and shadow-color targets use a dual-texture ping-pong scheme: each D12RenderTarget holds a main and an alternate D12Texture, with a BufferFlipState<N> bitset tracking which buffer is currently the write target per slot. Calling Flip(index) swaps read/write roles without any GPU copy. Depth textures are single-buffered. DepthTextureProvider instead snapshots depth at key pipeline stages via CopyDepth() (for example, copying depthtex0 into depthtex1 before translucent geometry). Every texture is registered into the global bindless descriptor heap at creation time. Each provider maintains an IndexUniforms constant buffer that maps slot indices to bindless SRV handles, updated per-frame via UpdateIndices() and uploaded to a dedicated cbuffer register (b3 to b6) so shaders can sample any RT by index.

graph TB
    RTB["RenderTargetBinder"]

    subgraph Providers
        CTP["ColorTextureProvider"]
        DTP["DepthTextureProvider"]
        SCP["ShadowColorProvider"]
        STP["ShadowTextureProvider"]
    end

    subgraph Resources
        DRT["D12RenderTarget"]
        DDT["D12DepthTexture"]
    end

    BFS["BufferFlipState"]
    IDX["IndexUniforms"]
    GDH["GlobalDescriptorHeap"]

    RTB --> CTP
    RTB --> DTP
    RTB --> SCP
    RTB --> STP

    CTP --> DRT
    SCP --> DRT
    DTP --> DDT
    STP --> DDT

    CTP --> BFS
    SCP --> BFS

    CTP --> IDX
    DTP --> IDX
    SCP --> IDX
    STP --> IDX

    DRT --> GDH
    DDT --> GDH
    IDX -.-> GDH

Vertex Layout Registration

A deferred renderer needs to push very different vertex data depending on the pass. A full-screen quad only requires position, UV, and color (PCU, 24 bytes), while terrain geometry carries normals, lightmap coordinates, block entity IDs, and mid-texture coordinates for atlas animation (TerrainVertex, 56 bytes). Hardcoding any single input layout into the PSO would either waste bandwidth or lock out game-layer extensions entirely. The engine solves this with a VertexLayoutRegistry: an abstract VertexLayout base class exposes GetInputElements() / GetStride(), concrete implementations (Vertex_PCU, Vertex_PCUTBN, TerrainVertex) self-register at startup, and the registry hands out const VertexLayout* pointers by name. Because the pointer is part of the PSOKey hash, the PSOManager automatically caches a separate PSO per shader-layout combination with zero manual bookkeeping.

Each RenderPass declares its layout in BeginPass() via SetVertexLayout(). Terrain passes bind TerrainVertexLayout, everything else uses the default Vertex_PCUTBN. The renderer resets to the default at the start of every frame, so passes that don’t call SetVertexLayout() just work. Game-layer layouts like TerrainVertexLayout are registered after engine startup, keeping the dependency arrow pointing inward (Open-Closed Principle). TerrainVertexLayout also exposes a MulticastDelegate event (OnBuildVertexLayout) that lets the ShaderBundle system inject material IDs into vertices at mesh-build time, without the vertex type or chunk mesh ever knowing about shader packs.

graph LR
    subgraph Registry["VertexLayoutRegistry"]
        VL["VertexLayout"]
        PCU["Vertex_PCU"]
        PCUTBN["Vertex_PCUTBN"]
        TV["TerrainVertex"]
    end

    VL --> PCU
    VL --> PCUTBN
    VL --> TV

    RP["RenderPass"] -->|SetVertexLayout| RS["RendererSubsystem"]
    RS -->|layout ptr in PSOKey| PSO["PSOManager"]
    PSO -->|GetInputElements| VL

Per Render Target Blend Config

In a deferred pipeline, different render targets in the same pass often need different blending behavior. A water pass might alpha-blend color output on colortex0 while writing normals to colortex4 with blending disabled to avoid corrupting G-Buffer data. DirectX 12 supports this through IndependentBlendEnable, but configuring it per-pass and per-RT from shader pack authors requires a clean data-driven path. The engine provides exactly that through two configuration channels.

The first channel is shaders.properties, where ShaderBundle authors declare blend modes per program and optionally per colortex slot. A global directive like blend.gbuffers_water = SRC_ALPHA ONE_MINUS_SRC_ALPHA ONE ONE_MINUS_SRC_ALPHA sets the default blend for all RTs in that pass, while a per-buffer override like blend.gbuffers_water.colortex4 = off disables blending on a specific slot. During ShaderBundle loading, ShaderProperties parses these directives and InjectBlendDirectives() maps the semantic colortex index to the physical RT slot index through the drawBuffers array, then stores the result in ProgramDirectives. The second channel is direct code configuration, where a RenderPass calls SetBlendConfig(config) for global blend or SetBlendConfig(config, rtIndex) for per-RT override.

At draw time, RendererSubsystem packs the current blend state (global config + up to 8 per-RT overrides) into the PSOKey. The PSOManager fills all 8 RT slots with the global config first, then applies per-RT overrides only where isUndefined is false. This sentinel value distinguishes “explicitly set to opaque” from “not configured”, ensuring that unspecified slots inherit the global default cleanly. Each RenderPass resets blend to Opaque() after drawing to prevent state leaking into the next program.

graph TB
    subgraph Config["Configuration Sources"]
        SP["shaders.properties"]
        CODE["RenderPass Code"]
    end

    subgraph Processing
        PARSE["ShaderProperties"]
        INJ["InjectBlendDirectives"]
        PD["ProgramDirectives"]
    end

    subgraph Runtime
        RSUB["RendererSubsystem"]
        PKEY["PSOKey"]
        PSOMGR["PSOManager"]
        D3D["D3D12_BLEND_DESC"]
    end

    SP --> PARSE --> INJ --> PD
    CODE --> RSUB
    PD --> RSUB
    RSUB --> PKEY --> PSOMGR --> D3D

DXC Compiler

The engine compiles all HLSL shaders at runtime through a DXCCompiler wrapper around Microsoft’s DirectX Shader Compiler. Every shader targets Shader Model 6.6 with HLSL 2021 syntax and 16-bit type support enabled, producing DXIL bytecode that the PSOManager consumes directly. Because the engine uses a fully bindless architecture, traditional shader reflection (ID3D12ShaderReflection) is stripped entirely. Resource indices arrive through root constants rather than reflected binding slots, which cuts compilation time significantly and removes a whole class of binding mismatch bugs.

Before source code reaches DXC, the engine’s own include and directive systems have already processed it. The IncludeGraph builds a dependency tree, the IncludeProcessor flattens it into a single translation unit, and the CommentDirectiveParser extracts render state from structured comments. DXC receives a fully self-contained source string with no remaining #include directives.

Include Graph

ShaderBundle HLSL files use #include to share common libraries (lighting math, noise functions, uniform declarations). Rather than relying on DXC’s built-in file-system include handler, the engine resolves includes ahead of time through a two-phase process. First, IncludeGraph performs a BFS traversal starting from the entry file, discovering every transitive dependency and building a DAG of FileNode objects. Each node records its ShaderPath (a normalized Unix-style virtual path) and its list of child includes. Second, IncludeProcessor walks this graph in DFS order and concatenates the file contents into a single expanded source string, skipping duplicates so that shared headers are included exactly once. The result is a flat, self-contained HLSL source that DXC compiles without ever touching the file system.

This design also enables fast incremental checks. Because the graph is cached per ShaderBundle load, the engine can detect when a shared header changes and invalidate only the programs that depend on it.

Comment Directive Parsing

Iris-style shaders encode render state directly in HLSL comments rather than in external configuration files. The engine’s CommentDirectiveParser is a stateless scanner that extracts these directives from the expanded pixel shader source and populates a ProgramDirectives object. Supported directives include:

  • /* RENDERTARGETS: 0,3,4 */ or /* DRAWBUFFERS:034 */ to declare which color attachments this program writes to, determining NumRenderTargets and RTVFormats[] in the PSO
  • /* DEPTH_TEST: LEQUAL */ and /* DEPTH_WRITE: true */ to configure depth-stencil state
  • /* CULLFACE: BACK */ to set rasterizer cull mode
  • /* BLEND: ADD */ for blend operation hints

The PSOManager reads these directives when building a PSOKey, so each unique combination of draw buffers, depth mode, cull face, and blend config produces its own cached PSO. This keeps shader authors in full control of GPU state without touching any C++ code.

graph LR
    subgraph Include["Include Resolution"]
        IG["IncludeGraph BFS"]
        IP["IncludeProcessor DFS"]
    end

    subgraph Directives["Directive Extraction"]
        CDP["CommentDirectiveParser"]
        PD["ProgramDirectives"]
    end

    HLSL["HLSL Source Files"] --> IG --> IP --> FLAT["Expanded Source"]
    FLAT --> DXC["DXCCompiler SM6.6"]
    FLAT --> CDP --> PD

    DXC --> DXIL["DXIL Bytecode"]
    PD --> PKEY["PSOKey"]
    DXIL --> PSOMGR["PSOManager"]
    PKEY --> PSOMGR
    PSOMGR --> PSO["ID3D12PipelineState"]

Built-in Engine Shader Library

The engine ships with a self-contained shader library that provides every built-in program, uniform declaration, and utility function a ShaderBundle needs to render a complete frame. Any ShaderBundle shader can #include "../include/core.hlsl" to pull in all uniform cbuffers, vertex structures (VSInput/VSOutput), the standard vertex transform, and math constants in a single line.

Uniform buffers are split across two register spaces. Space 0 holds engine-managed cbuffers that the renderer populates automatically every frame or every draw call: transform matrices (b7), camera parameters (b9), viewport dimensions (b10), per-object data (b1), and all bindless index tables for color textures (b3), depth textures (b4), shadow colors (b5), shadow textures (b6), samplers (b8), and custom images (b2). Space 1 is reserved for game-side and ShaderBundle-authored uniforms that map to Iris-compatible variables: world time (b1), fog parameters (b2), world info like cloud height and ambient light (b3), common rendering state such as rain strength and sky color (b8), and celestial data including sun angle and shadow light position (b9). This separation means the engine’s root signature owns space 0 through direct root CBVs while space 1 binds through a descriptor table that the game layer fills, so ShaderBundle authors can add custom cbuffers in space 1 without touching engine code.

All texture access is fully bindless. Rather than declaring Texture2D : register(t*), each include file stores bindless SRV indices in uint4 arrays inside cbuffers, and shaders sample via ResourceDescriptorHeap[index]. The uint4 packing avoids the HLSL cbuffer alignment trap where scalar arrays pad each element to 16 bytes. Several include files also provide helper functions beyond raw data: LinearizeDepth() and LinearToNDCDepth() in camera uniforms, CalculateFogFactor() and ApplyFog() in fog uniforms, and named sampler aliases (linearSampler, pointSampler, shadowSampler, wrapSampler) in sampler uniforms.

engine/shaders/
  core/
    core.hlsl                    main entry, includes all uniforms
    gbuffers_basic.vs/ps.hlsl
    gbuffers_textured.vs/ps.hlsl
  include/
    camera_uniforms.hlsl         b9 space0, LinearizeDepth()
    matrices_uniforms.hlsl       b7 space0, all MVP/shadow matrices
    viewport_uniforms.hlsl       b10 space0, resolution and aspect
    color_texture_uniforms.hlsl  b3 space0, colortex0-15 indices
    depth_texture_uniforms.hlsl  b4 space0, depthtex0-2 indices
    shadow_color_uniforms.hlsl   b5 space0, shadowcolor0-7 indices
    shadow_texture_uniforms.hlsl b6 space0, shadowtex0-1 indices
    sampler_uniforms.hlsl        b8 space0, sampler aliases
    perobject_uniforms.hlsl      b1 space0, model matrix/color
    custom_image_uniforms.hlsl   b2 space0, custom texture indices
    common_uniforms.hlsl         b8 space1, renderStage/rain/sky
    worldtime_uniforms.hlsl      b1 space1, worldTime/moonPhase
    fog_uniforms.hlsl            b2 space1, fog color/density
    worldinfo_uniforms.hlsl      b3 space1, cloud height/ambient
    celestial_uniforms.hlsl      b9 space1, sun/moon/shadow pos
    developer_uniforms.hlsl      debug-only uniforms
  lib/
    fog.hlsl                     fog calculation utilities
    spaceConversion.hlsl         coordinate space transforms
  program/
    gbuffers_terrain.vs/ps.hlsl  terrain rendering
    gbuffers_water.vs/ps.hlsl    water surface
    shadow.vs/ps.hlsl            shadow map generation
    composite.vs/ps.hlsl         post-process compositing
    final.vs/ps.hlsl             final output to swapchain
    ...                          sky, cloud, debug programs

Compute Shader and Mipmap Pipeline

The engine extends its PSO infrastructure to support compute shaders alongside the existing graphics pipeline. ShaderProgram and PSOManager handle compute PSO creation and caching using the same key-based approach as graphics PSOs, with UniformManager supporting a PerDispatch update frequency for compute-specific constant buffers. The bindless root signature includes a dedicated ROOT_CBV_MIPGEN slot (b11) for mipmap generation uniforms.

The MipmapGenerator is a pure static engine service that produces GPU-driven mipmaps via compute shader dispatch. It supports four filter modes (Box, AlphaWeighted, SRGB, AlphaWeightedSRGB) compiled as shader variants at pipeline initialization time through the RendererEvents::OnPipelineReady event. Each mip level is generated by dispatching ceil(w/8) x ceil(h/8) thread groups that read from the source mip’s SRV and write to the target mip’s UAV, with UAV barriers inserted between levels to ensure correct ordering. The generator routes dispatches to the compute queue when available, falling back to the graphics queue with proper cross-queue synchronization.

Render targets opt into mipmap generation through the colortexNMipmapEnabled directive in shaders.properties. The CompositeRenderPass calls generateMipmapsForMarkedTargets() between composite sub-passes, so later passes (like composite4 bloom) can read hardware mipmaps written by earlier passes. Atlas border extrusion prevents color bleeding at texture edges during mip downsampling.

Multi-frame In-flight Rendering

The engine supports configurable multi-frame in-flight GPU pipelining (1 to 3 concurrent frames, default 2), allowing the CPU to submit frame N while the GPU executes frame N-1. Frame-partitioned resource layouts use the compiled active depth to size ring buffers, descriptor tables, and per-frame allocations. FrameSlotAcquisitionResult provides detailed diagnostics including wait tracking and retirement statistics. This architecture eliminates CPU stalls on GPU completion for the common case, with fence-based synchronization preventing resource hazards across in-flight frames.

Dirty-tracked Root Signature Binding

GraphicsRootBinder replaces the previous DrawBindingHelper with a dirty-tracking cache that minimizes redundant D3D12 API calls. It maintains per-slot validity flags for the root signature, 15 engine CBV slots, and descriptor tables. Each Bind*IfDirty() method returns a boolean indicating whether the binding was actually necessary, and the system tracks diagnostics (bind counts, cache hits, invalidation counts) for profiling. Selective invalidation (InvalidateAll() vs InvalidateDescriptorTables()) allows fine-grained cache control when pipeline state changes. Profiling shows this reduces root signature rebinding by over 90% in typical terrain rendering scenarios, with BindDescriptorTableIfDirty contributing only ~5ms per hot path.

PerPass Uniform Scope

The PerPass uniform scope system allows each composite sub-pass to maintain its own set of custom image bindings without cross-contamination. SceneRenderPass tracks a pass scope that commits dirty-tracked CustomImage snapshots at pass boundaries, ensuring that composite4’s bloom textures don’t leak into composite1’s volumetric light pass. This replaces the previous per-frame-only uniform commit model and is critical for the multi-sub-pass composite architecture.

Chunk Batch Rendering

The chunk batch system organizes voxel terrain into 4x4 chunk regions, each maintaining separate vertex/index buffer allocations with per-layer spans (opaque, cutout, translucent). A ChunkBatchCollector performs frustum culling against region world bounds using the ICullingVolumeProvider interface, producing a ChunkBatchCollection of visible draw items. ChunkBatchRenderer submits these items with base vertex and start index resolution for efficient batched rendering.

The system uses arena-based GPU memory management with dynamic growth and relocation tracking. When a region’s geometry changes (chunk mesh rebuild), the arena allocation is updated in-place or relocated if the buffer needs to grow. A dedicated PlayerCameraRig provides separate cameras for gameplay, rendering, debug inspection, and chunk batch culling, allowing the culling frustum to be frozen for debugging while the render camera continues to move. The ChunkBachingRenderPass provides debug visualization with color-coded region wireframes (green=visible, red=culled, yellow=dirty, magenta=build failed).

graph TB
    subgraph Collection["Frustum Culling"]
        CAM["PlayerCameraRig"] --> FRUST["ICullingVolumeProvider"]
        FRUST --> COLL["ChunkBatchCollector"]
    end

    subgraph Regions["4x4 Chunk Regions"]
        R1["Region (Opaque)"]
        R2["Region (Cutout)"]
        R3["Region (Translucent)"]
    end

    COLL --> R1
    COLL --> R2
    COLL --> R3

    subgraph Submit["GPU Submission"]
        REND["ChunkBatchRenderer"]
        ARENA["Arena GPU Buffers"]
    end

    R1 --> REND
    R2 --> REND
    R3 --> REND
    REND --> ARENA

Application-Side Render Pipeline

The engine deliberately does not own the rendering order. RendererSubsystem provides stateless APIs (bind shader, set blend, draw geometry, upload uniforms) but never decides which pass runs when. That responsibility belongs to the application layer, where Game::RenderWorld() calls each RenderPass in an explicit, linear sequence. This separation means the engine can serve any rendering strategy (forward, deferred, hybrid) without modification, while the application defines the exact pipeline topology for its use case.

Each RenderPass inherits from SceneRenderPass, an abstract base class that defines three hooks: Execute() (public entry point), BeginPass() (set up render state, bind programs and targets), and EndPass() (restore state). The base class also handles ShaderBundle hot-reload automatically by subscribing to OnBundleLoaded / OnBundleUnloaded events in its constructor and unsubscribing in its destructor, so concrete passes only need to override the callbacks to refresh their cached ShaderProgram pointers. A static helper RenderPassHelper translates drawBuffers index lists into typed render target references that UseProgram() consumes.

Game owns all passes as unique_ptr<SceneRenderPass> and calls them in a fixed Iris-compatible order inside RenderWorld(). The pipeline follows a strict sequence: shadow generation, sky rendering into the G-Buffer, opaque terrain geometry, deferred lighting (which must complete before translucents so they blend onto a fully lit scene), translucent geometry (water and clouds), multi-stage compositing (SSR, volumetric light, tonemapping), and final output to the swapchain.

graph TB
    subgraph Shadow["1. Shadow"]
        S1["ShadowRenderPass"]
        S2["ShadowCompositeRenderPass"]
    end

    subgraph Sky["2. Sky"]
        SK1["SkyBasicRenderPass"]
        SK2["SkyTexturedRenderPass"]
    end

    subgraph GBuffer["3. Opaque G-Buffer"]
        T1["TerrainRenderPass"]
        T2["TerrainCutoutRenderPass"]
    end

    DEF["4. DeferredRenderPass (SSAO + Sky Glare)"]

    subgraph Trans["5. Translucent"]
        TT["TerrainTranslucentRenderPass (SSR + Refraction)"]
        CL["CloudRenderPass"]
    end

    subgraph Comp["6. CompositeRenderPass (Multi-Sub-Pass)"]
        C0["composite (SSR opaque)"]
        C1["composite1 (VL + Rainbow)"]
        MIP["Mipmap Generation"]
        C4["composite4 (Bloom Tile Atlas)"]
        C5["composite5 (Bloom + Tonemap + Color Grading)"]
    end

    FIN["7. FinalRenderPass (Underwater Distortion)"]
    CBR["8. ChunkBachingRenderPass"]
    DBG["DebugRenderPass"]

    S1 --> S2 --> SK1 --> SK2 --> T1 --> T2 --> DEF --> TT --> CL --> C0 --> C1 --> MIP --> C4 --> C5 --> FIN
    FIN --> CBR
    CBR -.-> DBG
Code/Game/Framework/RenderPass/
  SceneRenderPass.hpp/cpp           abstract base class (with PerPass scope)
  RenderPassHelper.hpp/cpp          static utilities
  WorldRenderingPhase.hpp           phase enum
  ConstantBuffer/                   POD uniform structs
  RenderShadow/                     shadow map generation
  RenderShadowComposite/            shadow post-process
  RenderSkyBasic/                   sky dome, void, and sky glare
  RenderSkyTextured/                sun, moon, stars
  RenderTerrain/                    opaque terrain (chunk batch integration)
  RenderTerrainCutout/              alpha-tested foliage
  RenderTerrainTranslucent/         water (SSR + refraction) and ice
  RenderCloud/                      cloud geometry
  RenderDeferred/                   deferred lighting (SSAO + sky glare)
  RenderComposite/                  multi-sub-pass: SSR, VL, mipmap, bloom, tonemap
  RenderFinal/                      output to swapchain (underwater distortion)
  RenderChunkBaching/               chunk batch debug visualization
  RenderDebug/                      debug overlays

The ShaderBundle system is this engine’s equivalent of Minecraft’s Iris/OptiFine ShaderPack. A ShaderBundle is a self-contained directory of HLSL programs, property files, fallback rules, and custom textures that completely defines how the world is rendered. Shader authors can write their own bundles to achieve any visual style (toon shading, photorealistic PBR, stylized painterly) without modifying a single line of engine C++ code. The engine discovers available bundles by scanning the .enigma/shaderbundles/ directory at startup and supports hot-swapping between them at runtime through an ImGui selector or API call.

The architecture uses a dual-bundle design. An engine bundle ships with the renderer and is always loaded as the final fallback. When a user bundle is active, every GetProgram() call walks a three-level fallback chain: first the current user-defined sub-bundle (for per-profile shader variants), then the bundle’s own program/ folder following fallback_rule.json chains (for example gbuffers_clouds falls back to gbuffers_textured then to gbuffers_basic), and finally the engine bundle which guarantees that every pass always has a valid shader. Bundle switching is deferred to the frame boundary to avoid D3D12 errors from deleting render targets mid-frame, and the previous bundle’s resources are released automatically through shared_ptr ownership.

Shader Bundle System

A ShaderBundle is organized as a directory under .enigma/shaderbundles/ with a fixed layout that the engine scans at load time. The program/ folder holds the primary shader programs (one .vs.hlsl and one .ps.hlsl per pass). The bundle/ folder contains named sub-bundles that can override any program for profile or dimension variants. lib/ stores reusable HLSL libraries (lighting, noise, shadow math, tonemapping) that programs pull in via #include. include/ holds shared declarations specific to this bundle. shaders.properties configures render targets, blend modes, and buffer formats per program. block.properties maps block IDs to material categories for the MaterialIdMapper. Custom textures live in textures/ with optional .enigmeta sidecar files that define sampling and format metadata.

On the engine side, the Bundle module is split into focused subsystems. ShaderBundle and UserDefinedBundle manage program ownership and the three-level fallback chain. ShaderProperties and PackRenderTargetDirectives parse the properties files into structured data. MaterialIdMapper bridges block IDs to shader material categories. BundleTextureLoader and EnigmetaParser handle custom texture loading. ShaderBundleSubsystem ties everything together as the engine integration point, handling discovery, lifecycle, and configuration persistence through shaderbundle.yml.

graph TB
    subgraph Integration
        SBS["ShaderBundleSubsystem"]
        CFG["Configuration"]
    end

    subgraph Core
        SB["ShaderBundle"]
        UDB["UserDefinedBundle"]
        PFC["ProgramFallbackChain"]
    end

    subgraph Config["Configuration Parsing"]
        SP["ShaderProperties"]
        RTD["PackRenderTargetDirectives"]
        PSD["PackShadowDirectives"]
        MIM["MaterialIdMapper"]
    end

    subgraph Assets["Asset Loading"]
        BTL["BundleTextureLoader"]
        EMP["EnigmetaParser"]
    end

    subgraph Helpers
        JH["JsonHelper"]
        FH["FileHelper"]
        SH["ScanHelper"]
    end

    SBS --> SB
    SBS --> CFG
    SB --> UDB
    SB --> PFC
    SB --> SP
    SB --> RTD
    SB --> PSD
    SB --> MIM
    SB --> BTL
    BTL --> EMP
    SB --> Helpers
EnigmaDefault/shaders/
  shaders.properties           RT formats, blend modes, buffer config
  block.properties             block ID to material mapping
  bundle.json                  bundle metadata
  program/
    gbuffers_terrain.vs/ps     terrain geometry
    gbuffers_water.vs/ps       water surface
    shadow.vs/ps               shadow map generation
    deferred1.vs/ps            deferred lighting
    composite.vs/ps            post-process pass 0 (SSR opaque)
    composite1.vs/ps           post-process pass 1 (VL + rainbow)
    composite4.vs/ps           bloom tile atlas generation
    composite5.vs/ps           bloom apply + tonemapping + color grading
    final.vs/ps                output to swapchain
    ...                        sky, cloud, debug programs
  bundle/
    mycustom_bundle_0/         sub-bundle variant (overrides program/)
      gbuffers_terrain.vs/ps
      composite.vs/ps
      ...
  lib/
    atmosphere.hlsl            sky and atmosphere math
    bloom.hlsl                 bloom tile atlas generation and reading
    clouds.hlsl                volumetric cloud ray marching
    common.hlsl                shared utilities (Luma, Pow2, Pow4)
    fog.hlsl                   fog calculations
    lighting.hlsl              diffuse and specular models
    noise.hlsl                 procedural noise functions
    pipelineSettings.hlsl      quality toggles and pipeline config
    rainbow.hlsl               procedural rainbow generation
    reflection.hlsl            three-layer reflection coordinator
    refraction.hlsl            water surface refraction
    shadow.hlsl                shadow sampling and bias
    skyGlare.hlsl              sun/moon atmospheric halo
    ssao.hlsl                  screen-space ambient occlusion
    ssr.hlsl                   screen-space reflection ray march
    tonemap.hlsl               Lottes 2016 HDR to LDR + color grading
    underwaterDistortion.hlsl  underwater screen-space distortion
    volumetricLight.hlsl       volumetric ray marching
    water.hlsl                 water surface utilities
  include/
    settings.hlsl              user-facing quality toggles
    ...                        bundle-specific declarations
  textures/
    cloud-water.png            custom texture asset
    cloud-water.png.enigmeta   sampling and format metadata

Material ID Mapper

In a voxel renderer, all terrain geometry shares the same vertex shader and pixel shader per pass. Water, grass, stone, and leaves all flow through gbuffers_terrain as identical quads with no built-in way for the shader to tell them apart. Without a material identification mechanism, effects like water reflections, translucent tinting, or emissive glow would require separate render passes per block type, which defeats the purpose of batched chunk rendering.

The MaterialIdMapper solves this by bridging a data file (block.properties) to the vertex stream. ShaderBundle authors define mappings in a simple properties format where each line assigns a numeric material ID to one or more namespaced block names (for example block.32000=simpleminer:water). At bundle load time, MaterialIdMapper parses these entries into an unordered_map<string, uint16_t> lookup table. The mapper then subscribes to TerrainVertexLayout::OnBuildVertexLayout, a MulticastDelegate event that fires every time the voxel mesher builds a quad. When the event fires with a block name that has a mapping, OnBuildVertex() stamps the material ID into the m_entityId field of all four quad vertices. The shader reads this value from TEXCOORD2 (matching Iris’s mc_Entity semantic) and branches on it to apply material-specific logic.

This design keeps the voxel mesher, vertex layout, and shader completely decoupled. The mesher knows nothing about materials. The vertex layout only knows it has a uint16 entity ID slot. The shader only reads an integer. All the knowledge of “water is 32000” lives in block.properties, which ShaderBundle authors can edit without recompiling anything.

graph LR
    BP["block.properties"] --> MIM["MaterialIdMapper"]
    MIM --> EVT["OnBuildVertexLayout"]
    MESH["Voxel Mesher"] --> EVT
    EVT --> TV["TerrainVertex m_entityId"]
    TV --> PS["Pixel Shader TEXCOORD2"]

Life Hook Event

The ShaderBundle system uses two event mechanisms to keep subsystems decoupled from the bundle lifecycle. The first is a string-based EventSystem that fires named events through the engine’s global event bus. The second is a typed MulticastDelegate system that provides compile-time safe, direct callback registration. Both allow any subsystem to react to bundle changes without the bundle module holding references to its consumers.

EventMechanismTriggerTypical Subscriber
OnShaderBundleLoadedEventSystemAfter a user bundle finishes loadingRenderPasses (rebuild PSO cache)
OnShaderBundleUnloadedEventSystemBefore switching back to engine bundleRenderPasses (release user programs)
OnShaderBundlePropertiesModifiedEventSystemWhen shaders.properties is edited at runtimePSOManager (invalidate cached state)
OnShaderBundlePropertiesResetEventSystemWhen properties are reset to defaultsPSOManager (restore original config)
OnShaderBundleReloadEventSystemWhen a hot-reload is requestedShaderBundleSubsystem (recompile all)
OnBundleLoadedMulticastDelegateAfter bundle load completesMaterialIdMapper subscription setup
OnBundleUnloadedMulticastDelegateAfter bundle unloadMaterialIdMapper subscription teardown
OnBuildVertexLayoutMulticastDelegatePer quad during chunk meshingMaterialIdMapper (inject entity ID)

The MulticastDelegate events are particularly important for the MaterialIdMapper workflow. When a bundle loads, ShaderBundleSubsystem subscribes the mapper’s OnBuildVertex callback to TerrainVertexLayout::OnBuildVertexLayout and stores the returned DelegateHandle. When the bundle unloads, it removes the subscription using that handle. This ensures material ID injection is only active while a bundle with block.properties is loaded, and no dangling callbacks survive a bundle swap.

Update RTs Configuration

A deferred renderer packs different data into each render target: HDR color in colortex0 might need R16G16B16A16_FLOAT, while normals in colortex2 fit in R8G8B8A8_SNORM, and a material mask in colortex3 only needs R8G8B8A8_UNORM. The engine lets ShaderBundle authors declare these formats, clear behavior, and clear colors directly in HLSL through a dedicated include file (rt_formats.hlsl). This keeps all RT configuration colocated with the shaders that write to them, rather than scattered across C++ code or external config files.

The configuration uses two directive styles parsed by PackRenderTargetDirectives. Format directives live inside /* */ comment blocks because DXGI format names like R16G16B16A16_FLOAT are not valid HLSL identifiers. The ConstDirectiveParser extracts them by scanning raw source lines before compilation. Clear and clear-color directives are valid HLSL const declarations (const bool colortex0Clear = true, const float4 colortex0ClearColor = float4(0,0,0,1)) that the same parser picks up through AST-level const evaluation. Both directive types support all four RT categories: colortex (0 to 15), depthtex (0 to 2), shadowcolor (0 to 7), and shadowtex (0 to 1).

At bundle load time, the engine collects these directives from all shader sources, merges them with YAML-defined defaults, and produces a RenderTargetConfig per slot. The providers then use these configs to create GPU resources with the correct DXGI format, set the appropriate clear action (Load, Clear, or DontCare), and apply the specified clear color at the start of each frame.

// rt_formats.hlsl example (EnigmaDefault)

// Format directives (inside comments, not valid HLSL)
/*
const int colortex0Format = R16G16B16A16_FLOAT;
const int colortex1Format = R8G8B8A8_UNORM;
const int colortex2Format = R8G8B8A8_SNORM;
*/

// Clear control (valid HLSL const declarations)
const bool colortex0Clear = true;
const bool colortex6Clear = false;

// Clear color
const float4 colortex1ClearColor = float4(0.0, 0.0, 1.0, 1.0);

// Shadow RT configuration
const bool   shadowcolor0Clear      = true;
const float4 shadowcolor0ClearColor = float4(1.0, 1.0, 1.0, 1.0);
graph TB
    subgraph HLSL["rt_formats.hlsl"]
        FMT["Format Directives"]
        CLR["Clear / ClearColor"]
    end

    subgraph Parsing
        CDP["ConstDirectiveParser"]
        PRTD["PackRenderTargetDirectives"]
    end

    subgraph Output["RenderTargetConfig per slot"]
        CF["DXGI Format"]
        CA["Clear Action"]
        CC["Clear Color"]
    end

    FMT --> CDP
    CLR --> CDP
    CDP --> PRTD
    YAML["YAML Defaults"] --> PRTD
    PRTD --> CF
    PRTD --> CA
    PRTD --> CC
    CF --> PROV["Providers"]
    CA --> PROV
    CC --> PROV

Shader Properties

Each ShaderBundle includes a shaders.properties file that acts as the central configuration surface for shader authors. This file controls pipeline behavior without requiring any C++ changes, giving bundle creators full authority over how their shaders interact with the rendering engine.

The properties file supports several directive categories. Quality profiles define named presets (POTATO through ULTRA) that map to sets of macro definitions and numeric parameters like shadow distance, reflection quality, and SSAO settings. Custom texture bindings assign image assets to numbered slots per program (for example texture.deferred.3=textures/cloud-water.png), making them available in HLSL through the bindless GetCustomImage() accessor. Blend directives configure per-program and per-RT blending as described in the Per Render Target Blend Config section. The ShaderProperties parser loads this file at bundle load time and distributes the parsed data to the relevant subsystems: profiles feed into macro definitions for DXC compilation, texture bindings go to BundleTextureLoader, and blend directives are injected into ProgramDirectives for PSO creation.

Stylized Shader Bundle

EnigmaDefault is the project’s flagship ShaderBundle, targeting high-quality stylized rendering inspired by the Complementary Reimagined shader pack for Minecraft. The visual direction aims for rich atmospheric depth, soft natural lighting, and painterly color grading while preserving the blocky charm of voxel geometry. Rather than chasing photorealism, the bundle emphasizes mood and readability through carefully tuned volumetric clouds, warm atmospheric scattering, smooth water reflections, and subtle ambient occlusion. Every effect is implemented in pure HLSL within the bundle’s lib/ and program/ directories, running entirely through the engine’s data-driven deferred pipeline with no hardcoded rendering logic on the C++ side.

Atmospheric scattering with warm sunset tones and depth fog
Atmospheric scattering with warm sunset tones, showing depth fog and sky-to-horizon color blending

Shadow Mapping and Bias

The shadow system uses a single shadow map with nonlinear XY distortion rather than cascaded shadow maps. The distortion warps clip-space coordinates so that the area near the camera receives higher texel density while distant regions are compressed, achieving a similar near-field precision benefit as CSM with a single depth pass. The Z axis is additionally compressed to 20% of its original range, expanding effective depth precision across the entire shadow frustum. The distortion factor is driven by SHADOW_DISTANCE (configurable per quality profile, defaulting to 128 blocks).

Bias uses a two-layer approach. The primary technique is normal offset bias applied in world space: the shadow sampling point is pushed along the surface normal by an amount that adapts to both distance (farther surfaces get larger offsets to compensate for reduced texel density) and angle (grazing angles where NdotL approaches zero receive up to 2x the offset). This eliminates shadow acne without the light-bleeding artifacts that constant depth bias introduces on thin geometry. The secondary layer is a soft depth comparison that uses a narrow transition window (factor of 256) normalized against the Z compression, producing near-hard edges that the PCF filter then softens.

Shadow sampling supports six quality tiers from hard single-tap to 16-tap circular PCF. The PCF kernel distributes samples in a circular pattern using Interleaved Gradient Noise (IGN) for screen-space dithering, breaking up banding artifacts without temporal filtering. The kernel radius scales dynamically with three factors: distance from camera (far shadows get wider kernels to match the lower texel density), rain strength (overcast weather softens shadows up to 3x), and an edge fade band that smoothly transitions shadows to full brightness over the last 8 blocks before the shadow distance cutoff.

The shadow pass also writes two additional render targets beyond the depth buffer. shadowcolor0 stores a water caustic pattern sampled from a custom noise texture with dual-frequency blending, and shadowcolor1 stores underwater volumetric light color with exponential distance attenuation. Both are consumed later in the lighting and composite stages for water rendering.

graph TB
    subgraph Shadow["ShadowRenderPass"]
        SVS["shadow.vs.hlsl"]
        SPS["shadow.ps.hlsl"]
    end

    subgraph Output["Shadow Output"]
        ST1["shadowtex1 depth"]
        SC0["shadowcolor0 caustics"]
        SC1["shadowcolor1 underwater VL"]
    end

    subgraph Deferred["DeferredRenderPass"]
        D1["deferred1.ps.hlsl"]
        LIT["lighting.hlsl"]
        SHAD["shadow.hlsl"]
    end

    SVS --> ST1
    SPS --> SC0
    SPS --> SC1

    ST1 --> SHAD
    SHAD --> LIT
    SC0 --> LIT
    SC1 --> LIT
    LIT --> D1
    D1 --> CT0["colortex0 lit scene"]

Volumetric Cloud

The cloud system replaces vanilla geometric clouds entirely with screen-space ray marched volumetric clouds computed in the deferred lighting pass (deferred1). Cloud shape is driven by a 2D texture lookup (cloud-water.png blue channel) with an 8th-power threshold for sharp Complementary Reimagined style edges, rather than expensive 3D Perlin or Worley noise. The system features self-shadowing via light-direction sampling, height-gradient shading (bottom-dark, top-bright), forward scattering from a half-Lambert view-sun dot product, and three quality tiers (16/32/48 samples). Cloud depth output feeds into the composite pass to modulate volumetric light intensity, preventing god rays from shining through cloud bodies.

For a detailed breakdown of the ray march algorithm, texture-driven shape generation, self-shadow computation, and cloud lighting model, see the dedicated blog post: Volumetric Cloud in Deferred Rendering.

Volumetric clouds from above showing self-shadowing and height gradient shading
Volumetric clouds from above, with bright tops and self-shadowed undersides

Volumetric Light

The volumetric light system renders screen-space god rays by ray marching from the camera toward each fragment in world space, sampling the shadow map at each step to accumulate lit segments. Directional modulation via VdotL (view-to-light dot product) creates visible light shafts toward the sun or moon. The system supports four quality tiers (12 to 50 day samples), time-of-day color transitions from warm sunset tones to cool blue moonlight, and noon intensity reduction to avoid overexposure when the sun is overhead. A dual shadow test (shadowtex0 vs shadowtex1) enables colored light shafts through translucent surfaces like water, with shadowcolor1 providing the tint color. Underwater, the system forces full scene intensity, shortens the ray march distance to 80 blocks, and zeroes fully-lit samples so VL only comes from colored shafts passing through the water surface.

For the complete implementation details including the shadow sampling strategy, directional modulation formulas, and underwater VL attenuation, see the dedicated blog post: Volumetric Light in Deferred Rendering.

Volumetric light shafts at sunrise viewed from a mountain
Volumetric light shafts at sunrise, showing directional modulation and warm sunset color grading
Volumetric light shafts at night with cool blue moonlight
Volumetric light shafts at night, with cool blue moonlight and shadow-driven ray attenuation

Screen Space Reflection and Three-Layer Reflection

The water rendering system implements a three-layer reflection architecture that blends near-field ray marching, mid-distance mirrored image reprojection, and far-field procedural sky to produce seamless reflections across all viewing distances.

Layer 1 (SSR Ray March) performs view-space ray marching with exponential step growth and binary refinement. SSR is computed inline during the gbuffers_water pass rather than deferred to a composite pass, eliminating intermediate render target storage. The ray march uses depthtex1 (opaque-only depth) to avoid false hits on the water surface itself, with screen-border fade via pow(max(cdist.x, cdist.y), 50.0) and depth proximity rejection to prevent self-reflection artifacts. A smoothness modulation scales the result alpha, allowing rougher surfaces to fall through to lower layers.

Layer 2 (Mirrored Image) provides mid-distance reflections where SSR runs out of steps but the scene is still visible on screen from a different angle. It projects the reflected view direction to clip space with a vertical parallax stretch, samples colortex0 at the reprojected coordinate, and applies pow-8 screen-edge fade with exponential distance fog. A view angle consistency check rejects reflections that point behind the camera.

Layer 3 (Procedural Sky) serves as the always-available fallback, generating an atmospheric gradient with zenith-to-horizon color blending, sunset warmth overlay at the horizon, and below-horizon darkening. Sky light factor modulation ensures indoor and cave environments receive no sky reflection.

The three layers blend with priority ordering: SSR (highest) overrides mirrored image, which overrides procedural sky. Each layer’s alpha acts as a coverage mask, and uncovered regions fall through to the next layer. A GGX microfacet specular highlight adds sun/moon glints on the water surface.

The water surface features multi-layer normal distortion (three frequency layers scrolling at different speeds to simulate wind), depth-based transparency that reveals the seabed in shallow water via exponential decay against depthtex1, Fresnel-driven reflection blending with a cubic approximation guaranteeing 15% minimum reflectivity, and shoreline foam.

graph TB
    subgraph Layers["Three-Layer Reflection"]
        L1["Layer 1: SSR Ray March (near)"]
        L2["Layer 2: Mirrored Image (mid)"]
        L3["Layer 3: Procedural Sky (far)"]
    end

    L1 --> BLEND["Priority Blend"]
    L2 --> BLEND
    L3 --> BLEND
    GGX["GGX Specular"] --> BLEND
    BLEND --> FINAL["Final Reflection Color"]
    FRESNEL["Fresnel"] --> MIX["Reflection + Scene Mix"]
    FINAL --> MIX

For the complete implementation covering the ray march algorithm, normal distortion, depth-based culling, and underwater effects, see the dedicated blog post: Screen-Space Reflection Water in Deferred Rendering.

SSR water at noon with bright blue sky reflections
Screen-space reflections on water at noon, showing three-layer blending with depth-based transparency and Fresnel-driven reflection
SSR water at low view angle showing Fresnel-driven reflectivity
Screen-space reflections at a low view angle, demonstrating strong Fresnel reflectivity and mirrored image fallback blending

Water Refraction

Above-water refraction uses noise-based UV warping with three-stage validation to create a convincing underwater distortion effect without artifacts. The system samples a noise texture (customImage4) at world-space coordinates animated by frameTimeCounter, producing coherent distortion that moves with the water surface rather than the camera. The offset magnitude scales with a configurable intensity, FOV compensation from the projection matrix, and inverse view distance (farther water gets less refraction to avoid swimming artifacts).

Three validation stages prevent refraction bleeding: first, a material mask check confirms the current pixel is water via colortex4; second, depth-based attenuation reduces refraction in shallow water by comparing the linear distance between the water surface (depthtex0) and opaque geometry (depthtex1); third, the offset pixel is re-checked against the material mask to ensure it still falls within a water region. This prevents the common artifact where refraction samples non-water pixels at water boundaries.

Underwater Effects

When the camera is submerged, a full underwater post-processing pipeline activates across multiple render passes. The composite1 pass applies water refraction with the same noise-based UV warping system. The final pass adds sinusoidal screen-space distortion — a diagonal ripple animated by frameTimeCounter that creates the characteristic wavering underwater view. Distance fog uses squared exponential falloff with wavelength-dependent color attenuation applied in gamma space before linearization for amplified visual effect. Colored volumetric light shafts pass through the water surface via the dual shadow architecture (shadowtex0 vs shadowtex1), with shadowcolor1 providing the underwater tint color.

Underwater volumetric light shafts passing through the water surface
Underwater colored volumetric light shafts, with wavelength-dependent fog attenuation and screen-space distortion

Rainbow

The composite1 pass renders a procedural rainbow that appears when the sun sits at a low angle (solar elevation between 0.1 and 0.25), matching the real-world physics of rainbow formation at approximately 42 degrees from the anti-solar point. The arc geometry uses a bell curve intensity profile with configurable diameter, and spectral colors are generated procedurally through non-linear coordinate mapping with modulo arithmetic across three phase-offset channels. Rain strength, cloud depth, and noon factor all modulate visibility, and the rainbow is significantly dimmed underwater.

Procedural rainbow arc during light rain at sunset
Procedural rainbow arc at low solar elevation, with spectral color generation and rain-modulated visibility

SSAO

The deferred lighting pass (deferred1) computes screen-space ambient occlusion using a depth-only Poisson disk sampling approach with bilateral depth comparison, ported from Complementary Reimagined. Rather than reconstructing normals from the G-Buffer, the algorithm works entirely in screen space by comparing depth differences at jittered sample offsets, combining an angle metric (how much the sample is above the surface) with a distance metric (how far the depth difference extends) for robust occlusion detection.

Sample offsets are generated by a pseudo-random function using golden-ratio angle distribution with squared radial falloff, concentrating samples near the fragment center where occlusion detail matters most. The kernel radius adapts to both FOV (via the projection matrix) and view distance (farther fragments get smaller sample radii to match perspective foreshortening). Two quality tiers are available: a fast mode with 4 bilateral samples and 0.4 scale, and a quality mode with 12 samples and 0.6 scale. Each sample tests both +offset and -offset directions for bilateral coverage, effectively doubling the effective sample count. The final AO factor is raised to a configurable intensity power (SSAO_IM) for artistic control over occlusion strength.

SSAO adding depth and contact shadows to terrain geometry
Screen-space ambient occlusion adding contact shadows and depth cues to voxel terrain geometry

Bloom

The bloom system uses a two-pass tile atlas architecture driven by hardware mipmap levels, computed across the composite4 (generation) and composite5 (application) sub-passes. This approach avoids the traditional multi-pass Gaussian blur chain by leveraging the engine’s compute-shader mipmap generator to produce pre-filtered downsampled images, then packing seven LOD levels (LOD 2 through LOD 8, covering 1/4 to 1/256 resolution) into a single tile atlas stored in colortex3.

In the generation pass (composite4), each tile reads from the corresponding hardware mip level of colortex0 and applies a 7x7 Gaussian blur using Pascal’s triangle row-6 weights (1, 6, 15, 20, 15, 6, 1), normalized by 4096. Because each mip texel already covers exp2(lod) original pixels, the kernel offsets of one screen pixel at mip scale produce a proper 7-texel Gaussian blur at each resolution tier. The blurred result is gamma-encoded as pow(x/128, 0.25) to preserve HDR range within the RGBA8 format of colortex3.

In the application pass (composite5), GetBloomTile() reads each tile from the atlas, decodes the gamma encoding (x^4 * 128), and DoBloom() averages all seven LOD levels with equal weight (1/7). The bloom strength includes a darkness boost factor that intensifies bloom in dark scenes, creating a natural glow around bright light sources in dim environments. Resolution rescaling normalizes the tile layout to 1920x1080, preventing tile overlap at lower resolutions.

graph LR
    CT0["colortex0 HDR"] --> MIP["Compute Mipmap Generator"]
    MIP --> C4["composite4: 7x7 Gaussian per LOD"]
    C4 --> ATLAS["colortex3 Tile Atlas (7 LODs)"]
    ATLAS --> C5["composite5: Read + Decode + Average"]
    C5 --> BLOOM["Bloom + Scene Blend"]
Bloom effect with mipmap-based tile atlas
Bloom effect showing soft glow around bright light sources via 7-LOD tile atlas and Gaussian blur

Tonemapping and Color Grading

The final color pipeline in composite5 applies Lottes 2016 HDR-to-LDR tonemapping followed by BSL-derived color grading. The Lottes curve (from Timothy Lottes’ GDC 2016 presentation “Advanced Techniques and Optimization of HDR Color Pipelines”) provides configurable exposure, contrast, and highlight compression through precomputed polynomial coefficients, offering more artistic control than simpler Reinhard or ACES curves.

The tonemapping pipeline processes color through seven stages: exposure scaling, Lottes curve application with precomputed a/b/c/d coefficients, linear-to-sRGB gamma conversion (IEC 61966-2-1), dark lift (a smoothstep blend that preserves readability in very dark regions by mixing in a softer gamma curve), white path compression (smoothly pushing bright highlights toward pure white), and dark desaturation (reducing color saturation in shadows to prevent oversaturated dark tones). Each stage has configurable parameters exposed through settings.hlsl for ShaderBundle authors.

After tonemapping, a BSL-derived color grading pass applies saturation and vibrance adjustments. Vibrance selectively boosts muted colors more than already-saturated ones by computing a per-pixel saturation metric from the min/max channel spread, while the saturation control applies a uniform shift relative to perceptual gray. With default settings (T_SATURATION=1.0, T_VIBRANCE=1.0), the system applies a subtle +7% saturation boost that enriches the stylized look without oversaturating.

Sky Glare

The sky rendering pass (gbuffers_skybasic) adds a Fresnel-like atmospheric halo around the sun and moon, creating visible bright glow when looking near celestial bodies. The glare intensity follows an inverse Fresnel formula driven by the view-to-sun dot product, with an adaptive scatter exponent that narrows the halo core while maintaining a wide soft falloff. Rain dynamically widens and dims the glare while desaturating its color toward grey, simulating overcast atmospheric scattering.

Sky glare halo around the sun at golden hour
Atmospheric sky glare around the sun, with Fresnel-driven halo intensity and warm color transition

Glare color transitions between warm yellow-orange for the sun (shifting toward white at noon) and cool blue-grey for the moon, with an underwater brightness boost of 7x to simulate the sun seen through the water surface. The system is fully parameterized through SUN_GLARE_VISFACTOR (halo width) and SUN_GLARE_STRENGTH (intensity), allowing ShaderBundle authors to tune the effect per quality profile.

Design Philosophy

The architecture is built on a strict separation between the engine (which provides stateless rendering APIs and GPU resource management) and the application layer (which defines the rendering order and visual style). The engine never decides which pass runs when or what effects to apply — it provides the tools, and the ShaderBundle system plus the application’s RenderWorld() sequence define the complete visual pipeline. This separation means the same engine can serve forward, deferred, or hybrid rendering strategies without modification.

Data-driven configuration is the primary extension mechanism. ShaderBundle authors control render target formats, blend modes, draw buffer assignments, material mappings, quality profiles, and custom texture bindings entirely through shaders.properties, block.properties, HLSL comment directives, and const declarations — no C++ changes required. The engine’s directive parsers (CommentDirectiveParser, ConstDirectiveParser, ShaderProperties) translate these declarations into runtime state that feeds directly into PSO creation and resource allocation.

Performance optimization follows a “measure first, optimize the hot path” discipline. The dirty-tracked GraphicsRootBinder eliminates redundant root signature and CBV rebinding. The PerPass uniform scope prevents unnecessary constant buffer uploads across composite sub-passes. Chunk batch rendering with frustum culling reduces draw call volume at the region level before individual chunks are submitted. Multi-frame in-flight pipelining keeps the GPU fed while the CPU prepares the next frame. Compute-shader mipmap generation avoids CPU-side downsampling and enables hardware mip levels for bloom without additional render passes.

The ShaderBundle fallback chain (user sub-bundle → user bundle → engine bundle) guarantees that every render pass always has a valid shader, making the system robust against incomplete or experimental shader packs. Hot-reload support through MulticastDelegate events and deferred bundle switching at frame boundaries allows shader authors to iterate without restarting the application.