Skip to main content

Profiling and Performance

wgpu-html includes built-in timing instrumentation, a ring-buffer frame profiler, and a caching pipeline that skips work when inputs haven't changed. Zero-overhead when disabled — the only cost is a single Option::is_some branch per instrumented site.


Quick start

use wgpu_html_tree::Profiler;

// Enable the profiler on your tree:
tree.profiler = Some(Profiler::tagged("my app"));
tree.profiler.as_ref().map(|p| p.enable());

The pipeline automatically records every stage. After each frame, call summary_string() to get a human-readable block:

if let Some(ref prof) = tree.profiler {
if let Some(summary) = prof.summary_string() {
eprintln!("{summary}");
}
}

Output:

[my app] frame 4123 total 3.82 ms fps 60.0
cascade 0.42 ms ████████
layout 2.12 ms ██████████████████████████████████████
└ flex 1.34 ms ████████████████████████
paint 0.28 ms █████
└ glyphs 0.11 ms ██
counters: nodes=412 boxes=388 quads=903 glyphs=218

Bar width = fraction of the slowest span × 40. Children are indented with .


Profiler API

Enabling and disabling

prof.set_enabled(true);
prof.enable(); // shortcut
prof.disable(); // shortcut
prof.is_enabled(); // → bool

When disabled, every recording method is a no-op and the ring buffer is not updated.

Alert threshold (on-when-slow mode)

Silent until a frame exceeds a threshold. Useful for catching intermittent jank without log spam during normal operation.

prof.set_alert_threshold(Some(33_000_000)); // 33 ms (two vsyncs)
// summary_string() returns None for frames below the threshold
prof.set_alert_threshold(None); // always report

Frame lifecycle

The harness manages frame boundaries:

prof.frame_begin();
// … pipeline work (scopes, counters) …
prof.frame_end();

Functions that may be called standalone use ensure_frame_begin(), which is a no-op if already inside a frame:

prof.ensure_frame_begin();
// … work …

Scopes (RAII spans)

Wrap a block of work in a named scope. Timing is recorded automatically when the guard drops. Nested scopes track parent-child relationships via an internal stack.

{
let _guard = prof.scope("cascade");
// … cascade work …
}
// guard dropped — end time recorded

For Option<Profiler> boundaries, use the macro:

use wgpu_html_tree::prof_scope;

prof_scope!(&tree.profiler, "layout");
// ^ expands to:
// let __prof_scope_guard = $prof_opt.as_ref().map(|p| p.scope("layout"));
// Compiles to a no-op when tree.profiler is None.

For non-RAII paths (async, callbacks):

let id = prof.begin_span("upload_atlas");
// … work …
prof.end_span(id);

Counters and events

Record scalar values and zero-duration markers per frame:

prof.counter("nodes", 412);
prof.counter("quads", 903);

prof.event("viewport_resize");

Both appear in summary_string() (counters line) and trace export.

Reading the ring buffer

prof.last_frame() // → Option<FrameRecord>
prof.frame_history() // → Vec<FrameRecord> (oldest first)
prof.history_len() // → usize (frames in buffer)
prof.total_frames() // → u64 (frames ever recorded, including overwritten)

Each FrameRecord contains:

pub struct FrameRecord {
pub frame_index: u64,
pub frame_start_ns: u64,
pub spans: Vec<Span>,
pub counters: Vec<(LabelId, i64)>,
pub events: Vec<Event>,
}

pub struct Span {
pub name: LabelId,
pub start_ns: u64,
pub end_ns: u64,
pub parent: Option<u16>, // index into FrameRecord::spans
pub category: SpanCategory, // Cpu | Gpu
}

PipelineTimings

The numeric return value. Always available (no profiler required).

#[derive(Debug, Clone, Copy, Default)]
pub struct PipelineTimings {
pub cascade_ms: f64,
pub layout_ms: f64,
pub paint_ms: f64,
}

impl PipelineTimings {
pub fn total_ms(self) -> f64 {
self.cascade_ms + self.layout_ms + self.paint_ms
}
}

Returned by compute_layout_profiled() and paint_tree_returning_layout_profiled():

let (list, layout, timings) = wgpu_html::paint_tree_returning_layout_profiled(
&tree, &mut text_ctx, &mut image_cache,
viewport_w, viewport_h, scale, viewport_scroll_y,
);
println!("frame: cascade={:.2}ms layout={:.2}ms paint={:.2}ms total={:.2}ms",
timings.cascade_ms, timings.layout_ms, timings.paint_ms, timings.total_ms());

PipelineCache

pub struct PipelineCache {
snapshot: InteractionSnapshot,
viewport: (f32, f32),
scale: f32,
font_generation: u64,
tree_generation: u64,
layout: Option<LayoutBox>,
cascaded: Option<CascadedTree>,
pub paint_only_pseudo_rules: bool,
}

Three action levels

pub enum PipelineAction {
FullPipeline, // DOM / viewport / fonts changed — full re-cascade
PartialCascade, // Only pseudo-class state changed (hover / active / focus)
RepaintOnly, // Only scroll / selection / caret changed
}

pub fn classify_frame(
tree: &Tree, cache: &PipelineCache,
image_cache: &ImageCache,
viewport_w: f32, viewport_h: f32, scale: f32,
) -> PipelineAction

classify_frame() compares:

  • Cached vs current viewport size and scale.
  • Cached vs current tree.generation (DOM mutations).
  • Cached vs current tree.fonts.generation() (font changes).
  • Cached vs current InteractionSnapshot (hover / active / focus paths).
  • Whether images are still loading or animated.

paint_only_pseudo_rules

cache.paint_only_pseudo_rules = wgpu_html_style::pseudo_rules_are_paint_only(tree);

When all pseudo-class rules (:hover, :active, :focus) only set paint properties (color, background-color, opacity), the PartialCascade path skips re-layout entirely. Instead, patch_layout_colors() does an O(n) walk updating color fields in the existing LayoutBox tree — no geometry recomputation.

Usage

let mut cache = wgpu_html::PipelineCache::new();

// Each frame:
let (list, layout, timings) = wgpu_html::paint_tree_cached(
&tree, &mut text_ctx, &mut image_cache,
viewport_w, viewport_h, scale, viewport_scroll_y, &mut cache,
);

paint_tree_cached() automatically determines the needed action and only does the required work. It also manages the profiler's frame lifecycle (ensure_frame_begin → stages → frame_end).


Generational tracking

pub struct Tree {
pub generation: u64, // bumped on DOM mutation
pub fonts: FontRegistry, // .generation() bumped on font registration
}

The pipeline cache compares these against stored values. Hosts bump tree.generation when mutating the DOM:

tree.generation += 1; // or use tree.set_custom_property() which bumps automatically

Demo profiling (F9)

The demo (wgpu-html-demo) has two profiling outputs, both toggled with F9:

Compact (stdout): rolling 1-second averages per stage with max values and hover latency.

profile: 1.01s frames=60 fps=59.4 cascade=0.42/2.10 layout=2.12/8.34 paint=0.28/1.05 render=1.04/3.21 hover[moves=217 changed=12 ptr=0.042/0.310ms]

Ring-buffer detail (stderr): every frame's summary_string() output with nested spans and proportional bars.

Launch with --profile to enable both at startup:

cargo run -p wgpu-html-demo -- --profile

Dev profile

For development, set hot crates to opt-level = 2 in your workspace Cargo.toml:

[profile.dev.package]
wgpu-html-layout = { opt-level = 2 }
wgpu-html-style = { opt-level = 2 }
wgpu-html-renderer = { opt-level = 2 }
wgpu-html-text = { opt-level = 2 }

This keeps debug builds fast enough for interactive development while preserving debug info in your host code.


Data types reference

TypeCrateDescription
Profilerwgpu-html-treeRing-buffer frame profiler with scopes, counters, string interner
ScopeGuardwgpu-html-treeRAII guard returned by Profiler::scope()
SpanIdwgpu-html-treeOpaque handle for manual begin_span / end_span
LabelIdwgpu-html-treeInterned string identifier for span/counter/event names
FrameRecordwgpu-html-treeAll profiling data for one frame
Spanwgpu-html-treeOne measured time span with parent tracking
RingBuffer<N>wgpu-html-treeFixed-capacity ring buffer (default N=240, ≈ 4s at 60 Hz)
PipelineTimingswgpu-html{cascade_ms, layout_ms, paint_ms} return value
PipelineCachewgpu-htmlCaches layout + cascade to skip redundant work
PipelineActionwgpu-htmlFullPipeline | PartialCascade | RepaintOnly
FrameTimingswgpu-html-winit{cascade_ms, layout_ms, paint_ms, render_ms} passed to AppHook::on_frame

Complete example

use wgpu_html::PipelineCache;
use wgpu_html_tree::Profiler;

// Setup
let mut cache = PipelineCache::new();
tree.profiler = Some(Profiler::tagged("my app"));
tree.profiler.as_ref().map(|p| p.enable());

// Per-frame:
let (list, layout, timings) = wgpu_html::paint_tree_cached(
&tree, &mut text_ctx, &mut image_cache,
viewport_w, viewport_h, scale, viewport_scroll_y, &mut cache,
);

// Print profiler summary (only when above threshold, if set)
if let Some(ref prof) = tree.profiler {
if let Some(summary) = prof.summary_string() {
eprintln!("{summary}");
}
}

// PipelineTimings always available regardless of profiler state:
match wgpu_html::classify_frame(&tree, &cache, &image_cache, vw, vh, scale) {
PipelineAction::FullPipeline => println!("full frame: {:.2}ms", timings.total_ms()),
PipelineAction::PartialCascade => println!("cascade only: {:.2}ms", timings.cascade_ms),
PipelineAction::RepaintOnly => println!("repaint only"),
}