Viewing Test Results in Donobu Studio

Explore test run results, diagnose failures, and review self-healed tests using the Donobu Studio web interface.

Donobu Studio is the web interface for exploring test run results, diagnosing failures, and reviewing self-healed tests. Results are automatically synced to Studio when running locally.

How results appear in Studio

Each Playwright test run produces one flow in Donobu Studio.

Donobu Studio - Flow Details screen

A flow contains:

  • Run stateSUCCESS or FAILED
  • Step-by-step timeline — every tool call the AI made (or replayed from cache), in order, with timestamps
  • Screenshots — a screenshot captured after each tool call
  • Video — a full replay of the browser session (when video is enabled in playwright.config.ts)
  • Metadata JSON — the raw test-flow-metadata.json attached to the test, including token usage and run mode
  1. Select View Flows from the sidebar navigation.
  2. You will see all recent flows, sorted by time.
  3. Search by Name, Description or Flow ID.
  4. Click the View Details icon under Actions to open its detail view.

If you receive a failing CI notification (e.g. from Slack or a GitHub Actions failure email), look for the flow ID in the notification or in the Playwright report's test-flow-metadata.json attachment. Paste it into Studio's search to find that run.

Diagnosing failures in Studio

The List and Canvas views show every tool call the AI made, in order. For each step you can see:

  • The tool name and a description for that call, e.g. "Opening the appearance menu to switch to Dark mode."
  • Whether it succeeded or failed
  • The screenshot taken immediately after the call
  • The current URL
  • The times the action started and completed
  • The outcome of the call
  • Optional additional debugging information, e.g. the selector that matched when finding an element to Click on

To understand why a test failed:

  1. Find the first step that failed or produced an unexpected result
  2. Compare the screenshot at that step against what you expected to see
  3. For cached runs, check whether the run mode was DETERMINISTIC (replaying a stale cache) or AUTONOMOUS (the AI was actually running)

Playing the video can also

Cached vs. live runs

The details screen shows the run mode for the flow:

  • AUTONOMOUS — the AI made the decisions; the flow was not cached or the cache was invalidated
  • DETERMINISTIC — the flow replayed a cached sequence without calling the AI

A test that passes in AUTONOMOUS mode but fails in DETERMINISTIC mode usually means the cached sequence has gone stale. Delete the relevant .cache-lock/ entry and re-run to regenerate.

Token usage

Each flow's detail view shows input and ouput token counts for the run. Use this to:

  • Identify unusually expensive flows (high token counts may indicate the AI is struggling and taking many exploratory steps)
  • Compare token usage between autonomous and deterministic runs (deterministic runs should use very few tokens)
  • Track cost over time as your test suite grows