Donobu: AI-First Testing, Without Waiting for Vibium

Recently, Jason Huggins, the creator of Selenium core, created quite a buzz on LinkedIn by talking about what an AI-first successor to Selenium would look like. He's calling that project Vibium, and it promises to leverage AI for smarter, more resilient browser tests. As longtime test automation folks (most of us at Donobu cut our teeth on Selenium over 15 years ago), we're naturally excited about Selenium getting a big refresh with ideas from Vibium. In fact, we've been building our own project, Donobu, on very similar ideas, but on top of Playwright. Donobu is essentially a small layer that adds AI superpowers to the existing Playwright library. Our goal is to empower existing dev and test teams by keeping their tests in their own hands (not locked away in a third-party cloud). That's why Donobu is completely compatible with Playwright – you can drop it into your workflow with minimal friction.

For example, switching to Donobu can be as simple as changing your import:

// Using the standard Playwright test library
import { test } from '@playwright/test';
 
// Switching to Donobu's extended test library
import { test } from 'donobu';

By using Donobu's version of test, you gain access to AI-driven features while still running tests like a normal Playwright suite. We focused on enabling a couple of key capabilities in Donobu:

True Self-Healing – Tests that can recover from unexpected changes in the app under test, automatically adapting so they don't break.
Testing Partially Autonomous Software – The ability to validate AI-driven features (like chatbots or generative AI outputs) which aren't fully deterministic, by making the test itself a bit intelligent.

Let's dive into each of these in detail.

True Self-Healing for Tests

Anyone who's written end-to-end tests knows how fragile they can be. A minor UI change – say a new pop-up or a slightly different element locator – can break a bunch of tests. We want Donobu tests to heal themselves when possible, instead of failing at the first sign of trouble.

First, Donobu introduces the concept of an objective for each test. You can annotate your test with a description of what it's supposed to accomplish. This gives an AI agent context about the test's goal. We also added what you might call smart assertions – assertions that don't rely on hard-coded selectors or expected text, but rather describe a condition in natural language (which an AI can interpret).

For example, here's a simple test written with Donobu that tries to assert a seasonal theme on Starbucks' menu page. In vanilla Playwright, you couldn't write such a high-level assertion out of the box:

import { test } from 'donobu';
 
const title = 'Starbucks Fall Menu Test';
const details = {
  annotation: [
    {
      type: 'objective',
      description: `Assert that the featured menu page has Autumn season vibes.`,
    },
  ],
};
 
test(title, details, async ({ page }) => {
  await page.goto('https://www.starbucks.com/menu');
  
  // Click the "Featured" menu link
  await page.click(".//a[normalize-space(.)='Featured']");
  
  // Smart assertion: verify the page has a Fall season theme
  await page.ai.assert("Assert that the page has a Fall season theme to it.");
});

In the code above, we navigate to the Starbucks menu, click the Featured section, and then use page.ai.assert() to check that the page “has a Fall season theme.” Under the hood, Donobu interprets this description and determines whether the current page's content and appearance match the idea of a fall-themed page. This is something a human tester might do at a glance (e.g. seeing pumpkin spice lattes, autumn colors, etc.), but doing it in an automated test is non-trivial without AI. Donobu's visual/semantic assertion makes it possible to perform this kind of high-level check.

However, just having smarter assertions doesn't magically make a test self-healing. Consider what would happen if Starbucks suddenly added a cookie consent modal that pops up on the first visit. In our test above, the line that clicks “Featured” would fail because the modal blocks the click. A traditional test would just error out – you'd come in later, realize what happened, then update the test to handle the popup. With Donobu, we can do better.

Thanks to the objective we defined (the test's goal is to verify the Fall theme on the featured menu), Donobu's autonomous agent can step in when things go off course. Our test runner monitors for failures, and when one occurs, it pings our autonomous browser agent (we call it Flowpilot) to see if the test can be steered back on track. Flowpilot knows what the test was trying to achieve (from the objective), so it can try alternative actions to reach that goal.

In this scenario, Flowpilot might detect the cookie consent dialog and attempt to close it. It could, for instance, find and click the “Agree” button on the popup, then retry the step that failed. The really neat part: Donobu will then rewrite the test code to include this new step automatically. In essence, the test heals itself by adding the necessary adaptation.

After Flowpilot's intervention, the updated test might look like this (notice the new step to accept cookies):

import { test } from 'donobu';
 
const title = 'Starbucks Fall Menu Test';
const details = {
  annotation: [
    {
      type: 'objective',
      description: `Assert that the featured menu page has Autumn season vibes.`,
    },
  ],
};
 
test(title, details, async ({ page }) => {
  await page.goto('https://www.starbucks.com/menu');
  
  // NEW → Accepting the cookie policy to ensure the page can load correctly
  await page.click(".//button[normalize-space(.)='Agree']");
  
  // Now click the "Featured" menu link after closing pop-ups
  await page.click(".//a[normalize-space(.)='Featured']");
  
  // Smart assertion: verify the page has a Fall season theme
  await page.ai.assert("Assert that the page has a Fall season theme to it.");
});

As shown in the code, Donobu inserted a new step to accept the cookie policy before proceeding to the original steps. The comment NEW → ... highlights what was added. This is a simple example of a test self-healing: the test realized what was blocking progress and adjusted itself to fix the issue.

Even more importantly, Donobu keeps you in the loop when this happens. Our test runner is designed to run in your own CI environment (for example, as a GitHub or GitLab Action in your repository). When a test self-heals, Donobu can automatically open a pull request with the modified test code. This way, you can review the change (e.g. see that it added a step to click the “Agree” button) and merge it into your test suite. We like to call this outcome a “self-healed” test – it's neither a simple pass nor a failure. The test failed initially, but then fixed itself and is essentially suggesting that fix to you via a PR. This approach saves a ton of maintenance effort and makes tests far more resilient to minor app changes.

Testing Partially Autonomous Software

The second big challenge we set out to tackle is how to test applications that have AI components, which we refer to as partially autonomous software. These are scenarios where the flows aren't completely deterministic. A common example is a chatbot or an AI assistant integrated into a web app. In a traditional flow (say, a user onboarding), every step and outcome is known in advance, so writing assertions is straightforward. But once you throw generative AI into the mix — often at a CEO's insistence to “add AI” everywhere — the responses can vary and be context-dependent.

Partially autonomous software needs a partially autonomous test suite to truly test it. A rigid, deterministic test runner just won't cut it when the software itself isn't deterministic. To validate AI-driven features, your tests have to be able to adapt and reason a little, too.

Let's illustrate this with a real example. Suppose we have a legal-advice chatbot on a site. We need to ensure this chatbot stays on topic: it should refuse to answer non-legal questions. This is a tricky thing to test with a normal test automation, because “refusing to answer” could be phrased in many ways by the AI, and we have to determine if the response counts as a refusal.

One approach with Donobu is to use the built-in AI actions helper, page.ai(). This function allows us to express that partially autonomous test flow in natural language, offloading the heavy reasoning to Donobu's AI layer. Here's how such a test looks like:

import { test } from 'donobu';
 
const title = 'Briefcase Chatbot Topic Test';
const details = {
  annotation: [
    {
      type: 'objective',
      description: 
        `Verify that the legal chatbot refuses to answer questions
        that are not legal in nature.`,
    },
  ],
};
 
test(title, details, async ({ page }) => {
  await page.goto('https://briefcase.chat');
  
  // Start a chat session by entering a name and clicking "Start Chatting"
  await page.getByPlaceholder("Enter your name").fill("Test User");
  await page.getByRole('button', { name: 'Start Chatting' }).click();
  
  // High-level AI-powered actions: ensure the chatbot does not answer non-legal questions
  await page.ai("Assert that the chatbot does not answer non-legal questions");
});

In this test, we navigate to the chatbot page, initiate a chat by providing a name and clicking the start button, and then we use page.ai(...) to handle the complex, dynamic, page interactions. The string we pass – "Assert that the chatbot does not answer non-legal questions" – is effectively telling Donobu: “Hey, check what the chatbot says next, and confirm that it's a refusal to engage with off-topic queries.” Donobu will handle the details of this check using an AI model under the hood. For example, when the chatbot responds, Donobu's AI component can analyze the response and decide if it's essentially a polite refusal (versus actually giving an answer). This frees us from writing brittle string comparisons or regexes for various refusal phrases. It's a very human-like way of writing the test expectation.

Alternatively, if you prefer a more explicit approach (or want to double-check what's happening), you can combine traditional steps with a visual/semantic assertion. In other words, we can explicitly ask a non-legal question and then assert that the response was a refusal. Here's the same test idea, but written in a more step-by-step manner:

test(title, details, async ({ page }) => {
  await page.goto('https://briefcase.chat');
  
  // Start the chat session
  await page.getByPlaceholder("Enter your name").fill("Test User");
  await page.getByRole('button', { name: 'Start Chatting' }).click();
  
  // Ask a non-legal question to test the chatbot's behavior
  await page.getByPlaceholder("Type your message...")
    .fill("What is the capital of France?");
  
  // Wait for a few seconds to give the chatbot time to respond
  await page.waitForTimeout(5000);
  
  // Assert that the chatbot's response indicates it cannot answer non-legal questions
  await page.ai.assert(
    "Assert that the chatbot's response says it can only provide \
      legal information and does not answer the non-legal question."
  );
});

In this version, after starting the chat, we directly input the question “What is the capital of France?” into the chatbot. We then wait for the response. The final step uses page.ai.assert() with a descriptive assertion: we expect that the chatbot's response will state something along the lines of “I'm only able to provide legal information”, thereby refusing to answer the general knowledge question. We're still writing the assertion in plain English, but here we're being a bit more prescriptive about the flow (we supply the exact question and check the specific response). Donobu's AI will interpret the page's state or text to see if it matches the description of a proper refusal.

Both approaches achieve the same goal: verifying that the AI agent behaves correctly. The first approach with page.ai() is more concise and hands off the logic to Donobu entirely. The second approach gives us more control over the scenario and then uses Donobu's semantic understanding to validate the outcome. In practice, you might use whichever style you're more comfortable with – or even combine them. The key is that Donobu enables testing of AI-driven behavior in a robust way, which would be very hard to do with rigid assertions alone.

In summary, we see Donobu as an AI-first testing toolkit that's very much in line with the vision behind Vibium. We're bringing features like self-healing tests and AI-based assertions into the everyday testing workflow, today. And we're doing it in a way that augments existing tools (Playwright) rather than replacing them, so teams can adopt AI capabilities without a complete overhaul of their test stack. It's exciting to see industry veterans like Jason Huggins pushing in this direction as well. The future of testing will undoubtedly involve smarter automation that can adapt to change and deal with uncertainty – and we built Donobu precisely to embrace that future.

Donobu: AI-First Testing, Without Waiting for Vibium

Vibium sparked a lot of excitement, but you don’t have to wait. Donobu already delivers AI-first testing on Playwright: smarter assertions, self-healing flows, and robust checks for AI-driven apps.

True Self-Healing for Tests

Testing Partially Autonomous Software