page.ai.extract()

Extract structured, typed data from the current page without running a full autonomous flow.

Extracts structured data from the current page and returns it as a typed object — without running a full autonomous flow. Use this when you want to read information from the page rather than interact with it.

Signature

page.ai.extract<Schema extends z.ZodObject>(
  schema: Schema,
  options?: {
    instruction?: string;
    gptClient?: GptClient | LanguageModel;
    timeout?: number;
  }
): Promise<z.infer<Schema>>

Parameters

ParameterTypeDefaultDescription
schemaz.ZodObjectZod schema describing the shape of the data to extract
options.instructionstringOptional hint that narrows what the AI focuses on (e.g. "Only read the hero section at the top")
options.gptClientGptClient | LanguageModelProject defaultOverride the AI provider for this call
options.timeoutnumber60000 (60 s)Maximum milliseconds to wait for the AI response before aborting

Basic usage

import { test, expect } from 'donobu';
import { z } from 'zod';

test('read pricing plans', async ({ page }) => {
  await page.goto('https://app.example.com/pricing');

  const pricing = await page.ai.extract(
    z.object({
      plans: z.array(
        z.object({
          name: z.string(),
          monthlyPrice: z.string(),
          features: z.array(z.string()),
        }),
      ),
    }),
  );

  expect(pricing.plans).toHaveLength(3);
  expect(pricing.plans[0].name).toBe('Starter');
});

With an instruction hint

const hero = await page.ai.extract(
  z.object({
    heading: z.string(),
    subheading: z.string().optional(),
    ctaLabel: z.string(),
  }),
  {
    instruction: 'Only read the hero section at the very top of the page, not any other sections',
  },
);

page.ai.extract vs. page.ai with a schema

page.ai.extractpage.ai with schema
InteractionRead-only — takes a screenshot and reads textFull flow — the AI may click, scroll, navigate, etc.
CacheNot cachedCached
SpeedFast (single LLM call)Slower (multiple tool calls)
Use whenThe data you need is already visible on the pageThe data requires navigation or interaction to reach

Throws

Schema mismatch: throws a Zod ZodError if the AI's response does not conform to the provided schema. This can happen when the page does not contain the expected data, or when the AI misinterprets the page content. Adding an instruction hint often resolves ambiguous cases.

Timeout: throws an AbortError if the AI response takes longer than options.timeout milliseconds (default: 60 seconds). Increase the timeout for complex pages or slow network conditions.

Neither error type is a PageAiExceptionpage.ai.extract does not run a full autonomous flow and therefore does not produce flow-level failure states.