Overview
Copelf uses AI at every stage of workflow execution. Instead of relying on brittle CSS selectors, Copelf's AI understands web pages through a combination of DOM analysis and visual recognition — making workflows resilient to UI changes.
Element Detection
When a step targets a UI element (click, fill, or select), Copelf identifies it using two signals:
- DOM structure — The AI analyzes the page's HTML to find matching elements
- Visual appearance — A vision model examines a screenshot to confirm the correct element
Each target is described in natural language:
target:
description: 'Email input field in the login form'
vision:
hint: "Text field with placeholder 'Enter your email'"The description provides semantic context, while the vision.hint guides the visual model. Together, they allow Copelf to locate elements even when class names, IDs, or page layouts change.
Element detection uses Gemini 3.1 Flash Lite Preview — a lightweight model optimized for fast, cost-efficient element selection.
Step Verification
After each step executes, Copelf verifies the outcome by comparing the page state before and after the action.
Capture Before State
A screenshot and page metadata are recorded immediately before the step executes.
Execute the Step
The AI performs the action (click, fill, select, or navigate).
Capture After State
A new screenshot and page metadata are captured.
AI Comparison
The verification model compares before and after states and produces a result.
Verification results include:
| Field | Description |
|---|---|
| Confidence | A score from 0 to 1 indicating how certain the AI is that the step succeeded |
| Evidence type | What changed — visual, elements, text, or url |
| Explanation | A human-readable description of the observed change |
Step verification uses Gemini 3.1 Flash Lite Preview — the same lightweight model as element detection, keeping verification fast and affordable.
Browser Agent
The agent action type enables autonomous multi-step browser operation. Instead of a single action, the agent receives a natural language prompt and independently decides what to do.
How the Agent Works
Receive Prompt
The agent reads the prompt (e.g., "Fill in all required fields on this form") and takes a screenshot of the current page.
Plan and Act
The agent reasons about what needs to happen and executes browser actions using its built-in tools:
| Tool | Description |
|---|---|
view | Take a screenshot of the current page |
click | Click on a page element |
input | Type text into an input field |
navigate | Go to a URL |
Iterate
After each action, the agent captures a new screenshot, evaluates progress, and decides the next step. This loop continues until the task is complete or maxSteps is reached.
- id: step-5
action: agent
name: 'Complete registration'
with:
prompt: 'Fill in all required fields and submit the registration form'
maxSteps: 15The browser agent uses Gemini 3.1 Pro Preview — a more powerful model
capable of complex reasoning and multi-step planning. The maxSteps parameter
(1–100) controls the maximum number of iterations.
Adaptive Reasoning
Copelf adjusts its reasoning depth based on step complexity:
| Step Type | Model | Reasoning Depth |
|---|---|---|
| click, fill, select | Gemini 3.1 Flash Lite Preview | Lightweight — fast element detection and verification |
| agent | Gemini 3.1 Pro Preview | Deep — multi-step planning, complex form handling, error recovery |
| navigate | None | Direct URL navigation, no AI reasoning needed |
This adaptive approach keeps simple steps fast and cheap while giving complex tasks the reasoning power they need.
Safety in AI Automation
Copelf's AI automation is built on three safety pillars:
- Extension-based execution — Runs are sent to the connected Copelf browser extension and executed in that browser session
- Human-in-the-loop — Critical steps can require your approval before the AI proceeds
- Full visibility — Per-step screenshots and execution history help you review every run