agentafk
Configuration

Browser Control

Drive a real Chromium browser from the agent using five native tools. Covers setup, the tool surface, domain policies, and configuration.

agent-afk ships five native tools for driving a real Chromium browser. Each session gets its own isolated browser context. Subagents share the parent session's context.

Prerequisites

The Playwright Node package ships as a regular dependency and is installed with pnpm install. The ~300 MB Chromium binaries are not bundled — install them once before using a browser tool:

pnpm exec playwright install chromium

Quick start

# Install Chromium binaries (one-time)
pnpm exec playwright install chromium

# Start the REPL and ask the agent to browse
afk
> Open github.com and tell me what's on the homepage.

The agent issues browser_openbrowser_observe → additional actions under the hood. In REPL mode, Chromium opens headed by default so you can watch the agent work.

The five tools

ToolPurpose
browser_openNavigate to a URL and return a page observation
browser_observeRe-snapshot the current page without navigating
browser_actClick, fill, press, select, hover, scroll, or wait
browser_screenshotCapture a PNG of the viewport or a specific element
browser_closeTear down the current session's browser context

All five appear in afk --help once the package is built, alongside bash, read_file, and other tools.

Targeting elements in browser_act

browser_act accepts three target shapes:

// Preferred — by visible label and ARIA role.
{ "kind": "semantic", "text": "Sign in", "role": "button" }

// For unambiguous follow-up on an element from a recent observation.
{ "kind": "element_id", "element_id": "el_a1b2c3" }

// Escape hatch when the page has no accessible labels.
{ "kind": "selector", "selector": "button.primary[type=submit]" }

When a semantic target matches two or more elements, the tool returns an error with a disambiguation list of up to five candidates. The agent retries using element_id. Nothing silently picks among ambiguous matches.

Element IDs are stable within one observation only. Re-observe after any DOM mutation to get fresh IDs.

Configuration

All browser environment variables are optional.

VariableDefaultDescription
AFK_BROWSER_HEADLESSsurface-aware1 forces headless; 0 forces headed. Default is headed for REPL, headless for daemon/subagent/telegram.
AFK_BROWSER_ALLOWED_DOMAINS(empty — permissive)Comma-separated host globs. Non-matching navigation is blocked.
AFK_BROWSER_BLOCKED_DOMAINS(empty)Comma-separated host globs. Matching navigation is blocked regardless of the allowlist.
AFK_BROWSER_DOM_SNAPSHOTSoff1 writes a gzipped DOM snapshot per browser_act to ~/.afk/state/witness/<sid>/browser/dom-snapshots/.
AFK_BROWSER_BACKENDplaywrightReserved for future backends. Only playwright is supported.
AFK_BROWSER_CONFIG(none)Absolute path to a JSON config file that overrides env-derived settings.
AFK_SESSION_IDdefaultOverride the per-session BrowserContext key. Set when running multiple concurrent AFK processes that need isolated browser contexts.

Domain policy

export AFK_BROWSER_ALLOWED_DOMAINS="github.com,*.atlassian.net"
export AFK_BROWSER_BLOCKED_DOMAINS="*.ads.example.com"

Block wins over allow. The match uses simple * glob against the URL host — not a regex, not a path matcher. When the allowlist is non-empty, any host not on the list is refused. When the allowlist is empty (default), only the blocklist applies.

A blocked navigation returns an error the agent can read and act on — it tries a different URL instead of crashing.

JSON config file

For per-project overrides, create ~/.afk/config/browser.json (or point AFK_BROWSER_CONFIG at any absolute path):

{
  "headless": true,
  "allowedDomains": ["my-internal-tool.example.com"],
  "blockedDomains": [],
  "domSnapshots": false,
  "backend": "playwright"
}

File values override env-derived values. Arrays replace rather than append.

Trace and screenshots

Every browser tool emits records to ~/.afk/state/witness/<sessionId>/trace.jsonl:

  • tool_call — the generic dispatcher record for every tool.
  • browser_event — browser-domain semantics: URL transitions, action outcomes, screenshot paths, ambiguity verdicts, policy refusals.

Screenshots are written as sidecar files under ~/.afk/state/witness/<sessionId>/browser/screenshots/ and referenced by path in the trace — never embedded in the JSONL line.

The witness layer redacts known credential formats (AWS keys, OpenAI bearer tokens, GitHub PATs, Slack tokens, JWTs, password= form values) before persisting. The page receives the real value; only the trace file is redacted.

Troubleshooting

browserType.launch: Executable doesn't exist — run pnpm exec playwright install chromium.

Cannot find module 'playwright' — run pnpm install.

Navigation blocked by policy — the agent tried to reach a host outside AFK_BROWSER_ALLOWED_DOMAINS. Adjust the allowlist or unset it for permissive mode.

Zombie Chromium after a crash — SIGINT/SIGTERM/exit handlers normally close the browser provider. After an ungraceful kill:

ps aux | grep chromium | awk '{print $2}' | xargs kill