v1.17

Computer Use Remote, Visual Verification, & Platform Targeting

May 23, 2026

v1.17 turns Computer Use Remote into a fuller host desktop-control pipeline, with direct tool access, screenshot-backed verification, multimodal captures, platform-aware targeting, and clearer safety behavior when host permissions are missing.

🖥️ Host Desktop Control

computer_use_remote is callable in live sessions — The model can now invoke the remote computer-use tool directly, while availability, trust mode, and re-arm enforcement remain runtime checks.
Host desktop control is separated from Xpra — computer_use_remote is now the sole path for controlling the user's host desktop. linux-desktop targets only Agent Zero's internal Docker/Xpra desktop.
Host-screen requests route more accurately — Host desktop queries rank ahead of the Xpra skill, while explicit Agent Zero Desktop requests continue to target the internal desktop.

👁️ Visual Verification

Fresh screenshots are required after desktop actions — State-changing desktop actions are considered unverified until a new screenshot visibly confirms the result.
Agents stop when screenshots are unavailable — If a verification screenshot cannot be captured, the agent must stop instead of guessing or continuing blindly.
Screenshots return as multimodal results — Computer-use captures now arrive as real vision messages, allowing the model to inspect the screen visually after each action.
Older captures are pruned — Stale capture payloads are removed to keep long desktop-control sessions from growing context without bound.

🧭 Platform Targeting

macOS Accessibility targeting — A dedicated macOS skill supports Accessibility structural targeting through ax_snapshot and ax_action when the backend reports those capabilities.
Windows UIA targeting — Windows gains UI Automation guidance for window management, selector passthrough, and click-last workflows.
Linux AT-SPI and Wayland targeting — Linux snapshots include compact structural tree outlines so agents can select semantic targets more reliably.
Generic prompts stay backend-neutral — Backend-specific action details live in platform skills, while the generic layer focuses on capability discovery and skill loading.

🔐 Permissions & Safety

macOS approval denial handled cleanly — COMPUTER_USE_APPROVAL_REQUIRED now maps to the existing re-arm-required stop flow, preventing repeated retries or screenshot fallbacks before permission is granted.
Window-hide guidance updated — Ubuntu, GNOME, and Wayland sessions now prefer Super+H over Alt+F9.
Keystroke verification is clearer — Guidance now reminds agents that a sent key chord proves only that the keys were sent, not that the requested window action succeeded.

🧠 Vision & Context

Codex OAuth proxy preserves image inputs — Image content parts are converted to Responses API input_image parts instead of being flattened to text.
Multimodal regression coverage added — Screenshot-bearing tool results now have coverage to protect the vision path.
Screenshot token estimates fixed — Embedded base64 image data URLs are sanitized from prompt token estimates so screenshots no longer inflate context budgets.

Use Cases

Token

Architecture

Docs

Platform

Computer Use Remote, Visual Verification, & Platform Targeting

🖥️ Host Desktop Control

👁️ Visual Verification

🧭 Platform Targeting

🔐 Permissions & Safety

🧠 Vision & Context

Use Cases

Token

Architecture

Docs

Platform

About

My

Community

Governance

Computer Use Remote, Visual Verification, & Platform Targeting

🖥️ Host Desktop Control

👁️ Visual Verification

🧭 Platform Targeting

🔐 Permissions & Safety

🧠 Vision & Context