FlareDeskDocs

Workers AI

Monitor and inspect AI model inference calls from your worker

Workers AI lets you run machine learning models directly on Cloudflare's network. FlareDesk captures every AI call your worker makes, showing you the model used, input prompt, response, token counts, and latency, all in real time.

Overview

The Workers AI page in FlareDesk gives you full visibility into your AI inference pipeline:

Live Trace Feed

See every AI call in real time as your worker processes requests

Input & Output Inspector

View the full prompt sent and the complete model response

Token Counts

Track input and output tokens per inference call

Latency Tracking

Measure how long each model inference takes

Requirements

Before you start

  1. 1.Add the [ai] binding to your wrangler.toml
  2. 2.Run your worker with wrangler dev --remote (Workers AI requires remote execution)
  3. 3.Enable Profiling in FlareDesk (click Enable Profiling on the Workers AI page)
wrangler.toml
# Add this to your wrangler.toml
[ai]
binding = "AI"
worker env type
interface Env {
  AI: Ai;
  // ... other bindings
}

Viewing AI Traces

  1. 1

    Navigate to Workers AI in the sidebar under Bindings

  2. 2

    Click Enable Profiling if it's not already active

  3. 3

    Make a request to your worker that calls env.AI.run()

  4. 4

    The trace appears instantly in the list. Click it to inspect the full input and output

Live mode: FlareDesk auto-refreshes every 3 seconds when profiling is enabled. You can pause live updates by clicking the Live toggle in the header.

Inspecting a Trace

Click any trace in the list to open the detail drawer on the right side:

Trace Details Include

  • Model: The full model identifier (e.g. @cf/meta/llama-3.1-8b-instruct)
  • Duration: Total inference latency in milliseconds
  • Timestamp: Exact time the call was made
  • Input Tokens: Number of tokens in the prompt
  • Output Tokens: Number of tokens in the response
  • Request Input: The full prompt or messages array sent to the model
  • Response Output: The complete model response

Supported Models

FlareDesk captures traces for all Workers AI model categories:

Text Generation

@cf/meta/llama-3.1-8b-instruct

Text Classification

@cf/huggingface/distilbert-sst-2-int8

Text Embeddings

@cf/baai/bge-base-en-v1.5

Translation

@cf/meta/m2m100-1.2b

Summarization

@cf/facebook/bart-large-cnn

Image Classification

@cf/microsoft/resnet-50

Handling Errors

Failed AI calls are clearly marked with an Error badge in the trace list. Clicking the trace shows the full error message in the detail drawer.

Common error: AI not available in local mode

Workers AI cannot run in local-only mode. If you see this error, restart your worker with wrangler dev --remote.

Tips & Best Practices

Use structured messages

Pass a messages array (OpenAI-style) rather than a raw prompt string for better visibility in the inspector.

Monitor token usage

Keep an eye on input/output token counts to understand your usage patterns and optimise prompt lengths for cost and speed.

Use the Profiler for full context

The Profiler shows AI calls alongside all other binding calls (D1, KV, R2) in a waterfall view, great for understanding the full latency breakdown of a request.

Next Steps

Profiler

See AI calls alongside all other bindings in a waterfall view