Human-in-the-Loop (HITL) Review

Human-in-the-Loop (HITL) actions let you pause workflow execution for manual review and approval. This is essential when you need human judgment before proceeding—like reviewing AI-generated content, approving data transformations, or quality-checking critical outputs.

Quick Start

Add a HITL action to your workflow:

actions:
  - name: review_data
    kind: hitl  # Required: marks this as a human review action
    dependencies: [extract_data]
    intent: "Human reviews extracted data before processing"
    hitl:
      port: 3001  # Optional: port for review UI (default: 3001)
      instructions: "Review the extracted data for accuracy"
      timeout: 300  # Optional: seconds before timeout (default: 300)
      require_comment_on_reject: true  # Optional: require comment when rejecting
    context_scope:
      observe:
        - extract_data.*  # Data to review

When the agentic workflow reaches this action:

A browser UI opens at http://localhost:3001
Workflow pauses and waits for your decision
You review each record and approve/reject
After submitting, the workflow continues automatically

How It Works

Architecture

HITL actions use a client-server architecture:

HitlClient: Validates config, starts server, blocks workflow execution
HitlServer: Flask server serving the review UI at localhost
Browser UI: Interactive approval interface with per-record navigation

Review UI Features

The browser UI provides:

Per-record navigation: Review each record individually with auto-advance
Keyboard shortcuts:
- A - Approve current record
- R - Reject current record
- ←/→ - Navigate between records
View toggles:
- Fields view: Structured display of record fields
- JSON view: Raw JSON for debugging
Progress tracking: Visual progress bar and stats (pending/approved/rejected)
State persistence: Refresh the page without losing your reviews
Auto-shutdown: Server closes automatically after submission

Configuration

HITL Config Block

hitl:
  port: 3001                        # Port for review server (1024-65535)
  instructions: "Review carefully"  # Instructions shown in UI (required)
  timeout: 300                      # Seconds before auto-timeout (30-3600)
  require_comment_on_reject: true   # Require comment when rejecting

Configuration options:

Field	Type	Default	Description
`port`	int	3001	Server port (1024-65535). If busy, tries up to 5 consecutive ports.
`instructions`	str	(required)	Instructions displayed in review UI. Be clear and specific.
`timeout`	int	300	Seconds before timeout. Server shuts down and workflow continues with `hitl_status: timeout`.
`require_comment_on_reject`	bool	true	If true, rejecting a record requires a comment explaining why.

Workflow-Level Default Timeout

Set a default HITL timeout for all HITL actions in the workflow using defaults.hitl_timeout. Individual actions can still override this value.

defaults:
  hitl_timeout: 600  # 10 minutes for all HITL actions

actions:
  - name: review_data
    kind: hitl
    dependencies: [extract_data]
    hitl:
      instructions: "Review extracted data"
      # Uses workflow default: 600s

  - name: review_summary
    kind: hitl
    dependencies: [generate_summary]
    hitl:
      instructions: "Review generated summary"
      timeout: 120  # Overrides workflow default

Resolution order: action hitl.timeout > defaults.hitl_timeout > 300s hardcoded default.

Minimum timeout

The minimum allowed value is 5 seconds (useful for testing). For real reviews, use at least 60 seconds — reviewers need time to read instructions and inspect records.

Granularity

HITL actions always use FILE granularity — all records are presented in a single review session. Within that session, the reviewer can navigate between records, approve or reject each individually, and submit once. (File granularity here means one UI session for the entire batch, not one session per record.)

Record granularity not supported

Setting granularity: record on a HITL action raises a ConfigurationError. Record granularity would launch a separate approval UI per record, which is broken UX. If you need per-record filtering before HITL, use a guard to pre-filter records.

- name: review_data
  kind: hitl
  dependencies: [extract_data]
  hitl:
    instructions: "Review the full dataset and approve or reject"
  context_scope:
    observe:
      - extract_data.*

Output: Each record gets its own hitl_status, user_comment, and timestamp based on per-record review decisions in the UI.

[
  {
    "id": 1,
    "name": "Alice",
    "hitl_status": "approved",
    "user_comment": "",
    "timestamp": "2026-02-12T10:00:00Z"
  },
  {
    "id": 2,
    "name": "Bob",
    "hitl_status": "rejected",
    "user_comment": "Invalid email",
    "timestamp": "2026-02-12T10:00:00Z"
  }
]

Output Schema

HITL actions return decisions in a standardized format:

Response Fields

Field	Type	Description
`hitl_status`	str	`"approved"`, `"rejected"`, or `"timeout"`
`user_comment`	str	Optional comment from reviewer
`timestamp`	str	ISO-8601 timestamp (UTC) when review was submitted
`record_reviews`	list	(FILE mode only) Per-record decisions

Accessing HITL Decisions Downstream

HITL decision fields (hitl_status, user_comment, timestamp) are merged directly into each record's content. Downstream actions receive these fields at the top level of each item.

Guards

Guards evaluate against the item's content fields directly — do not prefix with the action name:

- name: process_approved_data
  dependencies: [review_data]
  guard:
    condition: "hitl_status == 'approved'"
    on_false: skip  # Skip processing if HITL rejected
  prompt: |
    Process the approved data:
    {{ review_data.* }}

    Reviewer comment: {{ review_data.user_comment }}

Guard field resolution

Guards evaluate against the flattened item content. The HITL fields (hitl_status, user_comment) are top-level keys in each record, not nested under the action name.

# Correct - field is at the top level of each item
condition: "hitl_status == 'approved'"

# Wrong - tries to look up data["review_data"]["hitl_status"], which doesn't exist
condition: "review_data.hitl_status == 'approved'"

Note: Prompt templates ({{ review_data.user_comment }}) use a different resolution mechanism (context scope) and do use the action name prefix. Guards do not.

Common guard patterns:

# Only process approved items (filter out rejected/timeout)
guard:
  condition: "hitl_status == 'approved'"
  on_false: filter

# Skip downstream if rejected (passthrough original content)
guard:
  condition: "hitl_status == 'approved'"
  on_false: skip

# Handle timeout
guard:
  condition: "hitl_status != 'timeout'"
  on_false: skip

Usage Patterns

Pattern 1: Quality Gate

Approve AI-generated content before using it:

actions:
  - name: generate_summary
    intent: "LLM generates article summary"
    prompt: "Summarize this article..."

  - name: review_summary
    kind: hitl
    dependencies: [generate_summary]
    intent: "Human reviews AI summary"
    hitl:
      instructions: "Review the generated summary for accuracy and tone"
    context_scope:
      observe:
        - generate_summary.summary

  - name: publish_summary
    dependencies: [review_summary]
    intent: "Publish approved summary"
    guard:
      condition: "hitl_status == 'approved'"
      on_false: skip
    prompt: "Publish the summary..."

Pattern 2: Batch Approval

Review and filter a dataset before processing:

actions:
  - name: extract_candidates
    intent: "Extract potential matches from data"

  - name: review_candidates
    kind: hitl
    dependencies: [extract_candidates]
    hitl:
      instructions: "Approve valid candidates, reject false positives"
      require_comment_on_reject: true
    context_scope:
      observe:
        - extract_candidates.*

  - name: process_approved_only
    dependencies: [review_candidates]
    intent: "Process only approved candidates"
    guard:
      condition: "hitl_status == 'approved'"
      on_false: filter

Pattern 3: Checkpoint Review

Pause between workflow stages for manual inspection:

actions:
  - name: stage_1_transformation
    intent: "Initial data transformation"

  - name: checkpoint_review
    kind: hitl
    granularity: file  # One decision for entire stage
    dependencies: [stage_1_transformation]
    hitl:
      instructions: "Verify stage 1 output before continuing to stage 2"
    context_scope:
      observe:
        - stage_1_transformation.*

  - name: stage_2_enrichment
    dependencies: [checkpoint_review]
    guard:
      condition: "hitl_status == 'approved'"
      on_false: filter  # Exclude items if stage 1 was rejected

Pattern 4: Pre-filtered HITL Review

Use a guard on the HITL action itself to show only flagged records to the reviewer:

actions:
  - name: auto_review_quality
    intent: "LLM scores each record for quality"
    prompt: "Score this Q&A for quality (1-10)..."

  - name: review_flagged_items
    kind: hitl
    dependencies: [auto_review_quality]
    guard:
      condition: 'decision == "review"'
      on_false: skip  # Auto-approved records skip HITL, preserve original content
    hitl:
      instructions: "Review items flagged by auto-review"
    context_scope:
      observe:
        - auto_review_quality.*

The guard runs per-record before the HITL UI launches. Only records where decision == "review" appear in the approval UI. See Guards with File Granularity for how on_false modes behave.

Debugging & Troubleshooting

Guard Filters or Skips All Items

If your downstream guard filters/skips every item (even approved ones), the most common causes are:

1. Using action name prefix in the guard condition

# Wrong - guard evaluates against flattened item content, not namespaced context
condition: "review_data.hitl_status == 'approved'"

# Correct - hitl_status is a top-level field in each item
condition: "hitl_status == 'approved'"

The guard evaluator flattens the item's content dict into a top-level namespace. Fields like hitl_status are accessed directly, not under the action name. See the guard field resolution warning above.

2. Wrong field name

Field names are case-sensitive. Verify the exact field name in your data:

The HITL server produces hitl_status (snake_case)
Your storage layer may display it as hitlStatus (camelCase)
Use the field name as it appears during processing, which is hitl_status

3. Default passthrough behavior

With passthrough_on_error: true (the default), if the guard condition errors (e.g., field not found), the item passes through. But if the field resolves to None and the comparison simply evaluates to False, that's not an error — the on_false behavior applies to all items.

Check Server Logs

HITL logs appear in workflow output:

45:32 | Action 'review_data': Starting HITL server...
45:32 | 🔍 APPROVAL REQUIRED
45:32 | ============================================================
45:32 | Open this URL in your browser:
45:32 |   http://localhost:3001
45:32 | ============================================================

Browser Console Debugging

Open DevTools (F12) → Console to see:

// When saving a decision:
Persisted decision for record 0: approved

// When refreshing (state restoration):
Loaded review state from server: {record_count: 3, record_reviews: [...]}
Restored 2 of 3 reviews from server

// If restoration fails:
Failed to restore review state: <error details>

State Refresh Issue

If refreshing the page shows all records as "pending":

Check browser console for errors
Check server logs for /api/review-state requests
Verify persistence by checking if decisions were saved:
- Approve a record
- Check console: Persisted decision for record 0: approved
- Refresh page
- Check console: Restored 1 of 3 reviews from server

If state isn't restoring, the console will show the error.

Port Conflicts

If the configured port is busy:

WARNING: Port 3001 in use, using 3002 instead

HITL tries up to 5 consecutive ports. If all are busy:

NetworkError: Could not find available port near 3001
Attempted ports: [3001, 3002, 3003, 3004, 3005]

Fix: Close unused servers or configure a different port range.

Timeout Handling

If the review isn't completed within the timeout:

hitl:
  timeout: 300  # 5 minutes

After timeout:

Server shuts down
Workflow continues with hitl_status: "timeout"
Downstream guards can handle this:

guard:
  condition: "hitl_status != 'timeout'"
  on_false: filter

Advanced Topics

Custom Ports

Configure port at workflow level to avoid conflicts:

actions:
  - name: review_stage_1
    kind: hitl
    hitl:
      port: 3001  # First review

  - name: review_stage_2
    kind: hitl
    hitl:
      port: 3002  # Second review (different port)

Workflow Continuation

After clicking "Submit Reviews":

✅ Server shuts down automatically (1.5s delay)
✅ Browser tab attempts to close (may not work due to browser security)
✅ Workflow continues immediately (doesn't wait for browser)
Message shown: "✅ Complete! Workflow is continuing. You can close this tab now."

The workflow continues as soon as you submit, even if the browser tab stays open.

Security Notes

✅ Server binds to 127.0.0.1 (localhost only, no network exposure)
✅ No authentication required (local-only access)
✅ No CSRF protection needed (not exposed to public network)
✅ HTML escaping prevents XSS
⚠️ Not suitable for remote or multi-user reviews (use webhooks instead)

Testing HITL Actions

For automated testing, mock the HITL decision:

from agent_actions.llm.providers.hitl.client import HitlClient

def test_workflow_with_hitl(monkeypatch):
    # Mock HITL to auto-approve
    def mock_invoke(self, context, config):
        return {
            "hitl_status": "approved",
            "user_comment": "Auto-approved for testing",
            "timestamp": "2026-02-12T10:00:00Z"
        }

    monkeypatch.setattr(HitlClient, "invoke", mock_invoke)

    # Run workflow - HITL will auto-approve
    result = run_workflow("workflow.yml")
    assert result.success

Quick Start​

How It Works​

Architecture​

Review UI Features​

Configuration​

HITL Config Block​

Workflow-Level Default Timeout​

Granularity​

Output Schema​

Response Fields​

Accessing HITL Decisions Downstream​

Guards​

Usage Patterns​

Pattern 1: Quality Gate​

Pattern 2: Batch Approval​

Pattern 3: Checkpoint Review​

Pattern 4: Pre-filtered HITL Review​

Debugging & Troubleshooting​

Guard Filters or Skips All Items​

Check Server Logs​

Browser Console Debugging​

State Refresh Issue​

Port Conflicts​

Timeout Handling​

Advanced Topics​

Custom Ports​

Workflow Continuation​

Security Notes​

Testing HITL Actions​

See Also​