Data Lineage

Storage backend: SQLite as of v0.2.6. File paths under agent_io/target/.../data.json in older snippets are stale; this page uses the current SQLite layout (agent_io/store/<workflow>.db, table target_data, where each row's data column is a JSON array of records).

How does Agent Actions track data as it flows through parallel branches, merges, and splits? The Ancestry Chain provides complete lineage tracking that enables complex workflow patterns like Diamond, Map-Reduce, and Ensemble voting.

Overview

Every record in Agent Actions carries lineage metadata:

jsonJSON
{
  "source_guid": "cbbd09ca-2503-591c-b712-4c378c101b9d",
  "target_id": "550e8400-e29b-41d4-a716-446655440003",
  "parent_target_id": "550e8400-e29b-41d4-a716-446655440001",
  "root_target_id": "550e8400-e29b-41d4-a716-446655440000",
  "node_id": "merge_abc12345-6789-0abc-def0-123456789abc",
  "lineage": [
    "extract_def45678-1234-5678-9abc-def012345678",
    "validate_ghi78901-2345-6789-abcd-ef0123456789",
    "merge_abc12345-6789-0abc-def0-123456789abc"
  ],
  "content": { ... }
}

Ancestry Fields

Field	Purpose	Use Case
`source_guid`	Correlates records from same source file	Basic file-level grouping
`target_id`	Unique identifier for this specific record	Individual record tracking
`parent_target_id`	Links to immediate parent record	Parallel branch merge (Diamond pattern)
`root_target_id`	Links to original ancestor record	Map-Reduce aggregation
`lineage`	Array of node IDs this record passed through	Sequential chain tracking

How It Works

Record Lifecycle

When a record flows through an action, the ancestry chain propagates automatically:

Propagation Rules:

target_id = new UUID (unique for each output)
parent_target_id = input's target_id (links to immediate parent)
root_target_id = input's root_target_id (preserves original ancestor)

First Record (Root)

When a record first enters the pipeline:

jsonJSON
{
  "target_id": "ROOT-UUID",
  "root_target_id": "ROOT-UUID"
}

The first record is its own root—root_target_id equals target_id. Note that parent_target_id is absent on root records (not null): use a presence check ("parent_target_id" in record), not record["parent_target_id"] is None.

Parallel Branch Merge (Diamond Pattern)

Multiple actions process the same data in parallel, then a downstream action needs all their outputs.

yamlYAML
actions:
  - name: validate
    dependencies: []

  - name: generate_seo
    dependencies: validate

  - name: generate_recommendations
    dependencies: validate

  - name: assess_reading_level
    dependencies: validate

  - name: score_quality
    dependencies: [generate_seo, generate_recommendations, assess_reading_level]

How matching works: All three parallel branches share the same parent_target_id (the validate action's target_id). When score_quality runs, it queries for records with that parent_target_id and finds all siblings.

Accessing Parallel Outputs

In your merge action's prompt, use namespaced field references:

yamlYAML
- name: score_quality
  dependencies: [generate_seo, generate_recommendations, assess_reading_level]
  prompt: |
    SEO Keywords: {{ generate_seo.primary_keywords }}
    Similar Books: {{ generate_recommendations.similar_books }}
    Reading Level: {{ assess_reading_level.reading_level }}

    Score the overall quality of this enriched catalog entry.

Map-Reduce Pattern

For splitting a document into chunks, processing each, then aggregating results:

yamlYAML
actions:
  - name: chunk_document
    kind: tool
    impl: chunk_document

  - name: process_chunk
    dependencies: chunk_document

  - name: aggregate_results
    dependencies: process_chunk
    kind: tool
    impl: aggregate_results

How matching works: All chunks preserve the original document's root_target_id. The aggregate action queries by root_target_id to collect all descendants.

Ensemble/Voting Pattern

Run the same input through multiple models, then select or combine the best answers:

yamlYAML
actions:
  - name: prepare

  - name: gpt4_answer
    dependencies: prepare
    model_vendor: openai

  - name: claude_answer
    dependencies: prepare
    model_vendor: anthropic

  - name: gemini_answer
    dependencies: prepare
    model_vendor: google

  - name: best_answer
    dependencies: [gpt4_answer, claude_answer, gemini_answer]

All three model responses share the same parent_target_id, enabling best_answer to access all of them for comparison.

Conditional Merge

When some branches may be skipped due to guards:

yamlYAML
actions:
  - name: classify

  - name: fast_path
    dependencies: classify
    guard:
      condition: "classify.complexity == 'low'"

  - name: slow_path
    dependencies: classify
    guard:
      condition: "classify.complexity == 'high'"

  - name: combine
    dependencies: [fast_path, slow_path]

Handling missing branches: The merge action receives null for skipped branches. Your template should handle this gracefully:

Important: On root records (records with no parent), parent_target_id is absent from the JSON object — not null. Always use a presence check ("parent_target_id" in record) rather than a null check (record["parent_target_id"] is None). A null check will raise KeyError on root records.

yamlYAML
- name: combine
  prompt: |
    {% if fast_path %}Fast result: {{ fast_path.result }}{% endif %}
    {% if slow_path %}Slow result: {{ slow_path.result }}{% endif %}

FILE-Mode Lineage

FILE-mode tools receive all records at once. The framework tracks each record's identity through the tool using node_id — inspired by Apache NiFi's FlowFile model where every record carries an immutable UUID through every processor.

How it works:

Each input record carries a node_id from the previous action
The tool receives full records and returns them
The framework matches each output to its input by node_id
Matched outputs extend the parent's lineage chain
Outputs without node_id (aggregation results) get fresh lineage

textTXT
Input records:                        Tool output:                     After enrichment:
┌──────────────────────┐              ┌──────────────────────┐         ┌──────────────────────┐
│ node_id: flatten_q0  │──── kept ───▶│ node_id: flatten_q0  │────────▶│ lineage: [...,       │
│ content: {q: "Q0"}   │              │ content: {q: "Q0"}   │         │   flatten_q0,        │
└──────────────────────┘              └──────────────────────┘         │   dedup_tool_0]      │
┌──────────────────────┐                                               └──────────────────────┘
│ node_id: flatten_q1  │──── dropped (not in output)
│ content: {q: "Q1"}   │
└──────────────────────┘
┌──────────────────────┐              ┌──────────────────────┐         ┌──────────────────────┐
│ node_id: flatten_q2  │──── kept ───▶│ node_id: flatten_q2  │────────▶│ lineage: [...,       │
│ content: {q: "Q2"}   │              │ content: {q: "Q2"}   │         │   flatten_q2,        │
└──────────────────────┘              └──────────────────────┘         │   dedup_tool_1]      │
                                                                       └──────────────────────┘

Downstream actions can use context_scope.observe to load data from any ancestor in the lineage chain — because each record traces back to the correct parent, not a shared fallback.

Matching Priority

When loading historical data, Agent Actions uses this priority:

Lineage match — Dependency's node_id is in current record's lineage (sequential chain)
Parent match — Records share the same parent_target_id (parallel siblings)
Root match — Records share the same root_target_id (Map-Reduce descendants)

Debugging Lineage

Inspect Record Ancestry

bashBASH
sqlite3 agent_io/store/<workflow>.db "
  SELECT json_extract(r.value, '\$.source_guid'),
         json_extract(r.value, '\$.target_id'),
         json_extract(r.value, '\$.parent_target_id'),
         json_extract(r.value, '\$.root_target_id'),
         json_extract(r.value, '\$.lineage')
  FROM target_data t, json_each(t.data) r
  WHERE t.action_name = 'merge'
  LIMIT 1
"

Verify Sibling Relationships

Check that parallel branches share the same parent:

bashBASH
# All three should return the same parent_target_id
sqlite3 agent_io/store/<workflow>.db "
  SELECT t.action_name, json_extract(r.value, '\$.parent_target_id')
  FROM target_data t, json_each(t.data) r
  WHERE t.action_name IN ('branch_a', 'branch_b', 'branch_c')
"

Trace Root Ancestry

For Map-Reduce, verify all chunks trace back to the same root:

bashBASH
sqlite3 agent_io/store/<workflow>.db "
  SELECT DISTINCT json_extract(r.value, '\$.root_target_id')
  FROM target_data t, json_each(t.data) r
  WHERE t.action_name = 'process_chunk'
"
# Should output exactly one UUID

Overview​

Ancestry Fields​

How It Works​

Record Lifecycle​

First Record (Root)​

Parallel Branch Merge (Diamond Pattern)​

Accessing Parallel Outputs​

Map-Reduce Pattern​

Ensemble/Voting Pattern​

Conditional Merge​

FILE-Mode Lineage​

Matching Priority​

Debugging Lineage​

Inspect Record Ancestry​

Verify Sibling Relationships​

Trace Root Ancestry​

See Also​