Skip to main content

Data Lineage

How does Agent Actions track data as it flows through parallel branches, merges, and splits? The Ancestry Chain provides complete lineage tracking that enables complex workflow patterns like Diamond, Map-Reduce, and Ensemble voting.

Overview

Every record in Agent Actions carries lineage metadata:

{
"source_guid": "cbbd09ca-2503-591c-b712-4c378c101b9d",
"target_id": "550e8400-e29b-41d4-a716-446655440003",
"parent_target_id": "550e8400-e29b-41d4-a716-446655440001",
"root_target_id": "550e8400-e29b-41d4-a716-446655440000",
"node_id": "merge_abc12345-6789-0abc-def0-123456789abc",
"lineage": [
"extract_def45678-1234-5678-9abc-def012345678",
"validate_ghi78901-2345-6789-abcd-ef0123456789",
"merge_abc12345-6789-0abc-def0-123456789abc"
],
"content": { ... }
}

Ancestry Fields

FieldPurposeUse Case
source_guidCorrelates records from same source fileBasic file-level grouping
target_idUnique identifier for this specific recordIndividual record tracking
parent_target_idLinks to immediate parent recordParallel branch merge (Diamond pattern)
root_target_idLinks to original ancestor recordMap-Reduce aggregation
lineageArray of node IDs this record passed throughSequential chain tracking

How It Works

Record Lifecycle

When a record flows through an action, the ancestry chain propagates automatically:

Propagation Rules:

  1. target_id = new UUID (unique for each output)
  2. parent_target_id = input's target_id (links to immediate parent)
  3. root_target_id = input's root_target_id (preserves original ancestor)

First Record (Root)

When a record first enters the pipeline:

{
"target_id": "ROOT-UUID",
"parent_target_id": null,
"root_target_id": "ROOT-UUID"
}

The first record is its own root—root_target_id equals target_id.

Parallel Branch Merge (Diamond Pattern)

Multiple actions process the same data in parallel, then a downstream action needs all their outputs.

actions:
- name: validate
dependencies: []

- name: generate_seo
dependencies: validate

- name: generate_recommendations
dependencies: validate

- name: assess_reading_level
dependencies: validate

- name: score_quality
dependencies: [generate_seo, generate_recommendations, assess_reading_level]

How matching works: All three parallel branches share the same parent_target_id (the validate action's target_id). When score_quality runs, it queries for records with that parent_target_id and finds all siblings.

Accessing Parallel Outputs

In your merge action's prompt, use namespaced field references:

- name: score_quality
dependencies: [generate_seo, generate_recommendations, assess_reading_level]
prompt: |
SEO Keywords: {{ generate_seo.primary_keywords }}
Similar Books: {{ generate_recommendations.similar_books }}
Reading Level: {{ assess_reading_level.reading_level }}

Score the overall quality of this enriched catalog entry.

Map-Reduce Pattern

For splitting a document into chunks, processing each, then aggregating results:

actions:
- name: chunk_document
kind: tool
impl: chunk_document

- name: process_chunk
dependencies: chunk_document

- name: aggregate_results
dependencies: process_chunk
kind: tool
impl: aggregate_results

How matching works: All chunks preserve the original document's root_target_id. The aggregate action queries by root_target_id to collect all descendants.

Ensemble/Voting Pattern

Run the same input through multiple models, then select or combine the best answers:

actions:
- name: prepare

- name: gpt4_answer
dependencies: prepare
model_vendor: openai

- name: claude_answer
dependencies: prepare
model_vendor: anthropic

- name: gemini_answer
dependencies: prepare
model_vendor: google

- name: best_answer
dependencies: [gpt4_answer, claude_answer, gemini_answer]

All three model responses share the same parent_target_id, enabling best_answer to access all of them for comparison.

Conditional Merge

When some branches may be skipped due to guards:

actions:
- name: classify

- name: fast_path
dependencies: classify
guard:
condition: "classify.complexity == 'low'"

- name: slow_path
dependencies: classify
guard:
condition: "classify.complexity == 'high'"

- name: combine
dependencies: [fast_path, slow_path]

Handling missing branches: The merge action receives null for skipped branches. Your template should handle this gracefully:

- name: combine
prompt: |
{% if fast_path %}Fast result: {{ fast_path.result }}{% endif %}
{% if slow_path %}Slow result: {{ slow_path.result }}{% endif %}

Matching Priority

When loading historical data, Agent Actions uses this priority:

  1. Lineage match — Dependency's node_id is in current record's lineage (sequential chain)
  2. Parent match — Records share the same parent_target_id (parallel siblings)
  3. Root match — Records share the same root_target_id (Map-Reduce descendants)

Debugging Lineage

Inspect Record Ancestry

jq '.[0] | {source_guid, target_id, parent_target_id, root_target_id, lineage}' \
agent_io/target/merge/data.json

Verify Sibling Relationships

Check that parallel branches share the same parent:

# All three should have the same parent_target_id
jq '.[].parent_target_id' agent_io/target/branch_a/*.json
jq '.[].parent_target_id' agent_io/target/branch_b/*.json
jq '.[].parent_target_id' agent_io/target/branch_c/*.json

Trace Root Ancestry

For Map-Reduce, verify all chunks trace back to the same root:

jq '.[].root_target_id' agent_io/target/process_chunk/*.json | sort -u
# Should output exactly one UUID

See Also