Data I/O
Every agentic workflow needs data to flow in, through, and out. Agent Actions uses a standardized directory structure that makes this flow predictable and traceable.
Think of it like a factory floor: raw materials enter through one door (staging/), get registered for tracking (source/), move through workstations (actions), and finished products exit through another (target/). The directory structure enforces this separation, making it easy to inspect what went in and what came out.
Directory Structure
agent_workflow/
└── my_workflow/
├── agent_config/
│ └── my_workflow.yml # Workflow definition
├── agent_io/
│ ├── staging/ # Input data (starting point)
│ ├── source/ # Metadata tracking staging files (JSON mode)
│ ├── target/ # Output data (JSON mode)
│ └── outputs.db # SQLite database (database mode)
└── seed_data/ # Static reference data
Agent Actions supports two storage modes for source and target data:
- SQLite mode (default): All data in a single
outputs.dbdatabase file, configured viaoutput_storageinagent_actions.yml - JSON mode: Individual JSON files in
source/andtarget/directories
SQLite mode offers better query performance, built-in deduplication, and atomic writes. Staging data always remains as JSON files regardless of mode.
staging/
This is where your agentic workflow begins. Place input files here before running:
agent_io/staging/
├── document_1.json
├── document_2.json
└── batch_input.csv
You can also point the start node at a local folder via data_source config:
actions:
- name: extract_facts
data_source:
type: local
folder: ./data
file_type: [json, csv]
source/
Metadata layer that tracks what's in staging:
- References to staging files for lineage tracking
- Enables tracing outputs back to original inputs
- Auto-generated when you run the agentic workflow
target/
Outputs organized by action. In JSON mode:
agent_io/target/
├── node_0_extract_facts/
│ └── document_1.json
├── node_1_validate_facts/
│ └── document_1.json
└── node_2_summarize/
└── document_1.json
In SQLite mode, the same data is stored in the target_data table with action_name and relative_path columns.
Data Flow
Let's trace how a document moves through an agentic workflow:
Here is what happens at each stage:
- Input data placed in
staging/ - Agent Actions creates tracking references in
source/ - Each action writes to
target/node_{n}_{name}/ - Downstream actions read from upstream
target/folders - Filenames preserved through the agentic workflow
Notice that filenames stay consistent across all stages. The source/ layer provides lineage tracking—you can trace any output back to its original staging file, which is essential for debugging and auditing.
Storage Backend
Agent Actions uses a pluggable storage backend system for source and target data. The default SQLite backend stores all workflow data in a single database file.
SQLite Database Schema
The database contains two main tables:
| Table | Purpose |
|---|---|
source_data | Stores source records with deduplication by source_guid |
target_data | Stores action outputs organized by action_name |
Querying the Database
You can inspect workflow data directly using SQLite:
sqlite3 my_workflow/agent_io/outputs.db
-- List all actions with output
SELECT DISTINCT action_name FROM target_data;
-- Count records per action
SELECT action_name, SUM(record_count) FROM target_data GROUP BY action_name;
-- Preview data from an action
SELECT data FROM target_data WHERE action_name = 'extract_facts' LIMIT 1;
Benefits
- Performance: Indexed queries for fast data access
- Integrity: ACID transactions prevent partial writes
- Deduplication: Automatic source_guid-based deduplication
- Concurrency: WAL mode enables concurrent reads
Learn More
- Input Formats — JSON, CSV, and other supported formats
- Output Format — Output structure and lineage tracking
- Data Lineage — Ancestry chain for parallel merges and Map-Reduce
- Chunking — Split large documents for LLM processing