Data Migration

Data Migration enables you to extract data from source databases and synchronize it to destination databases using the same powerful query execution layer used for notifications.

Purpose

Data Migration allows you to:

Transfer data between databases (same or different database engines)
Execute multi-step queries across multiple databases for complex data extraction
Synchronize data with multiple modes (Insert, Upsert, Truncate, Sync with Delete)
Schedule automated data transfers with cron expressions
Track execution history with detailed performance metrics
Handle failures with automatic retry mechanism

Use Cases

Data Warehouse ETL: Extract data from production databases and load into analytics databases
Multi-Tenant Data Sync: Synchronize data between tenant databases
Cross-Database Reporting: Consolidate data from PostgreSQL, MySQL, and SQL Server databases
Backup & Archival: Copy data to backup databases on a schedule
Data Distribution: Distribute master data to multiple downstream systems
Database Migration: Migrate data between different database engines

Creating a Migration Job

Step 1: Navigate to Data Migration

Log in to Beacon at the React UI (/login)
Click Data Migration in the left navigation menu (/data-migration)
Click Create New Migration Job

Step 2: Fill Basic Information

Field	Description	Required	Example
Name	Descriptive migration job name	Yes	`Daily User Sync`
Description	Purpose of this migration	No	`Sync active users to analytics database`

Step 3: Configure Source Query

The source query extracts data from one or more databases using multi-step queries.

Single-Step Query

For simple data extraction from one database:

Click Add Query Step
Fill in step details:
- Step Name: Descriptive name (e.g., “Extract Active Users”)
- Description: What this step does
- Project: Select source database project
- SQL Query: Write your SELECT query

Example:

SELECT user_id, username, email, created_at, last_login
FROM users
WHERE active = true
  AND last_login > CURRENT_DATE - INTERVAL '30 days'

Multi-Step Queries

For complex data extraction across multiple databases:

Step 1 - Extract Users:

-- Project: Production PostgreSQL
SELECT user_id, username, email
FROM users
WHERE active = true

Step 2 - Enrich with Orders:

-- Project: Orders MySQL
-- Reference previous step results with @result1
SELECT
    u.user_id,
    u.username,
    u.email,
    COUNT(o.order_id) as total_orders,
    SUM(o.amount) as total_spent
FROM @result1 u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username, u.email

Step 3 - Add Analytics:

-- Project: Analytics SQL Server
-- Reference any previous step: @result1, @result2
SELECT
    r.*,
    a.page_views,
    a.session_count
FROM @result2 r
LEFT JOIN user_analytics a ON r.user_id = a.user_id

Multi-Step Features:

Reference previous results using @result1, @result2, etc.

Each step can use a different project/database

Mix any of Beacon’s connectors in the same pipeline

Visual flow diagram shows step dependencies

Cross-database joins materialize intermediate results into an in-memory SQLite database, where the final join runs

Test Your Query

Click Preview Query to test execution
Review results in the preview panel
Verify data structure matches destination table
Check for any errors

Step 4: Configure Destination

Field	Description	Example
Destination Project	Target database	Analytics Database
Destination Table	Target table name	`user_summary` or `schema.user_summary`
Migration Mode	How to write data	See modes below

Migration Modes

Insert Only

Inserts new rows only
Skips rows that already exist
Use for: Append-only logs, historical data

Upsert (Insert or Update)

Inserts new rows
Updates existing rows based on primary key
Use for: Maintaining current state, incremental updates

Truncate and Load

Clears destination table completely
Inserts all rows from source
Use for: Full data refresh, replacing entire dataset

Sync with Delete

Inserts new rows
Updates existing rows
Deletes rows missing from source
Use for: Perfect synchronization, removing obsolete data

Truncate and Sync with Delete modes permanently delete data. Test thoroughly before using in production.

Step 5: Configure Execution Options

Field	Description	Default	Range
Schedule	Cron expression for automated execution	Empty (manual only)	Valid cron
Max Retries	Retry attempts on failure	3	0-10
Timeout (minutes)	Maximum execution time	30	1-1440
Enabled	Enable automatic scheduled execution	Unchecked	-
Validate Before Execution	Test query and table before running	Checked	-

Common Schedules:

Expression	Description
`0 2 * * *`	Daily at 2:00 AM
`0 /6 * *`	Every 6 hours
`0 0 * * 0`	Weekly on Sunday at midnight
`0 0 1 * *`	Monthly on the 1st at midnight

Use crontab.guru to build and validate cron expressions.

Step 6: Advanced Options (Optional)

Transformation Script

Apply data transformations using @result syntax
Modify column values before writing to destination
Currently in development

Step 7: Save Migration Job

Click Save to create the migration job.

Managing Migration Jobs

View Migration Jobs

The Data Migration page shows all configured jobs with:

Job name and description
Source and destination projects
Destination table
Schedule (if configured)
Last execution status
Enabled/disabled state
Actions (Execute, Edit, History, Delete)

Execute Job Manually

Test or run a job on-demand without waiting for the schedule:

Click Execute (play icon) on the job row
Execution starts immediately
Monitor progress in execution history
Check destination database for results

Manual executions don’t affect the cron schedule. Scheduled executions continue as configured.

View Execution History

Review past executions with detailed metrics:

Click History (clock icon) on the job row
Or navigate to Data Migration → History for all jobs
Filter by:
- Status (Completed, Failed, Running, etc.)
- Date Range (Start Date, End Date)

Execution History Shows:

Start time and duration
Status with color-coded indicator
Rows read from source
Rows written to destination
Rows failed (with error details)
Performance (rows/second)
Retry attempt number

Edit Migration Job

Click Edit (pencil icon) on the job row
Modify any configuration
Test changes with Preview Query
Click Save

Changes to query or destination table structure may break existing jobs. Always test after editing.

Enable/Disable Job

Temporarily pause scheduled execution:

Toggle the Enabled switch in the edit page
Or use the Disable action on the job row
Disabled jobs won’t execute on schedule
Manual execution is still available

Use cases for disabling:

Database maintenance windows
Destination table schema changes in progress
Temporary data freeze requirements
Testing alternative approaches

Delete Migration Job

Click Delete (trash icon) on the job row
Confirm deletion in the dialog
Job is archived (soft delete)
Execution history is preserved

Examples

Example 1: Daily User Sync

Scenario: Sync active users from production to analytics database every night.

Configuration:

Name: Daily Active User Sync
Source Project: Production PostgreSQL

Query:

SELECT user_id, username, email, created_at, last_login, subscription_tier
FROM users
WHERE active = true

Destination Project: Analytics PostgreSQL
Destination Table: analytics.active_users
Mode: Truncate and Load
Schedule: 0 2 * * * (daily at 2 AM)
Timeout: 60 minutes

Purpose: Provide analytics team with fresh daily snapshot of active users.

Example 2: Multi-Database Order Summary

Scenario: Combine user data from PostgreSQL, orders from MySQL, and analytics from SQL Server.

Configuration:

Name: Multi-DB Order Summary

Step 1 - Extract Users:

-- Project: Users DB (PostgreSQL)
SELECT user_id, username, email, signup_date
FROM users
WHERE active = true

Step 2 - Join Orders:

-- Project: Orders DB (MySQL)
SELECT
    u.*,
    COUNT(o.order_id) as order_count,
    SUM(o.total_amount) as lifetime_value,
    MAX(o.order_date) as last_order_date
FROM @result1 u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.user_id, u.username, u.email, u.signup_date

Step 3 - Add Web Analytics:

-- Project: Analytics DB (SQL Server)
SELECT
    r.*,
    a.total_sessions,
    a.total_pageviews,
    a.avg_session_duration
FROM @result2 r
LEFT JOIN web_analytics a ON r.user_id = a.user_id

Destination Project: Data Warehouse
Destination Table: dwh.customer_360
Mode: Upsert
Schedule: 0 */6 * * * (every 6 hours)

Purpose: Create unified customer view across systems.

Example 3: Incremental Log Migration

Scenario: Copy application logs from production to archive database, only new entries.

Configuration:

Name: Incremental Log Archive
Source Project: Production DB

Query:

SELECT log_id, timestamp, level, message, user_id, request_id
FROM application_logs
WHERE timestamp > CURRENT_TIMESTAMP - INTERVAL '1 hour'
  AND level IN ('ERROR', 'WARN')
ORDER BY timestamp DESC

Destination Project: Archive DB
Destination Table: logs.application_errors
Mode: Insert Only
Schedule: 0 * * * * (every hour)
Timeout: 15 minutes

Purpose: Archive error logs for long-term retention and compliance.

Example 4: Weekly Tenant Data Sync

Scenario: Synchronize master data to multiple tenant databases weekly.

Configuration:

Name: Tenant A Product Sync
Source Project: Master DB

Query:

SELECT product_id, sku, name, description, price, category, active
FROM products
WHERE active = true
  AND tenant_id = 'tenant_a'

Destination Project: Tenant A DB
Destination Table: public.products
Mode: Sync with Delete
Schedule: 0 0 * * 0 (Sunday at midnight)
Timeout: 30 minutes

Purpose: Keep tenant product catalogs in sync with master, removing discontinued items.

Migration Modes in Detail

Insert Only Mode

Behavior:

Attempts to insert each row from source
If row already exists (primary key conflict), skips it
Continues processing remaining rows
No updates, no deletes

Performance:

Fastest mode (no lookups required)
Minimal database load

Best For:

Append-only data (logs, events, history)
One-time data loads
When updates never happen

Example:

-- Source: New user registrations from last hour
SELECT user_id, email, signup_date
FROM users
WHERE signup_date > NOW() - INTERVAL '1 hour'

Upsert Mode

Behavior:

Attempts to insert each row
If row exists (primary key conflict), updates it
No deletes

Requirements:

Destination table must have a primary key
Source query must include primary key column(s)

Best For:

Maintaining current state
Incremental updates
Slowly changing dimensions

Example:

-- Source: Current user profile data
SELECT user_id, username, email, last_login, profile_updated_at
FROM users
WHERE active = true

Truncate and Load Mode

Behavior:

Deletes ALL rows from destination table
Inserts all rows from source query
Atomic operation (transaction-based)

DESTRUCTIVE: All existing data is deleted before insert. Rollback occurs only if insert fails.

Best For:

Complete data refresh
Small to medium datasets
When destination has no dependencies (no foreign keys)

Example:

-- Source: Complete current product catalog
SELECT product_id, name, price, stock_quantity
FROM products
WHERE active = true

Sync with Delete Mode

Behavior:

Inserts rows that don’t exist in destination
Updates rows that exist in destination
Deletes rows in destination that aren’t in source
Perfect synchronization

Requirements:

Destination table must have a primary key
Source query must include primary key column(s)

DESTRUCTIVE: Deletes rows missing from source. Ensure source query is correct.

Best For:

Perfect replica synchronization
Master data distribution
Removing obsolete data automatically

Example:

-- Source: Current active customer list
SELECT customer_id, name, email, status
FROM customers
WHERE status IN ('active', 'pending')
-- Inactive customers will be DELETED from destination

Error Handling & Retry

Row-Level Error Handling

Migration jobs handle errors at the row level:

Failed rows are recorded in execution history
Processing continues for remaining rows
Status becomes “Partial Success” if some rows fail
Error limiting: Stops after 100 consecutive failures

Common Row Errors:

Data type mismatch
Constraint violations (foreign key, check, unique)
Column count mismatch
NULL in NOT NULL column

Automatic Retry

Failed jobs can automatically retry:

Configure Max Retries (0-10) in job settings
Retry uses same query and parameters
Execution history tracks retry chain
Exponential backoff between retries (future enhancement)

Retry Scenarios:

Source database temporarily unavailable
Destination database connection timeout
Network interruption
Query timeout (execution exceeds configured limit)

Viewing Error Details

Navigate to Data Migration → History
Find failed or partial success execution
Click Error Details to view:
- Full error message
- Exception stack trace
- Row-specific failures
- Timestamp of failure

Performance Tracking

Execution Metrics

Each execution records:

Start Time: When execution began
Completion Time: When execution finished
Duration: Total execution time (formatted: 1h 23m 45s)
Source Rows Read: Total rows extracted from source
Destination Rows Written: Rows successfully inserted/updated
Rows Failed: Rows that failed to write
Throughput: Rows per second

Performance Optimization Tips

For Slow Source Queries:

Add indexes on filter columns
Reduce result set size with WHERE clauses
Break into smaller multi-step queries
Increase query timeout if complex aggregations needed

For Slow Destination Writes:

Ensure destination has appropriate indexes
Use Insert Only mode when possible (fastest)
Avoid Upsert/Sync modes for large datasets (require lookups)
Check destination database load during execution

For Large Datasets:

Schedule during off-peak hours
Increase timeout setting appropriately
Consider batch processing (multiple jobs with date ranges)
Monitor row-level errors for constraints

Bulk inserts and updates are performed via EFCore.BulkExtensions for high-throughput writes to the destination.

Validation

Pre-Execution Validation

When Validate Before Execution is enabled, Beacon checks:

Source query syntax is valid
Source database is accessible
Destination table exists
Destination database is accessible
Column count/types are compatible (basic check)

Validation runs the query in a transaction and rolls back immediately. No data is written.

Connection Validation

Test connectivity before saving:

Click Test Query in source configuration
Review preview results
Verify destination project is connectable
Check destination table exists with correct schema

Troubleshooting

Job Not Executing on Schedule

Check:

Job is enabled (toggle in edit page)
Schedule cron expression is valid
Next execution time shows correct schedule
Application background job service is running

Verify:

Check application logs for job scheduling errors
Ensure system time zone is configured correctly

Source Query Fails

Common Issues:

Syntax error in SQL query
Table or column doesn’t exist
Source database connection failed
Query timeout (exceeds configured limit)
Insufficient permissions (need SELECT)

Solutions:

Use Preview Query to test before saving
Check source project connection string
Verify database user has SELECT permissions
Increase timeout for complex queries

Destination Write Fails

Common Issues:

Destination table doesn’t exist
Column mismatch (count or names differ)
Data type incompatibility
Constraint violations (foreign key, unique, check)
Insufficient permissions (need INSERT/UPDATE/DELETE)

Solutions:

Verify destination table schema matches source query columns
Check destination project connection
Verify database user has appropriate permissions:
- Insert Only: INSERT permission
- Upsert: INSERT, UPDATE permissions
- Truncate: DELETE, INSERT permissions
- Sync with Delete: INSERT, UPDATE, DELETE permissions

Partial Success Status

When It Happens:

Some rows written successfully
Some rows failed (constraint violations, data errors)
Execution completed but with errors

Action:

View execution history
Click Error Details
Review failed row errors
Fix data issues in source or schema in destination
Re-run job

Performance Issues

Symptoms:

Execution consistently times out
Duration much longer than expected
High rows/second but still slow overall

Diagnosis:

Check execution history duration trends
Review source query execution plan
Check destination database load
Monitor network latency between databases

Solutions:

Optimize source query (add indexes, reduce complexity)
Increase timeout setting
Schedule during off-peak hours
Split into multiple smaller jobs
Use faster migration mode (Insert vs Upsert)

Best Practices

Query Design

Use WHERE Clauses: Filter unnecessary data at source
Select Only Needed Columns: Avoid SELECT *
Test with LIMIT: Test queries with small datasets first
Index Source Tables: Ensure filter columns are indexed
Use Multi-Step for Complexity: Break complex logic into steps

Scheduling

Off-Peak Hours: Schedule large migrations during low-traffic periods
Stagger Jobs: Don’t schedule multiple large jobs simultaneously
Match Frequency to Freshness: Don’t over-schedule if data changes slowly
Allow Time Windows: Use buffers between dependent jobs

Error Prevention

Test Before Production: Execute manually several times before enabling schedule
Validate Data Types: Ensure source and destination schemas match
Use Transactions: Rely on atomic operations (built-in)
Monitor First Runs: Watch initial scheduled executions closely
Enable Validation: Keep “Validate Before Execution” checked

Monitoring

Check History Regularly: Review execution history weekly
Set Up Alerts: Create notifications for failed migrations (future enhancement)
Track Performance: Monitor duration trends over time
Review Errors: Don’t ignore partial success statuses

Security

Use Read-Only Users for Source: Limit permissions to SELECT only
Limit Destination Permissions: Grant only needed permissions (INSERT/UPDATE/DELETE)
Avoid Sensitive Data: Don’t migrate passwords, tokens, or PII unless necessary
Audit Regularly: Review job configurations and execution history

Queries - Understanding the query execution layer
Data Sources - Managing database connections
Subscriptions - Scheduling and automation concepts
Configuration - Connection string reference

Data Migration

Purpose

Use Cases

Creating a Migration Job

Step 1: Navigate to Data Migration

Step 2: Fill Basic Information

Step 3: Configure Source Query

Single-Step Query

Multi-Step Queries

Test Your Query

Step 4: Configure Destination

Migration Modes

Step 5: Configure Execution Options

Step 6: Advanced Options (Optional)

Step 7: Save Migration Job

Managing Migration Jobs

View Migration Jobs

Execute Job Manually

View Execution History

Edit Migration Job

Enable/Disable Job

Delete Migration Job

Examples

Example 1: Daily User Sync

Example 2: Multi-Database Order Summary

Example 3: Incremental Log Migration

Example 4: Weekly Tenant Data Sync

Migration Modes in Detail

Insert Only Mode

Upsert Mode

Truncate and Load Mode

Sync with Delete Mode

Error Handling & Retry

Row-Level Error Handling

Automatic Retry

Viewing Error Details

Performance Tracking

Execution Metrics

Performance Optimization Tips

Validation

Pre-Execution Validation

Connection Validation

Troubleshooting

Job Not Executing on Schedule

Source Query Fails

Destination Write Fails

Partial Success Status

Performance Issues

Best Practices

Query Design

Scheduling

Error Prevention

Monitoring

Security

Related Documentation