Anomaly Detection
Statistical anomaly detection that learns from historical subscription execution data and alerts on unusual patterns.
Purpose
Anomaly detection in Semantico provides:
- Automatic Baseline Learning: System learns “normal” patterns from historical data
- Statistical Detection: Multiple methods (Z-score, IQR, percentage change) to detect anomalies
- Real-time Alerting: Immediate notifications when anomalies are detected
- False Positive Reduction: Configurable thresholds and minimum data requirements
How It Works
Overview
- Baseline Building: Every subscription execution stores row count in
AnomalyBaselinetable - Historical Analysis: System accumulates data over configured lookback period
- Anomaly Evaluation: When configured minimum data points is reached, detection activates
- Alert Generation: If anomaly detected, creates alert task or sends notification
Baseline Learning
Automatic Data Collection:
- Every time a subscription executes, row count is stored
- No manual configuration required
- Baseline continuously refines as more data is collected
Requirements:
- Minimum data points (default: 30) must be collected before detection activates
- Data older than lookback period (default: 30 days) is excluded
- At least one execution per day recommended for daily detection
Detection Methods
1. Standard Deviation (Z-Score)
Best for: Normally distributed data, general-purpose anomaly detection
How it works:
- Calculate mean (μ) and standard deviation (σ) from historical data
- Compute Z-score:
z = (x - μ) / σ - Flag as anomaly if
|z| > threshold(typically 2-3 standard deviations)
Example:
Historical data: [10, 12, 11, 13, 10, 12, 11]
Mean: 11.29
Std Dev: 1.03
Threshold: 2.5
Current value: 18
Z-score: (18 - 11.29) / 1.03 = 6.51
Result: ANOMALY (6.51 > 2.5)
Configuration:
{
"DetectionMethod": "StandardDeviation",
"Threshold": 2.5, // Number of standard deviations
"LookbackDays": 30,
"MinimumDataPoints": 30
}
Pros:
- Intuitive and well-understood
- Works well for normally distributed data
- Adjustable sensitivity via threshold
Cons:
- Sensitive to outliers in historical data
- Assumes normal distribution
- May have high false positives for skewed data
2. Interquartile Range (IQR)
Best for: Skewed distributions, outlier detection, data with extreme values
How it works:
- Calculate Q1 (25th percentile) and Q3 (75th percentile)
- Calculate IQR:
IQR = Q3 - Q1 - Define bounds: Lower = Q1 - 1.5×IQR, Upper = Q3 + 1.5×IQR
- Flag as anomaly if value < Lower or value > Upper
Example:
Historical data (sorted): [8, 10, 11, 12, 13, 14, 16, 20]
Q1 (25%): 10.5
Q3 (75%): 15
IQR: 15 - 10.5 = 4.5
Lower bound: 10.5 - (1.5 × 4.5) = 3.75
Upper bound: 15 + (1.5 × 4.5) = 21.75
Current value: 35
Result: ANOMALY (35 > 21.75)
Configuration:
{
"DetectionMethod": "IQR",
"Threshold": 1.5, // IQR multiplier
"LookbackDays": 30,
"MinimumDataPoints": 30
}
Pros:
- Robust to outliers
- No assumption of normal distribution
- Good for skewed data
Cons:
- Less sensitive to subtle changes
- Requires more data points for accuracy
- May miss gradual shifts
3. Percentage Change
Best for: Trend monitoring, business metrics, detecting sudden spikes/drops
How it works:
- Get most recent historical value
- Calculate percentage change:
((current - previous) / previous) × 100 - Flag as anomaly if
|percentage_change| > threshold
Example:
Previous value: 100
Current value: 165
Percentage change: ((165 - 100) / 100) × 100 = 65%
Threshold: 50%
Result: ANOMALY (65% > 50%)
Configuration:
{
"DetectionMethod": "PercentageChange",
"Threshold": 50, // Percentage threshold
"LookbackDays": 7, // Recent comparison window
"MinimumDataPoints": 7
}
Pros:
- Intuitive for business users
- Detects sudden changes quickly
- Works well for trending metrics
Cons:
- Sensitive to daily/weekly patterns
- May give false positives during expected spikes
- Doesn’t consider historical variance
Configuration
Subscription-Level Configuration
Each subscription can have its own anomaly detection configuration:
Entity: AnomalyConfig
| Property | Type | Description | Default |
|---|---|---|---|
Enabled |
bool | Enable/disable anomaly detection | false |
DetectionMethod |
enum | StandardDeviation, IQR, PercentageChange | StandardDeviation |
Threshold |
decimal | Sensitivity threshold (meaning varies by method) | 2.5 |
LookbackDays |
int | How many days of history to analyze | 30 |
MinimumDataPoints |
int | Min historical executions before detection activates | 30 |
Example Configurations
Daily Error Monitoring
new AnomalyConfig
{
SubscriptionId = 1,
Enabled = true,
DetectionMethod = AnomalyDetectionMethod.StandardDeviation,
Threshold = 2.5m, // 2.5 std deviations
LookbackDays = 30,
MinimumDataPoints = 30
}
Sales Spike Detection
new AnomalyConfig
{
SubscriptionId = 2,
Enabled = true,
DetectionMethod = AnomalyDetectionMethod.PercentageChange,
Threshold = 50m, // 50% increase/decrease
LookbackDays = 7,
MinimumDataPoints = 7
}
Outlier Detection for Skewed Data
new AnomalyConfig
{
SubscriptionId = 3,
Enabled = true,
DetectionMethod = AnomalyDetectionMethod.IQR,
Threshold = 1.5m, // Standard IQR multiplier
LookbackDays = 60,
MinimumDataPoints = 50
}
Usage Example
Via Service
public class AnomalyDetectionExample
{
private readonly IAnomalyDetectionService _anomalyService;
private readonly IDbContextFactory<SemanticoContext> _contextFactory;
public async Task ConfigureAnomalyDetectionAsync(int subscriptionId)
{
using var context = await _contextFactory.CreateDbContextAsync();
// Create or update configuration
var config = await context.AnomalyConfigs
.FirstOrDefaultAsync(x => x.SubscriptionId == subscriptionId)
?? new AnomalyConfig { SubscriptionId = subscriptionId };
config.Enabled = true;
config.DetectionMethod = AnomalyDetectionMethod.StandardDeviation;
config.Threshold = 2.5m;
config.LookbackDays = 30;
config.MinimumDataPoints = 30;
if (config.Id == 0)
context.AnomalyConfigs.Add(config);
await context.SaveChangesAsync();
}
public async Task EvaluateExecutionAsync(int subscriptionId, int rowCount)
{
// Store baseline
await _anomalyService.StoreBaselineAsync(
subscriptionId,
rowCount,
DateTime.UtcNow);
// Evaluate for anomaly
var result = await _anomalyService.EvaluateAnomalyAsync(
subscriptionId,
rowCount);
if (result.IsAnomaly)
{
Console.WriteLine($"ANOMALY DETECTED!");
Console.WriteLine($"Current: {result.CurrentValue}");
Console.WriteLine($"Mean: {result.Mean}");
Console.WriteLine($"Std Dev: {result.StandardDeviation}");
Console.WriteLine($"Explanation: {result.Explanation}");
// Create alert task or send notification
}
else
{
Console.WriteLine($"Normal execution: {rowCount} rows");
}
}
}
Integration with Subscriptions
Anomaly detection is automatically integrated into subscription execution:
public class SubscriptionExecutionFlow
{
// Simplified flow
public async Task ExecuteAsync(Subscription subscription)
{
// 1. Execute query
var result = await ExecuteQueryAsync(subscription.Query);
int rowCount = result.Rows.Count;
// 2. Store baseline (always, regardless of config)
await _anomalyService.StoreBaselineAsync(
subscription.Id,
rowCount,
DateTime.UtcNow);
// 3. Evaluate for anomaly (only if configured)
var anomalyResult = await _anomalyService.EvaluateAnomalyAsync(
subscription.Id,
rowCount);
// 4. If anomaly detected
if (anomalyResult.IsAnomaly)
{
// Option A: Create alert task
await _taskService.CreateAnomalyTaskAsync(
subscription,
anomalyResult);
// Option B: Send notification
await _notificationService.SendAnomalyAlertAsync(
subscription,
anomalyResult);
}
// 5. Continue with normal notification flow
if (rowCount > 0)
{
await _notificationService.SendAsync(subscription, result);
}
}
}
Evaluation Result
AnomalyEvaluationResult Model
public class AnomalyEvaluationResult
{
public bool IsAnomaly { get; set; }
public decimal CurrentValue { get; set; }
public decimal? Mean { get; set; }
public decimal? StandardDeviation { get; set; }
public decimal? PercentageChange { get; set; }
public string Explanation { get; set; }
public int HistoricalDataPoints { get; set; }
}
Example Results
Normal Execution:
{
"IsAnomaly": false,
"CurrentValue": 12,
"Mean": 11.5,
"StandardDeviation": 1.2,
"Explanation": "Value within expected range (Z-score: 0.42)",
"HistoricalDataPoints": 45
}
Anomaly Detected:
{
"IsAnomaly": true,
"CurrentValue": 85,
"Mean": 12.3,
"StandardDeviation": 2.1,
"Explanation": "Value significantly above historical mean (Z-score: 34.62). This is 34.62 standard deviations above normal.",
"HistoricalDataPoints": 45
}
Insufficient Data:
{
"IsAnomaly": false,
"CurrentValue": 15,
"Explanation": "Not enough historical data (need 30, have 5). Building baseline...",
"HistoricalDataPoints": 5
}
Use Cases
1. Error Rate Monitoring
Scenario: Monitor application error rates
Query:
SELECT COUNT(*) as error_count
FROM logs
WHERE level = 'ERROR'
AND created_at > NOW() - INTERVAL '1 hour';
Configuration:
- Method: StandardDeviation
- Threshold: 3.0 (very sensitive)
- Lookback: 7 days
- Min Data Points: 168 (hourly for 7 days)
Result: Alerts when error count exceeds 3 standard deviations above normal hourly rate
2. Sales Performance Tracking
Scenario: Detect unusual sales patterns (both spikes and drops)
Query:
SELECT COUNT(*) as daily_orders
FROM orders
WHERE DATE(created_at) = CURRENT_DATE;
Configuration:
- Method: PercentageChange
- Threshold: 30% (alert on 30% deviation)
- Lookback: 30 days
- Min Data Points: 30
Result: Alerts when daily orders are 30%+ different from previous day
3. System Resource Monitoring
Scenario: Monitor database connection pool usage
Query:
SELECT COUNT(*) as active_connections
FROM pg_stat_activity
WHERE state = 'active';
Configuration:
- Method: IQR
- Threshold: 1.5 (standard outlier detection)
- Lookback: 90 days
- Min Data Points: 90
Result: Alerts when connection count is an outlier compared to historical distribution
4. Data Quality Monitoring
Scenario: Detect sudden increases in null values
Query:
SELECT COUNT(*) as null_count
FROM products
WHERE description IS NULL OR price IS NULL;
Configuration:
- Method: StandardDeviation
- Threshold: 2.0
- Lookback: 14 days
- Min Data Points: 14
Result: Alerts when null count deviates from normal pattern
Choosing the Right Method
| Scenario | Best Method | Why |
|---|---|---|
| Consistent daily metrics | StandardDeviation | Good for stable patterns |
| Sales, revenue, traffic | PercentageChange | Business users understand percentages |
| Highly variable data | IQR | Robust to outliers |
| Error rates, system metrics | StandardDeviation | Sensitive to subtle changes |
| Detecting spikes/crashes | PercentageChange | Quick reaction to sudden changes |
| Data with seasonal patterns | IQR | Less affected by regular variations |
Tuning Recommendations
Threshold Sensitivity
StandardDeviation:
2.0- Very sensitive (more false positives)2.5- Balanced (recommended start)3.0- Conservative (fewer false positives)
IQR:
1.5- Standard outlier detection2.0- More conservative3.0- Only extreme outliers
PercentageChange:
20%- Very sensitive50%- Balanced100%- Only major changes
Lookback Period
Short (7-14 days):
- Pros: Adapts quickly to recent changes
- Cons: Sensitive to weekly patterns
- Best for: Fast-changing environments
Medium (30 days):
- Pros: Balanced, captures monthly patterns
- Cons: Slower to adapt to new normals
- Best for: General-purpose monitoring
Long (60-90 days):
- Pros: Stable baseline, seasonal awareness
- Cons: May miss gradual shifts
- Best for: Seasonal business metrics
Minimum Data Points
Low (7-14 points):
- Pros: Detects anomalies faster
- Cons: Higher false positive rate
- Best for: New subscriptions, urgent monitoring
Medium (30 points):
- Pros: Good balance of speed and accuracy
- Cons: 30-day wait before detection
- Best for: Most use cases
High (60-90 points):
- Pros: Very stable baseline
- Cons: Long baseline building period
- Best for: Critical systems, low false positive tolerance
Troubleshooting
Issue: “Not enough historical data”
Symptom: Anomaly detection never activates
Causes:
- Subscription hasn’t executed enough times
MinimumDataPointsset too high- Subscription executions failing
Solutions:
- Wait for more executions to accumulate
- Lower
MinimumDataPoints(carefully) - Check subscription execution history for failures
Issue: Too many false positives
Symptom: Anomaly alerts on normal variations
Causes:
- Threshold too sensitive
- Data has natural high variance
- Wrong detection method for data pattern
Solutions:
- Increase threshold (e.g., 2.5 → 3.0 for Z-score)
- Try IQR method (more robust)
- Increase lookback period for more context
Issue: Missing real anomalies
Symptom: Known anomalies not detected
Causes:
- Threshold too conservative
- Insufficient historical data
- Anomaly within “normal” range after drift
Solutions:
- Decrease threshold
- Ensure adequate baseline data
- Consider PercentageChange for sudden spikes
Database Schema
AnomalyConfig Entity
CREATE TABLE anomaly_configs (
id SERIAL PRIMARY KEY,
subscription_id INT NOT NULL REFERENCES subscriptions(id),
enabled BOOLEAN NOT NULL DEFAULT false,
detection_method VARCHAR(50) NOT NULL,
threshold DECIMAL(10,2) NOT NULL,
lookback_days INT NOT NULL DEFAULT 30,
minimum_data_points INT NOT NULL DEFAULT 30,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
AnomalyBaseline Entity
CREATE TABLE anomaly_baselines (
id SERIAL PRIMARY KEY,
subscription_id INT NOT NULL REFERENCES subscriptions(id),
execution_time TIMESTAMP NOT NULL,
metric_value DECIMAL(18,2) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_anomaly_baselines_subscription_time
ON anomaly_baselines(subscription_id, execution_time DESC);
API Reference
IAnomalyDetectionService
public interface IAnomalyDetectionService
{
/// <summary>
/// Evaluates current execution for anomalies based on historical baseline
/// </summary>
Task<AnomalyEvaluationResult> EvaluateAnomalyAsync(
int subscriptionId,
int rowCount,
CancellationToken cancellationToken = default);
/// <summary>
/// Stores execution result for future baseline calculations
/// </summary>
Task StoreBaselineAsync(
int subscriptionId,
decimal metricValue,
DateTime executionTime,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets historical metrics for analysis
/// </summary>
Task<List<decimal>> GetHistoricalMetricsAsync(
int subscriptionId,
int lookbackDays,
CancellationToken cancellationToken = default);
}
Related Features
- Subscriptions - Configure scheduled queries with anomaly detection
- Tasks - Alert tasks created when anomalies are detected
- AI Integration - Complement statistical detection with AI-powered insights