AI Integration (Experimental)
⚠️ Experimental Feature
AI-powered features are experimental and may produce incorrect, incomplete, or misleading results. Large language models can hallucinate facts, miss important details, or generate invalid SQL. Always review and validate all AI-generated content before using in production environments.
AI-powered features that leverage large language models (LLMs) to automate documentation generation and simplify alert creation.
Purpose
AI integration in Semantico provides:
- Automatic Documentation: Generate comprehensive data source documentation by analyzing schemas
- Natural Language Alerts: Create complex SQL queries from plain English descriptions
- Smart Insights: AI-powered data analysis and recommendations
- Multiple Export Formats: Export documentation as Markdown, HTML with ERD diagrams, or PDF
Supported LLM Providers
Semantico supports multiple LLM providers through a pluggable architecture:
| Provider | Recommended Models | Configuration |
|---|---|---|
| OpenAI | gpt-4o (recommended), gpt-4 | API key + model name |
| Anthropic Claude | claude-3-5-sonnet-20241022 (recommended) | API key + model name |
| Azure OpenAI | Latest models only | Custom base URL + deployment name |
Model Recommendations: Use the latest models only (gpt-4o, claude-3-5-sonnet). Older models like gpt-3.5-turbo and claude-3-opus are not recommended and may produce lower quality results.
Configuration
Prerequisites
AI features require:
- LLM provider API key
- Appropriate model access
- Network connectivity to provider APIs
Runtime Configuration via Admin Settings (Recommended)
The easiest way to configure AI is through the Admin Settings UI at runtime - no restart required:
- Navigate to Admin Settings > AI Configuration
- Select your provider (OpenAI, Anthropic, Azure OpenAI, Bedrock)
- Enter your API key and model name
- Configure rate limits and budget
- Click Save
Changes take effect immediately. You can switch providers at runtime without restarting.
Admin Settings requires the User Management system to be enabled with an Admin user. See the Admin Settings Guide for details.
appsettings.json Configuration
{
"Semantico": {
"LLM": {
"Provider": "OpenAI",
"ApiKey": "sk-your-api-key-here",
"Model": "gpt-4o",
"BaseUrl": "https://api.openai.com/v1",
"Limits": {
"MaxConcurrentRequests": 5,
"RequestsPerMinute": 60,
"MaxTokensPerRequest": 4000
}
}
}
}
Configuration Options
| Option | Description | Default | Required |
|---|---|---|---|
Provider |
LLM provider (OpenAI, Anthropic, AzureOpenAI) | - | Yes |
ApiKey |
API key for authentication | - | Yes |
Model |
Model name or deployment ID | - | Yes |
BaseUrl |
API endpoint URL | Provider default | No |
Limits.MaxConcurrentRequests |
Max parallel requests | 5 | No |
Limits.RequestsPerMinute |
Rate limit per minute | 60 | No |
Limits.MaxTokensPerRequest |
Max tokens per request | 4000 | No |
AI Documentation Generation
Overview
Automatically generate comprehensive documentation by analyzing database schemas with AI.
How It Works
- Schema Analysis: Semantico fetches schema metadata (tables, columns, data types, constraints, relationships)
- Sample Data Collection: Retrieves first 10 rows from each table for context
- AI Processing: Sends schema structure and samples to LLM with specialized prompts
- Documentation Generation: AI generates table descriptions, column explanations, relationships, and insights
- Storage: Documentation is stored with versioning and edit history
Generated Content
AI-generated documentation includes:
- Table Descriptions: Purpose and usage of each table
- Column Details: Meaning, data type explanation, and usage patterns
- Relationships: Foreign key relationships and join patterns
- Data Quality Observations: Potential issues, null patterns, constraint violations
- Business Context: Inferred business meaning from names and data
Review Required: AI-generated descriptions may be inaccurate or incomplete. Always validate documentation against actual schema and business requirements before relying on it.
Export Formats
Markdown Export
Clean markdown format compatible with GitHub, GitLab, and documentation platforms.
# Database Documentation
## Users Table
**Purpose**: Stores user account information
### Columns
- `id` (INT): Primary key, auto-increment
- `email` (VARCHAR): User email address, unique
- `created_at` (TIMESTAMP): Account creation timestamp
HTML Export with ERD Diagrams
Interactive HTML with:
- Collapsible sections for each table
- Table of contents with anchor navigation
- Embedded Mermaid ERD diagrams showing relationships
- Professional styling
<!DOCTYPE html>
<html>
<head>
<title>Data Source Documentation</title>
<script src="mermaid.min.js"></script>
</head>
<body>
<h1>Database Schema</h1>
<div class="mermaid">
erDiagram
USERS ||--o{ ORDERS : places
USERS {
int id PK
string email
timestamp created_at
}
</div>
</body>
</html>
PDF Export
Professional PDF document with:
- Table of contents
- Schema diagrams
- Formatted tables
- Page numbers and headers
Usage Example
Via UI:
- Navigate to Data Sources
- Select a data source
- Click “Generate Documentation”
- Configure options (tables to include, export format)
- Review and export
Via Service:
public class DocumentationExample
{
private readonly IAiDocumentationService _docService;
public async Task GenerateAsync(int dataSourceId)
{
var options = new GenerationOptions
{
Title = "Production Database Documentation",
TableFilter = null, // All tables
IncludeSampleData = true,
Temperature = 0.3m, // Lower = more focused
MaxTokens = 4000
};
var documentation = await _docService.GenerateDocumentationAsync(
dataSourceId,
userId: 1,
options,
CancellationToken.None);
// Export as HTML
var html = await _docService.ExportAsHtmlAsync(documentation.Id);
// Export as PDF
var pdf = await _docService.ExportAsPdfAsync(documentation.Id);
}
}
AI Alert Generation
Overview
Create sophisticated SQL queries from natural language descriptions without writing SQL.
How It Works
- Natural Language Input: User describes alert in plain English
- Schema Context: System provides data source schema to AI
- Query Generation: AI generates appropriate SQL query
- Validation: Generated SQL is validated for syntax correctness
- Review: User reviews natural language description + generated SQL
- Integration: Approved query integrates with existing Subscription system
Example Transformations
Validation Required: AI-generated SQL queries must be reviewed and tested before use. Queries may contain errors, inefficiencies, or may not accurately reflect the natural language intent.
Example 1: Threshold Alert
Natural Language:
“Alert me when error rates exceed 5% of total requests in the last hour”
Generated SQL:
SELECT
(COUNT(CASE WHEN status_code >= 500 THEN 1 END) * 100.0 / COUNT(*)) AS error_rate
FROM requests
WHERE created_at > NOW() - INTERVAL '1 hour'
HAVING (COUNT(CASE WHEN status_code >= 500 THEN 1 END) * 100.0 / COUNT(*)) > 5;
Example 2: Percentage Change Alert
Natural Language:
“Notify me when today’s sales drop more than 20% compared to yesterday”
Generated SQL:
WITH today AS (
SELECT SUM(amount) as total
FROM sales
WHERE DATE(created_at) = CURRENT_DATE
),
yesterday AS (
SELECT SUM(amount) as total
FROM sales
WHERE DATE(created_at) = CURRENT_DATE - INTERVAL '1 day'
)
SELECT
today.total as today_sales,
yesterday.total as yesterday_sales,
((today.total - yesterday.total) / yesterday.total * 100) as pct_change
FROM today, yesterday
WHERE ((today.total - yesterday.total) / yesterday.total * 100) < -20;
Example 3: Orphaned Records Alert
Natural Language:
“Find orders that don’t have a matching customer”
Generated SQL:
SELECT o.id, o.customer_id, o.created_at
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL;
Usage Example
Via Service:
public class AlertGenerationExample
{
private readonly IAiAlertGenerationService _alertService;
public async Task CreateAlertAsync(int dataSourceId)
{
var request = new AlertGenerationRequest
{
DataSourceId = dataSourceId,
Description = "Alert when error rates exceed 5% in the last hour"
};
var result = await _alertService.GenerateAlertAsync(
request,
CancellationToken.None);
Console.WriteLine($"Generated SQL: {result.GeneratedQuery}");
Console.WriteLine($"Explanation: {result.Explanation}");
// User reviews and approves...
// Then creates subscription with generated query
}
}
Cost and Usage Tracking
Token Usage
Each AI request consumes tokens based on:
- Schema size (number of tables/columns)
- Sample data included
- Response length
Typical token usage:
- Documentation generation (20 tables): ~3,000-5,000 tokens
- Alert generation: ~500-1,500 tokens
Cost Estimation
Costs vary by provider and model:
| Provider | Model | Cost per 1M Tokens (Input/Output) |
|---|---|---|
| OpenAI | gpt-4o | $5.00 / $15.00 |
| OpenAI | gpt-3.5-turbo | $0.50 / $1.50 |
| Anthropic | claude-3-5-sonnet | $3.00 / $15.00 |
Example calculation:
- Documentation generation: 3,500 input tokens, 1,500 output tokens
- Using gpt-4o: ($5 × 3.5k/1M) + ($15 × 1.5k/1M) = $0.04 per generation
Tracking
Usage metrics stored per request:
TokensUsed: Total tokens consumedEstimatedCost: Calculated cost based on provider pricingGeneratedByModel: Model used for generationGeneratedAt: Timestamp
Query usage history:
var documentations = await context.DataSourceDocumentations
.Where(d => d.DataSourceId == dataSourceId)
.Select(d => new {
d.TokensUsed,
d.EstimatedCost,
d.GeneratedAt
})
.ToListAsync();
var totalCost = documentations.Sum(d => d.EstimatedCost);
Rate Limiting
Configuration
Rate limits prevent abuse and control costs:
{
"Semantico": {
"LLM": {
"Limits": {
"MaxConcurrentRequests": 5,
"RequestsPerMinute": 60,
"MaxTokensPerRequest": 4000
}
}
}
}
Implementation
Request Queue (LlmRequestQueue):
- Enforces max concurrent requests
- Queues requests when limit reached
- Processes requests FIFO
Provider Rate Limits:
- OpenAI: Tier-based limits (tier 1: 500 RPM, tier 2: 5000 RPM)
- Anthropic: 50 requests/minute (free tier), higher for paid
- Azure: Configurable per deployment
Handling Rate Limit Errors
try
{
var result = await _llmProvider.CompleteAsync(request, cancellationToken);
}
catch (RateLimitException ex)
{
// Wait and retry
await Task.Delay(ex.RetryAfterSeconds * 1000);
// Retry logic...
}
Security Considerations
API Key Security
⚠️ Never commit API keys to source control
Best practices:
- Store API keys in environment variables
- Use secrets management (Azure Key Vault, AWS Secrets Manager)
- Rotate keys regularly
- Use different keys for dev/staging/production
{
"Semantico": {
"LLM": {
"ApiKey": "${LLM_API_KEY}" // Environment variable
}
}
}
Data Privacy
AI providers receive:
- Database schema structure (table/column names)
- Sample data (first 10 rows)
- Natural language descriptions
Recommendations:
- Exclude sensitive tables from AI analysis
- Filter sensitive columns before sending to AI
- Use de-identified sample data when possible
- Review provider data retention policies
- Consider self-hosted LLM for sensitive data
Configuration Example with Filtering
var options = new GenerationOptions
{
TableFilter = t => !t.Name.Contains("sensitive"),
ExcludeColumns = new[] { "password", "ssn", "credit_card" }
};
Troubleshooting
Common Issues
Issue: “LLM configuration not found”
Solution: Ensure Semantico:LLM section exists in appsettings.json
Issue: “Invalid API key” Solution: Verify API key is correct and has appropriate permissions
Issue: Rate limit exceeded
Solution: Reduce RequestsPerMinute or increase provider tier
Issue: Generated SQL is invalid Solution: AI is probabilistic and experimental - always review and test queries before use. Manual adjustment may be required.
Debugging
Enable verbose logging:
{
"Logging": {
"LogLevel": {
"Semantico.Core.Services.Ai": "Debug",
"Semantico.Core.Services.LlmProviders": "Debug"
}
}
}
Architecture
Service Layer
IAiDocumentationService
├── GenerateDocumentationAsync()
├── ExportAsMarkdownAsync()
├── ExportAsHtmlAsync()
└── ExportAsPdfAsync()
IAiAlertGenerationService
├── GenerateAlertAsync()
└── RefineAlertAsync()
ILlmProvider (interface)
├── DelegatingLlmProvider (proxy - injected everywhere)
│ └── Delegates to LlmProviderManager.CurrentProvider
├── OpenAiProvider
├── AnthropicProvider
├── AzureOpenAiProvider
└── BedrockProvider
LlmProviderFactory
└── CreateProvider() -> ILlmProvider
LlmProviderManager (implements ILlmConfigurationUpdater)
├── CurrentProvider (hot-swappable)
└── UpdateConfiguration() (called by Admin Settings)
Hot-Swap Flow
All consumers inject ILlmProvider, which is registered as DelegatingLlmProvider. This proxy delegates to the current provider held by LlmProviderManager. When Admin Settings saves a new LLM configuration, LlmProviderManager recreates the provider and all subsequent requests use the new provider automatically.
Data Model
DataSourceDocumentation
├── Id
├── DataSourceId
├── Title
├── GeneratedByModel
├── GeneratedAt
├── TokensUsed
├── EstimatedCost
└── Sections (List<DocumentationSection>)
DocumentationSection
├── Id
├── DocumentationId
├── SectionType (Table, Column, Relationship, etc.)
├── Content (markdown)
└── IsAiGenerated
AiAlertConfiguration
├── Id
├── DataSourceId
├── NaturalLanguageDescription
├── GeneratedQuery
├── Explanation
└── CreatedAt
Related Features
- Admin Settings - Configure and hot-swap LLM providers at runtime
- Data Sources - Configure data sources for AI analysis
- Subscriptions - Use AI-generated queries in subscriptions
- Anomaly Detection - Statistical anomaly detection complement to AI