AI isn't just for chatbots. Here's how we're using AI tools to improve infrastructure management and cloud optimisation.
Log Analysis with LLMs
Instead of writing complex KQL queries, describe what you're looking for:
Azure OpenAI + Log Analytics
import openai
from azure.monitor.query import LogsQueryClient
# Use LLM to generate KQL from natural language
def generate_kql(user_query):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": """You are a KQL expert.
Convert natural language queries to KQL for Azure Log Analytics.
Return only the KQL query, no explanation."""},
{"role": "user", "content": user_query}
]
)
return response.choices[0].message.content
# Example usage
kql = generate_kql("Show me all failed logins in the last 24 hours grouped by user")
# Returns: SigninLogs | where TimeGenerated > ago(24h) | where ResultType != 0 | summarize count() by UserPrincipalName
Automated Anomaly Detection
# Use Azure ML for time series anomaly detection
from azure.ai.anomalydetector import AnomalyDetectorClient
client = AnomalyDetectorClient(endpoint, credential)
# Detect unusual cost patterns
cost_data = get_daily_costs() # Your cost data
request = DetectRequest(
series=[TimeSeriesPoint(timestamp=d["date"], value=d["cost"]) for d in cost_data],
granularity="daily",
sensitivity=95
)
response = client.detect_last_point(request)
if response.is_anomaly:
send_alert(f"Unusual spend detected: {response.expected_value} vs {cost_data[-1]['cost']}")
Intelligent Cost Prediction
Predict next month's Azure costs:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model
# Train model on historical cost data
def train_cost_predictor():
# Features: resource count, previous costs, day of week, etc.
features = extract_cost_features()
# Use Azure AutoML for best model selection
automl_config = AutoMLConfig(
task="forecasting",
primary_metric="normalized_root_mean_squared_error",
training_data=features,
label_column_name="cost",
time_column_name="date",
forecast_horizon=30
)
run = experiment.submit(automl_config)
return run.get_output()
# Predict and alert if significantly higher
predicted = model.predict(current_features)
if predicted > current_budget * 1.2:
alert(f"Predicted cost £{predicted} exceeds budget by 20%")
Automated Remediation
Use AI to suggest and implement fixes:
# Analyze Azure Advisor recommendations with LLM
def analyze_recommendations():
advisor = AdvisorManagementClient(credential, subscription_id)
recommendations = list(advisor.recommendations.list())
# Use LLM to prioritize and explain
analysis = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": """You are a cloud architect.
Analyze these Azure Advisor recommendations and:
1. Rank by impact and ease of implementation
2. Identify any that conflict with each other
3. Suggest an implementation order"""},
{"role": "user", "content": json.dumps([r.as_dict() for r in recommendations])}
]
)
return analysis.choices[0].message.content
Intelligent Right-Sizing
Beyond simple thresholds, use ML for better VM recommendations:
# Collect comprehensive metrics
metrics = {
"cpu_avg": query_metrics("Percentage CPU"),
"cpu_p95": query_metrics("Percentage CPU", aggregation="P95"),
"memory_avg": query_metrics("Available Memory Bytes"),
"network_in": query_metrics("Network In Total"),
"disk_ops": query_metrics("Disk Operations/Sec")
}
# Use LLM to analyze patterns
analysis = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": """You are a cloud optimization expert.
Analyze these VM metrics and recommend the optimal Azure VM size.
Consider:
- Workload patterns (steady vs bursty)
- Memory vs compute bound
- Network requirements
- Cost optimization"""},
{"role": "user", "content": f"Current VM: {vm_size}\nMetrics: {json.dumps(metrics)}"}
]
)
Infrastructure as Code Generation
Generate Terraform from descriptions:
def generate_terraform(description):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": """You are a Terraform expert for Azure.
Generate production-ready Terraform code following best practices:
- Use variables for configurable values
- Include resource naming conventions
- Add appropriate tags
- Consider security defaults"""},
{"role": "user", "content": description}
]
)
return response.choices[0].message.content
# Example
terraform = generate_terraform("""
Create an Azure web application:
- Linux App Service Plan (Standard S1)
- Python web app with Application Insights
- Azure SQL Database (Standard S0)
- Storage account for static files
All resources in UK South, tagged with environment=production
""")
ChatOps for Infrastructure
Integrate AI into your chat platform:
# Slack bot for infrastructure queries
@app.message("what's my azure spend")
def handle_spend_query(message, say):
# Get costs
costs = get_current_costs()
# Generate natural language summary
summary = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Summarize Azure costs in a friendly, concise way"},
{"role": "user", "content": json.dumps(costs)}
]
)
say(summary.choices[0].message.content)
Practical Applications
| Use Case | Benefit |
|---|---|
| Log analysis | Faster troubleshooting |
| Cost prediction | Budget planning |
| Right-sizing | Reduced waste |
| IaC generation | Faster deployments |
| Anomaly detection | Proactive alerts |
Getting Started
- Azure OpenAI for LLM capabilities
- Azure Machine Learning for custom models
- Logic Apps for orchestration
- Start small - one use case at a time
Don't try to automate everything. Pick high-value, repetitive tasks first.
Need help implementing AI-driven infrastructure optimization? Get in touch - we help organisations leverage AI for smarter cloud management.