Back to Blog
Azure
4 min read

Using AI to Optimise Cloud Infrastructure

AIInfrastructureAutomationAzureDevOps

AI isn't just for chatbots. Here's how we're using AI tools to improve infrastructure management and cloud optimisation.

Log Analysis with LLMs

Instead of writing complex KQL queries, describe what you're looking for:

Azure OpenAI + Log Analytics

import openai
from azure.monitor.query import LogsQueryClient

# Use LLM to generate KQL from natural language
def generate_kql(user_query):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": """You are a KQL expert.
            Convert natural language queries to KQL for Azure Log Analytics.
            Return only the KQL query, no explanation."""},
            {"role": "user", "content": user_query}
        ]
    )
    return response.choices[0].message.content

# Example usage
kql = generate_kql("Show me all failed logins in the last 24 hours grouped by user")
# Returns: SigninLogs | where TimeGenerated > ago(24h) | where ResultType != 0 | summarize count() by UserPrincipalName

Automated Anomaly Detection

# Use Azure ML for time series anomaly detection
from azure.ai.anomalydetector import AnomalyDetectorClient

client = AnomalyDetectorClient(endpoint, credential)

# Detect unusual cost patterns
cost_data = get_daily_costs()  # Your cost data
request = DetectRequest(
    series=[TimeSeriesPoint(timestamp=d["date"], value=d["cost"]) for d in cost_data],
    granularity="daily",
    sensitivity=95
)

response = client.detect_last_point(request)
if response.is_anomaly:
    send_alert(f"Unusual spend detected: {response.expected_value} vs {cost_data[-1]['cost']}")

Intelligent Cost Prediction

Predict next month's Azure costs:

from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model

# Train model on historical cost data
def train_cost_predictor():
    # Features: resource count, previous costs, day of week, etc.
    features = extract_cost_features()

    # Use Azure AutoML for best model selection
    automl_config = AutoMLConfig(
        task="forecasting",
        primary_metric="normalized_root_mean_squared_error",
        training_data=features,
        label_column_name="cost",
        time_column_name="date",
        forecast_horizon=30
    )

    run = experiment.submit(automl_config)
    return run.get_output()

# Predict and alert if significantly higher
predicted = model.predict(current_features)
if predicted > current_budget * 1.2:
    alert(f"Predicted cost £{predicted} exceeds budget by 20%")

Automated Remediation

Use AI to suggest and implement fixes:

# Analyze Azure Advisor recommendations with LLM
def analyze_recommendations():
    advisor = AdvisorManagementClient(credential, subscription_id)
    recommendations = list(advisor.recommendations.list())

    # Use LLM to prioritize and explain
    analysis = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": """You are a cloud architect.
            Analyze these Azure Advisor recommendations and:
            1. Rank by impact and ease of implementation
            2. Identify any that conflict with each other
            3. Suggest an implementation order"""},
            {"role": "user", "content": json.dumps([r.as_dict() for r in recommendations])}
        ]
    )

    return analysis.choices[0].message.content

Intelligent Right-Sizing

Beyond simple thresholds, use ML for better VM recommendations:

# Collect comprehensive metrics
metrics = {
    "cpu_avg": query_metrics("Percentage CPU"),
    "cpu_p95": query_metrics("Percentage CPU", aggregation="P95"),
    "memory_avg": query_metrics("Available Memory Bytes"),
    "network_in": query_metrics("Network In Total"),
    "disk_ops": query_metrics("Disk Operations/Sec")
}

# Use LLM to analyze patterns
analysis = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": """You are a cloud optimization expert.
        Analyze these VM metrics and recommend the optimal Azure VM size.
        Consider:
        - Workload patterns (steady vs bursty)
        - Memory vs compute bound
        - Network requirements
        - Cost optimization"""},
        {"role": "user", "content": f"Current VM: {vm_size}\nMetrics: {json.dumps(metrics)}"}
    ]
)

Infrastructure as Code Generation

Generate Terraform from descriptions:

def generate_terraform(description):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": """You are a Terraform expert for Azure.
            Generate production-ready Terraform code following best practices:
            - Use variables for configurable values
            - Include resource naming conventions
            - Add appropriate tags
            - Consider security defaults"""},
            {"role": "user", "content": description}
        ]
    )
    return response.choices[0].message.content

# Example
terraform = generate_terraform("""
Create an Azure web application:
- Linux App Service Plan (Standard S1)
- Python web app with Application Insights
- Azure SQL Database (Standard S0)
- Storage account for static files
All resources in UK South, tagged with environment=production
""")

ChatOps for Infrastructure

Integrate AI into your chat platform:

# Slack bot for infrastructure queries
@app.message("what's my azure spend")
def handle_spend_query(message, say):
    # Get costs
    costs = get_current_costs()

    # Generate natural language summary
    summary = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Summarize Azure costs in a friendly, concise way"},
            {"role": "user", "content": json.dumps(costs)}
        ]
    )

    say(summary.choices[0].message.content)

Practical Applications

Use CaseBenefit
Log analysisFaster troubleshooting
Cost predictionBudget planning
Right-sizingReduced waste
IaC generationFaster deployments
Anomaly detectionProactive alerts

Getting Started

  1. Azure OpenAI for LLM capabilities
  2. Azure Machine Learning for custom models
  3. Logic Apps for orchestration
  4. Start small - one use case at a time

Don't try to automate everything. Pick high-value, repetitive tasks first.


Need help implementing AI-driven infrastructure optimization? Get in touch - we help organisations leverage AI for smarter cloud management.

Need help with your Azure environment?

Get in touch for a free consultation.

Get in Touch