Traditional VM-based monitoring doesn't translate well to containers. Here's how to implement monitoring that scales with your containerised workloads.
The Container Monitoring Challenge
VMs are persistent. Containers are ephemeral. Traditional monitoring approaches fail because:
- Containers come and go constantly
- IP addresses change with each deployment
- Log files disappear when containers restart
- Agent-per-node doesn't work the same way
Azure Monitor for Containers
Microsoft's native solution for AKS:
resource "azurerm_kubernetes_cluster" "this" {
name = "aks-production"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
dns_prefix = "aks-production"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_D2s_v3"
}
identity {
type = "SystemAssigned"
}
oms_agent {
log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id
}
}
What you get:
- Container-level CPU/memory metrics
- Pod and node health
- Container logs in Log Analytics
- Pre-built workbooks
Querying Container Logs
// Container errors in the last hour
ContainerLog
| where TimeGenerated > ago(1h)
| where LogEntry contains "error" or LogEntry contains "exception"
| project TimeGenerated, ContainerID, LogEntry
| order by TimeGenerated desc
// Pod restarts
KubePodInventory
| where TimeGenerated > ago(24h)
| where PodRestartCount > 0
| summarize RestartCount = max(PodRestartCount) by PodName, Namespace
| order by RestartCount desc
// High memory pods
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "memoryWorkingSetBytes"
| summarize AvgMemory = avg(CounterValue) by InstanceName
| where AvgMemory > 500000000 // 500MB
| order by AvgMemory desc
Prometheus Integration
For cloud-native metrics, use Azure Monitor managed Prometheus:
resource "azurerm_monitor_workspace" "this" {
name = "prometheus-workspace"
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
}
resource "azurerm_monitor_data_collection_rule" "prometheus" {
name = "dcr-prometheus"
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
destinations {
monitor_account {
monitor_account_id = azurerm_monitor_workspace.this.id
name = "prometheus-destination"
}
}
data_flow {
streams = ["Microsoft-PrometheusMetrics"]
destinations = ["prometheus-destination"]
}
}
Prometheus Queries
# Container CPU usage
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)
# Memory usage percentage
sum(container_memory_working_set_bytes{namespace="production"}) by (pod)
/ sum(container_spec_memory_limit_bytes{namespace="production"}) by (pod)
* 100
# Request rate
sum(rate(http_requests_total{namespace="production"}[5m])) by (service)
Application-Level Monitoring
OpenTelemetry in Containers
# Python app with OpenTelemetry
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Configure exporter
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(endpoint="otel-collector:4317")
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(otlp_exporter)
)
tracer = trace.get_tracer(__name__)
@app.route("/api/orders")
def get_orders():
with tracer.start_as_current_span("get_orders"):
# Your code here
pass
Application Insights for Containers
# Kubernetes deployment with App Insights
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
image: myapp:latest
env:
- name: APPLICATIONINSIGHTS_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: appinsights
key: connection-string
Health Probes and Liveness
Configure proper health checks:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Alerting on Container Issues
resource "azurerm_monitor_metric_alert" "container_restarts" {
name = "alert-container-restarts"
resource_group_name = azurerm_resource_group.this.name
scopes = [azurerm_kubernetes_cluster.this.id]
description = "Alert when containers restart frequently"
criteria {
metric_namespace = "insights.container/pods"
metric_name = "restartingContainerCount"
aggregation = "Average"
operator = "GreaterThan"
threshold = 0
}
action {
action_group_id = azurerm_monitor_action_group.ops.id
}
}
Container Apps Monitoring
For Azure Container Apps:
resource "azurerm_container_app" "api" {
# ... other config ...
template {
container {
name = "api"
image = "myapp:latest"
# Built-in logging
env {
name = "ASPNETCORE_LOGGING__CONSOLE__FORMATTERTYPE"
value = "json" # JSON logs for better parsing
}
}
}
}
Query Container Apps logs:
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "api"
| where Log_s contains "error"
| project TimeGenerated, Log_s
Need help implementing container monitoring? Get in touch - we help organisations build observable container platforms.