Infrastructure as Code: Best Practices for 2024

Building reliable and maintainable infrastructure
Jesús Pérez·
infrastructureiacdevopsterraform

Infrastructure as Code: Best Practices for 2024

Building reliable and maintainable infrastructure

Introduction

Infrastructure as Code (IaC) has become the cornerstone of modern DevOps practices. By treating infrastructure configuration as code, teams can achieve better reliability, consistency, and maintainability in their deployments. This guide explores the latest best practices and tools for implementing IaC in 2024.

Why Infrastructure as Code Matters

Traditional Infrastructure Challenges

  • Manual Configuration Drift: Systems diverge from their intended state over time
  • Lack of Reproducibility: Difficulty recreating environments consistently
  • No Version Control: Infrastructure changes aren’t tracked or reviewed
  • Poor Documentation: Tribal knowledge and outdated documentation
  • Scaling Difficulties: Manual processes don’t scale with growth

IaC Benefits

  • Consistency: Identical environments across development, staging, and production
  • Version Control: Track all infrastructure changes with Git
  • Automation: Reduce manual errors and deployment time
  • Cost Management: Better resource optimization and cleanup
  • Disaster Recovery: Quick environment recreation from code

Core Principles of IaC

1. Declarative Configuration

Define the desired end state rather than the steps to achieve it:

# Terraform example - Declarative
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1d0"
  instance_type = "t3.micro"
  
  tags = {
    Name = "web-server"
    Environment = "production"
  }
}

2. Idempotency

Operations should produce the same result regardless of how many times they’re executed:

# Ansible example - Idempotent
- name: Ensure nginx is installed
  package:
    name: nginx
    state: present

3. Immutable Infrastructure

Replace rather than modify existing infrastructure:

# Build new images instead of modifying running containers
FROM nginx:alpine
COPY ./config/nginx.conf /etc/nginx/nginx.conf
COPY ./static /usr/share/nginx/html

Modern IaC Tools and Technologies

Terraform: The Industry Standard

Terraform has established itself as the de facto standard for infrastructure provisioning:

Key Features

  • Multi-cloud support: AWS, Azure, GCP, and 100+ providers
  • State management: Tracks infrastructure state and changes
  • Plan and apply workflow: Preview changes before execution
  • Module system: Reusable infrastructure components

Terraform Best Practices

# Use variables for configuration
variable "environment" {
  description = "Environment name"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

# Use data sources for existing resources
data "aws_availability_zones" "available" {
  state = "available"
}

# Create reusable modules
module "vpc" {
  source = "./modules/vpc"
  
  cidr_block           = var.vpc_cidr
  availability_zones   = data.aws_availability_zones.available.names
  environment         = var.environment
}

Pulumi: Modern IaC with Real Programming Languages

import * as aws from "@pulumi/aws";

// Create VPC with TypeScript
const vpc = new aws.ec2.Vpc("main", {
    cidrBlock: "10.0.0.0/16",
    enableDnsHostnames: true,
    enableDnsSupport: true,
    tags: {
        Name: `${environment}-vpc`,
    },
});

// Type-safe configuration
interface DatabaseConfig {
    instanceClass: string;
    allocatedStorage: number;
    engine: string;
}

const dbConfig: DatabaseConfig = {
    instanceClass: "db.t3.micro",
    allocatedStorage: 20,
    engine: "postgres",
};

AWS CDK: Cloud-Native IaC

from aws_cdk import (
    Stack,
    aws_ec2 as ec2,
    aws_ecs as ecs,
    aws_ecs_patterns as ecs_patterns,
)

class WebServiceStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # Create VPC
        vpc = ec2.Vpc(self, "VPC", max_azs=2)

        # Create ECS cluster
        cluster = ecs.Cluster(self, "Cluster", vpc=vpc)

        # Create Fargate service
        ecs_patterns.ApplicationLoadBalancedFargateService(
            self, "Service",
            cluster=cluster,
            task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
                image=ecs.ContainerImage.from_registry("nginx"),
                container_port=80,
            ),
            public_load_balancer=True,
        )

State Management Best Practices

Remote State Storage

Never store Terraform state locally in production:

# terraform/backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

State Locking

Prevent concurrent modifications:

resource "aws_dynamodb_table" "terraform_locks" {
  name           = "terraform-locks"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Security Best Practices

Secrets Management

Never hardcode secrets in IaC:

# Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "rds-password"
}

resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
  # ... other configuration
}

Least Privilege Access

# IAM policy with minimal permissions
resource "aws_iam_role_policy" "lambda_policy" {
  name = "lambda-policy"
  role = aws_iam_role.lambda_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"
      }
    ]
  })
}

Network Security

# Security group with minimal access
resource "aws_security_group" "web" {
  name_prefix = "web-"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "web-security-group"
  }
}

Testing Infrastructure Code

Terratest for Terraform

package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestTerraformWebModule(t *testing.T) {
    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/web",
        Vars: map[string]interface{}{
            "environment": "test",
        },
    }

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify outputs
    instanceId := terraform.Output(t, terraformOptions, "instance_id")
    assert.NotEmpty(t, instanceId)
}

Kitchen-Terraform

# .kitchen.yml
driver:
  name: terraform

provisioner:
  name: terraform

verifier:
  name: terraform
  color: true

platforms:
  - name: aws

suites:
  - name: default
    driver:
      variables:
        environment: test
    verifier:
      controls:
        - instance_created
        - security_group_configured

Environment Management

Multi-Environment Strategy

environments/
├── dev/
│   ├── main.tf
│   ├── variables.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   ├── variables.tf
│   └── terraform.tfvars
└── prod/
    ├── main.tf
    ├── variables.tf
    └── terraform.tfvars

modules/
├── networking/
├── compute/
└── database/

Environment-Specific Configuration

# environments/prod/terraform.tfvars
environment = "prod"
instance_type = "t3.large"
min_size = 3
max_size = 10
enable_backup = true

# environments/dev/terraform.tfvars
environment = "dev"
instance_type = "t3.micro"
min_size = 1
max_size = 2
enable_backup = false

CI/CD Integration

GitLab CI Pipeline

# .gitlab-ci.yml
stages:
  - validate
  - plan
  - deploy

variables:
  TF_ROOT: ${CI_PROJECT_DIR}/terraform

before_script:
  - terraform --version
  - terraform init

validate:
  stage: validate
  script:
    - terraform validate
    - terraform fmt -check

plan:
  stage: plan
  script:
    - terraform plan -out=plan.tfplan
  artifacts:
    paths:
      - plan.tfplan
  except:
    - main

deploy:
  stage: deploy
  script:
    - terraform apply plan.tfplan
  when: manual
  only:
    - main

GitHub Actions

# .github/workflows/terraform.yml
name: Terraform

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  terraform:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: 1.6.0
        
    - name: Terraform Init
      run: terraform init
      
    - name: Terraform Validate
      run: terraform validate
      
    - name: Terraform Plan
      run: terraform plan -no-color
      if: github.event_name == 'pull_request'
      
    - name: Terraform Apply
      run: terraform apply -auto-approve
      if: github.ref == 'refs/heads/main'

Monitoring and Observability

Infrastructure Drift Detection

# CloudWatch alarm for drift detection
resource "aws_cloudwatch_metric_alarm" "config_compliance" {
  alarm_name          = "config-compliance-alarm"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "ComplianceByConfigRule"
  namespace           = "AWS/Config"
  period              = "300"
  statistic           = "Average"
  threshold           = "1"
  alarm_description   = "This metric monitors config compliance"
  
  dimensions = {
    ConfigRuleName = aws_config_config_rule.required_tags.name
  }
}

Cost Monitoring

resource "aws_budgets_budget" "infrastructure" {
  name         = "infrastructure-budget"
  budget_type  = "COST"
  limit_amount = "100"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  cost_filters = {
    Service = ["Amazon Elastic Compute Cloud - Compute"]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = ["alerts@company.com"]
  }
}

Common Pitfalls and Solutions

1. Resource Dependencies

Problem: Resources created in wrong order Solution: Use explicit dependencies

resource "aws_instance" "web" {
  depends_on = [aws_security_group.web]
  # configuration
}

2. Hardcoded Values

Problem: Environment-specific values in code Solution: Use variables and data sources

# Bad
cidr_block = "10.0.0.0/16"

# Good
cidr_block = var.vpc_cidr

3. Large State Files

Problem: Monolithic infrastructure state Solution: Split into multiple states

terraform/
├── networking/
├── compute/
├── database/
└── monitoring/

Advanced Patterns

Module Composition

module "web_tier" {
  source = "./modules/web-tier"
  
  vpc_id            = module.networking.vpc_id
  private_subnets   = module.networking.private_subnets
  security_groups   = [module.security.web_sg_id]
  
  instance_type = var.web_instance_type
  min_size     = var.web_min_size
  max_size     = var.web_max_size
}

module "database" {
  source = "./modules/rds"
  
  vpc_id           = module.networking.vpc_id
  database_subnets = module.networking.database_subnets
  security_groups  = [module.security.db_sg_id]
  
  instance_class    = var.db_instance_class
  allocated_storage = var.db_allocated_storage
}

Conditional Resources

resource "aws_cloudwatch_log_group" "app_logs" {
  count = var.enable_logging ? 1 : 0
  
  name              = "/aws/lambda/${var.function_name}"
  retention_in_days = var.log_retention_days
}

Future of Infrastructure as Code

Emerging Trends

  1. AI-Assisted Infrastructure: Tools like GitHub Copilot for IaC
  2. Policy as Code: Open Policy Agent (OPA) integration
  3. GitOps: Argo CD and Flux for Kubernetes
  4. Infrastructure from Code: Generate IaC from application code

Next-Generation Tools

  • Crossplane: Kubernetes-native infrastructure management
  • Nitric: Application-focused infrastructure
  • Wing: Cloud-oriented programming language

Conclusion

Infrastructure as Code has evolved from a best practice to a necessity in modern software development. By following the principles and practices outlined in this guide, teams can build more reliable, secure, and maintainable infrastructure.

Key takeaways for 2024:

  1. Choose the right tool for your team and use case
  2. Implement proper state management from the beginning
  3. Security should be built-in, not bolted on
  4. Test your infrastructure code like application code
  5. Monitor for drift and compliance continuously
  6. Use CI/CD pipelines for automated deployments

The future of IaC continues to evolve with better tooling, improved cloud provider APIs, and new paradigms like Infrastructure from Code. Stay current with these trends to maintain competitive advantage in cloud infrastructure management.


Published on 2024-01-10 by Jesús Pérez

Additional Resources