# Terraform AWS EKS Node Group Module ## Overview This Terraform module provisions an Amazon EKS Node Group with managed EC2 worker nodes. It provides a production-ready configuration for running containerized workloads on Amazon EKS with support for auto-scaling, custom AMIs, SSH access, and IAM role management. ## Features - Managed EKS Node Group with auto-scaling capabilities - Configurable EC2 instance types and disk sizes - Support for AL2_x86_64 and AL2_x86_64_GPU AMI types - Optional SSH access with key pairs - Kubernetes labels support - IAM roles and policies for worker nodes - Automatic attachment of required AWS managed policies - Support for custom IAM policies - CloudPosse naming conventions for consistent resource naming - Conditional module enablement ## Resources Created ### Compute - AWS EKS Node Group - Auto Scaling Group (managed by EKS) - EC2 Instances (managed by EKS) ### IAM - IAM Role for worker nodes - IAM Role Policy Attachments: - AmazonEKSWorkerNodePolicy - AmazonEKS_CNI_Policy - AmazonEC2ContainerRegistryReadOnly - Custom policies (optional) ## Usage ### Basic Example ```hcl module "eks_node_group" { source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2" # Naming namespace = "myorg" stage = "prod" name = "app" attributes = [] # Cluster Configuration cluster_name = "my-eks-cluster" subnet_ids = ["subnet-12345678", "subnet-87654321"] # Node Group Sizing desired_size = 3 min_size = 2 max_size = 5 # Instance Configuration instance_types = ["t3.medium"] disk_size = 20 # Kubernetes Labels kubernetes_labels = { Environment = "production" Team = "platform" } # Tags tags = { ManagedBy = "terraform" } } ``` ### Advanced Example with SSH Access ```hcl module "eks_node_group" { source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2" namespace = "myorg" stage = "prod" name = "app" cluster_name = module.eks_cluster.cluster_id subnet_ids = module.vpc.private_subnet_ids # Node Group Configuration desired_size = 5 min_size = 3 max_size = 10 instance_types = ["t3.large", "t3.xlarge"] disk_size = 50 # AMI Configuration ami_type = "AL2_x86_64" ami_release_version = "1.21.5-20220123" kubernetes_version = "1.21" # SSH Access ec2_ssh_key = "my-keypair" source_security_group_ids = ["sg-12345678"] # Custom IAM Policies existing_workers_role_policy_arns = [ "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy" ] existing_workers_role_policy_arns_count = 1 kubernetes_labels = { NodeType = "general-purpose" Environment = "production" } tags = { CostCenter = "engineering" } } ``` ### GPU-Enabled Node Group ```hcl module "eks_gpu_node_group" { source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2" namespace = "myorg" stage = "prod" name = "gpu" cluster_name = module.eks_cluster.cluster_id subnet_ids = module.vpc.private_subnet_ids desired_size = 2 min_size = 1 max_size = 4 instance_types = ["g4dn.xlarge"] ami_type = "AL2_x86_64_GPU" disk_size = 100 kubernetes_labels = { NodeType = "gpu" "nvidia.com/gpu" = "true" } } ``` ## Variables | Name | Description | Type | Default | Required | |------|-------------|------|---------|----------| | namespace | Namespace (e.g., 'eg' or 'cp') | `string` | `""` | no | | stage | Stage (e.g., 'prod', 'staging', 'dev') | `string` | `""` | no | | name | Solution name (e.g., 'app' or 'cluster') | `string` | n/a | yes | | delimiter | Delimiter between namespace, stage, name and attributes | `string` | `"-"` | no | | attributes | Additional attributes | `list(string)` | `[]` | no | | tags | Additional tags | `map(string)` | `{}` | no | | enabled | Enable/disable module resources | `bool` | `true` | no | | cluster_name | Name of the EKS cluster | `string` | n/a | yes | | ec2_ssh_key | SSH key name for worker node access | `string` | `null` | no | | desired_size | Desired number of worker nodes | `number` | n/a | yes | | max_size | Maximum number of worker nodes | `number` | n/a | yes | | min_size | Minimum number of worker nodes | `number` | n/a | yes | | subnet_ids | List of subnet IDs to launch resources in | `list(string)` | n/a | yes | | existing_workers_role_policy_arns | List of existing policy ARNs to attach | `list(string)` | `[]` | no | | existing_workers_role_policy_arns_count | Count of existing policy ARNs | `number` | `0` | no | | ami_type | Type of AMI (AL2_x86_64, AL2_x86_64_GPU) | `string` | `"AL2_x86_64"` | no | | disk_size | Disk size in GiB for worker nodes | `number` | `20` | no | | instance_types | List of instance types for EKS Node Group | `list(string)` | n/a | yes | | kubernetes_labels | Key-value mapping of Kubernetes labels | `map(string)` | `{}` | no | | ami_release_version | AMI version of the EKS Node Group | `string` | `null` | no | | kubernetes_version | Kubernetes version | `string` | `null` | no | | source_security_group_ids | Security Group IDs to allow SSH access | `list(string)` | `[]` | no | ## Outputs | Name | Description | |------|-------------| | eks_node_group_role_arn | ARN of the worker nodes IAM role | | eks_node_group_role_name | Name of the worker nodes IAM role | | eks_node_group_id | EKS Cluster name and EKS Node Group name separated by colon | | eks_node_group_arn | Amazon Resource Name (ARN) of the EKS Node Group | | eks_node_group_resources | List of objects containing information about underlying resources | | eks_node_group_status | Status of the EKS Node Group | ## Requirements | Name | Version | |------|---------| | terraform | >= 0.13 | | aws | ~> 3.27 | | template | ~> 2.2 | | local | ~> 2.0 | ## Dependencies This module uses: - [cloudposse/terraform-null-label](https://github.com/cloudposse/terraform-null-label) - Resource naming ## IAM Policies The module automatically attaches the following AWS managed policies to the worker node IAM role: 1. **AmazonEKSWorkerNodePolicy** - Allows worker nodes to connect to EKS cluster 2. **AmazonEKS_CNI_Policy** - Provides IP address management for pods 3. **AmazonEC2ContainerRegistryReadOnly** - Allows pulling images from ECR Additional custom policies can be attached via `existing_workers_role_policy_arns`. ## Important Notes 1. **Cluster Name**: The `cluster_name` must match an existing EKS cluster 2. **Subnets**: Node groups should typically be deployed in private subnets 3. **Instance Types**: You can specify multiple instance types for better availability 4. **SSH Access**: If `ec2_ssh_key` is specified without `source_security_group_ids`, port 22 will be open to the internet (0.0.0.0/0) 5. **Auto Scaling**: The node group will automatically scale between `min_size` and `max_size` based on pod scheduling needs 6. **Kubernetes Version**: If not specified, the cluster's Kubernetes version will be used 7. **AMI Updates**: When `ami_release_version` is not specified, the latest AMI for the Kubernetes version is used 8. **Tagging**: The module automatically adds `kubernetes.io/cluster/ = "owned"` tag ## Best Practices 1. **Use Multiple Instance Types**: Specify multiple instance types for better EC2 capacity availability 2. **Private Subnets**: Deploy node groups in private subnets for security 3. **Right-Sizing**: Start with conservative instance sizes and scale based on actual usage 4. **Disk Size**: Allocate sufficient disk space for container images and logs (minimum 20 GiB) 5. **Labels**: Use Kubernetes labels for node selection in pod specifications 6. **Security Groups**: Restrict SSH access to specific security groups 7. **IAM Policies**: Only attach necessary custom IAM policies 8. **Version Management**: Pin `ami_release_version` and `kubernetes_version` for consistency ## Kubernetes Integration After node group creation, nodes automatically join the cluster. You can target specific node groups using node selectors: ```yaml apiVersion: v1 kind: Pod metadata: name: my-pod spec: nodeSelector: Environment: production Team: platform containers: - name: my-container image: nginx ``` ## Scaling Behavior The node group will automatically scale based on: - Pod resource requests that cannot be scheduled - Cluster Autoscaler policies (if installed) - Manual scaling via AWS console or API - Kubernetes Horizontal Pod Autoscaler demands ## Troubleshooting ### Nodes not joining cluster - Verify the cluster name is correct - Check that subnets have proper tags (`kubernetes.io/cluster/ = "shared"`) - Ensure IAM role has required policies attached - Verify security groups allow communication with cluster ### Auto-scaling not working - Install and configure Cluster Autoscaler - Verify IAM permissions for auto-scaling - Check that min/max size allows scaling ### SSH access not working - Verify `ec2_ssh_key` exists in the region - Check security group rules allow SSH from your IP - Ensure bastion host or VPN connectivity to private subnets ## License This module is provided as-is for use within your organization.