279 lines
9.1 KiB
Markdown
Executable File

# Terraform AWS EKS Node Group Module
## Overview
This Terraform module provisions an Amazon EKS Node Group with managed EC2 worker nodes. It provides a production-ready configuration for running containerized workloads on Amazon EKS with support for auto-scaling, custom AMIs, SSH access, and IAM role management.
## Features
- Managed EKS Node Group with auto-scaling capabilities
- Configurable EC2 instance types and disk sizes
- Support for AL2_x86_64 and AL2_x86_64_GPU AMI types
- Optional SSH access with key pairs
- Kubernetes labels support
- IAM roles and policies for worker nodes
- Automatic attachment of required AWS managed policies
- Support for custom IAM policies
- CloudPosse naming conventions for consistent resource naming
- Conditional module enablement
## Resources Created
### Compute
- AWS EKS Node Group
- Auto Scaling Group (managed by EKS)
- EC2 Instances (managed by EKS)
### IAM
- IAM Role for worker nodes
- IAM Role Policy Attachments:
- AmazonEKSWorkerNodePolicy
- AmazonEKS_CNI_Policy
- AmazonEC2ContainerRegistryReadOnly
- Custom policies (optional)
## Usage
### Basic Example
```hcl
module "eks_node_group" {
source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2"
# Naming
namespace = "myorg"
stage = "prod"
name = "app"
attributes = []
# Cluster Configuration
cluster_name = "my-eks-cluster"
subnet_ids = ["subnet-12345678", "subnet-87654321"]
# Node Group Sizing
desired_size = 3
min_size = 2
max_size = 5
# Instance Configuration
instance_types = ["t3.medium"]
disk_size = 20
# Kubernetes Labels
kubernetes_labels = {
Environment = "production"
Team = "platform"
}
# Tags
tags = {
ManagedBy = "terraform"
}
}
```
### Advanced Example with SSH Access
```hcl
module "eks_node_group" {
source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2"
namespace = "myorg"
stage = "prod"
name = "app"
cluster_name = module.eks_cluster.cluster_id
subnet_ids = module.vpc.private_subnet_ids
# Node Group Configuration
desired_size = 5
min_size = 3
max_size = 10
instance_types = ["t3.large", "t3.xlarge"]
disk_size = 50
# AMI Configuration
ami_type = "AL2_x86_64"
ami_release_version = "1.21.5-20220123"
kubernetes_version = "1.21"
# SSH Access
ec2_ssh_key = "my-keypair"
source_security_group_ids = ["sg-12345678"]
# Custom IAM Policies
existing_workers_role_policy_arns = [
"arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
]
existing_workers_role_policy_arns_count = 1
kubernetes_labels = {
NodeType = "general-purpose"
Environment = "production"
}
tags = {
CostCenter = "engineering"
}
}
```
### GPU-Enabled Node Group
```hcl
module "eks_gpu_node_group" {
source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2"
namespace = "myorg"
stage = "prod"
name = "gpu"
cluster_name = module.eks_cluster.cluster_id
subnet_ids = module.vpc.private_subnet_ids
desired_size = 2
min_size = 1
max_size = 4
instance_types = ["g4dn.xlarge"]
ami_type = "AL2_x86_64_GPU"
disk_size = 100
kubernetes_labels = {
NodeType = "gpu"
"nvidia.com/gpu" = "true"
}
}
```
## Variables
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|----------|
| namespace | Namespace (e.g., 'eg' or 'cp') | `string` | `""` | no |
| stage | Stage (e.g., 'prod', 'staging', 'dev') | `string` | `""` | no |
| name | Solution name (e.g., 'app' or 'cluster') | `string` | n/a | yes |
| delimiter | Delimiter between namespace, stage, name and attributes | `string` | `"-"` | no |
| attributes | Additional attributes | `list(string)` | `[]` | no |
| tags | Additional tags | `map(string)` | `{}` | no |
| enabled | Enable/disable module resources | `bool` | `true` | no |
| cluster_name | Name of the EKS cluster | `string` | n/a | yes |
| ec2_ssh_key | SSH key name for worker node access | `string` | `null` | no |
| desired_size | Desired number of worker nodes | `number` | n/a | yes |
| max_size | Maximum number of worker nodes | `number` | n/a | yes |
| min_size | Minimum number of worker nodes | `number` | n/a | yes |
| subnet_ids | List of subnet IDs to launch resources in | `list(string)` | n/a | yes |
| existing_workers_role_policy_arns | List of existing policy ARNs to attach | `list(string)` | `[]` | no |
| existing_workers_role_policy_arns_count | Count of existing policy ARNs | `number` | `0` | no |
| ami_type | Type of AMI (AL2_x86_64, AL2_x86_64_GPU) | `string` | `"AL2_x86_64"` | no |
| disk_size | Disk size in GiB for worker nodes | `number` | `20` | no |
| instance_types | List of instance types for EKS Node Group | `list(string)` | n/a | yes |
| kubernetes_labels | Key-value mapping of Kubernetes labels | `map(string)` | `{}` | no |
| ami_release_version | AMI version of the EKS Node Group | `string` | `null` | no |
| kubernetes_version | Kubernetes version | `string` | `null` | no |
| source_security_group_ids | Security Group IDs to allow SSH access | `list(string)` | `[]` | no |
## Outputs
| Name | Description |
|------|-------------|
| eks_node_group_role_arn | ARN of the worker nodes IAM role |
| eks_node_group_role_name | Name of the worker nodes IAM role |
| eks_node_group_id | EKS Cluster name and EKS Node Group name separated by colon |
| eks_node_group_arn | Amazon Resource Name (ARN) of the EKS Node Group |
| eks_node_group_resources | List of objects containing information about underlying resources |
| eks_node_group_status | Status of the EKS Node Group |
## Requirements
| Name | Version |
|------|---------|
| terraform | >= 0.13 |
| aws | ~> 3.27 |
| template | ~> 2.2 |
| local | ~> 2.0 |
## Dependencies
This module uses:
- [cloudposse/terraform-null-label](https://github.com/cloudposse/terraform-null-label) - Resource naming
## IAM Policies
The module automatically attaches the following AWS managed policies to the worker node IAM role:
1. **AmazonEKSWorkerNodePolicy** - Allows worker nodes to connect to EKS cluster
2. **AmazonEKS_CNI_Policy** - Provides IP address management for pods
3. **AmazonEC2ContainerRegistryReadOnly** - Allows pulling images from ECR
Additional custom policies can be attached via `existing_workers_role_policy_arns`.
## Important Notes
1. **Cluster Name**: The `cluster_name` must match an existing EKS cluster
2. **Subnets**: Node groups should typically be deployed in private subnets
3. **Instance Types**: You can specify multiple instance types for better availability
4. **SSH Access**: If `ec2_ssh_key` is specified without `source_security_group_ids`, port 22 will be open to the internet (0.0.0.0/0)
5. **Auto Scaling**: The node group will automatically scale between `min_size` and `max_size` based on pod scheduling needs
6. **Kubernetes Version**: If not specified, the cluster's Kubernetes version will be used
7. **AMI Updates**: When `ami_release_version` is not specified, the latest AMI for the Kubernetes version is used
8. **Tagging**: The module automatically adds `kubernetes.io/cluster/<cluster_name> = "owned"` tag
## Best Practices
1. **Use Multiple Instance Types**: Specify multiple instance types for better EC2 capacity availability
2. **Private Subnets**: Deploy node groups in private subnets for security
3. **Right-Sizing**: Start with conservative instance sizes and scale based on actual usage
4. **Disk Size**: Allocate sufficient disk space for container images and logs (minimum 20 GiB)
5. **Labels**: Use Kubernetes labels for node selection in pod specifications
6. **Security Groups**: Restrict SSH access to specific security groups
7. **IAM Policies**: Only attach necessary custom IAM policies
8. **Version Management**: Pin `ami_release_version` and `kubernetes_version` for consistency
## Kubernetes Integration
After node group creation, nodes automatically join the cluster. You can target specific node groups using node selectors:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
nodeSelector:
Environment: production
Team: platform
containers:
- name: my-container
image: nginx
```
## Scaling Behavior
The node group will automatically scale based on:
- Pod resource requests that cannot be scheduled
- Cluster Autoscaler policies (if installed)
- Manual scaling via AWS console or API
- Kubernetes Horizontal Pod Autoscaler demands
## Troubleshooting
### Nodes not joining cluster
- Verify the cluster name is correct
- Check that subnets have proper tags (`kubernetes.io/cluster/<cluster_name> = "shared"`)
- Ensure IAM role has required policies attached
- Verify security groups allow communication with cluster
### Auto-scaling not working
- Install and configure Cluster Autoscaler
- Verify IAM permissions for auto-scaling
- Check that min/max size allows scaling
### SSH access not working
- Verify `ec2_ssh_key` exists in the region
- Check security group rules allow SSH from your IP
- Ensure bastion host or VPN connectivity to private subnets
## License
This module is provided as-is for use within your organization.