279 lines
9.1 KiB
Markdown
Executable File
279 lines
9.1 KiB
Markdown
Executable File
# Terraform AWS EKS Node Group Module
|
|
|
|
## Overview
|
|
|
|
This Terraform module provisions an Amazon EKS Node Group with managed EC2 worker nodes. It provides a production-ready configuration for running containerized workloads on Amazon EKS with support for auto-scaling, custom AMIs, SSH access, and IAM role management.
|
|
|
|
## Features
|
|
|
|
- Managed EKS Node Group with auto-scaling capabilities
|
|
- Configurable EC2 instance types and disk sizes
|
|
- Support for AL2_x86_64 and AL2_x86_64_GPU AMI types
|
|
- Optional SSH access with key pairs
|
|
- Kubernetes labels support
|
|
- IAM roles and policies for worker nodes
|
|
- Automatic attachment of required AWS managed policies
|
|
- Support for custom IAM policies
|
|
- CloudPosse naming conventions for consistent resource naming
|
|
- Conditional module enablement
|
|
|
|
## Resources Created
|
|
|
|
### Compute
|
|
- AWS EKS Node Group
|
|
- Auto Scaling Group (managed by EKS)
|
|
- EC2 Instances (managed by EKS)
|
|
|
|
### IAM
|
|
- IAM Role for worker nodes
|
|
- IAM Role Policy Attachments:
|
|
- AmazonEKSWorkerNodePolicy
|
|
- AmazonEKS_CNI_Policy
|
|
- AmazonEC2ContainerRegistryReadOnly
|
|
- Custom policies (optional)
|
|
|
|
## Usage
|
|
|
|
### Basic Example
|
|
|
|
```hcl
|
|
module "eks_node_group" {
|
|
source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2"
|
|
|
|
# Naming
|
|
namespace = "myorg"
|
|
stage = "prod"
|
|
name = "app"
|
|
attributes = []
|
|
|
|
# Cluster Configuration
|
|
cluster_name = "my-eks-cluster"
|
|
subnet_ids = ["subnet-12345678", "subnet-87654321"]
|
|
|
|
# Node Group Sizing
|
|
desired_size = 3
|
|
min_size = 2
|
|
max_size = 5
|
|
|
|
# Instance Configuration
|
|
instance_types = ["t3.medium"]
|
|
disk_size = 20
|
|
|
|
# Kubernetes Labels
|
|
kubernetes_labels = {
|
|
Environment = "production"
|
|
Team = "platform"
|
|
}
|
|
|
|
# Tags
|
|
tags = {
|
|
ManagedBy = "terraform"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Advanced Example with SSH Access
|
|
|
|
```hcl
|
|
module "eks_node_group" {
|
|
source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2"
|
|
|
|
namespace = "myorg"
|
|
stage = "prod"
|
|
name = "app"
|
|
|
|
cluster_name = module.eks_cluster.cluster_id
|
|
subnet_ids = module.vpc.private_subnet_ids
|
|
|
|
# Node Group Configuration
|
|
desired_size = 5
|
|
min_size = 3
|
|
max_size = 10
|
|
instance_types = ["t3.large", "t3.xlarge"]
|
|
disk_size = 50
|
|
|
|
# AMI Configuration
|
|
ami_type = "AL2_x86_64"
|
|
ami_release_version = "1.21.5-20220123"
|
|
kubernetes_version = "1.21"
|
|
|
|
# SSH Access
|
|
ec2_ssh_key = "my-keypair"
|
|
source_security_group_ids = ["sg-12345678"]
|
|
|
|
# Custom IAM Policies
|
|
existing_workers_role_policy_arns = [
|
|
"arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
|
|
]
|
|
existing_workers_role_policy_arns_count = 1
|
|
|
|
kubernetes_labels = {
|
|
NodeType = "general-purpose"
|
|
Environment = "production"
|
|
}
|
|
|
|
tags = {
|
|
CostCenter = "engineering"
|
|
}
|
|
}
|
|
```
|
|
|
|
### GPU-Enabled Node Group
|
|
|
|
```hcl
|
|
module "eks_gpu_node_group" {
|
|
source = "git@github.com:webuildyourcloud/terraform-aws-eks_node_group.git?ref=tags/0.0.2"
|
|
|
|
namespace = "myorg"
|
|
stage = "prod"
|
|
name = "gpu"
|
|
|
|
cluster_name = module.eks_cluster.cluster_id
|
|
subnet_ids = module.vpc.private_subnet_ids
|
|
|
|
desired_size = 2
|
|
min_size = 1
|
|
max_size = 4
|
|
instance_types = ["g4dn.xlarge"]
|
|
ami_type = "AL2_x86_64_GPU"
|
|
disk_size = 100
|
|
|
|
kubernetes_labels = {
|
|
NodeType = "gpu"
|
|
"nvidia.com/gpu" = "true"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Variables
|
|
|
|
| Name | Description | Type | Default | Required |
|
|
|------|-------------|------|---------|----------|
|
|
| namespace | Namespace (e.g., 'eg' or 'cp') | `string` | `""` | no |
|
|
| stage | Stage (e.g., 'prod', 'staging', 'dev') | `string` | `""` | no |
|
|
| name | Solution name (e.g., 'app' or 'cluster') | `string` | n/a | yes |
|
|
| delimiter | Delimiter between namespace, stage, name and attributes | `string` | `"-"` | no |
|
|
| attributes | Additional attributes | `list(string)` | `[]` | no |
|
|
| tags | Additional tags | `map(string)` | `{}` | no |
|
|
| enabled | Enable/disable module resources | `bool` | `true` | no |
|
|
| cluster_name | Name of the EKS cluster | `string` | n/a | yes |
|
|
| ec2_ssh_key | SSH key name for worker node access | `string` | `null` | no |
|
|
| desired_size | Desired number of worker nodes | `number` | n/a | yes |
|
|
| max_size | Maximum number of worker nodes | `number` | n/a | yes |
|
|
| min_size | Minimum number of worker nodes | `number` | n/a | yes |
|
|
| subnet_ids | List of subnet IDs to launch resources in | `list(string)` | n/a | yes |
|
|
| existing_workers_role_policy_arns | List of existing policy ARNs to attach | `list(string)` | `[]` | no |
|
|
| existing_workers_role_policy_arns_count | Count of existing policy ARNs | `number` | `0` | no |
|
|
| ami_type | Type of AMI (AL2_x86_64, AL2_x86_64_GPU) | `string` | `"AL2_x86_64"` | no |
|
|
| disk_size | Disk size in GiB for worker nodes | `number` | `20` | no |
|
|
| instance_types | List of instance types for EKS Node Group | `list(string)` | n/a | yes |
|
|
| kubernetes_labels | Key-value mapping of Kubernetes labels | `map(string)` | `{}` | no |
|
|
| ami_release_version | AMI version of the EKS Node Group | `string` | `null` | no |
|
|
| kubernetes_version | Kubernetes version | `string` | `null` | no |
|
|
| source_security_group_ids | Security Group IDs to allow SSH access | `list(string)` | `[]` | no |
|
|
|
|
## Outputs
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| eks_node_group_role_arn | ARN of the worker nodes IAM role |
|
|
| eks_node_group_role_name | Name of the worker nodes IAM role |
|
|
| eks_node_group_id | EKS Cluster name and EKS Node Group name separated by colon |
|
|
| eks_node_group_arn | Amazon Resource Name (ARN) of the EKS Node Group |
|
|
| eks_node_group_resources | List of objects containing information about underlying resources |
|
|
| eks_node_group_status | Status of the EKS Node Group |
|
|
|
|
## Requirements
|
|
|
|
| Name | Version |
|
|
|------|---------|
|
|
| terraform | >= 0.13 |
|
|
| aws | ~> 3.27 |
|
|
| template | ~> 2.2 |
|
|
| local | ~> 2.0 |
|
|
|
|
## Dependencies
|
|
|
|
This module uses:
|
|
- [cloudposse/terraform-null-label](https://github.com/cloudposse/terraform-null-label) - Resource naming
|
|
|
|
## IAM Policies
|
|
|
|
The module automatically attaches the following AWS managed policies to the worker node IAM role:
|
|
|
|
1. **AmazonEKSWorkerNodePolicy** - Allows worker nodes to connect to EKS cluster
|
|
2. **AmazonEKS_CNI_Policy** - Provides IP address management for pods
|
|
3. **AmazonEC2ContainerRegistryReadOnly** - Allows pulling images from ECR
|
|
|
|
Additional custom policies can be attached via `existing_workers_role_policy_arns`.
|
|
|
|
## Important Notes
|
|
|
|
1. **Cluster Name**: The `cluster_name` must match an existing EKS cluster
|
|
2. **Subnets**: Node groups should typically be deployed in private subnets
|
|
3. **Instance Types**: You can specify multiple instance types for better availability
|
|
4. **SSH Access**: If `ec2_ssh_key` is specified without `source_security_group_ids`, port 22 will be open to the internet (0.0.0.0/0)
|
|
5. **Auto Scaling**: The node group will automatically scale between `min_size` and `max_size` based on pod scheduling needs
|
|
6. **Kubernetes Version**: If not specified, the cluster's Kubernetes version will be used
|
|
7. **AMI Updates**: When `ami_release_version` is not specified, the latest AMI for the Kubernetes version is used
|
|
8. **Tagging**: The module automatically adds `kubernetes.io/cluster/<cluster_name> = "owned"` tag
|
|
|
|
## Best Practices
|
|
|
|
1. **Use Multiple Instance Types**: Specify multiple instance types for better EC2 capacity availability
|
|
2. **Private Subnets**: Deploy node groups in private subnets for security
|
|
3. **Right-Sizing**: Start with conservative instance sizes and scale based on actual usage
|
|
4. **Disk Size**: Allocate sufficient disk space for container images and logs (minimum 20 GiB)
|
|
5. **Labels**: Use Kubernetes labels for node selection in pod specifications
|
|
6. **Security Groups**: Restrict SSH access to specific security groups
|
|
7. **IAM Policies**: Only attach necessary custom IAM policies
|
|
8. **Version Management**: Pin `ami_release_version` and `kubernetes_version` for consistency
|
|
|
|
## Kubernetes Integration
|
|
|
|
After node group creation, nodes automatically join the cluster. You can target specific node groups using node selectors:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: my-pod
|
|
spec:
|
|
nodeSelector:
|
|
Environment: production
|
|
Team: platform
|
|
containers:
|
|
- name: my-container
|
|
image: nginx
|
|
```
|
|
|
|
## Scaling Behavior
|
|
|
|
The node group will automatically scale based on:
|
|
- Pod resource requests that cannot be scheduled
|
|
- Cluster Autoscaler policies (if installed)
|
|
- Manual scaling via AWS console or API
|
|
- Kubernetes Horizontal Pod Autoscaler demands
|
|
|
|
## Troubleshooting
|
|
|
|
### Nodes not joining cluster
|
|
- Verify the cluster name is correct
|
|
- Check that subnets have proper tags (`kubernetes.io/cluster/<cluster_name> = "shared"`)
|
|
- Ensure IAM role has required policies attached
|
|
- Verify security groups allow communication with cluster
|
|
|
|
### Auto-scaling not working
|
|
- Install and configure Cluster Autoscaler
|
|
- Verify IAM permissions for auto-scaling
|
|
- Check that min/max size allows scaling
|
|
|
|
### SSH access not working
|
|
- Verify `ec2_ssh_key` exists in the region
|
|
- Check security group rules allow SSH from your IP
|
|
- Ensure bastion host or VPN connectivity to private subnets
|
|
|
|
## License
|
|
|
|
This module is provided as-is for use within your organization.
|