[AWS EKS] (20) EKS 스터디 7주차 ( EKS auto Mode )

CloudNet@팀의 EKS 스터디 AEWS 2기에 작성된 자료를 토대로 작성합니다.

Auto Mode 실습

git clone

# Get the code : 배포 코드에 addon 내용이 읍다!
git clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git
cd sample-aws-eks-auto-mode/terraform

# eks.tf : "system" 은 '전용인스턴스'로 추가하지 않는다
...
  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose"]
  }
...

Terraform file edit (variables.tf)
region = ap-northeast-2
cidr = 10.20.0.0/16

variable "name" {
  description = "Name of the VPC and EKS Cluster"
  default     = "automode-cluster"
  type        = string
}

variable "region" {
  description = "region"
  default     = "ap-northeast-2" 
  type        = string
}

variable "eks_cluster_version" {
  description = "EKS Cluster version"
  default     = "1.31"
  type        = string
}

# VPC with 65536 IPs (10.0.0.0/16) for 3 AZs
variable "vpc_cidr" {
  description = "VPC CIDR. This should be a valid private (RFC 1918) CIDR range"
  default     = "10.20.0.0/16"
  type        = string
}

init , apply

# Initialize and apply Terraform
terraform init
terraform plan
terraform apply -auto-approve
...
null_resource.create_nodepools_dir: Creating...
null_resource.create_nodepools_dir: Provisioning with 'local-exec'...
null_resource.create_nodepools_dir (local-exec): Executing: ["/bin/sh" "-c" "mkdir -p ./../nodepools"]
...


# Configure kubectl
aws eks --region ap-northeast-2 update-kubeconfig --name automode-clusterAdded new context arn:aws:eks:ap-northeast-2:015609516422:cluster/automode-cluster to /Users/kpkim/.kube/config

# kubectl context 변경
kubectl config get-contexts 
CURRENT   NAME                                                                 CLUSTER                                                              AUTHINFO                                                             NAMESPACE
          admin                                                                arn:aws:eks:ap-northeast-2:015609516422:cluster/myeks                admin                                                                
*         arn:aws:eks:ap-northeast-2:015609516422:cluster/automode-cluster     arn:aws:eks:ap-northeast-2:015609516422:cluster/automode-cluster     arn:aws:eks:ap-northeast-2:015609516422:cluster/automode-cluster     
          arn:aws:eks:ap-northeast-2:015609516422:cluster/fargate-serverless   arn:aws:eks:ap-northeast-2:015609516422:cluster/fargate-serverless   arn:aws:eks:ap-northeast-2:015609516422:cluster/fargate-serverless   
          arn:aws:iam::015609516422:user/Ted                                   arn:aws:eks:ap-northeast-2:015609516422:cluster/myeks                arn:aws:iam::015609516422:user/Ted                                   
          babo                                                                 kind-myk8s                                                           kind-myk8s                                                           
          kind-myk8s     

# 아래 IP의 ENI 찾아보자
kubectl get svc,ep 
NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   27m

NAME                   ENDPOINTS                           AGE
endpoints/kubernetes   10.20.22.204:443,10.20.40.216:443   27m

#
terraform state list
terraform show
terraform state show 'module.eks.aws_eks_cluster.this[0]'
...
    compute_config {
        enabled       = true
        node_pools    = [
            "general-purpose",
        ]
        node_role_arn = "arn:aws:iam::911283464785:role/automode-cluster-eks-auto-20250316042752605600000003"
    }
...

EKS owned ENI

EKS Auto mode

add-on없음

노드에 접근 못하니까

분석지원 (CRD) 제공

nodediagnostics

#
kubectl get crd
NAME                                         CREATED AT
cninodes.eks.amazonaws.com                   2025-03-14T12:27:23Z
cninodes.vpcresources.k8s.aws                2025-03-14T12:23:31Z
ingressclassparams.eks.amazonaws.com         2025-03-14T12:27:23Z
nodeclaims.karpenter.sh                      2025-03-14T12:27:23Z
nodeclasses.eks.amazonaws.com                2025-03-14T12:27:23Z
nodediagnostics.eks.amazonaws.com            2025-03-14T12:27:23Z
nodepools.karpenter.sh                       2025-03-14T12:27:23Z
policyendpoints.networking.k8s.aws           2025-03-14T12:23:31Z
securitygrouppolicies.vpcresources.k8s.aws   2025-03-14T12:23:31Z
targetgroupbindings.eks.amazonaws.com        2025-03-14T12:27:23Z

kubectl api-resources | grep -i node
nodes                               no           v1                                false        Node
cninodes                            cni,cnis     eks.amazonaws.com/v1alpha1        false        CNINode
nodeclasses                                      eks.amazonaws.com/v1              false        NodeClass
nodediagnostics                                  eks.amazonaws.com/v1alpha1        false        NodeDiagnostic
nodeclaims                                       karpenter.sh/v1                   false        NodeClaim
nodepools                                        karpenter.sh/v1                   false        NodePool
runtimeclasses                                   node.k8s.io/v1                    false        RuntimeClass
csinodes                                         storage.k8s.io/v1                 false        CSINode
cninodes                            cnd          vpcresources.k8s.aws/v1alpha1     false        CNINode

# 노드에 Access가 불가능하니, 분석 지원(CRD)제공
kubectl explain nodediagnostics
GROUP:      eks.amazonaws.com
KIND:       NodeDiagnostic
VERSION:    v1alpha1

DESCRIPTION:
    The name of the NodeDiagnostic resource is meant to match the name of the
    node which should perform the diagnostic tasks

#
kubectl get nodeclasses.eks.amazonaws.com
NAME      ROLE                                                   READY   AGE
default   automode-cluster-eks-auto-20250314121820950800000003   True    29m

kubectl get nodeclasses.eks.amazonaws.com -o yaml
...
  spec:
    ephemeralStorage:
      iops: 3000
      size: 80Gi
      throughput: 125
    networkPolicy: DefaultAllow
    networkPolicyEventLogs: Disabled
    role: automode-cluster-eks-auto-20250314121820950800000003
    securityGroupSelectorTerms:
    - id: sg-05d210218e5817fa1
    snatPolicy: Random # ???
    subnetSelectorTerms:
    - id: subnet-0539269140458ced5
    - id: subnet-055dc112cdd434066
    - id: subnet-0865f60e4a6d8ad5c
  status:
    ...
    instanceProfile: eks-ap-northeast-2-automode-cluster-4905473370491687283
    securityGroups:
    - id: sg-05d210218e5817fa1
      name: eks-cluster-sg-automode-cluster-2065126657
    subnets:
    - id: subnet-0539269140458ced5
      zone: ap-northeast-2a
      zoneID: apne2-az1
    - id: subnet-055dc112cdd434066
      zone: ap-northeast-2b
      zoneID: apne2-az2
    - id: subnet-0865f60e4a6d8ad5c
      zone: ap-northeast-2c
      zoneID: apne2-az3

#
kubectl get nodepools
NAME              NODECLASS   NODES   READY   AGE
general-purpose   default     0       True    33m

kubectl get nodepools -o yaml
...
  spec:
    disruption:
      budgets:
      - nodes: 10%
      consolidateAfter: 30s
      consolidationPolicy: WhenEmptyOrUnderutilized
    template:
      metadata: {}
      spec:
        expireAfter: 336h # 14일
        nodeClassRef:
          group: eks.amazonaws.com
          kind: NodeClass
          name: default
        requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
          - on-demand
        - key: eks.amazonaws.com/instance-category
          operator: In
          values:
          - c
          - m
          - r
        - key: eks.amazonaws.com/instance-generation
          operator: Gt
          values:
          - "4"
        - key: kubernetes.io/arch
          operator: In
          values:
          - amd64
        - key: kubernetes.io/os
          operator: In
          values:
          - linux
        terminationGracePeriod: 24h0m0s
...

#
kubectl get mutatingwebhookconfiguration
kubectl get validatingwebhookconfiguration

kubeopsview 설치

# 모니터링
eks-node-viewer --node-sort=eks-node-viewer/node-cpu-usage=dsc --extra-labels eks-node-viewer/node-age
watch -d kubectl get node,pod -A

# helm 배포
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl get events -w --sort-by '.lastTimestamp' # 출력 이벤트 로그 분석해보자

# 확인
kubectl get nodeclaims
NAME                      TYPE         CAPACITY    ZONE              NODE                  READY   AGE
general-purpose-528mt     c5a.large    on-demand   ap-northeast-2c   i-09cf206aee76f0bee   True    54s

# OS, KERNEL, CRI 확인
kubectl get node -owide
NAME                  STATUS   ROLES    AGE     VERSION               INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                          KERNEL-VERSION   CONTAINER-RUNTIME
i-09cf206aee76f0bee   Ready    <none>   2m14s   v1.31.4-eks-0f56d01   10.20.44.40    <none>        Bottlerocket (EKS Auto) 2025.3.9 (aws-k8s-1.31)   6.1.129          containerd://1.7.25+bottlerocket

# CNI 노드 확인
kubectl get cninodes.eks.amazonaws.com   
NAME                  AGE
i-09cf206aee76f0bee   3m24s
 
 
#[신규 터미널] 포트 포워딩
kubectl port-forward deployment/kube-ops-view -n kube-system 8080:8080 &

# 접속 주소 확인 : 각각 1배, 1.5배, 3배 크기
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080"
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080/#scale=1.5"
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080/#scale=3"

open "http://127.0.0.1:8080/#scale=1.5" # macOS

Karpenter도 쓴다구~

[컴퓨팅] karpenter 동작 확인

step 1 . watch 걸어두기

step 2 . pod 배포 (우선 1개-> 100 -> 200 -> 300 스케일 할거임)

step 3 . 스케일

# Step 1: Review existing compute resources (optional)
kubectl get nodepools
general-purpose

# Step 2: Deploy a sample application to the cluster
# eks.amazonaws.com/compute-type: auto selector requires the workload be deployed on an Amazon EKS Auto Mode node.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 1
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      nodeSelector:
        eks.amazonaws.com/compute-type: auto
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
          securityContext:
            allowPrivilegeEscalation: false
EOF


# Step 3: Watch Kubernetes Events + eks-node-viewer + kubeopsview
eks-node-viewer --node-sort=eks-node-viewer/node-cpu-usage=dsc --extra-labels eks-node-viewer/node-age
watch -d kubectl get node,pod -A

# 
kubectl scale deployment inflate --replicas 100 && kubectl get events -w --sort-by '.lastTimestamp'

# 
kubectl scale deployment inflate --replicas 200 && kubectl get events -w --sort-by '.lastTimestamp'

#
kubectl scale deployment inflate --replicas 50 && kubectl get events -w --sort-by '.lastTimestamp'

# 실습 확인 후 삭제
kubectl delete deployment inflate && kubectl get events -w --sort-by '.lastTimestamp'

[네트워킹] Graviton Workloads (2048 game) 배포 with ingress(ALB)

1. 고객 Nodepool 생성

# custom node pool 생성 : 고객 NodePool : Karpenter 와 키가 다르니 주의!
## 기존(karpenter.k8s.aws/instance-family) → 변경(eks.amazonaws.com/instance-family) - Link
ls ../nodepools
cat ../nodepools/graviton-nodepool.yaml
kubectl apply -f ../nodepools/graviton-nodepool.yaml
---
apiVersion: eks.amazonaws.com/v1
kind: NodeClass
metadata:
  name: graviton-nodeclass
spec:
  role: automode-cluster-eks-auto-20250314121820950800000003
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "automode-demo"
  securityGroupSelectorTerms:
    - tags:
        kubernetes.io/cluster/automode-cluster: owned
  tags:
    karpenter.sh/discovery: "automode-demo"
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: graviton-nodepool
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: graviton-nodeclass
      requirements:
        - key: "eks.amazonaws.com/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "eks.amazonaws.com/instance-cpu"
          operator: In
          values: ["4", "8", "16", "32"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["arm64"]
      taints:
        - key: "arm64"
          value: "true"
          effect: "NoSchedule"  # Prevents non-ARM64 pods from scheduling
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

#
kubectl get NodeClass
NAME                 ROLE                                                   READY   AGE
default              automode-cluster-eks-auto-20250314121820950800000003   True    64m
graviton-nodeclass   automode-cluster-eks-auto-20250314121820950800000003   True    3m4s

kubectl get NodePool
NAME                NODECLASS            NODES   READY   AGE
general-purpose     default              0       True    64m
graviton-nodepool   graviton-nodeclass   0       True    3m32s

#
ls ../examples/graviton
cat ../examples/graviton/game-2048.yaml
...
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "200m"
              memory: "256Mi"
      automountServiceAccountToken: false
      tolerations:
      - key: "arm64"
        value: "true"
        effect: "NoSchedule"
      nodeSelector:
        kubernetes.io/arch: arm64
...

kubectl apply -f ../examples/graviton/game-2048.yaml

# c6g.xlarge : vCPU 4, 8 GiB RAM > 스팟 선택됨!
kubectl get nodeclaims
NAME                      TYPE         CAPACITY   ZONE              NODE                  READY   AGE
graviton-nodepool-ngp42   c6g.xlarge   spot       ap-northeast-2b   i-0b7ca5072ebf3c969   True    9m48s

kubectl get nodeclaims -o yaml
...
  spec:
    expireAfter: 336h
    ...
kubectl get cninodes.eks.amazonaws.com
kubectl get cninodes.eks.amazonaws.com -o yaml
eks-node-viewer --resources cpu,memory
kubectl get node -owide
kubectl describe node

#
kubectl get deploy,pod -n game-2048 -owide

관리형 true

EC2 - 1대 생성 확인!, 참고로 접근 안됨. ⇒ Reboot 해보자 ⇒ Terminated 해보자!

재부팅이 안됨

2. ingress 만들기

# ingress 살펴보기 
#
cat ../examples/graviton/2048-ingress.yaml
...
apiVersion: eks.amazonaws.com/v1
kind: IngressClassParams
metadata:
  namespace: game-2048
  name: params
spec:
  scheme: internet-facing

---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  namespace: game-2048
  labels:
    app.kubernetes.io/name: LoadBalancerController
  name: alb
spec:
  controller: eks.amazonaws.com/alb
  parameters:
    apiGroup: eks.amazonaws.com
    kind: IngressClassParams
    name: params

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: service-2048
                port:
                  number: 80

kubectl apply -f ../examples/graviton/2048-ingress.yaml

#
kubectl get ingressclass,ingressclassparams,ingress,svc,ep -n game-2048
NAME                                 CONTROLLER              PARAMETERS                                    AGE
ingressclass.networking.k8s.io/alb   eks.amazonaws.com/alb   IngressClassParams.eks.amazonaws.com/params   25s

NAME                                          GROUP-NAME   SCHEME            IP-ADDRESS-TYPE   AGE
ingressclassparams.eks.amazonaws.com/params                internet-facing                     25s

NAME                                     CLASS   HOSTS   ADDRESS                                                                       PORTS   AGE
ingress.networking.k8s.io/ingress-2048   alb     *       k8s-game2048-ingress2-db993ba6ac-210498627.ap-northeast-2.elb.amazonaws.com   80      25s

NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/service-2048   NodePort   172.20.95.201   <none>        80:30218/TCP   25s

NAME                     ENDPOINTS       AGE
endpoints/service-2048   10.20.8.48:80   25s

3. Configure Security Groups : between the ALB and EKS cluster

# Get security group IDs 
ALB_SG=$(aws elbv2 describe-load-balancers \
  --query 'LoadBalancers[?contains(DNSName, `game2048`)].SecurityGroups[0]' \
  --output text)

EKS_SG=$(aws eks describe-cluster \
  --name automode-cluster \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' \
  --output text)

echo $ALB_SG $EKS_SG # 해당 보안그룹을 관리콘솔에서 정책 설정 먼저 확인해보자

# Allow ALB to communicate with EKS cluster : 실습 환경 삭제 때, 미리 $EKS_SG에 추가된 규칙만 제거해둘것.
aws ec2 authorize-security-group-ingress \
  --group-id $EKS_SG \
  --source-group $ALB_SG \
  --protocol tcp \
  --port 80
 
# 아래 웹 주소로 http 접속!
kubectl get ingress ingress-2048 \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' \
  -n game-2048
  
# k8s-game2048-ingress2-db993ba6ac-210498627.ap-northeast-2.elb.amazonaws.com

4. 삭제

# Remove application components
kubectl delete ingress -n game-2048 ingress-2048 # 먼저 $EKS_SG에 추가된 규칙만 제거할것!!!
kubectl delete svc -n game-2048 service-2048
kubectl delete deploy -n game-2048 deployment-2048

# 생성된 노드가 삭제 후에 노드 풀 제거 할 것 : Remove Graviton node pool
kubectl delete -f ../nodepools/graviton-nodepool.yaml

# 삭제 
helm uninstall -n kube-system kube-ops-view → 노드(인스턴스)까지 삭제 확인 후 아래 테라폼 삭제 할 것
terraform destroy --auto-approve
rm -rf ~/.kube/config

저작자표시 (새창열림)

국두리의 기술블로그

[AWS EKS] (20) EKS 스터디 7주차 ( EKS auto Mode )

Auto Mode 실습

노드에 접근 못하니까

Karpenter도 쓴다구~

[네트워킹] Graviton Workloads (2048 game) 배포 with ingress(ALB)

티스토리툴바

[AWS EKS] (20) EKS 스터디 7주차 ( EKS auto Mode )

Auto Mode 실습

노드에 접근 못하니까

Karpenter도 쓴다구~

[네트워킹] Graviton Workloads (2048 game) 배포 with ingress(ALB)

티스토리툴바

노드에 접근 못하니까