[AWS EKS] (25) EKS 스터디 8주차 (Blue-Green Upgrade)

CloudNet@팀의 EKS 스터디 AEWS 2기에 작성된 자료를 토대로 작성합니다.

앞서 테스트한 inplace 업그레이드 전략은 단점이 명확하다.

inplace 업그레이드 전략:

순차 업데이트, 서비스 순단 발생
이슈 발생 시 롤백 불가능

Blue-Green 업그레이드 전략 : 새 EKS 클러스터(addon, apps)를 만들고 트래픽을 이전 클러스터에서 새 클러스터로 점진적으로 전환

Blue-Green 업그레이드 전략 : 새 EKS 클러스터(addon, apps)를 만들고 트래픽을 점진적으로 전환
1. 원하는 Kubernetes 버전 및 구성으로 새 EKS 클러스터(Green Cluster)를 만듭니다.
2. 새 클러스터에 애플리케이션, 애드온 및 구성을 배포합니다.
3. 새 클러스터가 예상대로 작동하는지 확인하기 위해 철저한 테스트와 검증을 수행합니다.
4. DNS 업데이트, 로드 밸런서 구성 또는 서비스 메시와 같은 기술을 사용하여 점진적으로 이전 클러스터(파란색)에서 새 클러스터(녹색)로 트래픽을 전환합니다.
5. 새 클러스터를 면밀히 모니터링하여 트래픽을 처리하고 예상대로 성능이 발휘되는지 확인하세요.
6. 모든 트래픽이 새 클러스터로 전환되면 이전 클러스터를 해제합니다.
블루-그린 업그레이드의 장점
1. 새로운 클러스터를 프로덕션 트래픽으로 전환하기 전에 철저히 테스트할 수 있으므로, 보다 통제되고 안전한 업그레이드 프로세스가 가능합니다.
2. 단일 업그레이드에서 여러 Kubernetes 버전을 건너뛸 수 있으므로 전체 업그레이드 시간과 노력이 줄어듭니다.
3. 문제가 발생할 경우 트래픽을 이전 클러스터로 다시 전환하여 빠르고 쉬운 롤백 메커니즘을 제공합니다.
4. 새 클러스터가 완전히 검증될 때까지 이전 클러스터가 계속 트래픽을 제공하므로 업그레이드 프로세스 동안 가동 중지 시간이 최소화됩니다.
블루-그린 업그레이드의 단점
1. 두 개의 클러스터를 동시에 유지 관리해야 하므로 업그레이드 프로세스에 추가적인 인프라 리소스와 비용이 필요합니다.
2. 클러스터 간 트래픽 이동에 대한 보다 복잡한 조정 및 관리가 필요합니다.
3. CI/CD 파이프라인, 모니터링 시스템, 액세스 제어 등의 외부 통합을 업데이트하여 새 클러스터를 가리키도록 해야 합니다.
4. 클러스터 간 데이터 마이그레이션이나 동기화가 필요한 상태 저장 애플리케이션의 경우 어려울 수 있습니다.
Stateful 워크로드 에 대한 고려 사항
- 상태 저장 워크로드에 대한 블루-그린 업그레이드를 수행할 때는 데이터 마이그레이션 및 동기화 프로세스를 신중하게 계획하고 실행하세요. - 불리할수 있음
- Velero와 같은 도구를 사용하여 영구 데이터를 마이그레이션하고 클러스터 간에 데이터를 동기화 상태로 유지하며 빠른 롤백을 활성화하세요.
- 새 클러스터에서 영구 볼륨 프로비저닝을 구성하여 이전 클러스터의 스토리지 클래스 또는 프로비저너와 일치시키세요.
- 애플리케이션 설명서를 참조하고 애플리케이션 소유자와 협력하여 데이터 마이그레이션 및 동기화에 대한 특정 요구 사항을 이해하세요.
- Velero와 같은 도구를 적절히 계획하고 활용하는 것은 위험을 최소화하고 상태 저장 애플리케이션에 대한 원활한 업그레이드 프로세스를 보장하는 데 중요합니다.

Blue-Green Cluster Upgrades : 총 30분 소요 (예상) 실습 포함

Blue-Green EKS 클러스터 업그레이드는 다음 단계들로 구성됩니다:

최신 또는 원하는 K8 버전의 새 EKS 클러스터 출시
필요에 따라 호환되는 Kubernetes 애드온 및 사용자 지정 컨트롤러 배포
워크로드를 새 클러스터에 배포합니다(필요에 따라 API 사용 중단에 업데이트 적용)
파란색 클러스터에서 녹색 클러스터로 트래픽 라우팅

필요에 따라 사용되지 않는 API와 업데이트된 Kubernetes 매니페스트를 제거하여 적절한 리소스를 업데이트하면 됩니다.

위의 순서대로 클러스터를 업그레이드할 수 있습니다.

이러한 단계는 테스트 환경에서 완료까지 가장 잘 수행되므로 클러스터 구성의 문제를 발견하고 애플리케이션이 프로덕션 클러스터에서 업그레이드 작업 전에 나타날 수 있습니다.

1. version.tf

terraform {
  required_version = ">= 1.3"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.34"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.9"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.20"
    }
  }

  # ##  Used for end-to-end testing on project; update to suit your needs
  # backend "s3" {
  #   bucket = "terraform-ssp-github-actions-state"
  #   region = "us-west-2"
  #   key    = "e2e/karpenter/terraform.tfstate"
  # }
}

2.variables.tf

variable "cluster_version" {
  description = "EKS cluster version."
  type        = string
  default     = "1.30"
}

variable "mng_cluster_version" {
  description = "EKS cluster mng version."
  type        = string
  default     = "1.30"
}

variable "ami_id" {
  description = "EKS AMI ID for node groups"
  type        = string
  default     = ""
}

variable "efs_id" {
  description = "The ID of the already provisioned EFS Filesystem"
  type        = string
}

3.base.tf

provider "aws" {
  region = local.region
}

# Required for public ECR where Karpenter artifacts are hosted
provider "aws" {
  region = "us-east-1"
  alias  = "virginia"
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      # This requires the awscli to be installed locally where Terraform is executed
      args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

data "aws_partition" "current" {}
data "aws_caller_identity" "current" {}

data "aws_ecrpublic_authorization_token" "token" {
  provider = aws.virginia
}

# Data source to reference the existing VPC
data "aws_vpc" "existing_vpc" {
  filter {
    name   = "tag:Name"
    values = [local.name]
  }
}

data "aws_subnets" "existing_private_subnets" {
  filter {
    name   = "vpc-id"
    values = [ data.aws_vpc.existing_vpc.id ]
  }

  filter {
    name   = "tag:karpenter.sh/discovery"
    values = [local.name]
  }
}

data "aws_availability_zones" "available" {}

# tflint-ignore: terraform_unused_declarations
variable "eks_cluster_id" {
  description = "EKS cluster name"
  type        = string
}
variable "aws_region" {
  description = "AWS Region"
  type        = string
}

locals {
  name   = var.eks_cluster_id
  region = var.aws_region

  #vpc_cidr = "10.0.0.0/16"
  #azs      = slice(data.aws_availability_zones.available.names, 0, 3)

  tags = {
    Blueprint  = "${local.name}-gr"
    GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints"
  }
}

################################################################################
# Cluster
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.14"

  cluster_name                   = "${local.name}-gr"
  cluster_version                = "1.30"
  cluster_endpoint_public_access = true

  vpc_id     = data.aws_vpc.existing_vpc.id
  subnet_ids = data.aws_subnets.existing_private_subnets.ids

  enable_cluster_creator_admin_permissions = true

  eks_managed_node_group_defaults = {
    cluster_version = var.mng_cluster_version
  }

  eks_managed_node_groups = {
    initial = {
      instance_types = ["m5.large", "m6a.large", "m6i.large"]
      min_size     = 2
      max_size     = 10
      desired_size = 2
    }
  }


  # For demonstrating node-termination-handler
  self_managed_node_groups = {
    default-selfmng = {
      instance_type = "m5.large"
      
      min_size     = 2
      max_size     = 4
      desired_size = 2

      # Additional configurations
      subnet_ids       = data.aws_subnets.existing_private_subnets.ids
      disk_size        = 100

      # Optional
      bootstrap_extra_args = "--kubelet-extra-args '--node-labels=node.kubernetes.io/lifecycle=self-managed,team=carts,type=OrdersMNG'"
      
      # Required for self-managed node groups
      create_launch_template = true
      launch_template_use_name_prefix = true
    }
  }
  
  tags = merge(local.tags, {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery" = "${local.name}-gr"
  })
}

resource "time_sleep" "wait_60_seconds" {
  create_duration = "60s"

  depends_on = [module.eks]
}

4.addons.tf

################################################################################
# EKS Blueprints Addons
################################################################################

module "eks_blueprints_addons" {
  depends_on = [ time_sleep.wait_60_seconds ]
  source  = "aws-ia/eks-blueprints-addons/aws"
  version = "~> 1.16"

  cluster_name      = module.eks.cluster_name
  cluster_endpoint  = module.eks.cluster_endpoint
  cluster_version   = module.eks.cluster_version
  oidc_provider_arn = module.eks.oidc_provider_arn

  # We want to wait for the Fargate profiles to be deployed first
  create_delay_dependencies = [for prof in module.eks.fargate_profiles : prof.fargate_profile_arn]

  eks_addons = {
    coredns = {
      addon_version = "v1.11.3-eksbuild.1"
    }
    vpc-cni    = {
      most_recent = true
    }
    kube-proxy = {
      addon_version = "v1.30.3-eksbuild.2"
    }
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
  }

  enable_karpenter = true
  enable_aws_efs_csi_driver = true
  enable_argocd = true
  enable_aws_load_balancer_controller = true
  enable_metrics_server = true

  argocd = {
    set = [
      {
        name = "server.service.type"
        value = "LoadBalancer"
      }
    ]
    wait = true
  }


  aws_load_balancer_controller = {
    set = [
      {
        name  = "vpcId"
        value = data.aws_vpc.existing_vpc.id
      },
      {
        name  = "region"
        value = local.region
      },
      {
        name  = "podDisruptionBudget.maxUnavailable"
        value = 1
      },
      {
        name  = "enableServiceMutatorWebhook"
        value = "false"
      }
    ]
    wait = true
  }

  karpenter_node = {
    # Use static name so that it matches what is defined in `karpenter.yaml` example manifest
    iam_role_use_name_prefix = false
  }

  tags = local.tags
}

resource "time_sleep" "wait_90_seconds" {
  create_duration = "90s"

  depends_on = [module.eks_blueprints_addons]
}

resource "aws_eks_access_entry" "karpenter_node_access_entry" {
  cluster_name      = module.eks.cluster_name
  principal_arn     = module.eks_blueprints_addons.karpenter.node_iam_role_arn
  # kubernetes_groups = []
  type              = "EC2_LINUX"

  lifecycle {
    ignore_changes = [
      user_name
    ]
  }
}

################################################################################
# Supporting Resources
################################################################################
module "ebs_csi_driver_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.20"

  role_name_prefix = "${module.eks.cluster_name}-ebs-csi-driver-"

  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }

  tags = local.tags
}

################################################################################
# Storage Classes
################################################################################

resource "kubernetes_annotations" "gp2" {
  api_version = "storage.k8s.io/v1"
  kind        = "StorageClass"
  # This is true because the resources was already created by the ebs-csi-driver addon
  force = "true"

  metadata {
    name = "gp2"
  }

  annotations = {
    # Modify annotations to remove gp2 as default storage class still retain the class
    "storageclass.kubernetes.io/is-default-class" = "false"
  }

  depends_on = [
    module.eks_blueprints_addons
  ]
}

resource "kubernetes_storage_class_v1" "gp3" {
  metadata {
    name = "gp3"

    annotations = {
      # Annotation to set gp3 as default storage class
      "storageclass.kubernetes.io/is-default-class" = "true"
    }
  }

  storage_provisioner    = "ebs.csi.aws.com"
  allow_volume_expansion = true
  reclaim_policy         = "Delete"
  volume_binding_mode    = "WaitForFirstConsumer"

  parameters = {
    encrypted = true
    fsType    = "ext4"
    type      = "gp3"
  }

  depends_on = [
    module.eks_blueprints_addons
  ]
}

# done: update parameters
resource "kubernetes_storage_class_v1" "efs" {
  metadata {
    name = "efs"
  }

  storage_provisioner = "efs.csi.aws.com"
  reclaim_policy      = "Delete"
  parameters = {
    provisioningMode = "efs-ap"
    fileSystemId     = var.efs_id
    directoryPerms   = "755"
    gidRangeStart    = "1000" # optional
    gidRangeEnd      = "2000" # optional
    basePath         = "/dynamic_provisioning" # optional
    subPathPattern   = "$${.PVC.namespace}/$${.PVC.name}" # optional
    ensureUniqueDirectory = "false"    # optional
    reuseAccessPoint = "false"         # optional
  }

  mount_options = [
    "iam"
  ]

  depends_on = [
    module.eks_blueprints_addons
  ]
}

개요 - 신규 Green EKS 클러스터 생성

이 실험실 섹션에서는 v1.30과 원하는 구성의 새로운 EKS 클러스터를 생성합니다. 이는 블루그린 클러스터 업그레이드 전략을 사용하면 한 번에 여러 K8 버전을 뛰어넘거나 하나씩 여러 업그레이드를 수행할 수 있기 때문입니다. EKS 클러스터는 이전 클러스터와 동일한 VPC 내에서 생성됩니다. 이를 통해 여러 가지 이점을 얻을 수 있습니다:

네트워크 연결성: 두 클러스터를 동일한 VPC에 유지하면 리소스 간의 원활한 통신이 보장되어 워크로드와 데이터를 더 쉽게 마이그레이션할 수 있습니다.
공유 자원: NAT 게이트웨이, VPN 연결, Direct Connect와 같은 기존 VPC 자원을 재사용할 수 있어 복잡성과 비용을 줄일 수 있습니다.
보안 그룹: 두 클러스터 모두에서 일관된 보안 그룹 규칙을 유지하여 보안 관리를 간소화할 수 있습니다.
서비스 검색: AWS 클라우드 맵 또는 유사한 서비스 검색 메커니즘을 사용하면 서비스가 클러스터 간에 서로를 더 쉽게 찾을 수 있습니다.
서브넷 활용: 기존 서브넷을 효율적으로 활용할 수 있어 새로운 네트워크 범위를 프로비저닝할 필요가 없습니다.
VPC 피어링: VPC가 다른 VPC와 피어링되는 경우, 이러한 연결은 두 클러스터 모두에서 유효하게 유지됩니다.
일관된 DNS: 동일한 VPC를 사용하면 프라이빗 DNS 존과 Route 53 구성을 일관되게 사용할 수 있습니다.
IAM 및 리소스 정책: 많은 IAM 역할과 리소스 정책이 VPC에 적용되므로 동일한 VPC를 사용하면 권한 관리가 간소화됩니다.

실습 - 신규 Green EKS 클러스터 생성 : 20분 소요 (예상) 실습 포함

1. Green 클러스터 배포

#
watch -d 'aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query "Reservations[*].Instances[*].[InstanceId, InstanceType, PublicIpAddress]" --output table'

-------------------------------------------------------
|                  DescribeInstances                  |
+----------------------+------------+-----------------+
|  i-04cc219772416d7bd |  m5.large  |  None           |
|  i-026216a29706ed2f1 |  m5.large  |  None           |
|  i-0259f87750935e615 |  t3.medium |  54.213.247.50  |
|  i-0ff475868205d0ca6 |  m5.large  |  None           |
|  i-0bf65899efc9e232c |  c5.large  |  None           |
|  i-0ab00c4551fa43dd5 |  m5.large  |  None           |
|  i-0c1b2898f9832a69a |  m6i.large |  None           |
|  i-00879efc482bd490f |  m5.large  |  None           |
+----------------------+------------+-----------------+

#
cd ~/environment
unzip eksgreen.zip

tree eksgreen-terraform/
eksgreen-terraform/
├── README.md
├── addons.tf
├── base.tf
├── variables.tf
└── versions.tf

#
cd eksgreen-terraform
terraform init
terraform plan -var efs_id=$EFS_ID
terraform apply -var efs_id=$EFS_ID -auto-approve

2. Update Kubectl Context : 별칭 사용

#
aws eks --region ${AWS_REGION} update-kubeconfig --name ${EKS_CLUSTER_NAME} --alias blue && \
  kubectl config use-context blue

aws eks --region ${AWS_REGION} update-kubeconfig --name ${EKS_CLUSTER_NAME}-gr --alias green && \
  kubectl config use-context green

#
cat ~/.kube/config
kubectl ctx
kubectl ctx green

# Verify the EC2 worker nodes are attached to the cluster
kubectl get nodes --context green
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-20-212.us-west-2.compute.internal   Ready    <none>   21m   v1.30.9-eks-5d632ec
ip-10-0-21-169.us-west-2.compute.internal   Ready    <none>   21m   v1.30.9-eks-5d632ec
ip-10-0-35-26.us-west-2.compute.internal    Ready    <none>   21m   v1.30.9-eks-5d632ec
ip-10-0-9-126.us-west-2.compute.internal    Ready    <none>   21m   v1.30.9-eks-5d632ec

# Verify the operational Addons on the cluster:
helm list -A --kube-context green
NAME                            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                APP VERSION
argo-cd                         argocd          1               2025-03-26 11:58:02.453602353 +0000 UTC deployed        argo-cd-5.55.0                       v2.10.0    
aws-efs-csi-driver              kube-system     1               2025-03-26 11:58:30.395245866 +0000 UTC deployed        aws-efs-csi-driver-2.5.6             1.7.6      
aws-load-balancer-controller    kube-system     1               2025-03-26 11:58:31.887699595 +0000 UTC deployed        aws-load-balancer-controller-1.7.1   v2.7.1     
karpenter                       karpenter       1               2025-03-26 11:58:31.926407743 +0000 UTC deployed        karpenter-0.37.0                     0.37.0     
metrics-server                  kube-system     1               2025-03-26 11:58:02.447165223 +0000 UTC deployed        metrics-server-3.12.0                0.7.0

Stateless Workload Migration - ArgoCD , UI HPA 수정 → blue 는 직접 수정하자

상태 비저장 애플리케이션은 클러스터에 영구 데이터를 보관할 필요가 없으므로 업그레이드 중에 새 녹색 클러스터에 배포하고 트래픽을 라우팅하기만 하면 됩니다.
이 워크숍에서는 이미 AWS CodeCommit eks-gitops-repo에 리테일 스토어 앱의 소스 코드 를 생성하고 ArgoCD를 사용하여 Blue 클러스터에 배포했습니다. 이러한 소스코드 를 확보하는 것은 Blue-Green 클러스터 업그레이드를 수행하는 데 있어 매우 중요합니다. 따라서 eks-gitops-repo로 새 클러스터를 부트스트랩하여 애플리케이션을 배포하기만 하면 됩니다.
부트스트래핑 전에 최신 1.30 K8S 버전을 준수하도록 애플리케이션을 변경해야 합니다. 클러스터 업그레이드 준비 모듈에서 설명한 것처럼, EKS 업그레이드 인사이트, kubent와 같은 도구를 사용하여 사용되지 않는 API 사용량을 찾아 이를 완화할 수 있습니다.

실습- Stateless Workload Migration

1 . 부트스트래핑 프로세스를 시작합니다:

그린 클러스터에 필요한 변경 사항을 분리하기 위해 새로운 git 브랜치를 사용하여 수정 사항을 유지하고 이 브랜치를 사용하여 그린 클러스터를 부트스트랩할 예정입니다.

#
cd ~/environment/eks-gitops-repo
git status
git branch
* main

# Create the new local branch green
git switch -c green
git branch -a
* green
  main
  remotes/origin/HEAD -> origin/main
  remotes/origin/main

이제 관련 K8s 매니페스트를 1.30개의 세부 정보로 업데이트하세요. 예를 들어, 1.26 Amazon Linux 2 AMI와 IAM 역할 및 보안 그룹의 블루 클러스터를 참조하는 Karpenter EC2NodeClass가 있습니다. 따라서 1.30 Amazon Linux 2023 AMI와 그린 클러스터의 보안 그룹 및 IAM 역할을 사용하도록 업데이트해 보겠습니다. 다음 명령으로 AL2023 AMI를 가져옵니다:

export AL2023_130_AMI=$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2023/x86_64/standard/recommended/image_id --region ${AWS_REGION} --query "Parameter.Value" --output text)
echo $AL2023_130_AMI
ami-08eb2eb81143e2902

default-ec2nc.yaml 에서 AMI ID, 보안 그룹, IAM 역할 및 기타 세부 정보를 업데이트합니다.

cat << EOF > ~/environment/eks-gitops-repo/apps/karpenter/default-ec2nc.yaml
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  amiSelectorTerms:
  - id: "${AL2023_130_AMI}" # Latest EKS 1.30 AMI
  role: karpenter-eksworkshop-eksctl-gr
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: eksworkshop-eksctl-gr
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: eksworkshop-eksctl
  tags:
    intent: apps
    managed-by: karpenter
    team: checkout
EOF

이제 1.30에 비해 사용되지 않는 API 사용량을 확인해 보세요.

GitOps 저장소에서 사용되지 않는 API 사용량을 찾을 수 있도록 Pluto 유틸리티를 미리 설치했습니다.

# 위에서 미리 조치를 해서 안나오지만, 미조치했을 경우 아래 처럼 코드 파일 내용으로 검출 가능!
pluto detect-files -d ~/environment/eks-gitops-repo/
NAME   KIND                      VERSION               REPLACEMENT      REMOVED   DEPRECATED   REPL AVAIL  
ui     HorizontalPodAutoscaler   autoscaling/v2beta2   autoscaling/v2   false     true         false       

#
kubectl convert -f apps/ui/hpa.yaml --output-version autoscaling/v2 -o yaml > apps/ui/tmp.yaml && mv apps/ui/tmp.yaml apps/ui/hpa.yaml

#
cat apps/ui/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ui
  namespace: ui
spec:
  minReplicas: 1
  maxReplicas: 4
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ui
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 80
        type: Utilization
    type: Resource

마지막으로, 녹색 브랜치를 사용할 ArgoCD 앱 애플리케이션을 가리킬 것입니다.

Lastly, we will point the ArgoCD apps application to use the green branch.

#
cat app-of-apps/values.yaml 
spec:
  destination:
    # HIGHLIGHT
    server: https://kubernetes.default.svc
  source:
    # HIGHLIGHT
    repoURL: https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo
    # HIGHLIGHT
    targetRevision: main

# HIGHLIGHT
applications:
  - name: assets
  - name: carts
  - name: catalog
  - name: checkout
  - name: orders
  - name: other
  - name: rabbitmq
  - name: ui
  - name: karpenter

#
sed -i 's/targetRevision: main/targetRevision: green/' app-of-apps/values.yaml

# Commit the change to green branch and push it to the CodeCommit repo.
git add .  && git commit -m "1.30 changes"
git push -u origin green

argocd Login

# Login to ArgoCD using credentials from the following commands:
export ARGOCD_SERVER_GR=$(kubectl get svc argo-cd-argocd-server -n argocd -o json --context green | jq --raw-output '.status.loadBalancer.ingress[0].hostname')
echo "ArgoCD URL: http://${ARGOCD_SERVER_GR}"
export ARGOCD_USER_GR="admin"
export ARGOCD_PWD_GR=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" --context green | base64 -d)
echo "Username: ${ARGOCD_USER_GR}"
echo "Password: ${ARGOCD_PWD_GR}"

# Alternatively you can login using ArgoCD CLI:
argocd login --name green ${ARGOCD_SERVER_GR} --username ${ARGOCD_USER_GR} --password ${ARGOCD_PWD_GR} --insecure --skip-test-tls --grpc-web
'admin:login' logged in successfully
Context 'green' updated

#
argo_creds=$(aws secretsmanager get-secret-value --secret-id argocd-user-creds --query SecretString --output text)

argocd repo add $(echo $argo_creds | jq -r .url) --username $(echo $argo_creds | jq -r .username) --password $(echo $argo_creds | jq -r .password) --server ${ARGOCD_SERVER_GR}
Repository 'https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo' added

Deploy the retail-store-sample application components on the green cluster, we are using ArgoCD App of Apps patterns to deploy all components of the retail-store-sample

#
argocd app create apps --repo $(echo $argo_creds | jq -r .url) --path app-of-apps \
  --dest-server https://kubernetes.default.svc --sync-policy automated --revision green --server ${ARGOCD_SERVER_GR}

#
argocd app list --server ${ARGOCD_SERVER_GR}
NAME              CLUSTER                         NAMESPACE  PROJECT  STATUS  HEALTH       SYNCPOLICY  CONDITIONS  REPO                                                                     PATH            TARGET
argocd/apps       https://kubernetes.default.svc             default  Synced  Healthy      Auto        <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  app-of-apps     green
argocd/assets     https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/assets     green
argocd/carts      https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/carts      green
argocd/catalog    https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/catalog    green
argocd/checkout   https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/checkout   green
argocd/karpenter  https://kubernetes.default.svc             default  Synced  Healthy      Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/karpenter  green
argocd/orders     https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/orders     green
argocd/other      https://kubernetes.default.svc             default  Synced  Healthy      Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/other      green
argocd/rabbitmq   https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/rabbitmq   green
argocd/ui         https://kubernetes.default.svc             default  Synced  Progressing  Auto-Prune  <none>      https://git-codecommit.us-west-2.amazonaws.com/v1/repos/eks-gitops-repo  apps/ui         green

Traffic Routing

애플리케이션이 배포되면 다음으로 중요한 것은 트래픽을 파란색에서 녹색 클러스터로 전환하는 방법입니다.

Amazon Route 53의 가중치 기록, AWS 애플리케이션 로드 밸런서의 가중치 대상 그룹 등 다양한 기술을 사용할 수 있습니다.

Amazon Route 53 가중치 리소스 레코드를 사용하면 각 클러스터에 정의된 입력 리소스를 가리키는 DNS 레코드의 가중치를 변경하여 Green Cluster 업그레이드 또는 카나리아 스타일 마이그레이션 중에 트래픽을 전환할 수 있습니다.
Route 53 가중치 레코드를 사용하면 도메인 이름(example.com )으로 향하는 트래픽을 여러 다른 엔드포인트로 가리킬 수 있습니다.
동일한 이름과 유형을 가진 여러 레코드에 0에서 255 사이의 가중치를 할당할 수 있습니다.
Route 53은 가중치 집합의 모든 레코드에 대한 가중치의 합을 계산하고, 가중치를 총 가중치의 비율로 기준으로 각 레코드로 트래픽을 라우팅합니다.

예를 들어, 가중치가 1과 3인 두 레코드가 있는 경우 첫 번째 레코드는 트래픽의 25%(1/4)를, 두 번째 레코드는 75%(3/4)를 받게 됩니다.

Traffic Routing with Route 53 ( 실습 X)

따라서 이 워크숍의 소매점 샘플 애플리케이션에서는 다음과 같이 작동합니다:

* 파란색 클러스터 DNS 레코드 = 100이고 녹색 클러스터 DNS 레코드 = 0인 경우, Route 53은 모든 요청을 파란색 클러스터로 라우팅합니다.

* 파란색 클러스터 DNS 레코드 = 0이고 녹색 클러스터 DNS 레코드 = 100인 경우, Route 53은 모든 요청을 녹색 클러스터로 라우팅합니다.

* 파란색 클러스터 DNS 레코드 = 50, 녹색 클러스터 DNS 레코드 = 50과 같은 중간 값도 정의할 수 있어 파란색 클러스터와 녹색 클러스터 간의 요청 균형을 고르게 맞출 수 있습니다.

Stateful application upgrade process

이 실험실에서는 Blue-Green 업그레이드 전략에서 상태 기반 애플리케이션을 마이그레이션하는 방법을 시연할 것입니다.

Kubernetes에서 실행되는 상태 저장 애플리케이션을 위한 지속적인 스토리지 옵션은 Amazon EBS 볼륨이나 클러스터 간 동기화가 필요한 데이터베이스 컨테이너를 포함하여 다양합니다.

이 워크숍의 단순화를 위해 Amazon EFS 파일 시스템을 사용할 예정이지만, 일반적인 개념은 그대로 유지되어 파란색 녹색 프로세스를 사용하여 상태 저장 애플리케이션을 업그레이드할 수 있습니다.

우리의 EFS 파일 시스템은 파란색과 초록색 클러스터 모두에 연결될 네트워크 파일 공유 볼륨을 제공할 것입니다.

이 워크숍에서는 클러스터 간 데이터를 동기화하는 대신, EFS 파일 시스템을 청록색 클러스터와 녹색 클러스터 모두에 간단히 마운트하여 동기화 과정을 시뮬레이션할 것입니다.
지속적인 저장 공간을 갖춘 상태 저장 애플리케이션의 업그레이드를 시뮬레이션하려면 다음 단계를 따릅니다
- 파란색 클러스터에 Kubernetes StatefulSet 리소스 생성
- 파란색 클러스터에 수동으로 정적 HTML 파일을 생성하여 궁극적으로 공유 EFS 파일 시스템에 저장됩니다
- 녹색 클러스터에 nginx를 실행하는 포드를 호스팅하고 파란색 포드와 동일한 파일 저장소에 액세스하는 StatefulSet 리소스 생성
- (선택 사항) 파란색 클러스터에서 원래 StatefulSet 삭제

EFS File System 확인

이 StatefulSet은 1개의 nginx 포드를 배포합니다 This StatefulSet will deploy 1 nginx pod
StatefulSet은 Nginx 포드가 액세스할 수 있도록 EFS 파일 공유를 마운트합니다. 이후 단계에서는 동일한 EFS 파일 공유를 녹색 클러스터의 StatefulSet에 마운트하여 두 클러스터가 동일한 공유 파일에 액세스할 수 있도록 할 것입니다. 이를 통해 애플리케이션의 지속적인 스토리지 계층을 시뮬레이션할 수 있습니다.

StorageClass 객체를 사용하면 EFS 파일 시스템에 필요한 매개변수를 지정할 수 있습니다.

StatefulSet의 포드는 이 스토리지Class를 참조하여 파일 시스템을 마운트합니다.

# EFS File System ID 확인 
echo $EFS_ID  
fs-05d87c33956454415

#
kubectl ctx
arn:aws:eks:us-west-2:271345173787:cluster/eksworkshop-eksctl
blue
green

# We will set our kubectl context to work within the EKS blue cluster. Run this command to connect to the Blue cluster:
kubectl config use-context blue
Switched to context "blue".

# Run this command to review the StorageClass manifest (optional):
# eFS 파일 시스템의 ID를 참조하는 fileSystemId 필드에 주목하세요. 이 필드는 이전에 설정된 EFS_ID 값과 일치해야 합니다.
kubectl get storageclass efs -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2025-03-25T02:36:29Z"
  name: efs
  resourceVersion: "3809"
  uid: c7a10b9b-4de6-4fa8-ab1a-bd1a04009916
mountOptions:
- iam
parameters:
  basePath: /dynamic_provisioning
  directoryPerms: "755"
  ensureUniqueDirectory: "false"
  fileSystemId: fs-0b269b8e2735b9c59
  gidRangeEnd: "200"
  gidRangeStart: "100"
  provisioningMode: efs-ap
  reuseAccessPoint: "false"
  subPathPattern: ${.PVC.namespace}/${.PVC.name}
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Statefulset 어플리케이션 배포 ( 스토리지 클래스 사용)

cat <<EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: efs-example
  namespace: default
spec:
  serviceName: "efs-example"
  replicas: 1
  selector:
    matchLabels:
      app: efs-example
  template:
    metadata:
      labels:
        app: efs-example
    spec:
      containers:
      - name: app
        image: nginx:latest
        volumeMounts:
        - name: efs-storage
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: efs-storage
    spec:
      accessModes: ["ReadWriteMany"]
      storageClassName: efs
      resources:
        requests:
          storage: 1Gi
EOF

스토리지 클래스 확인, pvc 확인

#
kubectl get sts,pvc
NAME                           READY   AGE
statefulset.apps/efs-example   1/1     29s

NAME                                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/efs-storage-efs-example-0   Bound    pvc-bca0344c-55b8-43e9-9b30-821fb2fc8886   1Gi        RWX            efs            28s

우리는 간단한 정적 HTML 웹 페이지를 사용하여 영구 저장 장치가 있는 애플리케이션을 시뮬레이션할 것입니다.

일반적인 시나리오에서는 이 영구 계층 자산이 전체 데이터베이스일 수도 있고, 애플리케이션이 의존하는 다른 영구 파일일 수도 있습니다.

하지만 이 워크숍에서는 단일 정적 HTML 파일을 사용하여 영구 저장 자산을 표현할 것입니다.

다시 말해, 이 HTML 파일이 위치할 영구 저장 계층은 EFS 파일 공유로, 파란색 클러스터와 초록색 클러스터 모두에서 접근할 수 있습니다.

# Use the command below to create a new file named index.html in the directory /usr/share/nginx/html of one of the pods in the blue cluster:
# 아래 명령을 사용하여 파란색 클러스터에 있는 포드 중 하나의 디렉토리 /usr/share/nginx/html에 index.html이라는 새 파일을 만듭니다
kubectl exec $(kubectl get pods -o jsonpath='{.items[0].metadata.name}' -l app=efs-example) -- bash -c 'touch /usr/share/nginx/html/index.html'
kubectl exec $(kubectl get pods -o jsonpath='{.items[0].metadata.name}' -l app=efs-example) -- bash -c 'echo aews study 9w end! > /usr/share/nginx/html/index.html'
kubectl exec $(kubectl get pods -o jsonpath='{.items[0].metadata.name}' -l app=efs-example) -- bash -c 'cat /usr/share/nginx/html/index.html'

# To confirm if our EFS file share works, check if this new file can be accessed by another pod in the same StatefulSet by running this command:
# EFS 파일 공유가 작동하는지 확인하려면 다음 명령을 실행하여 동일한 StatefulSet에 있는 다른 포드에서 이 새 파일에 액세스할 수 있는지 확인합니다:
kubectl delete pods -l app=efs-example && \
  kubectl exec $(kubectl get pods -o jsonpath='{.items[0].metadata.name}' -l app=efs-example) -- bash -c 'ls -lh /usr/share/nginx/html/index.html'
pod "efs-example-0" deleted
-rw-r--r-- 1 100 users 0 Mar 26 13:26 /usr/share/nginx/html/index.html

kubectl exec $(kubectl get pods -o jsonpath='{.items[0].metadata.name}' -l app=efs-example) -- bash -c 'cat /usr/share/nginx/html/index.html'

저작자표시

'DevOps' 카테고리의 다른 글

[AWS EKS] (27) EKS 스터디 10주차 ( Jenkins + Vault (AppRole) ) (0)	2025.04.12
[AWS EKS] (26) EKS 스터디 10주차 ( Vault ) (0)	2025.04.11
[AWS EKS] (24) EKS 스터디 8주차 (In-place Upgrade) (0)	2025.03.31
[AWS EKS] (23) EKS 스터디 8주차 (Amazon EKS Upgrades: Strategies and Best Practices) (0)	2025.03.31
[AWS EKS] (22) EKS 스터디 8주차 ( jenkins + harbor+ agrocd - CICD ) (0)	2025.03.29

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

국두리의 기술블로그

[AWS EKS] (25) EKS 스터디 8주차 (Blue-Green Upgrade)

Blue-Green Cluster Upgrades : 총 30분 소요 (예상) 실습 포함

개요 - 신규 Green EKS 클러스터 생성

실습 - 신규 Green EKS 클러스터 생성 : 20분 소요 (예상) 실습 포함

Stateless Workload Migration - ArgoCD , UI HPA 수정 → blue 는 직접 수정하자

실습- Stateless Workload Migration

Traffic Routing

Traffic Routing with Route 53 ( 실습 X)

Stateful application upgrade process

EFS File System 확인

'DevOps' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[AWS EKS] (25) EKS 스터디 8주차 (Blue-Green Upgrade)

Blue-Green Cluster Upgrades : 총 30분 소요 (예상) 실습 포함

개요 - 신규 Green EKS 클러스터 생성

실습 - 신규 Green EKS 클러스터 생성 : 20분 소요 (예상) 실습 포함

Stateless Workload Migration - ArgoCD , UI HPA 수정 → blue 는 직접 수정하자

실습- Stateless Workload Migration

Traffic Routing

Traffic Routing with Route 53 ( 실습 X)

Stateful application upgrade process

EFS File System 확인

'DevOps' 카테고리의 다른 글

'DevOps' Related Articles

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역