Canary deployments using Argo Rollouts and Istio

how to rollout software releases safely

5 min readJan 28, 2021

Overview

In the modern software continuous delivery practices canary deployments is one of the most adopted strategic tactic which reduces the risk involved with releasing new software versions.

In canary style deployment, idea is that you’ll deploy new release version in parallel with its Stable version and then you start diverting traffic to new version in small increments, in between you can configure custom checks (manual or automatic) to test the new software version and finally, you divert complete traffic on new version and delete old version.

In this blog, we are going to cover canary style deployment for distributed micro services architecture running over Kubernetes and Istio as a service mesh solution by using Argo Rollouts. Argo Rollouts is a Kubernetes controller and set of CRDs which provides advanced deployment capabilities such as canary, blue-green, canary analysis, experimentation etc. Our main focus will be on how to use canary deployment strategy features of Argo Rollouts with Istio as traffic management.

Before Argo rollouts, we were using helm for rolling out changes. While rolling out changes we were replacing the current replica set with the new one and when new replica set is ready to serve traffic we were just deleting the old replica set a.k.a Rolling update. In this process, we didn’t had any control to assess the result of the changes during the deployment and we had to wait till the deployment completes. On the other hand, Argo rollout’s canary style deployment feature allowed us to divert small amount of traffic on canary version of software instead of doing 100% switch in one attempt. This allows us to assess the state of system during the deployment and take actions, if required. Overall, canary style deployment helps to limit the blast radius and reduces risk in deployments.

In addition, Argo rollouts also provides analysis feature — which you can configure for your system to perform automatic checks during the deployment and based on the result of the analysis Argo rollouts controller can make decision to either keep going forward or abort the deployment. We are not going to cover this feature in depth in this blog.

Implementation

We will refer appAand appB as two example mirco-services and both are running inside Istio service mesh under namespace nsA and nsB respectively. Application appA receives public traffic i.e via internet and also, it receives internal traffic i.e within mesh from application appB .

Prerequisite

There are two prerequisite steps which needs to be done as explained below:

1.Stable and Canary k8s services
We need to configure two k8s services named as stable and canary service for app A which will be pointing to stable and canary replica set pods respectively.

apiVersion: v1
kind: Service
metadata:
  name: appA-stable-service
  labels:
    ...
  annotations:
    ...
spec:
  ports:
  ...---

apiVersion: v1
kind: Service
metadata:
  name: appA-canary-service
  labels:
    ...
  annotations:
    ...
spec:
  ports:
  ...

2. Istio virtual service
We need to configure Istio virtual service with routes pointing to both stable and canary k8s service.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: virtual-service-appA
  namespace: nsA
spec:
  hosts:
  - appA.nsA.svc.cluster.local 
  - appA.domain.com
  exportTo:
  - "*"
  gateways:
  - appA-ingress-gateway.nsA.svc.cluster.local
  - mesh
  http:
  - name: appA-http-route
    match:
    - uri:
        prefix: /
    route:
      - destination:
          host: appA-stable-service
          port:
            number: 80
        weight: 100
      - destination:
          host: appA-canary-service
          port:
            number: 80
        weight: 0

Things to highlight in above istio virtual service configuration of application appA :

hosts : This contains list of hostnames for which this virtual service configuration will be applied.
We have appA.nsA.svc.cluster.local which is used by other applications such as appB and appA.domain.com which is used by public users i.e via internet.
exportTo : Being set to * represents that this virtual service configuration should be exposed to all namespaces.
gateways : virtual service configuration will be applied on the traffic emerges from the list of gateways or envoy proxies of Istio mentioned under gateways given hostname is one of the domains mentioned under hosts .
We have appA-ingress-gateway.nsA.svc.cluster.local which is ingress gateway for public traffic and mesh which applies to all istio sidecars within mesh.
In httproutes, traffic being diverted to stable k8s service with weight 100% and 0% set on canary k8s service.

Configure Argo Rollout object

Now we have both k8s services and istio virtual service ready, its time to see how Argo Rollout object is configured.
A Rollout is Kubernetes workload resource which is equivalent to a Kubernetes Deployment object. Here, we are going to configure canary style as strategy and by using Istio as traffic shaping/switching between stable vs canary version of application.

apiVersion: argoproj.io/v1alpha1 
kind: Rollout 
metadata:
  ...
spec:
  ...
  template:
    metadata:
      ...
    spec:
      ...
  strategy:
    canary: 
      maxUnavailable: 0
      maxSurge: 3
      steps:
      - setWeight: 20
      - pause: {duration: 1m}
      - setWeight: 40
      - pause: {duration: 1m}
      - setWeight: 60
      - pause: {duration: 1m}
      - setWeight: 80
      - pause: {duration: 1m}
      stableService: appA-stable-service # stable k8 service
      canaryService: appA-canary-service # canary k8 service      
      trafficRouting:
        istio:
           virtualService: 
            name: virtual-service-appA  # istio virtual service
            routes:
            - appA-http-route # route name

Things to highlight the strategy section in above rollout object configuration:

steps : Represents how much weight i.e setWeightneeds to be set to canary version and how long to wait i.e pausebetween two weights increments.
stableService : Stable k8s service pointing to stable replica set.
canaryService : Canary k8s service pointing to canary replica set.
trafficRouting : To use Istio virtual service and its routes to divert the traffic between stable and canary replica set.

What Argo Rollouts Controller does during deployment

When the new deployment is triggered for application appA, the following things happens in the given order:

1.Argo rollouts controller creates canary replica set i.e pods running new software version in parallel with existing replica set i.e pods running stable software version with pod label rollouts-pod-template-hash added to both stable and canary replica set with value of stablehash and canaryhash respectively.

2. Argo rollouts controller also updates both stable and canary k8s services’s selector property by adding the key rollouts-pod-template-hash with value being set to stablehash for stable k8s service and canaryhash for canary k8s service.

3. Argo rollouts controller updates the Istio virtual service routes to divert 20% traffic i.e setWeight: 20 on canary k8s serviceand then pause for 1min i.e pause: {duration: 1m}. At this moment, virtual service routes looks like this:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: virtual-service-appA
  namespace: nsA
spec:
  hosts:
  ...
  exportTo:
  - "*"
  gateways:
  ...
  http:
  - name: appA-http-route
    match:
    - uri:
        prefix: /
    route:
      - destination:
          host: appA-stable-service
          port:
            number: 80
        weight: 80
      - destination:
          host: appA-canary-service
          port:
            number: 80
        weight: 20

4. Argo rollouts controller keeps incrementing the traffic as mentioned under steps until it diverts 100% traffic on canary k8s service.

5. Argo rollouts controller marks canary replica set as stable and updates Istio virtual service routes with weight 100% set to stable k8s service.

6. Lastly, Argo rollouts controller cleans up the previous stable replica set.