kubernetes-alibaba-cloud

Posted on 24 August 2020, updated on 28 May 2024.

I was recently using Alibaba Cloud in order to discover their Kubernetes service and didn’t find any convenient way to create an Autoscaling Group (or ASG) for Kubernetes.

Following this observation, I began to dive into the service to discover how to do it myself using Terraform.

Pros and Cons of known methods

  • Using the Alibaba Cloud Console / API directly

Alibaba Cloud has a resource called Node Pools for its container service that responds well to our needs, but they are not well-supported through popular IAC (such as Terraform) and in an ever-changing and ever-scaling environment making calls to API directly or using a UI to modify multiple resources become less convenient the more you begin to scale-up.

  • Using the alicloud_cs_kubernetes_autoscaler terraform resource

Telling you that there was no Terraform resource available for Node Pools was a bit of a lie, reading the provider documentation you will encounter a resource called: alicloud_cs_kubernetes_autoscaler.

Let’s dive a little into what this resource does:

  • Quoting the documentation This resource will help you to manage cluster-autoscaler in Kubernetes Cluster.
  • The resource requires an already-created ASG and Kubernetes cluster
  • Using the resource we can see that we have a cluster-autoscaler deployment on our Kubernetes cluster and also our fresh ASG is now a Kubernetes ASG
  • Looking at the code base, the resource modifies the “launch configuration” of instances in the ASG by modifying their userData, their name, and also their RAM Role (The name and RAM Role are modified during the execution of user data, not by the resource itself). It also deploys a well-configured cluster-autoscaler but without being able to fine-tune it.

I was a bit confused at first when looking at this resource that does 2 things that are usually handled through 2 different providers (Kubernetes and Alibaba Cloud).

Then I reran a terraform apply command and understood that it would cause some issues as we could not have different scaling configurations for different pools of nodes.

Those two options weren’t satisfying and I knew I had to find a better way to handle it looking at the provider codebase gave me some ideas related to a peculiar API endpoint on Alibaba Cloud.

Creating an Autoscaling Group for Kubernetes the AWS way

Every Cloud Provider is now taking the same approach on how to create a pool of nodes for a managed Kubernetes cluster which is a specific resource/service but not long ago AWS didn’t have a way to create managed pools of workers.

The old-fashioned way was a mix between an autoscaling group and the convenient use of a provided user-data script in order to attach the workers to the master :

Remembering how to do it on AWS I tried to apply this conceptual approach to Alibaba Cloud.

Applying the AWS way to Alibaba Cloud

At first, I tried to reproduce what Alibaba Cloud was doing when initializing another node pool through the resource mentioned earlier by retrieving the user data on those instances :

This was something of this form, and the only thing I asked myself was about this openapi-token that was required for the attach_nodes.sh script, and I couldn’t think of a way to generate it.

I told you earlier I noticed something during my investigation on the terraform provider, which was a peculiar endpoint that provided what seemed to be a bootstrap token for Kubernetes “CreateClusterToken”, I tried using the CLI Alibaba Cloud provided but couldn’t find the endpoint using aliyun cs --help.

After a bit of time of research, I found out you can use the CLI to directly forge API Requests and tried to get a bootstrap token :

aliyun --secure cs POST /clusters/”cluster-id”/token --body ‘{}’

Replacing “cluster-id” with my cluster-id would send me a bootstrap token but with a 24-hour lifespan, which couldn’t be used in a static user-data setup.

...1 hour later…

I finally found a way to make permanent tokens using this endpoint, and I only had to add a parameter in the body :

aliyun --secure cs POST /clusters/”cluster-id”/token --body ‘{“is_permanently”: true}’

I could now replicate what AWS taught me in Alibaba Cloud, creating an autoscaling group and generating user data that would work for joining nodes.

This wasn’t over; when using the one last time CLI, I found out there was an already packaged DescribeClusterAttachScripts endpoint. I ended up quickly creating a RAM Role for my instances because I wanted them to launch this request to retrieve their user data dynamically.

But I was disappointed when I found out those APIs (AttachScripts and Token) don’t have the same behavior on instances in Alibaba Cloud and their responses were empty JSON.

Following this event, I had to find a way to generate it through the execution of Terraform and came up with this :

And yes the “is_permanently” is mandatory in order to not have a 24H bootstrap token. Also, bonus, you can set up labels and taints that way.

Future methods of implementation

You can use this type of implementation, but be aware that it is not optimal. Alibaba Cloud could modify its Container Service API, and you’d have to rethink your way of creating clusters. I would recommend using it for now, but be on the lookout for another way of implementing it.

Alibaba Cloud will surely provide a way to create NodePool through their API, and soon enough, it will be implemented in their Terraform provider.

Deep-diving into Alibaba Cloud Kubernetes service was a bit time-consuming, but in the end, I was happy to have a more convenient way to deploy my Autoscaling Groups, and I hope you will too.

If you have any specific questions about how to deploy an Autoscaling Group following this method or if you know of any elements that could potentially improve the content of this article, feel free to contact us.

If you want more intel on autoscaling, I recommend this article on how to set up a Kubernetes autoscaler in 5 minutes!