Installing Kubeflow on AWS and running a training

If you ended up on this blog post probably you know what kubeflow is and want to experiment with it.

The Kubeflow project is made by Google, and even there is some documentation about how to use it on AWS, this is not very accurate, it is very limited and not updated, thus I decided to write this blog post where to describe my experience creating a PoC for Kubeflow.

In the first place, we need an AWS account. I will not go into details about setting up the IAM permissions for users and I will consider for the sake of simplicity an account with all the necessary permissions.

In parallel with this post, you can follow also the official Kubeflow installation guide on AWS.

I. Setup up the local environment.

I am using a Linux (Ubuntu) laptop for this.

Create a directory where to work, eg:

mkdir -p ~/projects/kuebflow-demo

cd ~/projects/kuebflow-demo

1. Install kfctl, the command line tool to interact with Kubeflow. Download it from https://github.com/kubeflow/kfctl/releases – Check the last version and replace it accordingly in the command below.

wget https://github.com/kubeflow/kfctl/releases/download/v1.1.0/kfctl_v1.1.0-0-g9a3621e_linux.tar.gz  
tar zxpvf kfctl_v1.1.0-0-g9a3621e_linux.tar.gz  
sudo mv kfctl /usr/local/bin

2. Install kubectl – this is the tool used to interact with the Kubernetes cluster. Follow the steps from the official documentation.

3. Install eksctl – this is the tool for creating clusters on EKS – Amazon’s new managed Kubernetes service for EC2.

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

4. Install AWS CLI following the official documentation and configure it! We use an AWS profile called kubeflow_profile. If you will use default profile, you need to replace it in the environment variable below.

export AWS_PROFILE=kubeflow_profile
export AWS_SDK_LOAD_CONFIG=1
export AWS_DEFAULT_REGION=eu-west-1
export AWS_REGION=eu-west-1

Be aware, if you already have other AWS_* environment variables to not interact with them (eg: unset AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY).

5. Set other environment variables

export AWS_CLUSTER_NAME=kbfl-demo-1
export KF_NAME=${AWS_CLUSTER_NAME}
# replace with your working dir path 
export BASE_DIR=${HOME}/projects/kuebflow-demo
export KF_DIR=${BASE_DIR}/${KF_NAME}
export NODEGROUP_NAME=kbfl-ng-1

6. Create the folder from which we’ll create the Kubernetes cluster

mkdir -p $KF_DIR && cd $KF_DIR

II. Install Kubernetes cluster and Kubeflow

1. Install the Kubernetes cluster

Be carefully, from now on AWS it will start to get your money for 3 m5.large machines and other infrastructure elements.

Before running the command below, you need to create a new Key Pair in AWS, in EC2 section and replace YOUR_KEY_PAIR_NAME in the command below.

eksctl create cluster --name=${AWS_CLUSTER_NAME} --ssh-access=true --ssh-public-key=<YOUR_KEY_PAIR_NAME> --nodegroup-name=${NODEGROUP_NAME} --node-type=m5.large --nodes=3 --nodes-min=2 --nodes-max=4 --node-ami=auto --profile=${AWS_PROFILE} --region=${AWS_REGION}

Optional: If you want to scale more nodes:

eksctl scale nodegroup --cluster ${AWS_CLUSTER_NAME} --name $NODEGROUP_NAME --nodes 6 --nodes-max 6 --profile=${AWS_PROFILE}

2. Download the manifest for Kubeflow for AWS. We’ll use version 1.1.0

wget https://raw.githubusercontent.com/kubeflow/manifests/v1.1-branch/kfdef/kfctl_aws.v1.1.0.yaml -O kfctl_aws.yaml

Open kfctl_aws.yaml file and look almost at the end of the file for region: and replace it’s default “us-west-2” value with your value defined above in $AWS_REGION (in our case eu-west-1).

Also, replace the username and password from this file with your own values.

This step not sure if is necessary

curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.17.9/2020-08-04/bin/linux/amd64/aws-iam-authenticator
chmod +x aws-iam-authenticator
sudo mv aws-iam-authenticator /usr/local/bin

3. Apply the manifest file (install kubeflow)

kfctl apply -V -f kfctl_aws.yaml

This command it can take a while, even after it finishes, you can still wait a bit more and execute:

kubectl get ingress -n istio-system

This command should show in the ADDRESS field the URL for Kubeflow, like in this example.

After the address became available, you can login in Kubeflow with the credentials from kfctl_aws.yaml.

To be continued…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.