Intel Kubernetes Service¶
The Intel® Kubernetes Service (IKS) gives you the tools to manage Kubernetes clusters for application development, AI/ML training, and helm chart deployments.
Tip
Currently IKS is only available to premium and enterprise account users.
Control Plane
IKS provides managed Kubernetes service in Intel® Tiber™ AI Cloud. IKS manages the availability and scalability of the Kubernetes control plane. For a technical overview, see also Kubernetes Control Plane Components.
Pricing
Pricing is 0.10 cents per cluster per hour. See Billing and usage for more information on payment methods and account types.
Provision Kubernetes Cluster¶
Create a Cluster¶
Navigate to the Intel® Tiber™ AI Cloud console.
In the menu at left, click the Intel Kubernetes Service menu.
Visit the Overview tab to view the workflow.
Click Clusters tab.
Click :guilabel:` Launch Cluster`.
Complete the required fields under Cluster details and configuration.
In Cluster name, enter a name.
In Select cluster K8S version, select a version.
Cluster details and configuration
Click Launch. After launching, the State column shows Updating.
Under Cluster Name column, click your cluster.
Note
Now your Cluster name with Actions menu appears below.
Add Node Group to Cluster¶
From the Actions pulldown menu, select Add node group.
Enter your data in the Node group configuration menu.
In Node type, choose between Virtual Machine Bare Metal for your node. Note the cost per hour. See also Compare Instance Types below.
In Node group name, enter a name.
In Node quantity, choose a quantity from 1 to 10. Select the number of worker nodes you need in your cluster.
Tip
You can scale the number of worker nodes up or down.
Under Public Keys, select Upload Key or Refresh Keys.
Select Upload Key, name your key and copy your local SSH public key in the fields shown.
Select Upload Key.
Now, in Node group configuration, check the box next to the SSH key you added.
Compare Instance Types¶
At any time during Node group configuration, you may choose Compare instance types. This pop-out screen helps you compare and select your preferred processor.
Launch Kubernetes Cluster¶
When you create a cluster, it includes:
K8S Control-plane
ETCD Database
Scheduler
API Server
Select Launch.
Now that your Node group is added, it shows Updating in submenu.
When adding your Node Group is successful, each Node name appears and its State shows Active.
Connect to cluster¶
Set the
KUBECONFIG
Environment Variable:export KUBECONFIG=/path/to/your/kubeconfig
$Env:KUBECONFIG = "C:\path\to\your\kubeconfig"
Verify Configuration: Ensure that the current context points to the correct cluster.
kubectl config view
Kubeconfig Admin Access¶
Ideally, you export the KUBECONFIG
to your secret management system and continue.
In the Kubernetes Console, locate options below Kube Config.
Copy or Download the
KUBECONFIG
file and export it to your development environment.For more help on exporting, follow related steps in the next section.
Caution
Exercise caution while downloading, accessing, or sharing this file.
Set Context for Multiple Clusters¶
Optional: List all available contexts.
kubectl config get-contexts -o=name
Change directory, or create one if it doesn’t exist.
cd ./kubeconfig
mkdir ./kubeconfig
In the Kubernetes Console, navigate to My clusters, Kube Config.
From the Kubernetes Console, download (or copy) the KUBECONFIG file to the current directory.
Extract the value from the KUBECONFIG and paste it into the shell, following the example below.
Export KUBECONFIG as an environment variable as shown below.
export KUBECONFIG =/home/sdp/.kube/dev-env
Use kubectl config set context to modify an existing context or create a new cluster context.
kubectl config set context
To view them, enter command.
kubectl get nodes
Important
If you wish to launch another cluster, return to the start of this section and perform all steps again, exporting a different KUBECONFIG file.
Controlling Node Auto-repair Behavior¶
By default, IKS auto-detects the worker node’s unavailability. If it’s unavailable beyond a specific grace period, it will automatically be replaced (auto-repair) with a fresh new node of the same type. If you do not desire this behavior for one or more worker nodes in your cluster, you may turn off the auto-repair functionality for any given worker node.
Auto-repair Options¶
If you want to opt out of auto repair mode (where a node will be automatically replaced when it becomes unavailable / unreachable after a grace period elapsed) then you must label the given node with autorepair=false.
As long as the node has this label, IKS will not replace the node if it becomes unavailable. The user interface will show the status as Updating when unavailable (and not ready in kubernetes), a sign to show it detected an unavailability of a node. If the node becomes available later, the status will change from Updating` to Active. During the unavailability of a node, if you remove the auto-repair label , then default behavior of auto-replacement of the node resume and IKS will replace the node, as designed.
We do not recommend removing the node from compute console when this label is On (defeats the purpose of a label in the first place). It will result in dangling node in your Kubernetes console.
Note
You can label the node using the out of the box adding and removing label functionality using kubectl commands.
Examples¶
Add a label to a node to avoid auto replacement:
kubectl label node ng-hdmqnphxi-f49b8 iks.cloud.intel.com/autorepair=false
Remove a label from a node to enable auto replacement:
kubectl label node ng-hdmqnphxi-f49b8 iks.cloud.intel.com/autorepair-
Manage Kubernetes Cluster¶
Create a pod.
kubectl apply -f pod-definition.yaml
Create a
YAML`
orJSON
file with your pod specificationsr. See example below.apiVersion: v1 kind: Pod metadata: name: mypod spec: containers: - name: mycontainer image: nginx
Replace “mypod” with the name of your pod.
kubectl get pods kubectl describe pod mypod
Update a Pod:
kubectl edit pod mypod
Note
This opens the pod configuration in your default editor. Make changes and save the file.
Delete a Pod. Replace mypod with the name of your pod.
kubectl delete pod mypod
Upgrade Kubernetes Cluster¶
In the Cluster name, Details, find the Upgrade link.
Select Upgrade.
In the Upgrade K8S Version, pull-down menu, select your desired version.
Click the Upgrade button.
During the upgrade, the Details menu State may show Upgrading controlplane.
Note
If the current version is penultimate to the latest version, only the latest version appears. When the version upgrade is successful, Cluster reconciled appears.
Apply Load Balancer¶
Navigate to the Cluster name submenu.
In the Actions menu, select Add load balancer.
In the Add load balancer, complete these fields.
Select the port number of your service from the dropdown menu.
For Type, select public or private.
Click on Launch.
In the Cluster name submenu, view the Load Balancer menu.
Your Load Balancer appears with Name and State shows Active .
K8S will automatically perform load balancing for your service.
Add Security Rules¶
You can create a security rule if you have already created a Load Balancer.
Note
If you haven’t created a Load Balancer, return to above section before proceeding. After a Cluster is available, you must create a Node Group.
Click on your Cluster name.
Select the tab Worker Node Group.
Select Add Node Group.
Complete all required fields as shown in Add Node Group to Cluster. Then return to this workflow.
Wait until the State shows “Active” before proceeding.
Complete all steps in Apply Load Balancer. Then return here.
Add security rule to your own Load Balancer¶
For your own Load Balancer, click Edit.
Add an Source IP address to create a security rule.
Select a protocol.
Click Save. The rule is created.
Edit or delete security rule¶
Optional: After the State changes to Active:
You may edit the security rules by selecting Edit.
You may delete the security rule by selecting Delete.
Add security rule to default Load Balancer¶
Navigate to the Security tab. You may see Load Balancers populated in a table.
Note
The public-apiserver is the default Load Balancer.
For the public-apiserver, click “Edit”.
Then add an Source IP address to create a security rule.
Select a protocol.
Click Save The rule is created.
Additional resources¶
Configure Ingress, Expose Cluster Services¶
Note
This requires helm version 3 or a helm client utility. See also Helm Docs.
Create a cluster with at least one worker node. See Create a Cluster.
Create a Load balancer (public) using port 80. See Apply Load Balancer.
Note
This IP is used in the last step in the URL for testing Your port number may differ.
Install the ingress controller.
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace --set controller.hostPort.enabled=true
To install test NGINX POD, Service, and Ingress object, download
ingress-test.yml
.Alternatively, copy the contents of file and save it as
ingress-test.yml
apiVersion: apps/v1 kind: Deployment metadata: name: my-app labels: app: my-app spec: replicas: 2 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: nginx:stable ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: my-app ports: - protocol: TCP port: 80 targetPort: 80 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: minimal-ingress annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules: - http: paths: - path: /test pathType: Prefix backend: service: name: my-service port: number: 80
Run command to apply.
kubectl apply -f ingress-test.yaml
Visit your browser and test, inserting your IP where shown below.
http://<IP>/test
The IP mentioned here is the Public Load balancer IP.
De-Provision Kubernetes Cluster¶
Delete Cluster Group or Node¶
Delete Node Group¶
In the Cluster name submenu select the Node group you wish to delete.
Click Delete button.
Delete Node¶
Below the Node name table, note Add node and Delete node
Click Delete node button, as desired.
Select Continue.
Deploy Example AI/ML Workloads¶
Add instance of Intel® Gaudi® 2 processor to a cluster to deploy LLM and Stable Diffusion models.
Complete the tutorial Training a PyTorch Model on Intel Gaudi 2.
Add nodes to the Intel Kubernetes Cluster.
Assure you’re able to access the KUBECONFIG file and the Kubernetes Cluster.
See also
Deploy Stable Diffusion¶
To deploy with Stable Diffusion, try an example below. Run this on a Intel® Gaudi® 2 processor instance and deploy it on an IKS cluster.
Intel® Gaudi® 2 processor with Stable Diffusion¶
To run Stable diffusion in IKS with Intel® Gaudi® 2 processor, apply the following configuration.
Apply configuration if huge pages is not set in all nodes. Otherwise, skip to the next section.
sudo sysctl -w vm.nr hugepages=156300
Verify configuration.
grep HugePages Free /proc/meminfo grep HugePages Total /proc/meminfo
Esnure that your output is similar to this.
HugePages_Free: 34142 HugePages_Total: 35201
Use the suggested settings for model inference.
hugepages2Mi: 500Mi memory: 60G
Revise your YAML file, using this example.
apiVersion: v1 kind: Pod metadata: name: std labels: name: std spec: containers: - name: std image: docker.io/rramamu1/std-gaudi:latest securityContext: capabilities: add: ["SYS_NICE"] ports: - containerPort: 8000 resources: limits: habana.ai/gaudi: "1" hugepages-2Mi: 500Mi memory: 60G #cpu: "25"
HugePages Settings by Model¶
Model Name |
hugepages-2Mi |
Memory |
Number of Cards |
---|---|---|---|
runwayml/stable-diffusion-v1-5 |
500Mi |
6OG |
1 |
meta-llama/Meta-Llama-3-70B-Instruct |
9800Mi |
250G |
>= 2 |
mistralai/Mixtral-8x7B-Instruct-v0.1 |
9800Mi |
250G |
>= 2 |
mistralai/Mistral-7B-v0.1 |
600Mi |
5OG |
1 |
Generate Image with Stable Diffusion¶
Consider using this YAML deployment for Helm Chart resources.
Download the Helm Charts from the STD Helm Charts.
Configuration for hugepages, as noted above, is already applied.
Note
This YAML file overrides default configuration. Apply your custom configuration to this file to ensure your settings are applied.
# Default values for tgi-chart. # This is a YAML-formatted file. # Declare variables to be passed into your templates. replicaCount: 1 modelName: runwayml/stable-diffusion-v1-5 hostVolumePath: /scratch-2/data image: repository: docker.io/rramamu1/std-gaudi pullPolicy: IfNotPresent # Overrides the image tag whose default is the chart appVersion. tag: "latest" service: type: ClusterIP port: 8000 resources: numofgaudi: 1 hugepages2Mi: 500Mi #cpu: 25 memory: 60G
Next, run the install command.
helm install std std-chart -f ./std-values.yaml
Access the result using the load balancer IP.
Note
Ensure you followed the section Apply Load Balancer.
Construct a full URL for the Load Balancer by following this two-step process.
Replace the value of
<Load Balancer IP>
with your own, as shown below.http://<Load Balancer IP>/std/generate_image
Add the prompt, including parameters, as the second part of the URL.
Example: The second part starts with “prompts=”
http://<Load Balancer IP>/std/generate_image/prompts=dark sci-fi , A huge radar on mountain ,sunset, concept art&height=512&width=512&num_inference_steps=50&guidance_scale=7.5&batch_size=1&negative_prompts=''&seed=100&num_images_per_prompt=1
Paste the full URL in a browser and press <Enter>.
Change the value of “prompts=”, as desired.
Example 2: Change the second part of the URL. Replace the text, starting with “prompts=”, as shown below.
http://<Load Balancer IP>/std/generate_image/prompts=Flying Cars&height=512&width=512&num_inference_steps=50&guidance_scale=7.5&batch_size=1&negative_prompts=''&seed=100&num_images_per_prompt=1
Paste the full URL in a browser and press <Enter>.
Tip
Your image will differ. Any image that you generate may require managing copyright permissions.
See Helm Docs for more details.
Generate Text with Stable Diffusion¶
Consider using this sample YAML deployment for Text Generation Interface (TGI). Refer to HugePages Settings by Model.
Note
To use this sample template, you must provide your own HUGGING_FACE_HUB_TOKEN
value.
apiVersion: v1
kind: Pod
metadata:
name: tgi-lama3
labels:
name: tgi-lama3
spec:
tolerations:
- key: "nodeowner"
operator: "Equal"
value: "admin"
effect: "NoSchedule"
containers:
- name: tgi-lama3
envFrom:
- configMapRef:
name: proxy-config
image: ghcr.io/huggingface/tgi-gaudi:1.2.1 #amr-registry.caas.intel.com/bda-mlop/genops/tgi_gaudi:1.3 #ghcr.io/huggingface/tgi-gaudi:1.2.1
securityContext:
capabilities:
add: ["SYS_NICE"]
env:
- name: PT_HPU_ENABLE_LAZY_COLLECTIVES
value: "true"
- name: OMPI_MCA_btl_vader_single_copy_mechanism
value: none
- name: MODEL_ID
value: meta-llama/Meta-Llama-3-8B-Instruct #meta-llama/Meta-Llama-3-8B #meta-llama/Llama-2-70b-chat-hf
- name: PORT
value: "8080"
- name: HUGGINGFACE_HUB_CACHE
value: /models-cache
- name: TGI_PROFILER_ENABLED
value: "true"
- name: NUM_SHARD
value: "1"
- name: SHARDED
value: "false"
- name: HUGGING_FACE_HUB_TOKEN
value: "xxxxxxxxxxxxxxxxxxxxxxx"
resources:
limits:
habana.ai/gaudi: "1"
hugepages-2Mi: 9200Mi
memory: 200G
#cpu: "50"
volumeMounts:
- name: models-cache
mountPath: models-cache
volumes:
- name: models-cache
hostPath:
path: /data
type: Directory
Download the TGI Helm Charts.
To deploy TGI with Mistral with Helm:
helm install mistral tgi-chart -f ./mistral-values.yaml
Note
See also Huggingface Text Generation Inference and text-generation-launcher arguments .
Access the result with the load balancer IP.
Follow the section Apply Load Balancer.
Replace the value of
<Load Balancer IP>
, shown below, with your own.http://<Load Balancer IP>/mistral/generate