Intel Kubernetes Service Guide¶

The Intel® Kubernetes Service (IKS) gives you the tools to manage Kubernetes clusters for application development, AI/ML training, and helm chart deployments.

Tip

Currently IKS is only available to premium and enterprise account users.

Control Plane

IKS provides managed Kubernetes service in Intel® Tiber™ AI Cloud. IKS manages the availability and scalability of the Kubernetes control plane. For a technical overview, see also Kubernetes Control Plane Components.

Provision Kubernetes Cluster¶

Create a Cluster¶

Navigate to the Intel® Tiber™ AI Cloud console.
In the menu at left, click the Intel Kubernetes Service menu.
Visit the Overview tab to view the workflow.
Click Clusters tab.
Click :guilabel:` Launch Cluster`.
Complete the required fields under Cluster details and configuration.
1. In Cluster name, enter a name.
2. In Select cluster K8S version, select a version.
Click Launch. After launching, the State column shows Updating.
Under Cluster Name column, click your cluster.

Note

Now your Cluster name with Actions menu appears below.

Add Node Group to Cluster¶

From the Actions pulldown menu, select Add node group.
Enter your data in the Node group configuration menu.
1. In Node type, choose between Virtual Machine Bare Metal for your node. Note the cost per hour. See also Compare Instance Types below.
2. In Node group name, enter a name.
3. In Node quantity, choose a quantity from 1 to 10. Select the number of worker nodes you need in your cluster.
Tip

You can scale the number of worker nodes up or down.
Under Public Keys, select Upload Key or Refresh Keys.
Select Upload Key, name your key and copy your local SSH public key in the fields shown.
Select Upload Key.
Now, in Node group configuration, check the box next to the SSH key you added.

Compare Instance Types¶

At any time during Node group configuration, you may choose Compare instance types. This pop-out screen helps you compare and select your preferred processor.

Launch Kubernetes Cluster¶

When you create a cluster, it includes:

K8S Control-plane
ETCD Database
Scheduler
API Server

Tip

Connect to cluster¶

Set the KUBECONFIG Environment Variable:

export KUBECONFIG=/path/to/your/kubeconfig

$Env:KUBECONFIG = "C:\path\to\your\kubeconfig"

Verify Configuration: Ensure that the current context points to the correct cluster.
```
kubectl config view
```

Kubeconfig Admin Access¶

Ideally, you export the KUBECONFIG to your secret management system and continue.

In the Kubernetes Console, locate options below Kube Config.
Copy or Download the KUBECONFIG file and export it to your development environment.
For more help on exporting, follow related steps in the next section.

Caution

Exercise caution while downloading, accessing, or sharing this file.

Set Context for Multiple Clusters¶

Optional: List all available contexts.
```
kubectl config get-contexts -o=name
```
Change directory, or create one if it doesn’t exist.
```
cd ./kubeconfig
```
```
mkdir ./kubeconfig
```
In the Kubernetes Console, navigate to My clusters, Kube Config.
From the Kubernetes Console, download (or copy) the KUBECONFIG file to the current directory.
Extract the value from the KUBECONFIG and paste it into the shell, following the example below.
1. Export KUBECONFIG as an environment variable as shown below.
```
export KUBECONFIG =/home/sdp/.kube/dev-env
```
Use kubectl config set context to modify an existing context or create a new cluster context.
```
kubectl config set context
```
To view them, enter command.
```
kubectl get nodes
```

Important

If you wish to launch another cluster, return to the start of this section and perform all steps again, exporting a different KUBECONFIG file.

Delete Node in Worker Group¶

You can delete or replace a selected node.

In the left side menu, click Kubernetes --> Clusters.
Select the submenu Clusters.
Select the tab Worker Node Groups.
Under Node name, select the radio button for the node you wish to change.
Select a button for your preferred action:
1. Delete and Resize
2. Replace node

Note

Selecting Replace node deletes the selected node and replaces it with a new node. Selecting Delete and Resize reduces the size of active nodes.

Controlling Node Auto-repair Behavior¶

By default, IKS auto-detects the worker node’s unavailability. If it’s unavailable beyond a specific grace period, it will automatically be replaced (auto-repair) with a fresh new node of the same type. If you do not desire this behavior for one or more worker nodes in your cluster, you may turn off the auto-repair functionality for any given worker node.

Auto-repair Options¶

If you want to opt out of auto repair mode (where a node will be automatically replaced when it becomes unavailable / unreachable after a grace period elapsed) then you must label the given node with autorepair=false.

As long as the node has this label, IKS will not replace the node if it becomes unavailable. The user interface will show the status as Updating when unavailable (and not ready in kubernetes), a sign to show it detected an unavailability of a node. If the node becomes available later, the status will change from Updating` to Active. During the unavailability of a node, if you remove the auto-repair label , then default behavior of auto-replacement of the node resume and IKS will replace the node, as designed.

We do not recommend removing the node from compute console when this label is On (defeats the purpose of a label in the first place). It will result in dangling node in your Kubernetes console.

Note

You can label the node using the out of the box adding and removing label functionality using kubectl commands.

Examples¶

Add a label to a node to avoid auto replacement:

kubectl label node ng-hdmqnphxi-f49b8 iks.cloud.intel.com/autorepair=false

Remove a label from a node to enable auto replacement:

kubectl label node ng-hdmqnphxi-f49b8 iks.cloud.intel.com/autorepair-

Manage Kubernetes Cluster¶

Create a pod.
```
kubectl apply -f pod-definition.yaml
```

Create a YAML` or JSON file with your pod specificationsr. See example below.

apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mycontainer
   image: nginx

Replace “mypod” with the name of your pod.

kubectl get pods kubectl describe pod mypod

Update a Pod:
```
kubectl edit pod mypod
```
Note

This opens the pod configuration in your default editor. Make changes and save the file.
Delete a Pod. Replace mypod with the name of your pod.
```
kubectl delete pod mypod
```

Upgrade Kubernetes Cluster¶

In the Cluster name, Details, find the Upgrade link.
Select Upgrade.
In the Upgrade K8S Version, pull-down menu, select your desired version.
Click the Upgrade button.
During the upgrade, the Details menu State may show Upgrading controlplane.

Note

If the current version is penultimate to the latest version, only the latest version appears. When the version upgrade is successful, Cluster reconciled appears.

Apply Load Balancer¶

Navigate to the Cluster name submenu.
In the Actions menu, select Add load balancer.
In the Add load balancer, complete these fields.
1. Select the port number of your service from the dropdown menu.
2. For Type, select public or private.
3. Click on Launch.
In the Cluster name submenu, view the Load Balancer menu.
Your Load Balancer appears with Name and State shows Active .

K8S will automatically perform load balancing for your service.

Add Security Rules¶

You can create a security rule if you have already created a Load Balancer.

Note

If you haven’t created a Load Balancer, return to above section before proceeding. After a Cluster is available, you must create a Node Group.

Click on your Cluster name.
Select the tab Worker Node Group.
Select Add Node Group.
Complete all required fields as shown in Add Node Group to Cluster. Then return to this workflow.
Wait until the State shows “Active” before proceeding.
Complete all steps in Apply Load Balancer. Then return here.

Add security rule to your own Load Balancer¶

For your own Load Balancer, click Edit.
Add an Source IP address to create a security rule.
Select a protocol.
Click Save. The rule is created.

Edit or delete security rule¶

Optional: After the State changes to Active:

You may edit the security rules by selecting Edit.
You may delete the security rule by selecting Delete.

Add security rule to default Load Balancer¶

Navigate to the Security tab. You may see Load Balancers populated in a table.

Note

The public-apiserver is the default Load Balancer.
For the public-apiserver, click “Edit”.
Then add an Source IP address to create a security rule.
Select a protocol.
Click Save The rule is created.

Additional resources¶

Configure Ingress, Expose Cluster Services¶

Note

This requires helm version 3 or a helm client utility. See also Helm Docs.

Create a cluster with at least one worker node. See Create a Cluster.
Create a Load balancer (public) using port 80. See Apply Load Balancer.

Note

This IP is used in the last step in the URL for testing Your port number may differ.

Install the ingress controller.

helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace --set controller.hostPort.enabled=true

To install test NGINX POD, Service, and Ingress object, download ingress-test.yml.

Alternatively, copy the contents of file and save it as ingress-test.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: nginx:stable
        ports:
          - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - http:
      paths:
      - path: /test
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Run command to apply.
```
kubectl apply -f ingress-test.yaml
```
Visit your browser and test, inserting your IP where shown below.
```
http://<IP>/test
```
1. The IP mentioned here is the Public Load balancer IP.

De-Provision Kubernetes Cluster¶

Delete Cluster Group or Node¶

Delete Node Group¶

In the Cluster name submenu select the Node group you wish to delete.
Click Delete button.

Delete Node¶

Below the Node name table, note Add node and Delete node
Click Delete node button, as desired.
Select Continue.

Deploy Example AI/ML Workloads¶

Add instance of Intel® Gaudi® 2 processor to a cluster to deploy LLM and Stable Diffusion models.

Complete the tutorial Training a PyTorch Model on Intel Gaudi 2.
Add nodes to the Intel Kubernetes Cluster.
Assure you’re able to access the KUBECONFIG file and the Kubernetes Cluster.

See also

Kubeconfig Admin Access

Deploy Stable Diffusion¶

To deploy with Stable Diffusion, try an example below. Run this on a Intel® Gaudi® 2 processor instance and deploy it on an IKS cluster.

Intel® Gaudi® 2 processor with Stable Diffusion¶

To run Stable diffusion in IKS with Intel® Gaudi® 2 processor, apply the following configuration.

Apply configuration if huge pages is not set in all nodes. Otherwise, skip to the next section.
```
sudo sysctl -w vm.nr hugepages=156300
```

Verify configuration.

grep HugePages Free /proc/meminfo

grep HugePages Total /proc/meminfo

Esnure that your output is similar to this.

HugePages_Free:    34142

HugePages_Total:   35201

Use the suggested settings for model inference.
```
hugepages2Mi: 500Mi
memory: 60G
```

Revise your YAML file, using this example.

apiVersion: v1
kind: Pod
metadata:
  name: std
  labels:
    name: std
spec:
  containers:
  - name: std
    image: docker.io/rramamu1/std-gaudi:latest
    securityContext:
      capabilities:
        add: ["SYS_NICE"]   
    ports:
        - containerPort: 8000
    resources:
      limits:
        habana.ai/gaudi: "1"
        hugepages-2Mi: 500Mi
        memory: 60G
        #cpu: "25"

HugePages Settings by Model¶

HugePages Settings¶
Model Name	hugepages-2Mi	Memory	Number of Cards
runwayml/stable-diffusion-v1-5	500Mi	6OG	1
meta-llama/Meta-Llama-3-70B-Instruct	9800Mi	250G	>= 2
mistralai/Mixtral-8x7B-Instruct-v0.1	9800Mi	250G	>= 2
mistralai/Mistral-7B-v0.1	600Mi	5OG	1

Generate Image with Stable Diffusion¶

Consider using this YAML deployment for Helm Chart resources.

Download the Helm Charts from the STD Helm Charts.

Configuration for hugepages, as noted above, is already applied.

Note

This YAML file overrides default configuration. Apply your custom configuration to this file to ensure your settings are applied.

# Default values for tgi-chart.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

modelName: runwayml/stable-diffusion-v1-5

hostVolumePath: /scratch-2/data

image:
  repository: docker.io/rramamu1/std-gaudi
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: "latest"

service:
  type: ClusterIP
  port: 8000

resources:
  numofgaudi: 1
  hugepages2Mi: 500Mi
  #cpu: 25
  memory: 60G

Next, run the install command.

helm install std std-chart -f ./std-values.yaml

Access the result using the load balancer IP.

Note

Ensure you followed the section Apply Load Balancer.
Construct a full URL for the Load Balancer by following this two-step process.
1. Replace the value of <Load Balancer IP> with your own, as shown below.
```
http://<Load Balancer IP>/std/generate_image
```
2. Add the prompt, including parameters, as the second part of the URL.
  
  Example: The second part starts with “prompts=”
```
http://<Load Balancer IP>/std/generate_image/prompts=dark sci-fi , A huge radar on mountain ,sunset, concept art&height=512&width=512&num_inference_steps=50&guidance_scale=7.5&batch_size=1&negative_prompts=''&seed=100&num_images_per_prompt=1
```
3. Paste the full URL in a browser and press <Enter>.
4. Change the value of “prompts=”, as desired.
  
  Example 2: Change the second part of the URL. Replace the text, starting with “prompts=”, as shown below.
```
http://<Load Balancer IP>/std/generate_image/prompts=Flying Cars&height=512&width=512&num_inference_steps=50&guidance_scale=7.5&batch_size=1&negative_prompts=''&seed=100&num_images_per_prompt=1
```
5. Paste the full URL in a browser and press <Enter>.
  
  Tip
  
  Your image will differ. Any image that you generate may require managing copyright permissions.

See Helm Docs for more details.

Generate Text with Stable Diffusion¶

Consider using this sample YAML deployment for Text Generation Interface (TGI). Refer to HugePages Settings by Model.

Note

To use this sample template, you must provide your own HUGGING_FACE_HUB_TOKEN value.

apiVersion: v1
kind: Pod
metadata:
  name: tgi-lama3
  labels:
    name: tgi-lama3
spec:
  tolerations:
  - key: "nodeowner"
    operator: "Equal"
    value: "admin"
    effect: "NoSchedule"
  containers:
  - name: tgi-lama3
    envFrom:
      - configMapRef:
          name: proxy-config
    image: ghcr.io/huggingface/tgi-gaudi:1.2.1 #amr-registry.caas.intel.com/bda-mlop/genops/tgi_gaudi:1.3 #ghcr.io/huggingface/tgi-gaudi:1.2.1
    securityContext:
      capabilities:
        add: ["SYS_NICE"]   
    env:
      - name: PT_HPU_ENABLE_LAZY_COLLECTIVES
        value: "true"
      - name: OMPI_MCA_btl_vader_single_copy_mechanism
        value: none
      - name: MODEL_ID
        value: meta-llama/Meta-Llama-3-8B-Instruct #meta-llama/Meta-Llama-3-8B #meta-llama/Llama-2-70b-chat-hf  
      - name: PORT 
        value: "8080"
      - name: HUGGINGFACE_HUB_CACHE 
        value: /models-cache
      - name: TGI_PROFILER_ENABLED 
        value: "true"    
      - name: NUM_SHARD 
        value: "1"
      - name: SHARDED 
        value: "false"    
      - name: HUGGING_FACE_HUB_TOKEN 
        value: "xxxxxxxxxxxxxxxxxxxxxxx"       
    resources:
      limits:
        habana.ai/gaudi: "1"
        hugepages-2Mi: 9200Mi
        memory: 200G
        #cpu: "50"
    volumeMounts:
        - name: models-cache
          mountPath: models-cache
  volumes:
  - name: models-cache
    hostPath:
     path: /data
     type: Directory     

Download the TGI Helm Charts.
To deploy TGI with Mistral with Helm:
```
helm install mistral tgi-chart -f ./mistral-values.yaml
```
Note

See also Huggingface Text Generation Inference and text-generation-launcher arguments .
Access the result with the load balancer IP.
1. Follow the section Apply Load Balancer.
Replace the value of <Load Balancer IP>, shown below, with your own.

http://<Load Balancer IP>/mistral/generate