Auto-scaling OpenStack Instances with Senlin and Prometheus

Prerequisite

openstack cloud
Prometheus
Alertmanager
ทั้ง Prometheus เเละ Alertmanager ควรใช้งานเป็น version ล่าสุด
Installation Openstack client, Senlin client, and Octavia client
- update packet cache sudo apt update
- Install Python3 sudo apt install python3-dev python3-pip
- install the OpenStack client sudo apt install python3-openstackclient
- verify by accessing the help tools openstack --help
- build environment python for Senlin client and Octavia client sudo python3 -m venv /opt/osenv sudo chmod -R 777 /opt/osenv
- use command source to use environment python source /opt/osenv/bin/activate
- install the Senlin client and Octavia client pip install python-senlinclient python-octaviaclient
- verify install Senlin client and Octavia client
  - Senlin openstack cluster -h
  - Octavia openstack loadbalancer -h

Requirement

ติดตั้ง Openstack cli ลงในเครื่อง ubuntu
OpenStack RC File

Setup Senlin Cluster

สร้าง Instance ขึ้นมา 1 ตัวที่ติดตั้งเพื่อใช้ในการสร้าง Cluster [OPTIONAL] ด้วย command ดังนี้ :

FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address {Network_Pool_id})

PORT_ID=$(openstack port create \
  --network {network_id} \
  --security-group ALL \
  --security-group default \
  --security-group http \
  --security-group ICMP \
  --security-group HTTP \
  --security-group In-Cluster \
  --security-group Ping \
  --security-group ssh \
  Instance1234-port \
  -f value -c id)

openstack floating ip set --port $PORT_ID $FLOATING_IP

openstack server create \
  --flavor {flavor_id} \
  --image {image_id} \
  --boot-from-volume 20 \
  --availability-zone NCP-NON \
  --port $PORT_ID \
  --key-name gaebolg \
  Instance1234

หลังจากนั้น remote เข้า instance ด้วย command ssh -i [path-file-privatekey] username@host-IP เพื่อ install Openstack Client เเละ Senlin Client ดูได้ที่ Prerequisite

สร้าง Cluster ด้วย Senlin

สร้างไฟล์ node_exporter_profile.yaml

touch node_exporter_profile.yaml

ทำการเเก้ไขภายในไฟล์ node_exporter_profile.yaml สามารถดู Properties เพิ่มเติมได้ที่ Nova Profile

Note: ในส่วนของ image: เเละ block_device_mapping_v2: เลือกใช้อย่างใดอย่างหนึ่งเท่านั้น

Limitation: admin_pass ไม่สามารถใช้งานได้

type: os.nova.server
version: 1.0
properties:
  name: UbuntuG
  flavor: {flavor_name} # Ex.csa.large.v2
  availability_zone: {az} # Ex. NCP-BKK,NCP-NON,NCP-BKK2
  image: {image_name} # Ex.ubuntu-24-v240703
####################### 
#  block_device_mapping_v2:    
#    - uuid: {image_id}
#      boot_index: 0
#      source_type: image
#      destination_type: volume
#      volume_size: 30
#######################
  networks:
    - network: default # หาได้จาก vpc network ของ project นั้นๆ
      floating_network: {floating_network_name} # Ex.Standard_Public_IP_Pool_NON
      security_groups:
        - ALL
        - default
        - In-Cluster
        - HTTP
        - Ping
  user_data: |
    #!/bin/bash
    # Update the package index
    sudo apt-get update
    cd /home/ubuntu

    wget \
      https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz

    sudo groupadd -f node_exporter
    sudo useradd -g node_exporter --no-create-home --shell /bin/false node_exporter
    sudo mkdir /etc/node_exporter
    sudo chown node_exporter:node_exporter /etc/node_exporter

    tar -xvf node_exporter-1.0.1.linux-amd64.tar.gz
    mv node_exporter-1.0.1.linux-amd64 node_exporter-files

    sudo cp node_exporter-files/node_exporter /usr/bin/
    sudo chown node_exporter:node_exporter /usr/bin/node_exporter

    echo "[Unit]
    Description=Node Exporter
    Documentation=https://prometheus.io/docs/guides/node-exporter/
    Wants=network-online.target
    After=network-online.target

    [Service]
    User=node_exporter
    Group=node_exporter
    Type=simple
    Restart=on-failure
    ExecStart=/usr/bin/node_exporter --web.listen-address=:9100

    [Install]
    WantedBy=multi-user.target" | sudo tee /usr/lib/systemd/system/node_exporter.service > /dev/null

    sudo chmod 664 /usr/lib/systemd/system/node_exporter.service

    sudo systemctl daemon-reload
    sudo systemctl start node_exporter

    sudo systemctl status node_exporter

    sudo systemctl enable node_exporter.service

    sudo apt-get install -y wget nginx

    systemctl start nginx

    systemctl enable nginx

    # NGINX EXPORTER
    #Setting nginx Configuration

    apt-get install -y wget tar

        sudo sed -i '/^http {/a \
        \n    server {\n\
            listen 8080;\n\
            server_name localhost;\n\
            \n\
            location /stub_status {\n\
                stub_status;\n\
                allow 0.0.0.0/0; # Allow all IPs for testing (restrict in production)\n\
                deny all;\n\
            }\n\
        }\n
    ' /etc/nginx/nginx.conf

    # Restart Nginx to apply changes
    systemctl restart nginx


    # Download and install Nginx Exporter
    
    wget \
      https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v1.2.0/nginx-prometheus-exporter_1.2.0_linux_amd64.tar.gz

    tar -xvzf nginx-prometheus-exporter_1.2.0_linux_amd64.tar.gz

    mv nginx-prometheus-exporter /usr/local/bin/
    
    # Create systemd service for Nginx Exporter
    cat <<EOL > /etc/systemd/system/nginx-exporter.service
    [Unit]
    Description=Nginx Prometheus Exporter
    After=network.target

    [Service]
    ExecStart=/usr/local/bin/nginx-prometheus-exporter -nginx.scrape-uri http://localhost:8080/stub_status
    Restart=always
    User=nobody
    Group=nogroup

    [Install]
    WantedBy=multi-user.target
    EOL

    # Reload systemd and start the Nginx Exporter service
    systemctl daemon-reload
    systemctl start nginx-exporter
    systemctl enable nginx-exporter

ในส่วนของ userdata จะทำการติดตั้ง Node exporter , Nginx , Nginx Exporter

หากใช้ IIS ให้เเก้ไข node_exporter_profile.yaml ดังนี้เเทน

type: os.nova.server
version: 1.0
properties:
  name: WindowG
  flavor: {flavor_name} # Ex. asa.large.v2
  key_name: gaebolg
  image: {Image_id}    #image ที่มีการติดตั้ง window exporter เเล้ว
  availability_zone: {AZ_name} # Ex. NCP-BKK
  networks:
    - network: {Network_name} # Ex. Network-fortest
      floating_network: {floating_network_name}  # Ex. Standard_Public_IP_Pool_NON
      security_groups:
        - ALL
        - default
        - http
        - HTTP
        - ICMP
        - In-Cluster
        - Ping
        - ssh

สร้าง Cluster Profile จาก ไฟล์ node_exporter_profile.yaml

openstack cluster profile create --spec node_exporter_profile.yaml {profile_name}

Note : {profile_name} สามารถใส่เป็นชื่ออะไรก็ได้

สร้าง Cluster จาก Profile ที่สร้างไว้ในข้อที่ 2

Note: {Clustername} สามารถใส่เป็นชื่ออะไรก็ได้

openstack cluster create --profile {cluster_profile_name} --desired-capacity 1 --min-size 1 {Clustername}

หรือ

openstack cluster create --profile {cluster_profile_name} --desired-capacity 1 --min-size 1 --max-size 3 {Clustername}

สร้าง Receiver เพื่อใช้ในการ trigger scale in เเละ scale out

openstack cluster receiver create --cluster {Clustername} --action CLUSTER_SCALE_OUT {receiver_name}

หลังจากสร้าง Receiver ให้ทำการ Copy alarm_url ไว้ทุกครั้ง เพื่อใช้ในส่วนของการ Setup Alertmanager ต่อ

ตัวอย่างเช่น

alarm_url คือ https://cloud-api.nipa.cloud:8778/v1/webhooks/cec91e87-f31d-4471-b451-283a8e382c7d/trigger?V=2

openstack cluster receiver create --cluster {Clustername} --action CLUSTER_SCALE_IN {receiver_name}

alarm_url คือ https://cloud-api.nipa.cloud:8778/v1/webhooks/e2dc6ab7-4827-4477-b4f7-90f849dfd47a/trigger?V=2

หากต้องการเรียกดู alarm_url ภายหลังสามารถใช้คำสั่ง

openstack cluster receiver show {Receivername} -f value -c channel

สามารถดู metrics ของ node exporter ได้ที่ http://<FloatingIP>:9100/metrics

Config Prometheus

สร้าง Instance สำหรับ ติดตั้ง Prometheus (สามารถนำ command สร้าง Instance จาก Setup Senlin Cluster มาใช้ในการสร้างได้)
ติดตั้ง Prometheus (ควรใช้เวอร์ชั่นล่าสุด)
สร้าง application credential ที่ใช้ใน config ให้ใช้ ซึ่ง application_credential_secret จะเเสดงเเค่ครั้งเเรกที่สร้างเท่านั้น

openstack application credential create {credentialname}

เมื่อติดตั้ง Prometheus เรียบร้อยแล้ว ให้แก้ไขไฟล์ config ของ prometheus ที่ path: /etc/prometheus/prometheus.yml โดยทำการเเก้ไขไฟล์ config ดังนี้

global:
 scrape_interval: 15s
scrape_configs:
 - job_name: 'prometheus'
   scrape_interval: 5s
   static_configs:
     - targets: ['localhost:9090']
 # Job, for nginx_exporter
 - job_name: 'NginxExporter'
   openstack_sd_configs:  #ต้องทำการเเก้ไขในส่วนนี้
      - identity_endpoint: https://identity-api.nipa.cloud/
        region: NCP-TH
        application_credential_id: {appcredentialID}   
        application_credential_secret: {appcredentialsecret}
        role: instance
   relabel_configs:
        - source_labels: [__meta_openstack_tag_cluster_id]
          action: keep
          regex: .+
          target_label: cluster_id
        # Update the scraping port if required
        - source_labels:
          - __address__
          action: replace
          regex: ([^:]+)(?::\d+)
          replacement: $1:9113
          target_label: __address__
    # Scrape IIS instances  
 - job_name: 'IIS'
   openstack_sd_configs:
      - identity_endpoint: https://identity-api.nipa.cloud/
        region: NCP-TH
        application_credential_id: {appcredentialID}   
        application_credential_secret: {appcredentialsecret}
        role: instance
   relabel_configs:
        - source_labels: [__meta_openstack_tag_cluster_id]
          action: keep
          regex: .+
          target_label: cluster_id
        # Update the scraping port if required
        - source_labels:
          - __address__
          action: replace
          regex: ([^:]+)(?::\d+)
          replacement: $1:9182
          target_label: __address__
 # Scrape OpenStack instances
 - job_name: 'openstack'
   openstack_sd_configs:
      - identity_endpoint: https://identity-api.nipa.cloud/
        region: NCP-TH
        application_credential_id: {appcredentialID}   
        application_credential_secret: {appcredentialsecret}
        role: instance
   relabel_configs:
      # Keep Senlin instances that have cluster_id
      - source_labels: [__meta_openstack_tag_cluster_id]
        action: keep
        regex: .+
        target_label: cluster_id
      # Update the scraping port if required
      - source_labels:
        - __address__
        action: replace
        regex: ([^:]+)(?::\d+)
        replacement: $1:9100
        target_label: __address__
        
rule_files:
  - alert.rules.yml

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:9093'

ในโฟลเดอร์เดียวกันกับ /etc/prometheus/ ให้สร้าง alert_rules.yml ด้วย จากนั้นเเก้ไขไฟล์ alert_rules.yml ดังนี้ โดย rule ที่ใช้จะขึ้นกับตัว Exporter ที่ผู้ใช้งานเลือกใช้

4.1 กรณีใช้กับ Node Exporter

groups:
- name: example
  rules:
  # Define minimum cluster size
  - record: min_cluster_size
    expr: 2  # Change this value to your actual minimum cluster size

  # Record the count of instances with idle CPU
  - record: instance_count
    expr: count(count by (instance) (node_cpu_seconds_total{mode="idle"}))

  # Alert for any instance that has average CPU idle < 60% (40% usage) only if instance count is equal or greater than min_cluster_size
  - alert: HighUsage
    expr: |
      (instance_count >= min_cluster_size)
      and
      (
        avg(irate(node_cpu_seconds_total{mode="idle"}[1m])) * 100 < 60
      )
    for: 1m
    annotations:
      summary: "High usage on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has an average CPU idle of less than 60% (current value: {{ $value }}%)"

  # Alert for any instance that has average CPU idle > 75% (25% usage) only if instance count is greater than min_cluster_size
  - alert: LowUsage
    expr: |
      (instance_count > min_cluster_size)
      and
      (
        avg(irate(node_cpu_seconds_total{mode="idle"}[1m])) * 100 > 75
      )
    for: 1m
    annotations:
      summary: "Low usage on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has an average CPU idle of more than 75% (current value: {{ $value }}%)"

4.2 กรณีใช้กับ Nginx Exporter

groups:
- name: HTTPREQUEST
  rules:
  # Record the minimum cluster size (adjust as needed)
  - record: min_cluster_size
    expr: 1  # Adjust this value to your actual minimum cluster size
  
  # Record the count of instances (using a different metric if available)
  - record: instance_count
    expr: count(count by (instance) (nginx_http_requests_total))  # Adjust based on available metrics

  # Alert to scale out if current HTTP requests are greater than 200 and instance count is >= min_cluster_size
  - alert: HighUsage
    expr: |
      (instance_count >= min_cluster_size)
      and
      (sum(rate(nginx_http_requests_total[1m]))) > 200
    for: 1m
    annotations:
      summary: "Scale out: High number of HTTP requests"
      description: "Number of HTTP requests exceeds 200 and instance count is greater than or equal to minimum cluster size."

  # Alert to scale in if current HTTP requests are less than 200 and instance count is > min_cluster_size
  - alert: LowUsage
    expr: |
      (instance_count > min_cluster_size)
      and
      (sum(rate(nginx_http_requests_total[1m]))) < 200
    for: 1m
    annotations:
      summary: "Scale in: Low number of HTTP requests"
      description: "Number of HTTP requests is less than 200 and instance count is greater than minimum cluster size."

4.3 กรณีใช้กับ Window_Exporter(for IIS)

groups:
- name: WINDOWREQUEST
  rules:
  # Record the minimum cluster size (adjust as needed)
  - record: min_cluster_size
    expr: 1  # Adjust this value to your actual minimum cluster size
  
  # Record the count of instances (using a different metric if available)
  - record: instance_count
    expr: count(count by (instance) (windows_iis_requests_total))  # Adjust based on available metrics

  # Alert to scale out if current HTTP requests are greater than 200 and instance count is >= min_cluster_size
  - alert: HighUsage
    expr: |
      (instance_count >= min_cluster_size)
      and
      (sum(rate(windows_iis_requests_total[1m]))) > 200
    for: 1m
    annotations:
      summary: "Scale out: High number of HTTP requests"
      description: "Number of HTTP requests exceeds 200 and instance count is greater than or equal to minimum cluster size."

  # Alert to scale in if current HTTP requests are less than 200 and instance count is > min_cluster_size
  - alert: LowUsage
    expr: |
      (instance_count > min_cluster_size)
      and
      (sum(rate(windows_iis_requests_total[1m]))) < 200
    for: 1m
    annotations:
      summary: "Scale in: Low number of HTTP requests"
      description: "Number of HTTP requests is less than 200 and instance count is greater than minimum cluster size."

Restart prometheus service

systemctl restart prometheus

สามารถตรวจสอบได้ที่หน้าเว็บ UI ของ Prometheus http://{FloatingIP}:9090

Config Alertmanager

ติดตั้ง Alertmanager ภายใน instance เดียวกันกับ Prometheus (ควรติดตั้งเวอร์ชั่นล่าสุด)
แก้ไขไฟล์ /etc/alertmanager/alertmanager.yml เพื่อทำการเเก้ไขในส่วนของ url: เป็น alarm_url ที่ copy ไว้ตอนสร้าง senlin receiver

global:
  resolve_timeout: 5m

route:
  group_by: ["alertname"]
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 5m
  receiver: default

  routes:
    - match:
        alertname: HighUsage
      receiver: "scale-out-senlin"
    - match:
        alertname: LowUsage
      receiver: "scale-in-senlin"

receivers:
  - name: "scale-out-senlin"
    webhook_configs:
      - url: "Link จาก alarm_url ที่ copy ไว้ตอนสร้าง Receiver"
        send_resolved: false

  - name: "scale-in-senlin"
    webhook_configs:
      - url: "Link จาก alarm_url ที่ copy ไว้ตอนสร้าง Receiver"
        send_resolved: false

  - name: "default"   #receiver ตัวนี้ใช้ลิงค์ dummy เพื่อให้มี default receivers ไม่มีผลในส่วนของการ autoscaling
    webhook_configs:
      - url: "http://localhost:9999/dummy"  
        send_resolved: false

inhibit_rules:
  - source_match:
      severity: "critical"
    target_match:
      severity: "warning"
    equal: ["alertname", "dev", "instance"]

limitation : ควรปรับให้ repeat_interval เป็น 5-10 นาที (5-10m) เนื่องจาก cluster สามารถ scale in ได้รอบละ 1 node ต่อ 1 trigger หากปรับนานกว่านั้นจะทำให้การ scale in ช้าลง

Restart alertmanager service

systemctl restart alertmanager

สามารถตรวจสอบได้จากหน้า web UI ของ Alertmanager

http://{FloatingIP}:9093/#/alerts

Config Load-Balancing Policy

ภายใต้โฟลเดอร์เดียวกันกับที่สร้าง Cluster Profile ให้สร้างไฟล์ ชื่อ lb_policy.yaml

touch lb_policy.yaml

เเก้ไขไฟล์ lb_policy.yaml ดังนี้ ซึ่งมี 2 กรณี

2.1 กรณี ใช้ lb_policy ในการสร้าง loadbalancer ให้อัตโนมัติหลังผูกกับ cluster

type: senlin.policy.loadbalance
version: 1.0
properties:
  availability_zone: NCP-BKK
  flavor_id: [loadbalancer flavor_id]
  vip:
    subnet: [subnet_id]  
    protocol: TCP
    protocol_port: 80
  health_monitor:
    type: TCP  # Options can be PING, TCP, HTTP, HTTPS
    delay: 5
    timeout: 3
    max_retries: 3
  pool:
    subnet: [subnet_id]  #ใช้ subnet เดียวกันกับ vip
    lb_method: ROUND_ROBIN
    session_persistence: {}
    protocol: TCP
    protocol_port: 80

2.2 กรณีที่ผู้ใช้สร้าง loadbalancer ไว้เองเเล้วไปผูกกับ cluster ทีหลัง โดยค่าต่างๆที่ใช้ในเเต่ละ argument ควรจะตรงกับค่าของ loadbalancer ที่ต้องการผูก

type: senlin.policy.loadbalance
version: 1.2
properties:
  loadbalancer: [loadbalancer_id]
  vip:
    address: 192.168.0.5  
    subnet: [subnet_id]
    protocol: TCP
    protocol_port: 80
  health_monitor:
    id: [health_monitor_id]
    type: TCP  # Options can be PING, TCP, HTTP, HTTPS
    delay: 5
    timeout: 3
    max_retries: 3
  pool:
    id: [pool_id]
    subnet: [subnet_id]
    lb_method: ROUND_ROBIN
    session_persistence: {}
    protocol: TCP
    protocol_port: 80

จากนั้นใช้ command สร้าง policy จากไฟล์ lb_policy.yaml

openstack cluster policy create --spec-file lb_policy.yaml lb_policy

นำ policy ที่สร้างมาผูกกับ cluster ที่สร้างไว้

openstack cluster policy attach --policy <POLICY_ID> <CLUSTER_ID>

PreviousTerraform with Openstack NextInstallation Prometheus

Last updated 9 months ago

Was this helpful?