• 1. 环境准备
    • 1.1. 部署机器
    • 1.2. 配置管理机
    • 1.3. 配置部署机器
    • 1.4. 涉及镜像
  • 2. 部署集群
    • 2.1. 下载kubespary的源码
    • 2.2. 编辑配置文件
      • 2.2.1. hosts.ini
      • 2.2.2. k8s-cluster.yml
    • 2.3. 执行部署操作
  • 3. 确认部署结果
    • 3.1. ansible的部署结果
    • 3.2. k8s集群运行结果
  • 4. k8s集群扩容节点
    • 4.1. 修改hosts.ini文件
    • 4.2. 执行扩容命令
    • 4.3. 检查扩容结果
  • 5. 部署高可用集群
  • 6. 升级k8s集群
  • 7. troubles shooting
    • 7.1. python-netaddr未安装
    • 7.2. swap未关闭
    • 7.3. 部署机器内存过小
    • 7.4. kube-scheduler组件运行失败
    • 7.5. docker安装包冲突

    1. 环境准备

    1.1. 部署机器

    以下机器为虚拟机

    机器IP 主机名 角色 系统版本 备注
    172.16.94.140 kube-master-0 k8s master Centos 4.17.14 内存:3G
    172.16.94.141 kube-node-41 k8s node Centos 4.17.14 内存:3G
    172.16.94.142 kube-node-42 k8s node Centos 4.17.14 内存:3G
    172.16.94.135 部署管理机 -

    1.2. 配置管理机

    管理机主要用来部署k8s集群,需要安装以下版本的软件,具体可参考:

    • https://github.com/kubernetes-incubator/kubespray#requirements

    • https://github.com/kubernetes-incubator/kubespray/blob/master/requirements.txt

    1. ansible>=2.4.0
    2. jinja2>=2.9.6
    3. netaddr
    4. pbr>=1.6
    5. ansible-modules-hashivault>=3.9.4
    6. hvac

    1、安装及配置ansible

    • 参考ansible的使用。
    • 给部署机器配置SSH的免密登录权限,具体参考ssh免密登录。

    2、安装python-netaddr

    1. # 安装pip
    2. yum -y install epel-release
    3. yum -y install python-pip
    4. # 安装python-netaddr
    5. pip install netaddr

    3、升级Jinja

    1. # Jinja 2.9 (or newer)
    2. pip install --upgrade jinja2

    1.3. 配置部署机器

    部署机器即用来运行k8s集群的机器,包括MasterNode

    1、确认系统版本

    本文采用centos7的系统,建议将系统内核升级到4.x.x以上。

    2、关闭防火墙

    1. systemctl stop firewalld
    2. systemctl disable firewalld
    3. iptables -F

    3、关闭swap

    Kubespary v2.5.0的版本需要关闭swap,具体参考

    • https://github.com/kubernetes-incubator/kubespray/blob/02cd5418c22d51e40261775908d55bc562206023/roles/kubernetes/preinstall/tasks/verify-settings.yml#L75
    1. - name: Stop if swap enabled
    2. assert:
    3. that: ansible_swaptotal_mb == 0
    4. when: kubelet_fail_swap_on|default(true)
    5. ignore_errors: "{{ ignore_assert_errors }}"

    V2.6.0 版本去除了swap的检查,具体参考:

    • https://github.com/kubernetes-incubator/kubespray/commit/b902602d161f8c147f3d155d2ac5360244577127#diff-b92ae64dd18d34a96fbeb7f7e48a6a9b

    执行关闭swap命令swapoff -a

    1. [root@master ~]#swapoff -a
    2. [root@master ~]#
    3. [root@master ~]# free -m
    4. total used free shared buff/cache available
    5. Mem: 976 366 135 6 474 393
    6. Swap: 0 0 0
    7. # swap 一栏为0,表示已经关闭了swap

    4、确认部署机器内存

    由于本文采用虚拟机部署,内存可能存在不足的问题,因此将虚拟机内存调整为3G或以上;如果是物理机一般不会有内存不足的问题。具体参考:

    • https://github.com/kubernetes-incubator/kubespray/blob/95f1e4634a1c50fa77312d058a2b713353f4307e/roles/kubernetes/preinstall/tasks/verify-settings.yml#L52
    1. - name: Stop if memory is too small for masters
    2. assert:
    3. that: ansible_memtotal_mb >= 1500
    4. ignore_errors: "{{ ignore_assert_errors }}"
    5. when: inventory_hostname in groups['kube-master']
    6. - name: Stop if memory is too small for nodes
    7. assert:
    8. that: ansible_memtotal_mb >= 1024
    9. ignore_errors: "{{ ignore_assert_errors }}"
    10. when: inventory_hostname in groups['kube-node']

    1.4. 涉及镜像

    Docker版本为17.03.2-ce

    1、Master节点

    镜像 版本 大小 镜像ID 备注
    gcr.io/google-containers/hyperkube v1.9.5 620 MB a7e7fdbc5fee k8s
    quay.io/coreos/etcd v3.2.4 35.7 MB 498ffffcfd05
    gcr.io/google_containers/pause-amd64 3.0 747 kB 99e59f495ffa
    quay.io/calico/node v2.6.8 282 MB e96a297310fd calico
    quay.io/calico/cni v1.11.4 70.8 MB 4c4cb67d7a88 calico
    quay.io/calico/ctl v1.6.3 44.4 MB 46d3aace8bc6 calico

    2、Node节点

    镜像 版本 大小 镜像ID 备注
    gcr.io/google-containers/hyperkube v1.9.5 620 MB a7e7fdbc5fee k8s
    gcr.io/google_containers/pause-amd64 3.0 747 kB 99e59f495ffa
    quay.io/calico/node v2.6.8 282 MB e96a297310fd calico
    quay.io/calico/cni v1.11.4 70.8 MB 4c4cb67d7a88 calico
    quay.io/calico/ctl v1.6.3 44.4 MB 46d3aace8bc6 calico
    gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.8 40.9 MB c2ce1ffb51ed dns
    gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.8 42.2 MB 6f7f2dc7fab5 dns
    gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.8 50.5 MB 80cc5ea4b547 dns
    gcr.io/google_containers/cluster-proportional-autoscaler-amd64 1.1.2 50.5 MB 78cf3f492e6b
    gcr.io/google_containers/kubernetes-dashboard-amd64 v1.8.3 102 MB 0c60bcf89900 dashboard
    nginx 1.13 109 MB ae513a47849c -

    3、说明

    • 镜像被墙并且全部镜像下载需要较多时间,建议提前下载到部署机器上。
    • hyperkube镜像主要用来运行k8s核心组件(例如kube-apiserver等)。
    • 此处使用的网络组件为calico。

    2. 部署集群

    2.1. 下载kubespary的源码

    1. git clone https://github.com/kubernetes-incubator/kubespray.git

    2.2. 编辑配置文件

    2.2.1. hosts.ini

    hosts.ini主要为部署节点机器信息的文件,路径为:kubespray/inventory/sample/hosts.ini

    1. cd kubespray
    2. # 复制一份配置进行修改
    3. cp -rfp inventory/sample inventory/k8s
    4. vi inventory/k8s/hosts.ini

    例如:

    hosts.ini文件可以填写部署机器的登录密码,也可以不填密码而设置ssh的免密登录。

    1. # Configure 'ip' variable to bind kubernetes services on a
    2. # different ip than the default iface
    3. # 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
    4. kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
    5. kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
    6. kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24
    7. # configure a bastion host if your nodes are not directly reachable
    8. # bastion ansible_ssh_host=x.x.x.x
    9. [kube-master]
    10. kube-master-0
    11. [etcd]
    12. kube-master-0
    13. [kube-node]
    14. kube-node-41
    15. kube-node-42
    16. [k8s-cluster:children]
    17. kube-node
    18. kube-master
    19. [calico-rr]

    2.2.2. k8s-cluster.yml

    k8s-cluster.yml主要为k8s集群的配置文件,路径为:kubespray/inventory/k8s/group_vars/k8s-cluster.yml。该文件可以修改安装的k8s集群的版本,参数为:kube_version: v1.9.5。具体可参考:

    • https://github.com/kubernetes-incubator/kubespray/blob/master/inventory/sample/group_vars/k8s-cluster.yml#L22

    2.3. 执行部署操作

    涉及文件为cluster.yml

    1. # 进入主目录
    2. cd kubespray
    3. # 执行部署命令
    4. ansible-playbook -i inventory/k8s/hosts.ini cluster.yml -b -vvv

    -vvv 参数表示输出运行日志

    如果需要重置可以执行以下命令:

    涉及文件为reset.yml

    1. ansible-playbook -i inventory/k8s/hosts.ini reset.yml -b -vvv

    3. 确认部署结果

    3.1. ansible的部署结果

    ansible命令执行完,出现以下日志,则说明部署成功,否则根据报错内容进行修改。

    1. PLAY RECAP *****************************************************************************
    2. kube-master-0 : ok=309 changed=30 unreachable=0 failed=0
    3. kube-node-41 : ok=203 changed=8 unreachable=0 failed=0
    4. kube-node-42 : ok=203 changed=8 unreachable=0 failed=0
    5. localhost : ok=2 changed=0 unreachable=0 failed=0

    以下为部分部署执行日志:

    1. kubernetes/preinstall : Update package management cache (YUM) --------------------23.96s
    2. /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/main.yml:121
    3. kubernetes/master : Master | wait for the apiserver to be running ----------------23.44s
    4. /root/gopath/src/kubespray/roles/kubernetes/master/handlers/main.yml:79
    5. kubernetes/preinstall : Install packages requirements ----------------------------20.20s
    6. /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/main.yml:203
    7. kubernetes/secrets : Check certs | check if a cert already exists on node --------13.94s
    8. /root/gopath/src/kubespray/roles/kubernetes/secrets/tasks/check-certs.yml:17
    9. gather facts from all instances --------------------------------------------------9.98s
    10. /root/gopath/src/kubespray/cluster.yml:25
    11. kubernetes/node : install | Compare host kubelet with hyperkube container --------9.66s
    12. /root/gopath/src/kubespray/roles/kubernetes/node/tasks/install_host.yml:2
    13. kubernetes-apps/ansible : Kubernetes Apps | Start Resources -----------------------9.27s
    14. /root/gopath/src/kubespray/roles/kubernetes-apps/ansible/tasks/main.yml:37
    15. kubernetes-apps/ansible : Kubernetes Apps | Lay Down KubeDNS Template ------------8.47s
    16. /root/gopath/src/kubespray/roles/kubernetes-apps/ansible/tasks/kubedns.yml:3
    17. download : Sync container ---------------------------------------------------------8.23s
    18. /root/gopath/src/kubespray/roles/download/tasks/main.yml:15
    19. kubernetes-apps/network_plugin/calico : Start Calico resources --------------------7.82s
    20. /root/gopath/src/kubespray/roles/kubernetes-apps/network_plugin/calico/tasks/main.yml:2
    21. download : Download items ---------------------------------------------------------7.67s
    22. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6
    23. download : Download items ---------------------------------------------------------7.48s
    24. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6
    25. download : Sync container ---------------------------------------------------------7.35s
    26. /root/gopath/src/kubespray/roles/download/tasks/main.yml:15
    27. download : Download items ---------------------------------------------------------7.16s
    28. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6
    29. network_plugin/calico : Calico | Copy cni plugins from calico/cni container -------7.10s
    30. /root/gopath/src/kubespray/roles/network_plugin/calico/tasks/main.yml:62
    31. download : Download items ---------------------------------------------------------7.04s
    32. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6
    33. download : Download items ---------------------------------------------------------7.01s
    34. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6
    35. download : Sync container ---------------------------------------------------------7.00s
    36. /root/gopath/src/kubespray/roles/download/tasks/main.yml:15
    37. download : Download items ---------------------------------------------------------6.98s
    38. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6
    39. download : Download items ---------------------------------------------------------6.79s
    40. /root/gopath/src/kubespray/roles/download/tasks/main.yml:6

    3.2. k8s集群运行结果

    1、k8s组件信息

    1. # kubectl get all --namespace=kube-system
    2. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    3. ds/calico-node 3 3 3 3 3 <none> 2h
    4. NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
    5. deploy/kube-dns 2 2 2 2 2h
    6. deploy/kubedns-autoscaler 1 1 1 1 2h
    7. deploy/kubernetes-dashboard 1 1 1 1 2h
    8. NAME DESIRED CURRENT READY AGE
    9. rs/kube-dns-79d99cdcd5 2 2 2 2h
    10. rs/kubedns-autoscaler-5564b5585f 1 1 1 2h
    11. rs/kubernetes-dashboard-69cb58d748 1 1 1 2h
    12. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    13. ds/calico-node 3 3 3 3 3 <none> 2h
    14. NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
    15. deploy/kube-dns 2 2 2 2 2h
    16. deploy/kubedns-autoscaler 1 1 1 1 2h
    17. deploy/kubernetes-dashboard 1 1 1 1 2h
    18. NAME DESIRED CURRENT READY AGE
    19. rs/kube-dns-79d99cdcd5 2 2 2 2h
    20. rs/kubedns-autoscaler-5564b5585f 1 1 1 2h
    21. rs/kubernetes-dashboard-69cb58d748 1 1 1 2h
    22. NAME READY STATUS RESTARTS AGE
    23. po/calico-node-22vsg 1/1 Running 0 2h
    24. po/calico-node-t7zgw 1/1 Running 0 2h
    25. po/calico-node-zqnx8 1/1 Running 0 2h
    26. po/kube-apiserver-kube-master-0 1/1 Running 0 22h
    27. po/kube-controller-manager-kube-master-0 1/1 Running 0 2h
    28. po/kube-dns-79d99cdcd5-f2t6t 3/3 Running 0 2h
    29. po/kube-dns-79d99cdcd5-gw944 3/3 Running 0 2h
    30. po/kube-proxy-kube-master-0 1/1 Running 2 22h
    31. po/kube-proxy-kube-node-41 1/1 Running 3 22h
    32. po/kube-proxy-kube-node-42 1/1 Running 3 22h
    33. po/kube-scheduler-kube-master-0 1/1 Running 0 2h
    34. po/kubedns-autoscaler-5564b5585f-lt9bb 1/1 Running 0 2h
    35. po/kubernetes-dashboard-69cb58d748-wmb9x 1/1 Running 0 2h
    36. po/nginx-proxy-kube-node-41 1/1 Running 3 22h
    37. po/nginx-proxy-kube-node-42 1/1 Running 3 22h
    38. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    39. svc/kube-dns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP 2h
    40. svc/kubernetes-dashboard ClusterIP 10.233.27.24 <none> 443/TCP 2h

    2、k8s节点信息

    1. # kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. kube-master-0 Ready master 22h v1.9.5
    4. kube-node-41 Ready node 22h v1.9.5
    5. kube-node-42 Ready node 22h v1.9.5

    3、组件健康信息

    1. # kubectl get cs
    2. NAME STATUS MESSAGE ERROR
    3. scheduler Healthy ok
    4. controller-manager Healthy ok
    5. etcd-0 Healthy {"health": "true"}

    4. k8s集群扩容节点

    4.1. 修改hosts.ini文件

    如果需要扩容Node节点,则修改hosts.ini文件,增加新增的机器信息。例如,要增加节点机器kube-node-43(IP为172.16.94.143),修改后的文件内容如下:

    1. # Configure 'ip' variable to bind kubernetes services on a
    2. # different ip than the default iface
    3. # 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
    4. kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
    5. kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
    6. kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24
    7. kube-node-43 ansible_ssh_host=172.16.94.143 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.143 mask=/24
    8. # configure a bastion host if your nodes are not directly reachable
    9. # bastion ansible_ssh_host=x.x.x.x
    10. [kube-master]
    11. kube-master-0
    12. [etcd]
    13. kube-master-0
    14. [kube-node]
    15. kube-node-41
    16. kube-node-42
    17. kube-node-43
    18. [k8s-cluster:children]
    19. kube-node
    20. kube-master
    21. [calico-rr]

    4.2. 执行扩容命令

    涉及文件为scale.yml

    1. # 进入主目录
    2. cd kubespray
    3. # 执行部署命令
    4. ansible-playbook -i inventory/k8s/hosts.ini scale.yml -b -vvv

    4.3. 检查扩容结果

    1、ansible的执行结果

    1. PLAY RECAP ***************************************
    2. kube-node-41 : ok=228 changed=11 unreachable=0 failed=0
    3. kube-node-42 : ok=197 changed=6 unreachable=0 failed=0
    4. kube-node-43 : ok=227 changed=69 unreachable=0 failed=0 # 新增Node节点
    5. localhost : ok=2 changed=0 unreachable=0 failed=0

    2、k8s的节点信息

    1. # kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. kube-master-0 Ready master 1d v1.9.5
    4. kube-node-41 Ready node 1d v1.9.5
    5. kube-node-42 Ready node 1d v1.9.5
    6. kube-node-43 Ready node 1m v1.9.5 #该节点为新增Node节点

    可以看到新增的kube-node-43节点已经扩容完成。

    3、k8s组件信息

    1. # kubectl get po --namespace=kube-system -o wide
    2. NAME READY STATUS RESTARTS AGE IP NODE
    3. calico-node-22vsg 1/1 Running 0 10h 172.16.94.140 kube-master-0
    4. calico-node-8fz9x 1/1 Running 2 27m 172.16.94.143 kube-node-43
    5. calico-node-t7zgw 1/1 Running 0 10h 172.16.94.142 kube-node-42
    6. calico-node-zqnx8 1/1 Running 0 10h 172.16.94.141 kube-node-41
    7. kube-apiserver-kube-master-0 1/1 Running 0 1d 172.16.94.140 kube-master-0
    8. kube-controller-manager-kube-master-0 1/1 Running 0 10h 172.16.94.140 kube-master-0
    9. kube-dns-79d99cdcd5-f2t6t 3/3 Running 0 10h 10.233.100.194 kube-node-41
    10. kube-dns-79d99cdcd5-gw944 3/3 Running 0 10h 10.233.107.1 kube-node-42
    11. kube-proxy-kube-master-0 1/1 Running 2 1d 172.16.94.140 kube-master-0
    12. kube-proxy-kube-node-41 1/1 Running 3 1d 172.16.94.141 kube-node-41
    13. kube-proxy-kube-node-42 1/1 Running 3 1d 172.16.94.142 kube-node-42
    14. kube-proxy-kube-node-43 1/1 Running 0 26m 172.16.94.143 kube-node-43
    15. kube-scheduler-kube-master-0 1/1 Running 0 10h 172.16.94.140 kube-master-0
    16. kubedns-autoscaler-5564b5585f-lt9bb 1/1 Running 0 10h 10.233.100.193 kube-node-41
    17. kubernetes-dashboard-69cb58d748-wmb9x 1/1 Running 0 10h 10.233.107.2 kube-node-42
    18. nginx-proxy-kube-node-41 1/1 Running 3 1d 172.16.94.141 kube-node-41
    19. nginx-proxy-kube-node-42 1/1 Running 3 1d 172.16.94.142 kube-node-42
    20. nginx-proxy-kube-node-43 1/1 Running 0 26m 172.16.94.143 kube-node-43

    5. 部署高可用集群

    hosts.ini文件中的master和etcd的机器增加到多台,执行部署命令。

    1. ansible-playbook -i inventory/k8s/hosts.ini cluster.yml -b -vvv

    例如:

    1. # Configure 'ip' variable to bind kubernetes services on a
    2. # different ip than the default iface
    3. # 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
    4. kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
    5. kube-master-1 ansible_ssh_host=172.16.94.144 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.144 mask=/24
    6. kube-master-2 ansible_ssh_host=172.16.94.145 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.145 mask=/24
    7. kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
    8. kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24
    9. kube-node-43 ansible_ssh_host=172.16.94.143 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.143 mask=/24
    10. # configure a bastion host if your nodes are not directly reachable
    11. # bastion ansible_ssh_host=x.x.x.x
    12. [kube-master]
    13. kube-master-0
    14. kube-master-1
    15. kube-master-2
    16. [etcd]
    17. kube-master-0
    18. kube-master-1
    19. kube-master-2
    20. [kube-node]
    21. kube-node-41
    22. kube-node-42
    23. kube-node-43
    24. [k8s-cluster:children]
    25. kube-node
    26. kube-master
    27. [calico-rr]

    6. 升级k8s集群

    选择对应的k8s版本信息,执行升级命令。涉及文件为upgrade-cluster.yml

    1. ansible-playbook upgrade-cluster.yml -b -i inventory/k8s/hosts.ini -e kube_version=v1.10.4 -vvv

    7. troubles shooting

    在使用kubespary部署k8s集群时,主要遇到以下报错。

    7.1. python-netaddr未安装

    • 报错内容:
    1. fatal: [node1]: FAILED! => {"failed": true, "msg": "The ipaddr filter requires python-netaddr be installed on the ansible controller"}
    • 解决方法:

    需要安装 python-netaddr,具体参考上述[环境准备]内容。

    7.2. swap未关闭

    • 报错内容:
    1. fatal: [kube-master-0]: FAILED! => {
    2. "assertion": "ansible_swaptotal_mb == 0",
    3. "changed": false,
    4. "evaluated_to": false
    5. }
    6. fatal: [kube-node-41]: FAILED! => {
    7. "assertion": "ansible_swaptotal_mb == 0",
    8. "changed": false,
    9. "evaluated_to": false
    10. }
    11. fatal: [kube-node-42]: FAILED! => {
    12. "assertion": "ansible_swaptotal_mb == 0",
    13. "changed": false,
    14. "evaluated_to": false
    15. }
    • 解决方法:

    所有部署机器执行swapoff -a关闭swap,具体参考上述[环境准备]内容。

    7.3. 部署机器内存过小

    • 报错内容:
    1. TASK [kubernetes/preinstall : Stop if memory is too small for masters] *********************************************************************************************************************************************************************************************************
    2. task path: /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:52
    3. Friday 10 August 2018 21:50:26 +0800 (0:00:00.940) 0:01:14.088 *********
    4. fatal: [kube-master-0]: FAILED! => {
    5. "assertion": "ansible_memtotal_mb >= 1500",
    6. "changed": false,
    7. "evaluated_to": false
    8. }
    9. TASK [kubernetes/preinstall : Stop if memory is too small for nodes] ***********************************************************************************************************************************************************************************************************
    10. task path: /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:58
    11. Friday 10 August 2018 21:50:27 +0800 (0:00:00.570) 0:01:14.659 *********
    12. fatal: [kube-node-41]: FAILED! => {
    13. "assertion": "ansible_memtotal_mb >= 1024",
    14. "changed": false,
    15. "evaluated_to": false
    16. }
    17. fatal: [kube-node-42]: FAILED! => {
    18. "assertion": "ansible_memtotal_mb >= 1024",
    19. "changed": false,
    20. "evaluated_to": false
    21. }
    22. to retry, use: --limit @/root/gopath/src/kubespray/cluster.retry
    • 解决方法:

    调大所有部署机器的内存,本示例中调整为3G或以上。

    7.4. kube-scheduler组件运行失败

    kube-scheduler组件运行失败,导致http://localhost:10251/healthz调用失败。

    • 报错内容:
    1. FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
    2. FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
    3. fatal: [node1]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
    • 解决方法:

    可能是内存不足导致,本示例中调大了部署机器的内存。

    7.5. docker安装包冲突

    • 报错内容:
    1. failed: [k8s-node-1] (item={u'name': u'docker-engine-1.13.1-1.el7.centos'}) => {
    2. "attempts": 4,
    3. "changed": false,
    4. ...
    5. "item": {
    6. "name": "docker-engine-1.13.1-1.el7.centos"
    7. },
    8. "msg": "Error: docker-ce-selinux conflicts with 2:container-selinux-2.66-1.el7.noarch\n",
    9. "rc": 1,
    10. "results": [
    11. "Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\n * elrepo: mirrors.tuna.tsinghua.edu.cn\n * epel: mirrors.tongji.edu.cn\nPackage docker-engine is obsoleted by docker-ce, trying to install docker-ce-17.03.2.ce-1.el7.centos.x86_64 instead\nResolving Dependencies\n--> Running transaction check\n---> Package docker-ce.x86_64 0:17.03.2.ce-1.el7.centos will be installed\n--> Processing Dependency: docker-ce-selinux >= 17.03.2.ce-1.el7.centos for package: docker-ce-17.03.2.ce-1.el7.centos.x86_64\n--> Processing Dependency: libltdl.so.7()(64bit) for package: docker-ce-17.03.2.ce-1.el7.centos.x86_64\n--> Running transaction check\n---> Package docker-ce-selinux.noarch 0:17.03.2.ce-1.el7.centos will be installed\n---> Package libtool-ltdl.x86_64 0:2.4.2-22.el7_3 will be installed\n--> Processing Conflict: docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch conflicts docker-selinux\n--> Restarting Dependency Resolution with new changes.\n--> Running transaction check\n---> Package container-selinux.noarch 2:2.55-1.el7 will be updated\n---> Package container-selinux.noarch 2:2.66-1.el7 will be an update\n--> Processing Conflict: docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch conflicts docker-selinux\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"
    12. ]
    13. }
    • 解决方法:

    卸载旧的docker版本,由kubespary自动安装。

    1. sudo yum remove -y docker \
    2. docker-client \
    3. docker-client-latest \
    4. docker-common \
    5. docker-latest \
    6. docker-latest-logrotate \
    7. docker-logrotate \
    8. docker-selinux \
    9. docker-engine-selinux \
    10. docker-engine

    参考文章:

    • https://github.com/kubernetes-incubator/kubespray
    • https://github.com/kubernetes-incubator/kubespray/blob/master/docs/upgrades.md