• Cluster Networking
    • The Kubernetes network model
    • How to implement the Kubernetes networking model
      • ACI
      • AOS from Apstra
      • AWS VPC CNI for Kubernetes
      • Big Cloud Fabric from Big Switch Networks
      • Cilium
      • CNI-Genie from Huawei
      • cni-ipvlan-vpc-k8s
      • Contiv
      • Contrail / Tungsten Fabric
      • DANM
      • Flannel
      • Google Compute Engine (GCE)
      • Jaguar
      • k-vswitch
      • Knitter
      • Kube-OVN
      • Kube-router
      • L2 networks and linux bridging
      • Multus (a Multi Network plugin)
      • NSX-T
      • Nuage Networks VCS (Virtualized Cloud Services)
      • OpenVSwitch
      • OVN (Open Virtual Networking)
      • Project Calico
      • Romana
      • Weave Net from Weaveworks
    • What's next
    • Feedback

    Cluster Networking

    Networking is a central part of Kubernetes, but it can be challenging tounderstand exactly how it is expected to work. There are 4 distinct networkingproblems to address:

    • Highly-coupled container-to-container communications: this is solved bypods and localhost communications.
    • Pod-to-Pod communications: this is the primary focus of this document.
    • Pod-to-Service communications: this is covered by services.
    • External-to-Service communications: this is covered by services.Kubernetes is all about sharing machines between applications. Typically,sharing machines requires ensuring that two applications do not try to use thesame ports. Coordinating ports across multiple developers is very difficult todo at scale and exposes users to cluster-level issues outside of their control.

    Dynamic port allocation brings a lot of complications to the system - everyapplication has to take ports as flags, the API servers have to know how toinsert dynamic port numbers into configuration blocks, services have to knowhow to find each other, etc. Rather than deal with this, Kubernetes takes adifferent approach.

    The Kubernetes network model

    Every Pod gets its own IP address. This means you do not need to explicitlycreate links between Pods and you almost never need to deal with mappingcontainer ports to host ports. This creates a clean, backwards-compatiblemodel where Pods can be treated much like VMs or physical hosts from theperspectives of port allocation, naming, service discovery, load balancing,application configuration, and migration.

    Kubernetes imposes the following fundamental requirements on any networkingimplementation (barring any intentional network segmentation policies):

    • pods on a node can communicate with all pods on all nodes without NAT
    • agents on a node (e.g. system daemons, kubelet) can communicate with allpods on that node

    Note: For those platforms that support Pods running in the host network (e.g.Linux):

    • pods in the host network of a node can communicate with all pods on allnodes without NAT

    This model is not only less complex overall, but it is principally compatiblewith the desire for Kubernetes to enable low-friction porting of apps from VMsto containers. If your job previously ran in a VM, your VM had an IP and couldtalk to other VMs in your project. This is the same basic model.

    Kubernetes IP addresses exist at the Pod scope - containers within a Podshare their network namespaces - including their IP address. This means thatcontainers within a Pod can all reach each other’s ports on localhost. Thisalso means that containers within a Pod must coordinate port usage, but thisis no different from processes in a VM. This is called the “IP-per-pod” model.

    How this is implemented is a detail of the particular container runtime in use.

    It is possible to request ports on the Node itself which forward to your Pod(called host ports), but this is a very niche operation. How that forwarding isimplemented is also a detail of the container runtime. The Pod itself isblind to the existence or non-existence of host ports.

    How to implement the Kubernetes networking model

    There are a number of ways that this network model can be implemented. Thisdocument is not an exhaustive study of the various methods, but hopefully servesas an introduction to various technologies and serves as a jumping-off point.

    The following networking options are sorted alphabetically - the order does notimply any preferential status.

    ACI

    Cisco Application Centric Infrastructure offers an integrated overlay and underlay SDN solution that supports containers, virtual machines, and bare metal servers. ACI provides container networking integration for ACI. An overview of the integration is provided here.

    AOS from Apstra

    AOS is an Intent-Based Networking system that creates and manages complex datacenter environments from a simple integrated platform. AOS leverages a highly scalable distributed design to eliminate network outages while minimizing costs.

    The AOS Reference Design currently supports Layer-3 connected hosts that eliminate legacy Layer-2 switching problems. These Layer-3 hosts can be Linux servers (Debian, Ubuntu, CentOS) that create BGP neighbor relationships directly with the top of rack switches (TORs). AOS automates the routing adjacencies and then provides fine grained control over the route health injections (RHI) that are common in a Kubernetes deployment.

    AOS has a rich set of REST API endpoints that enable Kubernetes to quickly change the network policy based on application requirements. Further enhancements will integrate the AOS Graph model used for the network design with the workload provisioning, enabling an end to end management system for both private and public clouds.

    AOS supports the use of common vendor equipment from manufacturers including Cisco, Arista, Dell, Mellanox, HPE, and a large number of white-box systems and open network operating systems like Microsoft SONiC, Dell OPX, and Cumulus Linux.

    Details on how the AOS system works can be accessed here: http://www.apstra.com/products/how-it-works/

    AWS VPC CNI for Kubernetes

    The AWS VPC CNI offers integrated AWS Virtual Private Cloud (VPC) networking for Kubernetes clusters. This CNI plugin offers high throughput and availability, low latency, and minimal network jitter. Additionally, users can apply existing AWS VPC networking and security best practices for building Kubernetes clusters. This includes the ability to use VPC flow logs, VPC routing policies, and security groups for network traffic isolation.

    Using this CNI plugin allows Kubernetes pods to have the same IP address inside the pod as they do on the VPC network. The CNI allocates AWS Elastic Networking Interfaces (ENIs) to each Kubernetes node and using the secondary IP range from each ENI for pods on the node. The CNI includes controls for pre-allocation of ENIs and IP addresses for fast pod startup times and enables large clusters of up to 2,000 nodes.

    Additionally, the CNI can be run alongside Calico for network policy enforcement. The AWS VPC CNI project is open source with documentation on GitHub.

    Big Cloud Fabric from Big Switch Networks

    Big Cloud Fabric is a cloud native networking architecture, designed to run Kubernetes in private cloud/on-premises environments. Using unified physical & virtual SDN, Big Cloud Fabric tackles inherent container networking problems such as load balancing, visibility, troubleshooting, security policies & container traffic monitoring.

    With the help of the Big Cloud Fabric’s virtual pod multi-tenant architecture, container orchestration systems such as Kubernetes, RedHat OpenShift, Mesosphere DC/OS & Docker Swarm will be natively integrated alongside with VM orchestration systems such as VMware, OpenStack & Nutanix. Customers will be able to securely inter-connect any number of these clusters and enable inter-tenant communication between them if needed.

    BCF was recognized by Gartner as a visionary in the latest Magic Quadrant. One of the BCF Kubernetes on-premises deployments (which includes Kubernetes, DC/OS & VMware running on multiple DCs across different geographic regions) is also referenced here.

    Cilium

    Cilium is open source software forproviding and transparently securing network connectivity between applicationcontainers. Cilium is L7/HTTP aware and can enforce network policies on L3-L7using an identity based security model that is decoupled from networkaddressing.

    CNI-Genie from Huawei

    CNI-Genie is a CNI plugin that enables Kubernetes to simultaneously have access to different implementations of the Kubernetes network model in runtime. This includes any implementation that runs as a CNI plugin, such as Flannel, Calico, Romana, Weave-net.

    CNI-Genie also supports assigning multiple IP addresses to a pod, each from a different CNI plugin.

    cni-ipvlan-vpc-k8s

    cni-ipvlan-vpc-k8s contains a setof CNI and IPAM plugins to provide a simple, host-local, low latency, highthroughput, and compliant networking stack for Kubernetes within Amazon VirtualPrivate Cloud (VPC) environments by making use of Amazon Elastic NetworkInterfaces (ENI) and binding AWS-managed IPs into Pods using the Linux kernel’sIPvlan driver in L2 mode.

    The plugins are designed to be straightforward to configure and deploy within aVPC. Kubelets boot and then self-configure and scale their IP usage as neededwithout requiring the often recommended complexities of administering overlaynetworks, BGP, disabling source/destination checks, or adjusting VPC routetables to provide per-instance subnets to each host (which is limited to 50-100entries per VPC). In short, cni-ipvlan-vpc-k8s significantly reduces thenetwork complexity required to deploy Kubernetes at scale within AWS.

    Contiv

    Contiv provides configurable networking (native l3 using BGP, overlay using vxlan, classic l2, or Cisco-SDN/ACI) for various use cases. Contiv is all open sourced.

    Contrail / Tungsten Fabric

    Contrail, based on Tungsten Fabric, is a truly open, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with various orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide different isolation modes for virtual machines, containers/pods and bare metal workloads.

    DANM

    DANM is a networking solution for telco workloads running in a Kubernetes cluster. It’s built up from the following components:

    • A CNI plugin capable of provisioning IPVLAN interfaces with advanced features
    • An in-built IPAM module with the capability of managing multiple, cluster-wide, discontinuous L3 networks and provide a dynamic, static, or no IP allocation scheme on-demand
    • A CNI metaplugin capable of attaching multiple network interfaces to a container, either through its own CNI, or through delegating the job to any of the popular CNI solution like SRI-OV, or Flannel in parallel
    • A Kubernetes controller capable of centrally managing both VxLAN and VLAN interfaces of all Kubernetes hosts
    • Another Kubernetes controller extending Kubernetes’ Service-based service discovery concept to work over all network interfaces of a Pod

    With this toolset DANM is able to provide multiple separated network interfaces, the possibility to use different networking back ends and advanced IPAM features for the pods.

    Flannel

    Flannel is a very simple overlaynetwork that satisfies the Kubernetes requirements. Manypeople have reported success with Flannel and Kubernetes.

    Google Compute Engine (GCE)

    For the Google Compute Engine cluster configuration scripts, advancedrouting is used toassign each VM a subnet (default is /24 - 254 IPs). Any traffic bound for thatsubnet will be routed directly to the VM by the GCE network fabric. This is inaddition to the “main” IP address assigned to the VM, which is NAT’ed foroutbound internet access. A linux bridge (called cbr0) is configured to existon that subnet, and is passed to docker’s —bridge flag.

    Docker is started with:

    1. DOCKER_OPTS="--bridge=cbr0 --iptables=false --ip-masq=false"

    This bridge is created by Kubelet (controlled by the —network-plugin=kubenetflag) according to the Node’s .spec.podCIDR.

    Docker will now allocate IPs from the cbr-cidr block. Containers can reacheach other and Nodes over the cbr0 bridge. Those IPs are all routablewithin the GCE project network.

    GCE itself does not know anything about these IPs, though, so it will not NATthem for outbound internet traffic. To achieve that an iptables rule is usedto masquerade (aka SNAT - to make it seem as if packets came from the Nodeitself) traffic that is bound for IPs outside the GCE project network(10.0.0.0/8).

    1. iptables -t nat -A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE

    Lastly IP forwarding is enabled in the kernel (so the kernel will processpackets for bridged containers):

    1. sysctl net.ipv4.ip_forward=1

    The result of all this is that all Pods can reach each other and can egresstraffic to the internet.

    Jaguar

    Jaguar is an open source solution for Kubernetes’s network based on OpenDaylight. Jaguar provides overlay network using vxlan and Jaguar CNIPlugin provides one IP address per pod.

    k-vswitch

    k-vswitch is a simple Kubernetes networking plugin based on Open vSwitch. It leverages existing functionality in Open vSwitch to provide a robust networking plugin that is easy-to-operate, performant and secure.

    Knitter

    Knitter is a network solution which supports multiple networking in Kubernetes. It provides the ability of tenant management and network management. Knitter includes a set of end-to-end NFV container networking solutions besides multiple network planes, such as keeping IP address for applications, IP address migration, etc.

    Kube-OVN

    Kube-OVN is an OVN-based kubernetes network fabric for enterprises. With the help of OVN/OVS, it provides some advanced overlay network features like subnet, QoS, static IP allocation, traffic mirroring, gateway, openflow-based network policy and service proxy.

    Kube-router

    Kube-router is a purpose-built networking solution for Kubernetes that aims to provide high performance and operational simplicity. Kube-router provides a Linux LVS/IPVS-based service proxy, a Linux kernel forwarding-based pod-to-pod networking solution with no overlays, and iptables/ipset-based network policy enforcer.

    L2 networks and linux bridging

    If you have a “dumb” L2 network, such as a simple switch in a “bare-metal”environment, you should be able to do something similar to the above GCE setup.Note that these instructions have only been tried very casually - it seems towork, but has not been thoroughly tested. If you use this technique andperfect the process, please let us know.

    Follow the “With Linux Bridge devices” section of this very nicetutorial fromLars Kellogg-Stedman.

    Multus (a Multi Network plugin)

    Multus is a Multi CNI plugin to support the Multi Networking feature in Kubernetes using CRD based network objects in Kubernetes.

    Multus supports all reference plugins (eg. Flannel, DHCP, Macvlan) that implement the CNI specification and 3rd party plugins (eg. Calico, Weave, Cilium, Contiv). In addition to it, Multus supports SRIOV, DPDK, OVS-DPDK & VPP workloads in Kubernetes with both cloud native and NFV based applications in Kubernetes.

    NSX-T

    VMware NSX-T is a network virtualization and security platform. NSX-T can provide network virtualization for a multi-cloud and multi-hypervisor environment and is focused on emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks. In addition to vSphere hypervisors, these environments include other hypervisors such as KVM, containers, and bare metal.

    NSX-T Container Plug-in (NCP) provides integration between NSX-T and container orchestrators such as Kubernetes, as well as integration between NSX-T and container-based CaaS/PaaS platforms such as Pivotal Container Service (PKS) and OpenShift.

    Nuage Networks VCS (Virtualized Cloud Services)

    Nuage provides a highly scalable policy-based Software-Defined Networking (SDN) platform. Nuage uses the open source Open vSwitch for the data plane along with a feature rich SDN Controller built on open standards.

    The Nuage platform uses overlays to provide seamless policy-based networking between Kubernetes Pods and non-Kubernetes environments (VMs and bare metal servers). Nuage’s policy abstraction model is designed with applications in mind and makes it easy to declare fine-grained policies for applications.The platform’s real-time analytics engine enables visibility and security monitoring for Kubernetes applications.

    OpenVSwitch

    OpenVSwitch is a somewhat more mature but alsocomplicated way to build an overlay network. This is endorsed by several of the“Big Shops” for networking.

    OVN (Open Virtual Networking)

    OVN is an opensource network virtualization solution developed by theOpen vSwitch community. It lets one create logical switches, logical routers,stateful ACLs, load-balancers etc to build different virtual networkingtopologies. The project has a specific Kubernetes plugin and documentationat ovn-kubernetes.

    Project Calico

    Project Calico is an open source container networking provider and network policy engine.

    Calico provides a highly scalable networking and network policy solution for connecting Kubernetes pods based on the same IP networking principles as the internet, for both Linux (open source) and Windows (proprietary - available from Tigera). Calico can be deployed without encapsulation or overlays to provide high-performance, high-scale data center networking. Calico also provides fine-grained, intent based network security policy for Kubernetes pods via its distributed firewall.

    Calico can also be run in policy enforcement mode in conjunction with other networking solutions such as Flannel, aka canal, or native GCE, AWS or Azure networking.

    Romana

    Romana is an open source network and security automation solution that lets you deploy Kubernetes without an overlay network. Romana supports Kubernetes Network Policy to provide isolation across network namespaces.

    Weave Net from Weaveworks

    Weave Net is aresilient and simple to use network for Kubernetes and its hosted applications.Weave Net runs as a CNI plug-inor stand-alone. In either version, it doesn’t require any configuration or extra codeto run, and in both cases, the network provides one IP address per pod - as is standard for Kubernetes.

    What's next

    The early design of the networking model and its rationale, and some futureplans are described in more detail in the networking designdocument.

    Feedback

    Was this page helpful?

    Thanks for the feedback. If you have a specific, answerable question about how to use Kubernetes, ask it onStack Overflow.Open an issue in the GitHub repo if you want toreport a problemorsuggest an improvement.