OpenShift Node-to-Node encrypted network mesh with WireGuard and Ansible


The content of this post is part of my masters thesis (to be completed in June 2019), in which I research the security of OpenShift and how it can be extended. The main focus is on the threats on traffic flows and interconnections, and how a built-in encryption mechanism could prevent malicious influences on the operation of the platform and its data.

What's the point?

Based on the underlying Kubernetes, each node in an OpenShift cluster is deployed with a control unit called "kubelet" which locally manages the node's components. Due to its importance, this key component has certain requirements regarding availability and security. The malicious modification of the kubelet would allow attackers to not only modify the state of the node and possibly other nodes, it also opens the possibility to request resources from the master (for example secrets) in the name of the kubelet.

Now, two different attack vectors must be differentiated. First, there's the takeover of the kubelet on the node. An exploit could allow users of the platform to break out of their contained environment and to escalate their privileges to access the kubelet. Second, traffic flows between nodes and masters could be intercepted and modified. In this article, we will focus on the second scenario.

Currently, security in OpenShift is mainly handled by TLS and authentication/authorization mechanisms. The deployment of IPsec between nodes is proposed for further hardening. On the nodes, the Linux kernel provides multiple isolation techniques (namespaces, cgroups, SELinux) used by Docker and the SDN can be configured to further isolate network traffic between pods. For example, using the redhat/openshift-ovs-multitenant OpenVSwitch SDN plugin restricts traffic between pods to pods of their own project (a namespace in Kubernetes).

The aim of the implementation presented in this article is best described with the following picture, illustrating the resulting topology design with one master and two compute nodes.

Network topology of an OpenShift cluster with WireGuard mesh

WireGuard is a relatively new VPN implementation. People call it 'hyped', but its strengths legitimate the enthusiasm for its deployment, even though it is still considered experimental. WireGuard is not only considerably easier to deploy than other VPNs, benchmarks also show it is fast and performant. And thanks to its small code base, reviews and security audits can be performed. Lastly, it is foreseeable that the WireGuard kernel module might be included in future Linux releases.

Deploying the mesh with Ansible

Let's dive straight into the deployment. As mentioned, Ansible is used in this example setup for the orchestrated configuration of all nodes. We need a playbook, an inventory, SSH access to the nodes and WireGuard installed on the nodes. In my setup, CentOS served as the OS, with enabled EPEL and WireGuard repo.

As an example, the host_vars for the master would contain the following snippet to define a fixed overlay IP address for the wg0 interface. In this case, 192.168.66.1 is chosen, because it's the master node. You can choose any IP from a private range you wish, as long as the CIDR is correct (/32). One could even try IPv6 ULA ranges.. This must be adapted to all other nodes.

1
2
wireguard:
  address: 192.168.66.1/32

Now we need a way to configure each node to connect their WireGuard interfaces with all other nodes in the cluster, creating a full mesh. In the first two tasks, all nodes are set up to recognize use the WireGuard for each other peer's hostname. Where OpenShift would configure master.cluster as e.g. 1.2.3.4, we need master.cluster as 192.168.66.1.

After that, WireGuard is executed on the host to generate key pairs. The keys are stored in the Ansible fact cache during the run and would be recreated on a new execution of the playbook. In the end, the WireGuard configuration is rendered on every node, containing a list of all peers.

The recreation of key pairs is not a problem during runtime, since OpenShift is not dependable on the state of the wg0 interface, but solely uses it for traffic towards other nodes. The two parts (WireGuard mesh and the deployment of OpenShift) are independent from each other! In fact, you can test the rotation of keys by pinging other nodes, running the playbook and observe the non-existence of any interruptions. As a side note, calling wg on each node could also be done locally on the Ansible master, but executing it on the node has the advantage to detect errors if e.g. WireGuard is not installed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Snippet from the playbook setting up the WireGuard mesh

- name: remove FQDN localhost mapping from /etc/hosts
  lineinfile:
    path: /etc/hosts
    regexp: "^{{ item }} {{ ansible_fqdn }} {{ ansible_hostname }}"
    state: absent
  loop:
    - "127.0.0.1"
    - "::1"

- name: update /etc/hosts for all hosts in inventory
  lineinfile:
    path: /etc/hosts
    line: "{{ hostvars[item].wireguard.address[:-3] }} {{ hostvars[item].inventory_hostname }} {{ hostvars[item].inventory_hostname_short }}"
    state: present
  with_items: "{{ groups.all }}"

- name: Configure wireguard directory
  file:
    path: /etc/wireguard
    state: directory
    mode: 0700
    owner: root
    group: root

- name: Create wireguard key on host
  command: wg genkey
  register: wireguard_key

- shell: echo "{{ wireguard_key.stdout }}" | wg pubkey
  register: wireguard_pubkey

- set_fact:
    wireguard_key: "{{ wireguard_key.stdout }}"
    wireguard_pubkey: "{{ wireguard_pubkey.stdout }}"
    cachable: false

- name: Render wireguard configs
  template:
    src: wg0.j2
    dest: /etc/wireguard/wg0.conf
    owner: root
    group: root
    mode: 0600
  notify: restart wg0

The template for wg0.j2 looks like the following. Note the iteration over all nodes known to Ansible (the "all" group), ignoring the node which corresponds to the node on which the template is rendered.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# wg0.j2 to be rendered as /etc/wireguard/wg0.conf

[Interface]
ListenPort = 54396
PrivateKey = {{ wireguard_key }}
Address = {{ wireguard.address }}

{% for node in groups.all %}
{% if node != ansible_nodename %}
[Peer]
PublicKey = {{ hostvars[node].wireguard_pubkey }}
AllowedIPs = {{ hostvars[node].wireguard.address }}
Endpoint = {{ hostvars[node].ansible_host }}:54396
{% endif %}

{% endfor %}

After the template is rendered, each node would have one [Interface] section with its own private key and the address from the host_vars, and multiple (here: two) [Peer] sections with the public key, the routed IP address for this peer (called AllowedIPs in WireGuard) and the endpoint with the public IP address of the peer node.

Configuring OpenShift with openshift-ansible

Now that we have configured WireGuard channels between all nodes in the cluster, we can deploy OpenShift and configure it to use the encrypted channels as the default connection interfaces. In the next code block, the Ansible inventory for openshift-ansible is defined. There are multiple important components: opening of ports in the firewall, lowering the MTU of the overlay network interfacs and setting the IP addresses to the ones designated for the internal WireGuard network 192.168.66.0/24.

For reference, the lower MTU value is a solution to a known problem with failing builds: Builds on a Virtual Network are Failing. It might be possible to choose a higher value than 1300, since eth0 has a value of 1500 and wg0 a value of 1420. Please note that this inventory defines the list of node groups. If you wish to use other groups like node-config-infra as well, adapt accordingly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Ansible inventory for openshift-ansible

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=root
os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant
openshift_master_open_ports=[{"service":"wg0","port":"54396/udp"}]
openshift_node_open_ports=[{"service":"wg0","port":"54396/udp"}]

# Edit the default MTU on all nodes
openshift_node_groups=[{'name': 'node-config-master-infra',
  'labels': ['node-role.kubernetes.io/master=true',
  'node-role.kubernetes.io/infra=true'],
  'edits': [{ 'key': 'networkConfig.mtu', 'value': 1300}]},
  {'name': 'node-config-compute',
  'labels': ['node-role.kubernetes.io/compute=true'],
  'edits': [{ 'key': 'networkConfig.mtu',
  'value': 1300}]}]

openshift_deployment_type=origin
openshift_release="3.10"  # used in this case, more recent releases are available

[masters]
master.cluster openshift_public_ip=192.168.66.1 openshift_ip=192.168.66.1

[etcd]
master.cluster openshift_public_ip=192.168.66.1 openshift_ip=192.168.66.1

[nodes]
master.cluster openshift_node_group_name='node-config-master-infra' openshift_public_ip=192.168.66.1 openshift_ip=192.168.66.1
node1.cluster openshift_node_group_name='node-config-compute' openshift_public_ip=192.168.66.2 openshift_ip=192.168.66.2
node2.cluster openshift_node_group_name='node-config-compute' openshift_public_ip=192.168.66.3 openshift_ip=192.168.66.3

Running the playbooks playbooks/prerequisites.yml and playbooks/deploy_cluster.yml then deploys OpenShift. In this case, the result will be a cluster with three nodes: one master with infra containers and two compute nodes. The VXLAN overlay with an MTU of 1300 is routed over the WireGuard mesh, identified as wg0 interface on each node.