WIP hardening

2017-10-09 16:53:10 +02:00 · 2017-10-09 16:53:10 +02:00 · d8c5da0487
commit d8c5da0487
parent f0d5fe98f0
1 changed files with 86 additions and 2 deletions
--- a/_posts/2017-08-22-hardening-lxc-containers.md
+++ b/_posts/2017-08-22-hardening-lxc-containers.md
@ -75,15 +75,99 @@ Historically, there is a huge difference between the root user (with uid 0) whic
 A good idea would be to only allow the ping command to execute actions related to network as root, not everything. You can do that by using capabilities, by giving the `CAP_NET_RAW` capability to your ping command.

 But capabilities, and more precisely **capability bounding set**, can also be used to reduce the capabilities that any process of your container can inquire. Indeed, if you allow a process in your container to load kernel modules, what prevent him to load a faulty module enabling him to escape the container ? So, one way to prevent this catastrophic scenario is to drop `CAP_SYS_MODULE` from the capability bounding set. When you use `lxc.cap.keep` and `lxc.cap.drop`, you're modifying the capability bounding set of your container.
+You can show your current **capability bounding set** with the following command:

-One capability is a bit special, `CAP_SYS_ADMIN`, as it is sometimes considered as ["the new root"](https://lwn.net/Articles/486306/) because of its large and not strictly defined scope. This capability is very useful because it permits to mount filesystems from the container. Unfortunately, it also enables interaction with ioctl, IPC resources, namespaces, etc. So, we want to drop this capability. So, we can just drop it ?
+```bash
+capsh --print
+```
+
+One capability is a bit special, `CAP_SYS_ADMIN`, as it is sometimes considered as ["the new root"](https://lwn.net/Articles/486306/) because of its large and not strictly defined scope. This capability is very useful because it permits to mount filesystems from the container. Unfortunately, it also enables interaction with ioctl, IPC resources, namespaces, etc. So, we want to drop this capability. Can we just drop it ?

 ```ini
 # /var/lib/harden/config
 lxc.cap.drop = sys_admin
 ```

-Now try to restart your container...
+Now try to restart your container... we can't just drop the capability:
+
+```raw
+Failed to mount tmpfs at /dev/shm: Operation not permitted
+Failed to mount tmpfs at /run: Operation not permitted
+Failed to mount tmpfs at /sys/fs/cgroup: Operation not permitted
+Failed to mount cgroup at /sys/fs/cgroup/systemd: No such file or directory
+[!!!!!!] Failed to mount API filesystems, freezing.
+Freezing execution.
+```
+
+It looks like the only solution is to manually mount these folders before systemd execution.
+The operation will be slightly different from what Christian Seiler wrote as our kernel supports the cgroup namespace.
+Indeed, the following directive will do nothing:
+
+```ini
+# /var/lib/harden/config
+lxc.mount.entry = cgroup:mixed
+```
+
+Here is why in the code:
+
+```c
+/*
+ /src/lxc/cgroups/cgfsng.c
+ /src/lxc/cgroups/cgfs.c
+*/
+static bool cgfsng_mount(void *hdata, const char *root, int type)
+{
+  /* some initializations */
+  if (cgns_supported())
+    return true;
+  /* rest of the function */
+}
+```
+
+Developpers put this condition as, with the cgroup namespace, we can safely mount the cgroup hierarchy like any other filesystem in our LXC configuration file:
+
+<pre style="white-space: pre">
+# /var/lib/harden/config
+lxc.mount.entry = tmpfs dev/shm tmpfs rw,nosuid,nodev,create=dir 0 0
+lxc.mount.entry = tmpfs run tmpfs rw,nosuid,nodev,mode=755,create=dir 0 0
+lxc.mount.entry = tmpfs run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,create=dir 0 0
+lxc.mount.entry = tmpfs run/user tmpfs rw,nosuid,nodev,mode=755,size=50m,create=dir 0 0
+lxc.mount.entry = tmpfs sys/fs/cgroup tmpfs rw,nosuid,nodev,create=dir 0 0
+</pre>
+
+But to mount our cgroup hierarchy (we only need one, for systemd), we need to create the mount point first... We can't put the following line:
+
+<pre style="white-space: pre">
+# /var/lib/harden/config
+lxc.mount.entry = cgroup sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
+</pre>
+
+Instead, the only solution I found was to create a (simple) [LXC mount hook](https://linuxcontainers.org/lxc/manpages//man5/lxc.container.conf.5.html#lbBC):
+
+```bash
+#!/bin/bash
+# /usr/local/bin/mount-cgroup on the host
+mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
+mount cgroup $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd \
+  -t cgroup \
+  -o rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
+```
+
+Now, we call this script from our configuration:
+
+```ini
+# /var/lib/harden/config
+lxc.hook.mount = /usr/local/bin/mount-cgroup
+```
+
+And now, your container is working !
+
+But instead of creating a capabilities blacklist, can we create a capabilities whitelist ?
+
+```ini
+lxc.cap.keep =
+lxc.cap.keep = chown ipc_lock ipc_owner kill net_admin net_bind_service
+```

 You can find the whole capability list in the dedicated man page [capabilities(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html) and how to use them with LXC in the LXC man page [lxc.container.conf(5)](https://linuxcontainers.org/fr/lxc/manpages//man5/lxc.container.conf.5.html#lbAV).