Rewording

This commit is contained in:
Quentin 2021-07-14 23:33:01 +02:00
parent 765e046ff3
commit 5579c70d92
Signed by untrusted user: quentin
GPG key ID: A98E9B769E4FF428

View file

@ -18,7 +18,7 @@ As said by Jessie Frazelle in her blog post [Setting the Record Straight: contai
> A "container" is just a term people use to describe a combination of Linux namespaces and cgroups. Linux namespaces and cgroups ARE first class objects. NOT containers.
The challenge when it comes to hardening a LXC container, compared to other solutions, is that there is a great probability that you'll run systemd in your container. And systemd heavily uses the primitives quoted before. Especially, systemd rely on *cgroups* to handle its services. We can also mention that many systemd daemon will be provided with a configuration that need to interact with the *capabilities*.
The challenge when it comes to hardening a LXC container, compared to other solutions, is that there is a great probability that you'll run systemd in your container. And systemd heavily uses the primitives quoted before. Especially, systemd relies on *cgroups* to handle its services. We can also mention that many systemd daemon will be provided with a configuration that need to interact with the *capabilities*.
In this article, we'll boot a container running systemd without `CAP_SYS_ADMIN`, all other hardenings being out of scope for now.
@ -26,7 +26,7 @@ If you feel a bit lost with containers, a good start is the reading of this whit
## Creating a standard LXC container
Before starting, you'll need a very recent version of LXC, at least lxc-2.0.9.
Before starting, you'll need at least lxc-2.0.9.
In any case, compiling LXC is quite straightforward.
Here is a quick reminder on how to compile LXC:
@ -51,29 +51,35 @@ As you'll need to debug the launch of your container, I can only recommend you t
sudo lxc-start -n harden -lDEBUG -F
```
It will launch your container in foreground (so you'll be able to see systemd logs at boot) and it will log many useful informations in the `/var/log/lxc/harden.log` file.
This command will launch your container in foreground (so you'll be able to see systemd logs at boot) and it will log many useful informations in the `/var/log/lxc/harden.log` file.
## Capabilities: split the root
Historically, there is a huge difference between the root user (with uid 0) which bypass any access control and the other users of the system which must pass every control. So, if you want to send an ICMP request via the `ping` command for example, you must run the command as root (with the magic of [setuid](https://en.wikipedia.org/wiki/Setuid) to enable non privileged users to launch it). As the command is launched as root for everyone, ping can load a kernel module, change the time on your system, erase every files, etc. That's dangerous, particularly if someone find a vulnerability in your command and use it to do a [privilege escalation](https://en.wikipedia.org/wiki/Privilege_escalation).
Historically, there is a huge difference between the root user (with uid 0) which bypasses any access control and the other users of the system which must pass every control. So, if you want to send an ICMP request via the `ping` command for example, you must run the command as root (with the magic of [setuid](https://en.wikipedia.org/wiki/Setuid) to enable non privileged users to launch it). As the command is launched as root for everyone, ping can load a kernel module, change the time on your system, erase every files, etc. That's dangerous, particularly if someone finds a vulnerability in your command and uses it to do a [privilege escalation](https://en.wikipedia.org/wiki/Privilege_escalation).
A good idea would be to only allow the ping command to execute actions related to network as root, not everything. You can do that by using capabilities, by giving the `CAP_NET_RAW` capability to your ping command.
A good idea would be to only allow the ping command to execute actions related to network as root, not everything. You can do that with capabilities, by giving the `CAP_NET_RAW` capability to your ping command.
But capabilities, and more precisely **capability bounding set**, can also be used to reduce the capabilities that any process of your container can inquire. Indeed, if you allow a process in your container to load kernel modules, what prevent him to load a faulty module enabling him to escape the container ? So, one way to prevent this catastrophic scenario is to drop `CAP_SYS_MODULE` from the capability bounding set. When you use `lxc.cap.keep` and `lxc.cap.drop`, you're modifying the capability bounding set of your container.
You can show your current **capability bounding set** with the following command:
But capabilities, and more precisely **capability bounding set**, can also be used to reduce the capabilities that any process of your container can inquire. Indeed, if you allow a process in your container to load kernel modules, what prevent it to load a faulty module enabling the attacker to escape the container? So, one way to prevent this catastrophic scenario is to drop `CAP_SYS_MODULE` from the capability bounding set. When you use `lxc.cap.keep` and `lxc.cap.drop`, you're modifying this capability bounding set of your container.
Let's start by displaying your current **capability bounding set**:
```bash
capsh --print
```
One capability is a bit special, `CAP_SYS_ADMIN`, as it is sometimes considered as ["the new root"](https://lwn.net/Articles/486306/) because of its large and not strictly defined scope. This capability is very useful because it permits to mount filesystems from the container. Unfortunately, it also enables interaction with ioctl, IPC resources, namespaces, etc. So, we want to drop this capability. Can we just drop it ?
Over all the existing capabilities, one is a bit special: `CAP_SYS_ADMIN`.
It is considered by somes as ["the new root"](https://lwn.net/Articles/486306/) because of its large and not strictly defined scope.
This capability is also very useful because it is needed to mount filesystems from the container.
Unfortunately, it enables interactions with critical API of the kernel like ioctl, IPC resources, namespaces, etc.
Considering the power of this capability, we want to drop it in out container.
But can we only do it?
```ini
# /var/lib/harden/config
lxc.cap.drop = sys_admin
```
Now try to restart your container... we can't just drop the capability:
Now try to restart your container... and enjoy the crash:
```raw
Failed to mount tmpfs at /dev/shm: Operation not permitted
@ -145,10 +151,10 @@ Now, we call this script from our configuration:
lxc.hook.mount = /usr/local/bin/mount-cgroup
```
And now, your container is working !
And finally your container is working !
But instead of creating a capabilities blacklist, can we create a capabilities whitelist ?
Yes, we can:
But one more thing: instead of creating a capabilities blacklist, can we create a more secure whitelist ?
The answer is yes:
```ini
lxc.cap.keep =