WIP 4 LXC hardening

This commit is contained in:
Quentin Dufour 2017-08-22 15:18:24 +02:00
parent eeb89b8cd6
commit f0d5fe98f0
1 changed files with 23 additions and 1 deletions

View File

@ -18,7 +18,7 @@ tags:
*A container. Photo by [Mr. Rollers](https://www.flickr.com/photos/mr-rollers/32972266123/). CC BY-NC-ND 2.0*
</div>
Hardening Linux Containers, and more especially [LXC containers](https://linuxcontainers.org/fr/lxc/introduction/), is needed to prevent a malicious user to escape your container. But before starting, we need to understand how containers work under the hood.
Hardening Linux Containers, and more especially [LXC containers](https://linuxcontainers.org/fr/lxc/introduction/), is needed to prevent a malicious user to escape your container. However, even hardened, a container can't be considered totally safe today. You can consider this article as part of a [defence in depth strategy](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)). But before starting, we need to understand how containers work under the hood.
As said by Jessie Frazelle in her blog post [Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs](https://blog.jessfraz.com/post/containers-zones-jails-vms/), containers in Linux are not a top level design like Zone in Solaris and Jails in BSD.
@ -70,6 +70,23 @@ It will launch your container in foreground (so you'll be able to see systemd lo
*The great puzzle of root. Photo by [Kevin Dooley](https://www.flickr.com/photos/pagedooley/14555354976). CC BY 2.0.*
</div>
Historically, there is a huge difference between the root user (with uid 0) which bypass any access control and the other users of the system which must pass every control. So, if you want to send an ICMP request via the `ping` command for example, you must run the command as root (with the magic of [setuid](https://en.wikipedia.org/wiki/Setuid) to enable non privileged users to launch it). As the command is launched as root for everyone, ping can load a kernel module, change the time on your system, erase every files, etc. That's dangerous, particularly if someone find a vulnerability in your command and use it to do a [privilege escalation](https://en.wikipedia.org/wiki/Privilege_escalation).
A good idea would be to only allow the ping command to execute actions related to network as root, not everything. You can do that by using capabilities, by giving the `CAP_NET_RAW` capability to your ping command.
But capabilities, and more precisely **capability bounding set**, can also be used to reduce the capabilities that any process of your container can inquire. Indeed, if you allow a process in your container to load kernel modules, what prevent him to load a faulty module enabling him to escape the container ? So, one way to prevent this catastrophic scenario is to drop `CAP_SYS_MODULE` from the capability bounding set. When you use `lxc.cap.keep` and `lxc.cap.drop`, you're modifying the capability bounding set of your container.
One capability is a bit special, `CAP_SYS_ADMIN`, as it is sometimes considered as ["the new root"](https://lwn.net/Articles/486306/) because of its large and not strictly defined scope. This capability is very useful because it permits to mount filesystems from the container. Unfortunately, it also enables interaction with ioctl, IPC resources, namespaces, etc. So, we want to drop this capability. So, we can just drop it ?
```ini
# /var/lib/harden/config
lxc.cap.drop = sys_admin
```
Now try to restart your container...
You can find the whole capability list in the dedicated man page [capabilities(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html) and how to use them with LXC in the LXC man page [lxc.container.conf(5)](https://linuxcontainers.org/fr/lxc/manpages//man5/lxc.container.conf.5.html#lbAV).
## cgroups: group your processes
![Lions](/assets/images/posts/harden-lions.jpg)
@ -97,6 +114,7 @@ Michael Kerrisk wrote an interesting [serie of articles about namespaces](https:
> The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
At first glance, namespaces handle could appear trivial in LXC: every available namespaces are used and that's all.
The reality is more complex.
## Seccomp: filter your syscalls
@ -117,3 +135,7 @@ At first glance, namespaces handle could appear trivial in LXC: every available
### prlimit
### /dev
### The bridge
ethtable