Analysis on the safety of Docker

Category:

1. docker run syntax
grammar:

docker run [OPTIONS] IMAGE [COMMAND] [ARG…]

2. Docker operation security related parameters
2.1 Enable AppArmor
The main function of AppArmor is to set the access control authority of an executable program, which can restrict the program to read/write a certain directory/file, open/read/write network ports, and so on.

Apparmor configuration files are stored in the /etc/apparmor.d/containers/ directory

The configuration file uses the Nginx configuration example under the official document, see https://docs.docker.com/engine/security/apparmor/

Load a new configuration file

$ sudo apparmor_parser -r -W /etc/apparmor.d/containers/docker-nginx

Upload new configuration file

$ apparmor_parser -R /path/to/profile

Use AppArmor in Docker

$ docker run --security-opt "apparmor=docker-nginx" -p 80:80 -d --name apparmor-nginx nginx

2.2 Enable SELinux
Configuration steps

Enable SELinux on the host, enable SELinux for the Docker daemon, and start the container by default to enable SELinux

[root@localhost selinux]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
[root@localhost selinux]# docker info
...
init version: fec3683
Security Options:
seccomp
WARNING: You're not using the default seccomp profile
Profile: /etc/docker/seccomp/default-no-chmod.json
selinux
userns
Kernel Version: 3.10.0-1062.12.1.el7.x86_64
...

Test, mount the host/directory to the container’s /hacking directory

[root@localhost selinux]# docker run -it --rm -v /:/hacking centos:latest /bin/sh
sha256:fe8d824220415eed5477b63addf40fb06c3b049404242b31982106ac204f6700
Status: Downloaded newer image for centos:latest
sh-4.4# cd /
sh-4.4# ls
bin dev etc hacking home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var
sh-4.4# cd hacking/
sh-4.4# ls
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
sh-4.4# cd var/log
sh-4.4# ls
ls: cannot open directory '.': Permission denied

The operating parameters can choose the level of SELinux. The label contains 4 parts User:Role:Type:level. You can set different labels of SELinux to set the operating level.

docker run --security-opt label=type:spc_t replicated

docker run --interactive --tty --security-opt label=level:TopSecret centos /bin/bas

docker run -it --security-opt label:disable alpine sh

label

[root@localhost ~]# ls /etc/selinux/targeted/contexts/files/
file_contexts file_contexts.homedirs file_contexts.subs media
file_contexts.bin file_contexts.homedirs.bin file_contexts.subs_dist
[root@localhost ~]# cat /etc/selinux/targeted/contexts/files/file_contexts

2.3 Restrict the kernel functions of running containers

The Linux kernel is able to decompose the privileges of the root user into different units called functions. For example, the CAP_CHOWN function allows the root user to make arbitrary changes to the file UID and GID. The CAP_DAC_OVERRIDE function allows the root user to bypass the kernel permission check for file read, write and execute operations. Almost all special functions related to the Linux root user are broken down into individual functions.

More fine-grained functional restrictions can:

Remove individual functions from the root user account to make them less functional/dangerous.

Add privileges to non-root users at a very granular level.

Functions apply to files and threads. The file function allows users to execute programs with higher privileges. This is similar to how the setuid bit works. The thread function tracks the current state of the functions in the running program.

By default, Docker uses a whitelist method to delete all functions except the required functions.

Start command

docker run --rm -it --cap-drop $CAP alpine sh

docker run --rm -it --cap-add $CAP alpine sh

docker run --rm -it --cap-drop ALL --cap-add $CAP alpine sh

$CAP contains http://man7.org/linux/man-pages/man7/capabilities.7.html. Among the nearly 40 Capabilities of Linux, Docker only supports 14 of the basic Capabilities to ensure the security of the container. Capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_MKNOD, FOWNER, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP, NET_BIND_SERVICE, SYS_CHROOT, KILL and AUDIT_WRITE.

Test command

$ docker container run --rm -it alpine chown nobody /

$ docker container run --rm -it --cap-drop ALL --cap-add CHOWN alpine chown nobody /

$ docker container run --rm -it --cap-drop CHOWN alpine chown nobody /

$ docker container run --rm -it --cap-add chown -u nobody alpine chown nobody /

2.4 Run the container without using privileged mode

Privileged mode parameter –privileged, allows users in the container to directly access the resources of the host when running a privileged container. Therefore, by abusing the privileged container, an attacker can gain access to the resources of the host. After the privileged container is created, due to the many enhanced permissions, the attacker may run the code with root permissions. This shows that the attacker can run the host with root privileges, including CAP_SYS_ADMIN.

After the attacker gains access to the exposed privileged container, he can launch many further attacks. The attacker can identify the software running on the host, and find and exploit related vulnerabilities. You can also exploit container software vulnerabilities or misconfigurations, such as the use of weak credentials or containers without authentication. Since the attacker has root access, the malicious code or mining machine can be executed and effectively hidden.

test

Create a container:

docker run -d -name centos7 --privileged=true centos:7 /usr/sbin/init

Enter the container:

docker exec -it centos7 /bin/bash

 

2.5 To restrict the container from acquiring new privileges, use –security-opt=no-new-privileges
The meaning is as follows:

The process can set the no_new_privs bit in the kernel, which persists between fork, clone and exec.

The no_new_privs bit ensures that the process or its children will not obtain any other privileges.

After setting the no_new_privs bit, the process cannot cancel the setting.

Even if the process is executed using a setuid binary file or executable file with the file function bit set, the process with no_new_privs is not allowed to change the uid/gid or obtain any other functions.

no_new_privs can also prevent Linux security modules (LSM) such as SELinux from transitioning to process labels that do not allow access to the current process. This means that SELinux processes are only allowed to be converted to process types with less privileges.

testnnp.c, print uid information

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main(int argc, char *argv[])
{
        printf("Effective uid: %d\n", geteuid());
        return 0;
}

Compile

[root@localhost ~]# make testnnp
cc testnnp.c -o testnnp

Make a mirror, dockerfile

FROM fedora:latest
ADD testnnp /root/testnnp
RUN chmod +s /root/testnnp
ENTRYPOINT /root/testnn

Make a test image

[root@localhost ~]# docker build -t testnnp .

test

# docker run -it --rm --user=1000  testnnp

Use no-new-privileges

# docker run -it --rm --user=1000 --security-opt=no-new-privileges testnnp

2.6 Use cgroup to ensure that the container runs in the defined cgroup

Without the –cgroup-parent option, the control group CGroups is another important feature of the Linux kernel, which is mainly used to implement resource restriction and auditing.

A control group (cgroup) is a function of the Linux kernel that allows you to restrict access to system resources (such as CPU, RAM, IOPS, and network) by accessing processes and containers.

$ docker run -d –name='low_prio' –cpuset-cpus=0 –cpu-shares=20 busybox md5sum /dev/urandom

2.7 Enable seccomp

When talking about the security of the docker daemon, I said that seccomp is a group kernel security policy. Different policies have different names. You can specify the security policy used when docker is running, instead of using the default policy set by the docker daemon. seccomp=unconfined means that seccomp is not enabled. The following are the startup commands that are not recommended.

$ docker container run --rm -it --security-opt seccomp=unconfined debian:jessie sh

Specified sccomp

$ docker run --rm -it --security-opt seccomp=default-no-chmod.json alpine sh

2.8 Run the container without mounting the system partition of the host

include:

/

/boot

/dev

/etc

/lib

/proc

/sys

/usr

2.9 Mount the container root directory in read-only mode –read-only

Using –read-only will restrict the running container’s root directory as read-only

[root@localhost ~]# docker run -it --read-only 72300a873c2c /bin/bash
root@f077b480dbe5:/# ls bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var root@f077b480dbe5:/# touch 111
touch: cannot touch '111': Read-only file system

If the directory that needs to be mounted has read and write permissions, you can use more detailed permissions control, such as mounting a specific directory with read and write permissions.

docker run --interactive --tty --read-only -v /opt/app/data:/run/app/data:rw centos /bin/bash

2.10 Do not mount the host’s docker.sock to any container

After installing Docker, the Docker daemon will listen to the Unix domain socket: /var/run/docker.sock.

The following command is used to run the container in interactive mode (in which mode you will enter the container directly) and bind docker.sock.

# docker run -v /var/run/docker.sock:/var/run/docker.sock -ti alpine sh

After binding the Docker socket, the permissions of the container will be very high and can control the Docker daemon.

2.11 Do not use shared mount propagation mode

Do not use the configuration of –volume=/hostPath:/containerPath:shared

# docker run <Run arguments> --volume=/hostPath:/containerPath:shared <Container Image Name or ID> <Command> 

2.12 Do not directly mount the host device, if necessary, set it to read-only

Avoid direct modification of device information by users in the container.

docker run --interactive --tty --device=/dev/tty0:/dev/tty0:rw --device=/dev/temp_sda:/dev/temp_sda:r centos bash 

2.13 The on-failure container restart policy is set to 5

By using the –restart flag in the docker run command, you can specify a restart strategy to specify how the container should restart if it fails to start. You should choose the onfailure restart strategy and limit the restart attempts to 5 times.

If you try to start a container indefinitely, it may cause a denial of service on the host, especially if there are multiple containers on the same host. In addition, ignoring the exit status of the container and always trying to restart the container will make it impossible to investigate the root cause of the container termination. If a container is terminated, you should investigate the reason behind it, not just try to restart it indefinitely. The restart on failure policy should be used to limit the number of container restarts to a maximum of 5 attempts.

docker run --detach --restart=on-failure:5 nginx

2.14 Do not use cyberspace sharing

The –net=hostHost mode does not create an isolated network environment for the container. In this mode, the Docker container and the host host share the same network namespace, so the container can use the host’s eth0 like the host to achieve and External communication, characteristics:

The container in this mode does not have an isolated network namespace

The IP address of the container is the same as the IP address of the Docker host

It should be noted that the port number of the service in the container cannot conflict with the port number already used on the Docker host

Host mode can coexist with other modes

2.15 Host process namespace is not shared, disable –pid=host

By default, all containers have PID namespaces enabled. The PID namespace provides separation of processes. The PID namespace deletes the system process view, allowing process IDs to be reused, including pid 1.

In some cases, the container is required to share the host process namespace, which basically allows the processes in the container to view all processes of the host. For example, a container with debugging tools is built, and you want to use these tools in the container to debug the host’s process.

If you use the –pid=host parameter, the container can directly manipulate the data on the host. If the dockerd daemon has set the user namespace mapping, using this parameter when running the container will cause the startup to fail.

docker run --interactive --tty --pid=host centos /bin/bash 

2.16 Host IPC namespace is not shared, disable –ipc=host

The process shares memory with a single “supervisor” process in order to exchange data (by using shared buffers). This solution is implemented for performance requirements.
–ipc=MODE: Set the IPC mode of the container, shareable its own private IPC name space, which can be shared with other containers. host: Use the IPC name space of the host system.

The host mode is forbidden, the following is an example of an error:

docker run --interactive --tty --ipc=host centos /bin/bash 

Can use shared IPC with other containers

docker run --ipc=container:<id> <image>
docker run -d --ipc=shareable data-server
docker run -d --ipc=container:data-server data-client

2.17 Do not share the host UTS namespace, disable –uts=host

The UTS namespace is used to set the host name and the domains visible to the running processes in the namespace. By default, all containers, including those running with –network=host, have their own UTS namespace. Setting UTS to host will make the container use the same UTS namespace as the host. Note that –hostname is invalid in host UTS mode.

When you want to change the hostname on the host, you also change the same hostname to the container.

2.18 Do not share host user namespace, disable –users=host
By default, the Docker daemon runs as root. This allows the daemon to create and use the kernel structure needed to start the container. However, it also has potential security risks.

The default container runs as the root account

# docker run --rm alpine id

# docker run --rm --user 1000:1000 alpine id

# docker run --rm --privileged --userns=host alpine id

2.19 Limit the memory and CPU usage of containers when running

Parameters commonly used to limit CPU and memory

-c –cpu-shares parameter can only limit the proportion of CPU used by the container

–cpus is followed by a floating point number, which represents the maximum number of cores used by the container, which can be accurate to two decimal places, which means that the container can use a minimum of 0.01 core CPU

-m –memory: The maximum memory size that the container can use, the minimum is 4m

–memory-swap: The swap size that the container can use

2.20 Docker runs to override the default ulimit when necessary

The docker daemon can be configured with default restrictions. If necessary, you can use the docker run command to override the default

# docker run --ulimit nofile=1024:1024 --interactive --tty centos /bin/bash 

2.21 Use –pids-limit to limit the number of processes generated within a specified time

Using the PID cgroup parameter –pids-limit can prevent logical attacks by limiting the number of processes that may be created inside the internal container within the specified time range.

# docker run -it --pids-limit 100 <Image_ID>

2.22 Check the running status of the container when running, use the –health-cmd parameter

Used to check the running status of the container

# docker run -d --name db --health-cmd "curl --fail http://localhost:8091/pools || exit 1" --health-interval=5s --timeout=3s arungupta/couchbase

2.23 Incoming container traffic is bound to a specific host port

Specify to map to a specific network port on the host:

docker run --detach --publish 10.2.3.4:49153:80 nginx 

2.24 Do not use docker’s default bridge network docker0

Docker connects the virtual interface created in bridge mode to a universal bridge named docker0. This default network model is vulnerable to ARP spoofing and MAC flooding attacks because no filtering is applied to it.

The actual network is usually configured with the network of the orchestration system.

2.25 Use ports greater than 1024, and the container only maps ports that must be used
Ports lower than 1024 are usually used for system services. Using ports lower than 1024 may conflict with host services, except for 80 and 443. Container service external mapping ports should only map ports that must be open.

2.26 Ensure that Docker commands always use the latest version of their image
Use the latest version of the mirror to avoid introducing vulnerabilities.

2.27 Use minimized containers to ensure that there are no redundant components or services
Such as SSH, Telnet or other unnecessary components or services.

2.28 The docker exec command does not use the –privileged option
Use the –privileged option in the docker exec command to provide extended Linux functions for the command.

2.29 docker exec command does not use –user=root option
Using the –user=root option in the docker exec command will execute commands in the container as the root user. For example, if the container is running as a tomcat user (or any other non-root user), you can use the –user=root option to run commands as root via docker exec.

3. Description of other parameters

[OPTIONS] Parameter Description:
–add-host list Add custom host to ip mapping (writing format: host:ip)
-a, –attach list Attach to STDIN, STDOUT or STDERR
–blkio-weight uint16 Block IO (relative weight), the value is between 10 and 1000, 0 is disabled (default 0)
–blkio-weight-device list Block IO weight (weight relative to the device) (default is in the form of an array)
–cap-add list Add Linux features
–cap-drop list Remove Linux features
–cgroup-parent string Optional parent control item for the container
–cidfile string Write container ID to file
–cpu-period int Limit CPU CFS (Completely Fair Scheduler) cycles
–cpu-quota int Limit CPU CFS (Completely Fair Scheduler) upper limit
–cpu-rt-period int Limit CPU runtime cycles (in microseconds)
–cpu-rt-runtime int Limit CPU real-time running time (in microseconds)
-c, –cpu-shares int CPU sharing (relative weight setting)
–cpus decimal Set the number of cpu
–cpuset-cpus string Allowed cpu (0-3,0,1)
–cpuset-mems string Allowed MEMs (0-3,0,1)
-d, –detach Run the container in the background and print the container ID
–detach-keys string Override the key sequence of the separated container
–device list Add a host device to the container
–device-cgroup-rule list Add one or more rules to the list of devices allowed by the cgroup
–device-read-bps list Limit the reading rate of the device (unit: byte/s) (default is [])
–device-read-iops list Limit the reading rate of the device (unit: IO/s) (default is [])
–device-write-bps list Limit the write rate of the device (unit: byte/s) (default is [])
–device-write-iops list Limit the write rate of the device (unit: IO/s) (default is [])
–disable-content-trust Skip image verification (default is true)
–dns list Set up a custom DNS server
–dns-option list Set DNS options
–dns-search list Set up a custom DNS search domain
–entrypoint string Override the default entry point of the mirror
-e, –env list Set environment variables
–env-file list Read the contents of environment variables
–expose list Expose one port or multiple ports
–group-add list Add another group to join
–health-cmd string Command to run to check health
–health-interval duration Time between running checks (ms
–health-retries int Continuous failures need to be reported as unhealthy
–health-start-period duration Start cycle of container initialization before starting health retry countdown (ms
–health-timeout duration The maximum time value of the health check operation. The format is: (ms
–help Print out usage
-h, –hostname string Define the container hostname
–init Run initialization in the container to forward the signal and get the process
-i, –interactive Keep STDIN open even if there is no connection
–ip string Set the IPv4 address of the container (for example, 192.168.155.139)
–ip6 string Set IPv6 address (for example, 2001:db8::33)
–ipc string Use IPC mode
–isolation string Container isolation technology
–kernel-memory bytes Kernel memory limit
-l, –label list Set metadata on the container
–label-file list Read in a label file separated by lines
–link list Add a link to another container
–link-local-ip list Container IPv4/IPv6 link local address
–log-driver string Set the log driver of the container
–log-opt list Set log drive options
–mac-address string Configure the container MAC address (for example, 92:d0:c6:0a:29:33)
-m, –memory bytes Set memory limit
–memory-reservation bytes Soft memory limit
–memory-swap bytes The swap limit is equal to memory plus swap:’-1′ to enable unlimited swap
–memory-swappiness int Optimize container memory swap (0 to 100) (default is -1)
–mount mount Attach the file system mount to the container
–name string Specify a name for the container
–network string Connect the container to the network
–network-alias list Add an alias to the network connected by the container
–no-healthcheck Prohibit any container to specify HEALTHCHECK
–oom-kill-disable Prohibit OOM incidents to be killed
–oom-score-adj int Optimize the OOM event of the host, the parameter range (-1000 to 1000)
–pid string Set PID naming
–pids-limit int Optimize container pid limit (unlimited if set -1)
–privileged Grant container extension permissions
-p, –publish list Publish the port of the container to the host
-P, –publish-all Publish all public ports to random ports
–read-only Mount the root file system of the container as read-only (more on this later)
–restart string Configure the restart strategy of the container, restart when the container exits (default is “no”)
–rm Automatically remove this container when it exits
–runtime string Use container runtime
–security-opt list Specify the security items for docker startup
–shm-size bytes The size of /dev/shm (this can make its capacity expand dynamically)
–sig-proxy Set proxy to receive signals from Beijing (default is true)
–stop-signal string Signal to stop the container (default is “SIGTERM”)
–stop-timeout int Set a timeout to stop the container (in seconds)
–storage-opt list Set storage driver options for the container
–sysctl map Specify system control items (default is map[] format)
–tmpfs list Mount the tmpfs directory
-t, –tty Assign a client to the current container
–ulimit ulimit Start the items that need to be restricted (the default is in the form of an array)
-u, –user string User name or UID (format: <name
–userns string Use user namespace
–uts string Use UTS namespace
-v, –volume list Binding mount volume (about container volume, it will be explained in detail in Docker container data volume)
–volume-driver string Optional volume driver for containers
–volumes-from list Specify container mount volume
-w, –workdir string Working directory in the container

Reviews

There are no reviews yet.

Be the first to review “Analysis on the safety of Docker”

Your email address will not be published. Required fields are marked *