Pod network loss
Introduction
Pod network loss is a Kubernetes pod-level chaos fault that causes packet loss in a specific container by starting a traffic control (tc) process with netem rules to add egress (or ingress) loss.
Use cases
Pod network loss:
- Simulates a degraded network with varied percentages of dropped packets between microservices.
- Simulates loss of access to specific third-party or dependent services or components.
- Simulates blackhole against traffic to a given availability zone (that represents failure simulation of availability zones).
- Simulates network partitions (split-brain) between peer replicas for a stateful application.
- Tests the application's resilience to lossy (or flaky) network.
- Kubernetes> 1.16 is required to execute this fault.
- The application pods should be in the running state before and after injecting chaos.
Fault tunables
Optional tunables
Tunable | Description | Notes |
---|---|---|
NETWORK_INTERFACE | Name of the ethernet interface considered for shaping traffic. | For more information, go to network interface. |
TARGET_CONTAINER | Name of the container subject to network loss. Applicable for containerd and crio runtimes only. | With these runtimes, if the value is not provided, the fault injects chaos on the first container of the pod. For more information, go to target specific container. |
NETWORK_PACKET_LOSS_PERCENTAGE | Packet loss (in percentage). | Default: 100 %. For more information, go to network packet loss. |
CONTAINER_RUNTIME | Container runtime interface for the cluster. | Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime. |
SOCKET_PATH | Path of the containerd or crio or docker socket file. | Defaults to /run/containerd/containerd.sock . For more information, go to socket path. |
TOTAL_CHAOS_DURATION | Duration to inject insert chaos (in seconds). | Default: 60 s. For more information, go to duration of the chaos. |
TARGET_PODS | Comma-separated list of application pod names subject to pod network corruption. | If not provided, the fault selects target pods randomly based on provided appLabels. For more information, go to target specific pods. |
DESTINATION_IPS | IP addresses of the services or pods or the CIDR blocks(range of IPs) whose accessibility is impacted. Comma-separated IPs or CIDRs can be provided. | If the values are not provided, the fault induces network chaos for all IPs or destinations. For more information, go to destination IPs. |
DESTINATION_HOSTS | DNS names or FQDN names of the services whose accessibility is impacted. | If the values are not provided, the fault induces network chaos for all IPs or destinations or DESTINATION_IPS if already defined. For more information, go to destination hosts. |
SOURCE_PORTS | Ports of the target application, the accessibility to which is impacted | Comma separated port(s) can be provided. If not provided, it will induce network chaos for all ports. |
DESTINATION_PORTS | Ports of the destination services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted | Comma separated port(s) can be provided. If not provided, it will induce network chaos for all ports. |
PODS_AFFECTED_PERC | Percentage of total pods to target. Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time. |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution. |
Network packet loss
Network packet loss (in percentage) injected into the target application. Tune it by using the NETWORK_PACKET_LOSS_PERCENTAGE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# it injects network-loss for the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-loss
spec:
components:
env:
# network packet loss percentage
- name: NETWORK_PACKET_LOSS_PERCENTAGE
value: '100'
- name: TOTAL_CHAOS_DURATION
value: '60'
Destination IPs and destination hosts
Default IPs and hosts whose traffic is interrupted because of the network faults. Tune it by using the DESTINATION_IPS
and DESTINATION_HOSTS
environment variabes, respectively.
DESTINATION_IPS
: It contains the IP addresses of the services or pods or the CIDR blocks(range of IPs) whose accessibility is impacted.DESTINATION_HOSTS
: It contains the DNS names or FQDN names of the services whose accessibility is impacted.
The following YAML snippet illustrates the use of these environment variables:
# it injects the chaos for the egress traffic for specific ips/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-loss
spec:
components:
env:
# supports comma separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma separated destination hosts
- name: DESTINATION_HOSTS
value: 'nginx.default.svc.cluster.local,google.com'
- name: TOTAL_CHAOS_DURATION
value: '60'
Source And Destination Ports
By default, the network experiments disrupt traffic for all the source and destination ports. The interruption of specific port(s) can be tuned via SOURCE_PORTS
and DESTINATION_PORTS
ENV.
SOURCE_PORTS
: It contains ports of the target application, the accessibility to which is impactedDESTINATION_PORTS
: It contains the ports of the destination services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted
Use the following example to tune this:
# it inject the chaos for the egress traffic for specific ports
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-loss
spec:
components:
env:
# supports comma separated source ports
- name: SOURCE_PORTS
value: '80'
# supports comma separated destination ports
- name: DESTINATION_PORTS
value: '8080,9000'
- name: TOTAL_CHAOS_DURATION
value: '60'
Network interface
Name of the ethernet interface considered to shape the traffic. Its default value is eth0
. Tune it by using the NETWORK_INTERFACE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# provide the network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-loss
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: TOTAL_CHAOS_DURATION
value: '60'
Container runtime and socket path
The CONTAINER_RUNTIME
and SOCKET_PATH
environment variables to set the container runtime and socket file path, respectively.
CONTAINER_RUNTIME
: It supportsdocker
,containerd
, andcrio
runtimes. The default value iscontainerd
.SOCKET_PATH
: It contains path of containerd socket file by default(/run/containerd/containerd.sock
). Fordocker
, specify path as/var/run/docker.sock
. Forcrio
, specify path as/var/run/crio/crio.sock
.
The following YAML snippet illustrates the use of these environment variables:
## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-loss
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: 'containerd'
# path of the socket file
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'