ECS container network latency
ECS container network latency disrupts the state of infrastructure resources. It brings delay on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.
- It causes network stress on the containers of the ECS task using the given
CLUSTER_NAME
environment variable for a specific duration. - To select the Task Under Chaos (TUC), use the service name associated with the task. If you provide the service name along with the cluster name, all the tasks associated with the given service will be selected as chaos targets.
- It tests the ECS task sanity (service availability) and recovery of the task containers subject to network stress.
Usage
View fault usage
Prerequisites
- Kubernetes >= 1.17
- Adequate AWS access to stop and start an EC2 instance.
- Create a Kubernetes secret that has the AWS access configuration(key) in the
CHAOS_NAMESPACE
. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXX
It is recommended to use the same secret name, i.e.
cloud-secret
. Otherwise, you will need to update theAWS_SHARED_CREDENTIALS_FILE
environment variable in the fault template and you may be unable to use the default health check probes.Refer to AWS Named Profile For Chaos to know how to use a different profile for AWS faults.
Permissions required
Here is an example AWS policy to execute the fault.
View policy for the fault
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"ecs:UpdateContainerInstancesState",
"ecs:RegisterContainerInstance",
"ecs:ListContainerInstances",
"ecs:DeregisterContainerInstance",
"ecs:DescribeContainerInstances",
"ecs:ListTasks",
"ecs:DescribeClusters"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}
Refer to the superset permission/policy to execute all AWS faults.
Default validations
The ECS container instance should be in a healthy state.
Fault tunables
Fault tunables
Mandatory fields
Variables | Description | Notes |
---|---|---|
CLUSTER_NAME | Name of the target ECS cluster. | For example, cluster-1 . |
REGION | Region name of the target ECS cluster. | For example, us-east-1 . |
Optional fields
Variables | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. |
CHAOS_INTERVAL | Interval between successive instance terminations (in seconds). | Defaults to 30s. |
AWS_SHARED_CREDENTIALS_FILE | Path to the AWS secret credentials. | Defaults to /tmp/cloud_config.yml . |
NETWORK_LATENCY | Latency you wish to induce within the service (in milliseconds). | Defaults to 2000 ms. |
DESTINATION_IPS | IP addresses of the services or the CIDR blocks(range of IPs), the accessibility to which is impacted | comma-separated IP(S) or CIDR(S) can be provided. if not provided, it will induce network chaos for all ips/destinations |
DESTINATION_HOSTS | DNS Names of the services, the accessibility to which, is impacted | if not provided, it will induce network chaos for all ips/destinations or DESTINATION_IPS if already defined |
NETWORK_INTERFACE | Name of ethernet interface considered for shaping traffic | Defaults to eth0 |
JITTER | Specify the value of jitter. | Defaults to 0. |
SEQUENCE | It defines sequence of chaos execution for multiple instance | Defaults to parallel. Supports serial sequence as well. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 |
Fault examples
Common and AWS-specific tunables
Refer to the common attributes and AWS-specific tunables to tune the common tunables for all faults and aws specific tunables.
Network latency
It defines the network latency(in ms) to be injected in the targeted container. You can tune it using the NETWORK_LATENCY
ENV.
Use the following example to tune it:
# injects network latency for a certain chaos duration
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-container-network-latency
spec:
components:
env:
# network latency to be injected
- name: NETWORK_LATENCY
value: '2000' #in ms
- name: TOTAL_CHAOS_DURATION
value: '60'
Network interface
The defined name of the ethernet interface, which is considered for shaping traffic. You can tune it using the NETWORK_INTERFACE
ENV. Its default value is eth0
.
Use the following example to tune it:
# provide the network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-container-network-latency
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: TOTAL_CHAOS_DURATION
value: '60'
Jitter
It defines the jitter (in ms), a parameter that allows introducing a network delay variation. You can tune it using the JITTER
ENV. Its default value is 0
.
Use the following example to tune it:
# provide the network latency jitter
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-container-network-latency
spec:
components:
env:
# value of the network latency jitter (in ms)
- name: JITTER
value: '200'
Destination IPs and destination hosts
The network faults interrupt traffic for all the IPs/hosts by default. The interruption of specific IPs/Hosts can be tuned via DESTINATION_IPS
and DESTINATION_HOSTS
ENV.
DESTINATION_IPS
: It contains the IP addresses of the services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted.DESTINATION_HOSTS
: It contains the DNS Names/FQDN names of the services, the accessibility to which, is impacted.
Use the following example to tune it:
# it inject the chaos for the egress traffic for specific ips/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-container-network-latency
spec:
components:
env:
# supports comma-separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma-separated destination hosts
- name: DESTINATION_HOSTS
value: 'nginx.default.svc.cluster.local,google.com'
- name: TOTAL_CHAOS_DURATION
value: '60'