EC2 IO stress
Introduction
EC2 IO stress disrupts the state of infrastructure resources. This fault:
- Induces stress on AWS EC2 instance using Amazon SSM Run command. The SSM Run command is executed using SSM documentation that is built into the fault.
- Causes IO stress on the EC2 instance for a specific duration.
Use cases
EC2 IO stress:
- Simulates slower disk operations by the application.
- Simulates noisy neighbour problems by hogging the disk bandwidth.
- Verifies the disk performance on increasing IO threads and varying IO block sizes.
- Checks how the application functions under high disk latency conditions, when IO traffic is high and includes large I/O blocks, and when other services monopolize the IO disks.
- Kubernetes version 1.17 or later is required to execute the fault.
- The EC2 instance should be in a healthy state.
- SSM agent should be installed and running on the target EC2 instance.
- The Kubernetes secret should have the AWS Access Key ID and Secret Access Key credentials in the
CHAOS_NAMESPACE
. Below is a sample secret file:apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX - We recommend you use the same secret name, that is,
cloud-secret
. Otherwise, you will need to update theAWS_SHARED_CREDENTIALS_FILE
environment variable in the fault template, and you won't be able to use the default health check probes. - Go to AWS named profile for chaos to use a different profile for AWS faults, and the superset permission/policy to execute all AWS faults.
Below is an example AWS policy to execute the fault.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}
Fault tunables
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
EC2_INSTANCE_ID | ID of the target EC2 instance. | For example, i-044d3cb4b03b8af1f . |
REGION | The AWS region ID where the EC2 instance has been created. | For example, us-east-1 . |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration to insert chaos (in seconds). | Default: 30 s. |
CHAOS_INTERVAL | Time interval between two successive instance terminations (in seconds). | Default: 60 s. |
AWS_SHARED_CREDENTIALS_FILE | Path to the AWS secret credentials. | Default: /tmp/cloud_config.yml . |
INSTALL_DEPENDENCIES | Install dependencies used to run IO chaos. It can be 'True' or 'False'. | If the dependency already exists, you can turn it off. Defaults to True. |
FILESYSTEM_UTILIZATION_PERCENTAGE | Specify the size as percentage of free space on the file system. | Default: 0 %. Results in 1 GB utilization. |
FILESYSTEM_UTILIZATION_BYTES | Specify the size in gigabytes(GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are provided, FILESYSTEM_UTILIZATION_PERCENTAGE is prioritized. | Default: 0 GB. Results in 1 GB Utilization. |
NUMBER_OF_WORKERS | Number of IO workers involved in IO stress. | Default: 4. |
VOLUME_MOUNT_PATH | Fill the given volume mount path. | Default: User HOME directory. |
SEQUENCE | Sequence of chaos execution for multiple instances. | Default: parallel. Supports serial and parallel. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. |
File system utilization in megabytes
Amount of file system that is utilized on the EC2 instance (in megabytes). Tune it by using the FILESYSTEM_UTILIZATION_BYTES
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# filesystem bytes to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_BYTES
value: '1024'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'
File system utilization in percentage
Amount of file system that is utilized on the EC2 instance (in percentage). Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# filesystem percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
value: '50'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'
Multiple workers
CPU threads that need to be run to increase the file system utilization. This increases the amount of file system consumed. Tune it using the NUMBER_OF_WORKERS
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# multiple workers to utilize resources
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: NUMBER_OF_WORKERS
value: '3'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'
Volume mount path
Volume mount path to the target attached to the EC2 instance. Tune it by using the VOLUME_MOUNT_PATH
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# volume path to be used for io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: VOLUME_MOUNT_PATH
value: '/tmp'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'
Multiple EC2 instances
Multiple EC2 instances as comma-separated IDs that are target in one chaos run. Tune it by using the EC2_INSTANCE_ID
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# multiple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
# ids of the EC2 instances
- name: EC2_INSTANCE_ID
value: 'instance-1,instance-2'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'