Skip to main content

EC2 IO stress

Introduction

EC2 IO stress disrupts the state of infrastructure resources. This fault:

  • Induces stress on AWS EC2 instance using Amazon SSM Run command. The SSM Run command is executed using SSM documentation that is built into the fault.
  • Causes IO stress on the EC2 instance for a specific duration.

EC2 IO Stress

Use cases

EC2 IO stress:

  • Simulates slower disk operations by the application.
  • Simulates noisy neighbour problems by hogging the disk bandwidth.
  • Verifies the disk performance on increasing IO threads and varying IO block sizes.
  • Checks how the application functions under high disk latency conditions, when IO traffic is high and includes large I/O blocks, and when other services monopolize the IO disks.
note
  • Kubernetes version 1.17 or later is required to execute the fault.
  • The EC2 instance should be in a healthy state.
  • SSM agent should be installed and running on the target EC2 instance.
  • The Kubernetes secret should have the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. Below is a sample secret file:
    apiVersion: v1
    kind: Secret
    metadata:
    name: cloud-secret
    type: Opaque
    stringData:
    cloud_config.yml: |-
    # Add the cloud AWS credentials respectively
    [default]
    aws_access_key_id = XXXXXXXXXXXXXXXXXXX
    aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • We recommend you use the same secret name, that is, cloud-secret. Otherwise, you will need to update the AWS_SHARED_CREDENTIALS_FILE environment variable in the fault template, and you won't be able to use the default health check probes.
  • Go to AWS named profile for chaos to use a different profile for AWS faults, and the superset permission/policy to execute all AWS faults.

Below is an example AWS policy to execute the fault.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}

Fault tunables

Mandatory tunables

Tunable Description Notes
EC2_INSTANCE_ID ID of the target EC2 instance. For example, i-044d3cb4b03b8af1f.
REGION The AWS region ID where the EC2 instance has been created. For example, us-east-1.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration to insert chaos (in seconds). Default: 30 s.
CHAOS_INTERVAL Time interval between two successive instance terminations (in seconds). Default: 60 s.
AWS_SHARED_CREDENTIALS_FILE Path to the AWS secret credentials. Default: /tmp/cloud_config.yml.
INSTALL_DEPENDENCIES Install dependencies used to run IO chaos. It can be 'True' or 'False'. If the dependency already exists, you can turn it off. Defaults to True.
FILESYSTEM_UTILIZATION_PERCENTAGE Specify the size as percentage of free space on the file system. Default: 0 %. Results in 1 GB utilization.
FILESYSTEM_UTILIZATION_BYTES Specify the size in gigabytes(GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are provided, FILESYSTEM_UTILIZATION_PERCENTAGE is prioritized. Default: 0 GB. Results in 1 GB Utilization.
NUMBER_OF_WORKERS Number of IO workers involved in IO stress. Default: 4.
VOLUME_MOUNT_PATH Fill the given volume mount path. Default: User HOME directory.
SEQUENCE Sequence of chaos execution for multiple instances. Default: parallel. Supports serial and parallel.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30 s.

File system utilization in megabytes

Amount of file system that is utilized on the EC2 instance (in megabytes). Tune it by using the FILESYSTEM_UTILIZATION_BYTES environment variable.

The following YAML snippet illustrates the use of this environment variable:

# filesystem bytes to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_BYTES
value: '1024'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

File system utilization in percentage

Amount of file system that is utilized on the EC2 instance (in percentage). Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# filesystem percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
value: '50'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Multiple workers

CPU threads that need to be run to increase the file system utilization. This increases the amount of file system consumed. Tune it using the NUMBER_OF_WORKERS environment variable.

The following YAML snippet illustrates the use of this environment variable:

# multiple workers to utilize resources
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: NUMBER_OF_WORKERS
value: '3'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Volume mount path

Volume mount path to the target attached to the EC2 instance. Tune it by using the VOLUME_MOUNT_PATH environment variable.

The following YAML snippet illustrates the use of this environment variable:

# volume path to be used for io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: VOLUME_MOUNT_PATH
value: '/tmp'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Multiple EC2 instances

Multiple EC2 instances as comma-separated IDs that are target in one chaos run. Tune it by using the EC2_INSTANCE_ID environment variable.

The following YAML snippet illustrates the use of this environment variable:

# multiple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
# ids of the EC2 instances
- name: EC2_INSTANCE_ID
value: 'instance-1,instance-2'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'