ECS task stop
ECS task stop is an AWS fault that injects chaos to stop the ECS tasks based on the services or task replica ID and checks the task availability.
- This fault results in the unavailability of the application running on the tasks.
Usage
View fault usage
Prerequisites
- Kubernetes >= 1.17
- Sufficient AWS access to stop the ECS tasks.
- Kubernetes secret that has the AWS access configuration (key) in the
CHAOS_NAMESPACE
. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXX
It is recommended to use the same secret name, i.e.
cloud-secret
. Otherwise, you will need to update theAWS_SHARED_CREDENTIALS_FILE
environment variable in the fault template and you may be unable to use the default health check probes.Refer to AWS Named Profile For Chaos to know how to use a different profile for AWS faults.
Permissions required
Here is an example AWS policy to help execute the fault.
View policy for this fault
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"ecs:ListServices",
"ecs:ListTasks",
"ecs:StopTask",
"ecs:DescribeServices",
"ecs:DescribeTasks"
],
"Resource": "*"
}
]
}
Refer to the superset permission (or policy) to execute all AWS faults.
Default validations
The target ECS tasks should be in a healthy state.
Fault tunables
Fault tunables
Mandatory fields
Variables | Description | Notes |
---|---|---|
CLUSTER_NAME | Name of the target ECS cluster. | For example, cluster-1 . |
REGION | Region name of the target ECS cluster. | For example, us-east-1 . |
SERVICE_NAME | Target ECS service name. | For example, app-svc . |
TASK_REPLICA_ID | Comma-separated target task replica IDs. | `SERVICE_NAME` and `TASK_REPLICA_ID` are mutually exclusive. If both the values are provided, `SERVICE_NAME` takes precedence. |
Optional Fields
Variables | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration to insert chaos (in seconds). | Defaults to 30s. |
CHAOS_INTERVAL | Time interval between two successive instance terminations (in seconds). | Defaults to 30s. |
TASK_REPLICA_AFFECTED_PERC | Percentage of total tasks that are targeted. | Defaults to 100. |
SEQUENCE | Sequence of chaos execution for multiple instances. | Defaults to parallel. Supports serial sequence as well. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. |
AWS_SHARED_CREDENTIALS_FILE | Path to the AWS secret credentials. | Defaults to /tmp/cloud_config.yml . |
Fault examples
Common and AWS-specific tunables
Refer to the common attributes and AWS-specific tunables to tune the common tunables for all faults and aws specific tunables.
ECS service name
It stops the tasks that are a part of the particular service using the SERVICE_NAME
environment variable.
Use the following example to tune it:
# stop the tasks of an ECS cluster
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-task-stop
spec:
components:
env:
# provide the name of ECS cluster
- name: CLUSTER_NAME
value: 'demo'
- name: SERVICE_NAME
vale: 'test-svc'
- name: REGION
value: 'us-east-1'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'
ECS task replica IDs
It stops all the tasks that are set using the TASK_REPLICA_ID
environment variable.
Use the following example to tune it:
# stop the tasks of an ECS cluster
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-task-stop
spec:
components:
env:
# provide the name of ECS cluster
- name: CLUSTER_NAME
value: 'demo'
- name: TASK_REPLICA_ID
vale: '1b751cf956e34e54b9d83b6a5c067f60,20d5041c044941dfb2126f1722d10558'
- name: REGION
value: 'us-east-1'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'
ECS task replica affected percentage
It selects the number of tasks to be targeted (in percentage) using the TASK_REPLICA_AFFECTED_PERC
environment variable.
Use the following example to tune it:
# stop the tasks of an ECS cluster
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: ecs-task-stop
spec:
components:
env:
# provide the name of ECS cluster
- name: CLUSTER_NAME
value: 'demo'
- name: SERVICE_NAME
vale: 'test-svc'
- name: TASK_REPLICA_AFFECTED_PERC
vale: '100'
- name: REGION
value: 'us-east-1'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'