Skip to main content

Windows EC2 blackhole chaos

Introduction

Windows EC2 blackhole chaos results in loss of access to the given target hosts or IPs by injecting firewall rules. This fault:

  • Degrades the network without marking the EC2 instance as unhealthy (or unworthy) of traffic. This can be resolved by using a middleware that switches the traffic based on certain SLOs (performance parameters).
  • Limits the impact, that is, blast radius to only the traffic that you wish to test, by specifying the destination hosts or IP addresses.

Windows EC2 Blackhole Chaos

Use cases

Windows EC2 blackhole chaos determines the performance of the application (or process) running on the EC2 instances.

note
  • Kubernetes version 1.17 or later is required to execute this fault.
  • The EC2 instance must be in a healthy state.
  • SSM agent must be installed and running on the target EC2 instance.
  • Kubernetes secret must have the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. Below is a sample secret file:
    apiVersion: v1
    kind: Secret
    metadata:
    name: cloud-secret
    type: Opaque
    stringData:
    cloud_config.yml: |-
    # Add the cloud AWS credentials respectively
    [default]
    aws_access_key_id = XXXXXXXXXXXXXXXXXXX
    aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • Harness recommends using the same secret name, that is, cloud-secret. Otherwise, you must update the AWS_SHARED_CREDENTIALS_FILE environment variable in the fault template and you won't be able to use the default health check probes.
  • Go to AWS named profile for chaos to use a different profile for AWS faults.
  • Go to superset permission/policy to execute all AWS faults.

Here is an example AWS policy to execute the fault.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}

Fault tunables

Mandatory tunables

Tunable Description Notes
EC2_INSTANCE_ID ID of the target EC2 instance. For example, i-044d3cb4b03b8af1f. Provide any one value either instance id or tag.
EC2_INSTANCE_TAGS Tag of the target EC2 instances. Provide any one value, either the instance Id or the tag. For example, type:chaos.
REGION AWS region ID where the EC2 instance has been created. For example, us-east-1.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Default: 30 s
AWS_SHARED_CREDENTIALS_FILE Path to the AWS secret credentials. Default: /tmp/cloud_config.yml.
IP_ADDRESSES IP addresses of the services whose accessibility is impacted. Comma-separated IP(s) can be provided.
DESTINATION_HOSTS DNS Names of the services whose accessibility is impacted. If this value is not provided, the fault induces network chaos for all IPs or destinations or IP_ADDRESSES if already defined.
SEQUENCE Sequence of chaos execution for multiple instances. Default: parallel. Supports serial and parallel.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30 s.

Run with destination IPs

IP addresses of the services that interrupt the traffic. Tune it by using the IP_ADDRESSES environment variable.

The following YAML snippet illustrates the use of this environment variable:

# it injects the chaos into the egress traffic for specific IPs/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: windows-ec2-blackhole-chaos
spec:
components:
env:
# supports comma-separated destination ips
- name: IP_ADDRESSES
value: '8.8.8.8,192.168.5.6'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'

Run with destination hosts

Hosts that interrupt the traffic by default. These are the DNS names of the services whose accessibility is impacted. Tune it by using the DESTINATION_HOSTS environment variable.

The following YAML snippet illustrates the use of this environment variable:

# it injects the chaos into the egress traffic for specific IPs/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: windows-ec2-blackhole-chaos
spec:
components:
env:
# supports comma-separated destination hosts
- name: DESTINATION_HOSTS
value: 'google.com'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'