Skip to main content

EC2 network loss

Introduction

EC2 network loss causes flaky access to the application (or services) by injecting network packet loss to EC2 instance(s). This fault:

  • Degrades the network without marking the EC2 instance as unhealthy (or unworthy) of traffic, which is resolved using a middleware that switches traffic based on SLOs (performance parameters).
  • May stall the EC2 instance or get corrupted waiting endlessly for a packet.
  • Limits the impact (blast radius) to the traffic that you wish to test, by specifying the IP addresses.

EC2 Network Loss

Use cases

EC2 network loss:

  • Determines the performance of the application (or process) running on the EC2 instances.
  • Simulates a consistently slow network connection between microservices (for example, cross-region connectivity between active-active peers of a given service or across services or poor cni-performance in the inter-pod-communication network).
  • Simulates jittery connection with transient latency spikes between microservices.
  • Simulates a slow response on specific third party (or dependent) components (or services), and degraded data-plane of service-mesh infrastructure.
note
  • Kubernetes version 1.17 or later is required to execute the fault.
  • SSM agent is installed and running on the target EC2 instance.
  • The EC2 instance should be in a healthy state.
  • The Kubernetes secret should have the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. Below is the sample secret file:
    apiVersion: v1
    kind: Secret
    metadata:
    name: cloud-secret
    type: Opaque
    stringData:
    cloud_config.yml: |-
    # Add the cloud AWS credentials respectively
    [default]
    aws_access_key_id = XXXXXXXXXXXXXXXXXXX
    aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • We recommend you use the same secret name, that is, cloud-secret. Otherwise, you will need to update the AWS_SHARED_CREDENTIALS_FILE environment variable in the fault template, and you won't be able to use the default health check probes.
  • Go to AWS named profile for chaos to use a different profile for AWS faults and the superset permission/policy to execute all AWS faults.
  • Go to the common tunables to tune the common tunables for all the faults.

Below is an example AWS policy to execute the fault.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}

Fault tunables

Mandatory tunables

Tunable Description Notes
EC2_INSTANCE_ID ID of the target EC2 instance. For example, i-044d3cb4b03b8af1f.
REGION The AWS region ID where the EC2 instance has been created. For example, us-east-1.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Default: 30 s.
CHAOS_INTERVAL Time interval between two successive instance terminations (in seconds). Default: 30 s.
AWS_SHARED_CREDENTIALS_FILE Provide the path for aws secret credentials. Default: /tmp/cloud_config.yml.
INSTALL_DEPENDENCY Select to install dependencies used to run the network chaos. It can be either True or False. If the dependency already exists, you can turn it off. Default: True.
NETWORK_PACKET_LOSS_PERCENTAGE The packet loss in percentage. Default: 100 %.
DESTINATION_IPS IP addresses of the services or the CIDR blocks(range of IPs), the accessibility to which is impacted. Comma-separated IP(S) or CIDR(S) can be provided. If not provided, the fault induces network chaos for all IPs/destinations.
DESTINATION_HOSTS DNS Names of the services, the accessibility to which, is impacted. Ff not provided, the fault induces network chaos for all IPs/destinations or DESTINATION_IPS if already defined.
NETWORK_INTERFACE Name of ethernet interface considered for shaping traffic. Default: `eth0`.
SEQUENCE It defines sequence of chaos execution for multiple instance. Default: parallel. Supports serial sequence.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30 s

Network packet loss

Network packet loss percentage that is injected on the EC2 instances. Tune it by using the NETWORK_PACKET_LOSS_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# it injects the chaos into the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-loss
spec:
components:
env:
# network packet loss percentage
- name: NETWORK_PACKET_LOSS_PERCENTAGE
value: '100'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'

Run with destination IPs and destination hosts

Interruption of IPs/hosts. By default, all IPs/hosts are interrupted. Tune specific IPs/hosts by using the DESTINATION_IPS and DESTINATION_HOSTS environment variables, respectively.

DESTINATION_IPS: It contains the IP addresses of the services or the CIDR blocks (range of IPs) whose accessibility is impacted. DESTINATION_HOSTS: It contains the DNS names of the services whose accessibility is impacted.

The following YAML snippet illustrates the use of this environment variable:

# it injects the chaos into the egress traffic for specific IPs/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-loss
spec:
components:
env:
# supports comma-separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma-separated destination hosts
- name: DESTINATION_HOSTS
value: 'google.com'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'

Network interface

Name of the ethernet interface considered for shaping traffic. Tune it by using the NETWORK_INTERFACE environment variable. Its default value is eth0.

The following YAML snippet illustrates the use of this environment variable:

# it injects the chaos into the egress traffic for specific network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-loss
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'