Injecting Chaos to AWS Lambda functions with Lambda Layers- RELOADED
UPDATE — October 2019
The code used for this blog post has evolved a bit and became that and that. However, the principles explained here remains similar. Just be aware :-)
I would like to express my gratitude to Gunnar Grosch for giving me some very useful feedback and for providing a nice little serverless application to easily test and visualize the effect of the failure injection layer. Give him a big thumbs up and give him a follow!
In the initial post — Injecting Chaos to AWS Lambda functions with Lambda Layers, I explained how to use AWS Lambda Layers to conduct chaos engineering experiments on Lambda functions using latency injection. Since then, the little latency injection layer has evolved into a more general failure injection one. In this post, I will explain the new features and how to use them.
What’s new?
The current v0.2-alpha version of the FailureInjection layer supports:
HTTP Error status code injection using
error_code
Exception injection using
exception_msg
Latency injection remains using
delay
Per Lambda function injection control using Environment variable (
FAILURE_INJECTION_PARAM
) (thanks to Gunnar)Easy deployment using the Serverless Framework
sls deploy
(thanks to Gunnar)Controlling the rate of failure using
rate
If you don’t care about the details and just want to get the code now, click here :)
Resetting the context
As I explained in my previous post on getting started with AWS Lambda Layers, a Layer is a ZIP archive that contains libraries and other dependencies that you can import at run-time for your lambda functions to use. It is especially useful if you have several Lambda functions that use the same set of functions or libraries, promoting code reuse. This re-usability makes Lambda Layers ideal for running chaos experiments.
In this updated version of the FailureInjection layer, I am still using SSM (System Manager Parameter Store). However, the new configuration is as follows:
{
"isEnabled": true,
"delay": 400,
"error_code": 404,
"exception_msg": "I really failed seriously",
"rate": 1
}
To store the above configuration in an SSM parameter called chaoslambda.config
in the Stockholm AWS Region (eu-north-1)
, you can use the following command using the AWS CLI:
> aws ssm put-parameter \
--region eu-north-1 \
--name chaoslambda.config \
--type String --overwrite \
--value \
"{
\"delay\": 400, \
\"isEnabled\": true, \
\"error_code\": 404, \
\"exception_msg\": \"I really failed seriously\", \
\"rate\": 1 \
}"
isEnabled
acts like the big red button and allows you to stop the experiment real fast.
Indeed, if you want to immediately stop the experiment, simply replace isEnabled: true
with isEnabled: false
and update the SSM parameter.
> aws ssm put-parameter \
--region eu-north-1 \
--name chaoslambda.config \
--type String --overwrite \
--value \
"{
\"delay\": 400, \
\"isEnabled\": false, \
\"error_code\": 404, \
\"exception_msg\": \"I really failed seriously\", \
\"rate\": 1 \
}"
An easy way to deploy the FailureInjection Layer is to use the Serverless framework template provided by Gunnar:
and simply run:
> sls deploy --region eu-north-1 --stage dev
If successful, the output should contain the arn (amazon resource name)
of the lambda layer.
arn:aws:lambda:eu-north-1:123556789:layer:ChaosInjectionLayer-dev:1
The FailureInjection layer should now be available in the eu-north-1
AWS Region and can be attached to any of your Lambda functions running Python3.7.
Before using it, let’s take a look at the code.
1 — Latency injection
As in the previous version of the FailureInjection layer, we can create a python decorator and inject some delay (in Millisecond) when the Lambda function executes.
And indeed, invoking the Lambda function decorated with @corrupt_delay
as follows
from chaos_lib import *
@corrupt_delay
def lambda_handler(event, context):
return {
'statusCode': 200,
'body': 'Hello from Lambda!'
}
produces the following execution result
{
"statusCode": 200,
"body": "Hello from Lambda!"
}
You should also notice the Log output:
START RequestId: 5295aa0b-...-50fcfbebea1f Version: $LATEST
delay: 400, rate: 1
Added 400.61ms to lambda_handler
END RequestId: 5295aa0b-...-50fcfbebea1f
REPORT RequestId: 5295aa0b-...-50fcfbebea1f Duration: 442.65 ms Billed Duration: 500 ms Memory Size: 128 MB Max Memory Used: 79 MB
You can see that the Duration
is indeed approximately the value of delay
.
2 — Exception injection decorator
As explained in AWS Lambda Function Errors, if your Lambda function raises an exception, Lambda recognizes it and serializes the exception information into JSON
and returns it. Consider the following example:
def lambda_handler(event, context):
raise Exception('I failed!')
Invoking this function will raise an exception and Lambda will return the following error message:
{
"errorMessage": "I failed!",
"errorType": "Exception",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 2, in lambda_handler\n raise Exception('I failed!')\n"
]
}
Keep in mind that depending on the event source, Lambda may retry the failed Lambda function. For example, if Kinesis is the event source, Lambda will retry the failed invocation until the Lambda function succeeds or the records in the stream expire.
Moreover, if the invocation type is asynchronous, Lambda will not return anything. Instead, it will log the error information to CloudWatch Logs.
We can use that exception behavior of Lambda to create a python decorator and inject failure when the Lambda function executes.
And indeed, invoking the Lambda function decorated with @corrupt_exception
as follows
from chaos_lib import *
@corrupt_exception
def lambda_handler(event, context):
return {
'statusCode': 200,
'body': 'Hello from Lambda!'
}
produces the following execution result:
{
"errorMessage": "I really failed seriously",
"errorType": "Exception",
"stackTrace": [
" File \"/opt/python/chaos_lib.py\", line 76, in wrapper\n raise Exception(exception_msg)\n"
]
}
3 — HTTP status code injection
Distributed systems and micro-services are most of the time vulnerable to unexpected failure from services they depend on. Being able to simulate different HTTP status code will most definitely help you build more resilient application.
I have found that injecting errors such as 400 Bad Request
, 500 Internal Server Error
and 503 Service Unavailable
often help better understand how the application responds to various failure conditions.
To modify the HTTP status code of Lambda response, we can use the following python decorator:
And indeed, invoking the Lambda function decorated with @corrupt_statuscode
as follows
from chaos_lib import *
@corrupt_statuscode
def lambda_handler(event, context):
return {
'statusCode': 200,
'body': 'Hello from Lambda!'
}
produces the following execution result:
{
"statusCode": 404,
"body": "Hello from Lambda!"
}
See it in action!
The best way to start using and see the effect of this chaos injection layer is to use the serverless-chaos-demo provided by Gunnar — a.k.a mister Serverless Chaos!
A word of warning before you start breaking things: please, DO NOT use that FailureInjectionLayer in production to start with! Make sure you first experiment with it a test environment or you risk creating a real outage! Chaos engineering is not about breaking things randomly without a purpose, chaos engineering is about breaking things in a controlled environment and through well-planned experiments in order to build CONFIDENCE in your application to withstand turbulent conditions. So build that confidence first :)
Wrapping up.
That’s all for now, folks — Feel free to comment, share your ideas or submit pull requests if you want to improve or add new functionalities to the FailureInjectionLayer.
adhorn/LatencyInjectionLayer
Small example of how to use AWS Lambda Layers to inject latency into AWS Lambda Functions …github.com
Adrian
—
Subscribe to my stories here.
Join Medium for $5 — Access all of Medium + support me & others!