Playbooks and Runbooks for Incident Response

Most of the cloud service providers align their incident response around the life cycle popularised by NIST.

NIST Incident Response Life Cycle

Monitoring for events and logs supported by AWS Lambda serverless | Four main parts

AWS Security Incident Response Whitepaper

For AWS the whole process may seem different if we rely on the official Security Incident Response Whitepaper.

For AWS the main life cycle events are

Fundamentals of responding to security incidentswithin acustomer’s AWS Cloud environment

Num	Stage	Tools and Aids
1.	Preparation - People	Roles and Responsibilities known and informed
2.	Preparation - People	Owners for assets by appropriate tags
3.	Preparation - Technology	Best practices, standards benchmarks as checklists
4.	Preparation - Technology	Secure by default at the time of creation and continuous config audits
5.	Detect and Analyse - Compute	Monitoring for events and logs supported by AWS Lambda serverless
6.	Detect and Analyse - Compute	If required additional processing by using Fargate tasks (Containers)
7.	Detect and Analyse - Compute	Store raw logs and data and analysis in secure S3 buckets
8.	Contain, Remove and Recover - Network Layer	Using security groups and network ACLs contain the EC2
9.	Contain, Remove and Recover - Platform Layer	Remove any backdoor users and revoke STS tokens
10.	Contain, Remove and Recover - Application Layer	Attach compromised disks to another secure host for forensics
11.	Runbooks and Playbooks - Runbook	To ensure that all standard operating procedures are documented
12.	Runbooks and Playbooks - Playbook	A series of steps to be used in case something fails

Stage - 1 - Who you gonna call

Roles and Responsibilities in an incident response scenario

Stage - 2 - Maintaining Assets using Tags

Owners for assets by appropriate tags

Stage - 3 & 4 - Compliance Checks like CIS Benchmark

Stage - 5 - CloudWatch Metric and Alarms

Monitoring for events and logs

Stage - 6 - Using Prowler as Fargate Task to continuously check for CIS benchmark compliance

additional processing by using Fargate tasks

They seem similar but there are a few key differences

Num	Similarity or Difference	Runbook	Playbook
1.	Difference	Document known procedures	Document how to investigate/troubleshoot when known thing fails
2.	Difference	Ensures when required, SOP is applied consistently	Ensures when needed, response is consistent
3.	Similarity	Well documented manual procedures should be automated	Well documented manual troubleshooting steps should be automated
4.	Difference	Useful post incident to recover and resume normal operations	Useful when investigating what could be causing failure
5.	Difference	After every successful recovery post incident, runbooks should be reviewed and updated as per learnings	After every failure, playbooks should be reviewed and updated as per learnings