CloudSec: Three Avaliability Principles
Cybersecurity is so cool that we have a CIA of our own. No, not the bureau, but Confidentiality, Integrity & Availability.
It may seem counter-intuitive that security people should ensure data is available, isn’t that a DevOps thing? You may have a point, but what’s the use of safeguarding your stuff if it isn’t there when you most need it? Attackers may well don’t bother about threatening the integrity of your resources or even care to break into it — if they can prevent people from getting access to your resources, their job is done. Not surprisingly, DDoS attacks are one of the most common incidents in cybersec.
So let’s explore together 3 basic principles of resilient and always-on systems.
1. Don’t put all your eggs in one basket
Or, if you want to go technical, avoid single points of failure.
Single-point of failure (SPOF): part of a system that, if it fails, will stop the entire system from working.
Let’s play where’s Waldo: in the system diagram below, can you spot any single point of failure?
It’s everywhere. Why? You only have one instance for each layer. If your web server goes down, the whole thing fails. Same for the app server and the DB. Any failure at one of them makes everything fall apart. And this leads us to the next point.
2. Twin up, and spread over
Duplicates are the dream of every high availability system. You want extra servers, and you also want to replicate data to prevent data loss in case of failure.
And although redundant systems are the point, our very first principle (avoid SPOFs) won’t allow us to have all of them in the same room. That’s why we should spread our redundant resources across multiple availability zones (AZ).
Redundancy: Additional or alternative resources that keep your functionality going if there’s loss or failure on their counterparts.
As long as both web servers are active, the system above may seem to have avoided SPOF in the first layer. But if all web servers are in the same availability zone, what happens if that AZ goes down? Yep, everything breaks. Again.
So, if you want a resilient and highly available system, twin your resources in the same AZ, and double down in at least one more AZ.
Although our system is good on the SPOF front, now you’ve got 4 different points of entry for your Web server and an ugly mess of arrows between your public and private subnet. Unless you want to become the next network traffic warden, you’d better use a load balancer to keep track of all servers and their health.
Nginx has explained well what a load balancer does:
- Distributes client requests or network load efficiently across multiple servers
- Ensures high availability and reliability by sending requests only to servers that are online
- Provides the flexibility to add or subtract servers as demand dictates
Besides reduced downtime, redundancy and so on, load balancers bring an extra benefit: they allow each component to work independently, and unaware of each other (aka decoupled).
How would a load balancer change our architecture? Like this:
Phew, that looks a bit more manageable, right?
So, in business-speak: use multi-AZ for critical components, and get yourself a load balancer if you want to keep your sanity. You’re welcome.
And although this is not the focus here, for increased peace of mind, its’ recommended to use autoscaling to ensure your servers scale properly. Autoscaling can also help you use fewer resources (e.g. use 3 AZs with one web server and one app server in each) and make your FinOps team happier. Together with load balancers and other resources, autoscaling is very helpful for your survival during a DDoS attack.
3. Observe
This goes without saying. You need to know about failures, disruptions, performance issues. You’ve put this system in the world, you better care for it. Observability is the name of the game.
Conclusion
It’s too late to think of a cute ending, so yeah. The end.
Edit: If you want to learn more about availability, check my next post where I talk more about about auto scaling.
Thanks :)