As an American, I have participated in the shopping bonanza event, Black Friday, and yes, I have run and pushed my way through the crowds of frantic bargain hunters to snag the hottest deal.
While pushing my way through the crowds, I had a sudden feeling that I am an HTTP request, squeezing in those small store doors with the rest of the crowd, trying to grab the latest HTML page that my daughter wanted so badly. The weird thing is that this store has a strange rule about how long you can stand in line, and every 60 seconds the store security guard drags me outside while shouting “504, timeout!” for some reason.
I may have gone a bit overboard with that analogy, but you catch my drift.
Unlike physical stores, In the ether of the internet we are not bound by the earthly constraints of building a bigger store to handle Black Friday like events, and with the help of giants like AWS, GCP, and Azure we can be the designer of our dynamically expanding store. With some knowledge and preparations of course.
We here at DevOps Pro are sharing some insights to help improve and prevent your infrastructure from any interruption in service during large traffic spikes.
1. Make sure you know your peak:
During the holidays, web traffic normally increases by 300–1000%, as you would be familiar with traffic cycles from the previous years.
If you are unaware of your past traffic cycles, then we recommend you start measuring it today! You can use Google Analytics and Access Logs on every endpoint (CDNs, Load balancers, Web servers etc.)
Knowing your traffic behavior in cycles of 24 hours as well as a whole year will assure you are well prepared for this year and many years to come.
2. Make sure you handle the peak:
Be sure you scale your system in a scheduled timeline before the traffic comes in.
If you normally have certain thresholds for scale, this is the time to tighten them and make scaling far more sensitive, i.e adding more resources sooner and faster!
3. Scale smart: Scaling-up and scaling-out are two different things.
Scaling-up means replacing your existing resources with “bigger” ones (e.g. larger instances which could handle more CPU load, or more Memory intensive processes). Scaling up is usually recommended as an architectural change, rather than peak handling.
Scaling-out means adding resources to the pool to help out with traffic, instead of enlarging the existing ones. The load will be handled by a load balancer that was designed for that task, instead of having the same number of resources (even if more powerful).
4. Measure everything:
Monitoring during this period has two main values, the first would let you scale automatically based on the system’s health, e.g CPU-load, memory-load, latency levels, number of requests, number of errors (segmented to 3XX 4XX 5XX etc). Make sure you know your system at all times, so you can respond quickly and efficiently.
5. Monitor your providers:
This time of year brings an increased load to your service providers as well, make sure you monitor their services and availability at all times.
Knowing that a certain Availability Zone or Region is on fire or lacking available resources can help you make decisions to deploy load to different places before you reach your critical point.
Find more info here: AWS Amazon, Google Cloud, GitHub.
Prepare for the worse, even when planning, disasters happen.
The load affects everyone, not only your own business and can happen in many ways:
Loads on AWS / GCP or specific cloud services, resources not available for purchase in specific regions, etc. Be prepared! Have everything automated and templated so that you can redeploy your entire infrastructure in a click. Find more info here
We hope this list will help you and your team focus as you deal with increased traffic in the future years, avoiding the pitfalls of the past. There are plenty of things you can do right now to get your infrastructure ready.
We hope Y'all have a great holiday this year! Filled with food, family, fun and hopefully no interruptions with your business infrastructure. We recognize the massive opportunity that the event offers, but that it can be intimidating, too where crushing crowds, crashing websites, and customer brawls dominate. In 2016, Cyber Monday sales hit an all-time high, with consumers spending $3.45 billion USD in a single day. And, in the same year, Black Friday traffic was up 220% over a regular day. Even with “smart” and automated monitoring systems that help to catch problems early, some issues will still slip through the cracks.