ECS provides the freedom of scaling different services through adding “tasks”. These operate and communicate independently, and are being hosted on a series of EC2 instances. Scaling the system’s services in and out, as well as scaling out EC2 hosts is a convenient task, achieved with the help of CloudWatch metrics and alarms. However, scaling-in EC2 hosts require some complexity in decision making.
ECS is a three-layer architecture:
Scaling tasks in and out is relatively simple; based on a couple of metric thresholds such as CPU load percentage which determines whether new tasks should be loaded, or current ones should be removed. Scaling ECS hosts out, is about the same; for example when a certain level of CPU is crossed, or when the physical cluster doesn’t have the required amount of memory to hold the current load, new EC2 instances are triggered.
Removing resources is about cost reduction; these are the resources that are being paid for, and removing them when they are no longer required is about utilization which is translated to costs.
However, the scale-in task is somewhat challenging: applying the general method of crossing a metric threshold might result in cutting down required resources in other parameters, for example: removing resources due to lower levels of CPU while the level of memory reservation cannot “afford” to lose hosts.
Since AWS doesn’t provide the function of multiple-metric scale, and due to the fact that such a scale requires a few more calculations to make the decision of resource removal, an external tool is required for the task.
As said, scaling-in is not a straightforward task: assuming a host cluster was scaled up based on low memory available, but then scaled down to low CPU levels, removing instances may result again in low memory available which will trigger new hosts again. This can be a never-ending scaling loop.
A multi-metric scale trigger is required. Ignoring for a moment the fact that AWS doesn’t offer such a feature, what if a multimetric trigger is used but the future metrics levels’ trigger scale up again, this is another loop. So, a forecast metric calculation is also required. There’s also the need to decide which is the least utilized instance to scale, then making this a graceful removal draining client connections before termination, together with a cleanup of 0% utilized resources.https://github.com/omerxx/ecscale
ECSCALE answers all of the above. It is ready to run on AWS Lambda, which usually means this is a completely free resource, providing a virtual butler that cleans up the mess of unused instances.
The tool is ready to be deployed as a server-less function running on AWS Lambda. As such, it only requires ~3000 Lambda seconds every month. Since every AWS account is entitled to 1,000,000 free seconds every month, unless you’re running other server-less applications on Lambda, the resource is in fact completely free of charge. The scaling process is running every 60 minutes (configurable, one hour is default as EC2 instances are already paid for a whole hour even if terminated before), iterating over an account's ECS resources and cleaning them up.
Edit: AWS have recently announced a new EC2 billing model which charges and instance by the second. This means that when an instance is terminated, it's billed for the part of the hour it was running and no a whole hour. Therefore, It is recommended the scaling tool is run more frequently; since API calls are made, and the system should be given time to make unrelated changes such as deployments, scaling services and others, 1 second or even 1 minute are not ideal. After running some tests it seems that the sweet-spot is around the 20 minutes trigger.
You should, however, test this in your own cluster and make sure current procedures are not affected by the new frequent changes.
Best of luck!