Topics on this page
Smart Check sizing guidelines
This page provides guidance on sizing your Smart Check cluster. Smart Check can be scaled either vertically (adding more CPU and memory) or horizontally (adding more worker-nodes to the cluster).
Summary and recommendations
Our testing revealed that scaling vertically produces faster scan rates than scaling horizontally. For example, deploying Deep Security Smart Check using 4 worker-nodes of m5a-xlarge instances resulted in a faster scan rate than using 8 worker-nodes of m5-large instances.
To get the best performance per dollar spent, we recommend that you scale your pods until you see a high CPU or memory utilization (See Scale Smart Check pods). As shown in Pod scaling results, we found that CPU utilization will cap before memory does, so you might consider increasing the CPUs in a custom EC2 instance, rather than just upgrading from m5-large to m5a-xlarge (which would increase both CPUs and memory).
We also tested Smart Check with Amazon RDS to identify the impact that Smart Check has on database resources. Our testing showed that there is no major impact on the RDS database during scans.
Details about how we reached these conclusions are provided below.
This section provides details on how we reached the conclusions in the "Summary and recommendations" section.
We used these cluster configurations for our testing:
|Cluster configuration||Cluster name||Nodes||Instance type||Total vCPUs in cluster||Total memory in cluster|
|Baseline configuration||sc-d-4||4||m5-large (2 CPU, 8 GB)||8||32 GB|
|Horizontal scaling||sc-d-8||8||m5-large (2 CPU, 8 GB)||16||64 GB|
|Vertical scaling||sc-xl-4||4||m5a-xlarge (4 CPU, 16 GB)||16||64 GB|
|Horizontal and vertical scaling||sc-xl-8||8||m5a-xlarge (4 CPU, 16 GB)||32||128 GB|
Because AWS instance types may change over time, here are the specifications for m5-large and m5a-xlarge instances at the time of our testing:
|Processor type||1st or 2nd generation Intel Xeon Platinum 8000 series||AMD EPYC 7000 series|
|Clock speed||Turbo Clock speed upto 3.1 GHz||Turbo Clock speed upto 2.5 GHz|
|Advanced Vector Extension Support||Yes. 512 (AVX-512) instruction set. Two times FLOPS per core compared to m4 instances.||No|
|Memory||8 GB||16 GB|
|Network Bandwidth||Up to 10 Gbps||Up to 10 Gbps|
|EBS Bandwidth||Up to 4750 Mbps||Up to 2880 Mbps|
We used this registry configuration for our testing:
|Registry size||Total images||Average image size|
|11.949 GB||43||284.5 MB|
We used a t2.small RDS instance for each of the EKS instances, and our testing showed that there is no major impact on the RDS database during scans.
Because Amazon RDS types may change over time, here are the specifications for a t2.small instance at the time of our testing:
- Core: 1
- vCPU: 1
- CPU credits/hour: 12
- Memory: 2 GB
- Network performance (Gbps): Low to moderate
Cluster resource utilization
Scale Smart Check pods
You can use the overrides.yaml file to specify the replica count for each pod:
Add the following snippet to your
replicas: malwareScan: 5 scan: 5 imageScan: 5 vulnerabilityScan: 5 contentScan: 5
Then, in your terminal, run:
helm upgrade --values /path/to/overrides.yaml <release-name> /path/to/smartcheck-helm/
Using our default configuration (m5.large with 4 nodes), scaling each pod to a replica count of 5 resulted in an average CPU utilization of approximately 72%. That means you could potentially configure a replica count higher than 5.
Scaling the pods linearly (all counts at 5 instead of different numbers) is a good approach if you’re not sure what kind of content findings your images typically have. But, for example, if your content findings consist mostly of malware, you could scale your malware pod's replica count to a larger number.
vulnerabilityScan: workVolume: ## The amount of space to request for the vulnerability scan working volume ## ## Default value: 3Gi sizeLimit: 4Gi
Pod scaling results
This section details our test results with caching disabled. By default, caching is enabled for Smart Check, so actual performance in most environments should be higher than this.
In the results below, R1, R2, R3 are replica counts.
Each scan was pushed 10 times (10 iterations with one hour break). The results were:
- sc-d-4: Scan rate for R3 is 130% faster than R1, and R5 is 152% faster than R1
- sc-d-8: Scan rate for R3 is 135% faster than R1, and R5 is 237% faster than R1
- sc-xl-4: Scan rate for R3 is 82% faster than R1, and R5 is 144% faster than R1
- sc-xl-8: Scan rate for R3 is 90% faster than R1, and R5 is 161% faster than R1
Assuming R1 (replica count of 1) is the baseline (i.e. improvement factor 1:1 if comparing it to itself):
- Scaling to R3 provides an improvement factor of at least 1.82 times to 2.3 when compared to R1.
- Scaling to R5 provides an improvement factor of a least 2.44 to 3.37 when compared to R1.
When we increased the pod replica count within the same cluster configuration, we saw significant improvement to the scan rate while using very little additional resources (CPU and memory).