Table of contents

Smart Check sizing guidelines

This page provides guidance on sizing your Smart Check cluster. Smart Check can be scaled either vertically (adding more CPU and memory) or horizontally (adding more worker-nodes to the cluster).

Summary and recommendations

Our testing revealed that scaling vertically produces faster scan rates than scaling horizontally. For example, deploying Deep Security Smart Check using 4 worker-nodes of m5a-xlarge instances resulted in a faster scan rate than using 8 worker-nodes of m5-large instances.

To get the best performance per dollar spent, we recommend that you scale your pods until you see a high CPU or memory utilization (See Scale Smart Check pods). As shown in Pod scaling results, we found that CPU utilization will cap before memory does, so you might consider increasing the CPUs in a custom EC2 instance, rather than just upgrading from m5-large to m5a-xlarge (which would increase both CPUs and memory).

We also tested Smart Check with Amazon RDS to identify the impact that Smart Check has on database resources. Our testing showed that there is no major impact on the RDS database during scans.

Details about how we reached these conclusions are provided below.


This section provides details on how we reached the conclusions in the "Summary and recommendations" section.

Test environment

Cluster configuration

We used these cluster configurations for our testing:

Cluster configuration Cluster name Nodes Instance type Total vCPUs in cluster Total memory in cluster
Baseline configuration sc-d-4 4 m5-large (2 CPU, 8 GB) 8 32 GB
Horizontal scaling sc-d-8 8 m5-large (2 CPU, 8 GB) 16 64 GB
Vertical scaling sc-xl-4 4 m5a-xlarge (4 CPU, 16 GB) 16 64 GB
Horizontal and vertical scaling sc-xl-8 8 m5a-xlarge (4 CPU, 16 GB) 32 128 GB

Because AWS instance types may change over time, here are the specifications for m5-large and m5a-xlarge instances at the time of our testing:

Specification m5-large m5a-xlarge
Processor type 1st or 2nd generation Intel Xeon Platinum 8000 series AMD EPYC 7000 series
Clock speed Turbo Clock speed upto 3.1 GHz Turbo Clock speed upto 2.5 GHz
Advanced Vector Extension Support Yes. 512 (AVX-512) instruction set. Two times FLOPS per core compared to m4 instances. No
Memory 8 GB 16 GB
Instance Storage EBS-only EBS-only
Network Bandwidth Up to 10 Gbps Up to 10 Gbps
EBS Bandwidth Up to 4750 Mbps Up to 2880 Mbps

Registry configuration

We used this registry configuration for our testing:

Registry size Total images Average image size
11.949 GB 43 284.5 MB

Database configuration

We used a t2.small RDS instance for each of the EKS instances, and our testing showed that there is no major impact on the RDS database during scans.

Because Amazon RDS types may change over time, here are the specifications for a t2.small instance at the time of our testing:

  • Core: 1
  • vCPU: 1
  • CPU credits/hour: 12
  • Memory: 2 GB
  • Network performance (Gbps): Low to moderate

Cluster resource utilization

Chart showing average cluster CPU utilization

Chart showing average cluster CPU utilization

Scale Smart Check pods

You can use the overrides.yaml file to specify the replica count for each pod:

  1. Add the following snippet to your overrides.yaml file:

        malwareScan: 5
        scan: 5
        imageScan: 5
        vulnerabilityScan: 5
        contentScan: 5
  2. Then, in your terminal, run:

    helm upgrade --values /path/to/overrides.yaml <release-name> /path/to/smartcheck-helm/

Using our default configuration (m5.large with 4 nodes), scaling each pod to a replica count of 5 resulted in an average CPU utilization of approximately 72%. That means you could potentially configure a replica count higher than 5.

Scaling the pods linearly (all counts at 5 instead of different numbers) is a good approach if you’re not sure what kind of content findings your images typically have. But, for example, if your content findings consist mostly of malware, you could scale your malware pod's replica count to a larger number.

If the vulnerability scan pods become evicted, it’s usually because there is not enough room for them on the disk. To allocate more space for them, add the following to overrides.yaml:
    ## The amount of space to request for the vulnerability scan working volume
    ## Default value: 3Gi
    sizeLimit: 4Gi

Pod scaling results

This section details our test results with caching disabled. By default, caching is enabled for Smart Check, so actual performance in most environments should be higher than this.

In the results below, R1, R2, R3 are replica counts.

Each scan was pushed 10 times (10 iterations with one hour break). The results were:

  • sc-d-4: Scan rate for R3 is 130% faster than R1, and R5 is 152% faster than R1
  • sc-d-8: Scan rate for R3 is 135% faster than R1, and R5 is 237% faster than R1
  • sc-xl-4: Scan rate for R3 is 82% faster than R1, and R5 is 144% faster than R1
  • sc-xl-8: Scan rate for R3 is 90% faster than R1, and R5 is 161% faster than R1

Chart showing average scan rate comparison

Chart showing average memory usage comparison

Chart showing average CPU usage comparison

Chart comparing amounts of improvement

Assuming R1 (replica count of 1) is the baseline (i.e. improvement factor 1:1 if comparing it to itself):

  • Scaling to R3 provides an improvement factor of at least 1.82 times to 2.3 when compared to R1.
  • Scaling to R5 provides an improvement factor of a least 2.44 to 3.37 when compared to R1.

When we increased the pod replica count within the same cluster configuration, we saw significant improvement to the scan rate while using very little additional resources (CPU and memory).