Scale Logic, Inc. — Recommended Cloud Deployment Profile
Overview
This article describes Scale Logic’s recommended configuration for deploying CaraOne on Amazon Web Services (AWS) EC2 as an alternative or to compliment an on-premises physical appliance. It is intended for customers who prefer to run CaraOne in their own AWS account or in a Scale Logic–managed cloud environment rather than purchasing dedicated hardware.
CaraOne is a GPU-accelerated application that places its working database into system memory (RAM) for fastest processing, and requires guaranteed, non-variable GPU memory. The configuration below is a single, fixed deployment size providing dedicated CPU, dedicated memory, and a dedicated CUDA-capable GPU with the full 24 GB of addressable GPU memory CaraOne requires. CaraOne is a workstation-class GPU workload (equivalent to an NVIDIA RTX 4500 Ada Generation, 24 GB) and does not require premium data-center accelerators such as the A100, H100, or L40S.
Recommended configuration
| Component | Scale Logic recommendation |
|---|---|
| Operating System | Ubuntu 22.04 LTS (64-bit) |
| vCPU | 8–12 vCPU (dedicated) |
| System Memory (RAM) | 128 GB, dedicated |
| GPU | One dedicated NVIDIA CUDA-capable GPU with 24 GB of addressable GPU memory, workstation/Ada class. On EC2 this maps to the NVIDIA L4. Full physical GPU, not shared. |
| Network | 10 Gbps or higher (Elastic Network Adapter / ENA enabled) |
| Storage | 960 GB, thin-provisioned (EBS gp3) |
Recommended EC2 instance
CaraOne’s GPU requirement is a workstation-class 24 GB card — (the NVIDIA Ada Lovelace architecture or better). The NVIDIA L4 in the G6 family is the same Ada Lovelace generation with 24 GB and is AWS’s most cost-effective 24 GB GPU. We therefore recommend g6.8xlarge (128 GB RAM, one dedicated NVIDIA L4 24 GB GPU). On the single-GPU sizes of the G6 family, AWS passes the entire physical GPU through to the instance, so the full 24 GB is dedicated and cannot be reduced or “stolen” by another tenant or VM. CaraOne uses 8–12 of the instance’s vCPUs, with the remainder available as headroom.
Cost note: Because CaraOne is a workstation-class GPU workload, there is no need for the more expensive data-center-GPU instances (A100/H100/L40S). The L4-based G6 family is the lowest-cost EC2 option that still delivers a dedicated 24 GB CUDA GPU, and g6.8xlarge is less expensive than the equivalent A10G-based g5.8xlarge. The g5.8xlarge (NVIDIA A10G, 24 GB) remains a fully supported alternative if G6 capacity is unavailable in your region.
Memory (RAM)
RAM is fixed at 128 GB and dedicated to the instance. Standard EC2 instances are not memory-oversubscribed by AWS — the 128 GB is physically dedicated to your instance, satisfying CaraOne’s requirement for dedicated, non-variable memory for its in-memory database.
GPU
CaraOne will fail if GPU memory is variable or contended. On the recommended g6.8xlarge:
- The full physical NVIDIA L4 (Ada Lovelace, workstation-class) is passed through to the instance, providing the complete 24 GB of addressable GPU memory.
- GPU memory is fixed and dedicated — it is not time-sliced, oversubscribed, or shared with other VMs.
- The L4 is a CUDA-capable GPU fully supported under Ubuntu 22.04 with NVIDIA’s drivers and the CUDA toolkit.
Driver setup: CaraOne will install the correct CUDA driver and toolkit during software installation.
Networking & storage
- Network: Enable the Elastic Network Adapter (ENA), providing 10 Gbps or higher throughput and meeting CaraOne’s 10 GbE-or-above requirement.
- Storage: Provision a 960 GB thin-provisioned EBS gp3 volume for the OS and CaraOne DB and vector DB data. (EBS volumes are thin-provisioned by default.)
Comparison to the physical appliance
The dedicated physical CaraOne appliance delivers fixed, isolated hardware (one 4th Gen AMD EPYC processor, 128 GB ECC DDR5, workstation-class NVIDIA 24 GB GPU, 4 × 10 GbE). The g6.8xlarge configuration above reproduces the same performance-critical guarantees — dedicated cores, a dedicated 128 GB of memory, and a dedicated workstation-class 24 GB CUDA GPU — in the cloud, without purchasing dedicated hardware.