Capacity Planning

Capacity planning defines the resources required for a customer-controlled FactVerse environment before production rollout. Use this page for private container deployments on Kubernetes, OpenShift, customer VMs, or restricted networks.

The numbers below are initial planning bands for customer-side deployments. Final values should be validated with the release package, enabled modules, expected user concurrency, asset size, connector schedule, retention policy, and customer infrastructure standard.

Prerequisites

Complete the deployment model and container runtime decision first. Confirm the product modules in scope, number of environments, identity method, source systems, expected users, client devices, data retention policy, backup target, monitoring platform, and change approval process.

Capacity workflow

Product deployment units

Image names and chart names are provided in the project delivery package. Use this table to understand the deployment units that usually need independent capacity planning.

Product scope	Typical deployment units	Capacity driver
FactVerse Platform baseline	Web console, API gateway, tenant and identity services, asset metadata service, database, cache or queue, object storage, ingress.	Concurrent users, asset metadata volume, API calls, authentication traffic, object storage growth.
DataMesh Inspector	Inspector API, work-order and inspection services, evidence upload service, notification jobs, optional DFS connector workers, ECM evidence storage.	Field users, inspection records, photo or video evidence, work-order synchronization, mobile upload bursts.
Data Fusion Services	DFS API, connector controller, connector workers, mapping and quality jobs, scheduler, queue, connector logs.	Connector count, sync frequency, batch size, source-system limits, retry rate, quality-rule volume.
FactVerse AI Agent	Agent API, workflow orchestrator, tool execution workers, retrieval or index service, approval queue, audit records.	Workflow concurrency, tool-call volume, document retrieval, scheduled automation, human approval backlog.
Enterprise Content Management	ECM API, document service, search or index service, object storage, approval workflow jobs.	Document count, file size, retention period, approval activity, search frequency.
Designer, asset preparation, and Physical AI	Asset service, model conversion worker, simulation or rendering worker, worker scratch storage, optional GPU worker.	Largest model size, conversion frequency, simulation jobs, SimReady asset preparation, rendering or physics workload.
Client applications	Web access, desktop clients, mobile clients, mixed-reality devices, field caches.	Download size, site bandwidth, device cache behavior, update cadence, offline package requirements.

Initial container sizing

These values are per replica or per worker unless the row states otherwise. Use requests for scheduler planning and limits to protect the node. Raise limits only after measuring the validation workload.

Deployment unit	Initial request	Initial limit	Replicas or workers	I/O and storage notes
Web console or static frontend	0.1-0.25 vCPU, 256-512 MiB	0.5 vCPU, 1 GiB	2 replicas for production.	Low disk I/O. Cache static assets at ingress or customer CDN when allowed.
API gateway and lightweight APIs	0.5-1 vCPU, 1-2 GiB	2 vCPU, 4 GiB	2 replicas for production.	Watch p95 latency, error rate, and connection pool usage.
Product API services	1 vCPU, 2-4 GiB	4 vCPU, 8 GiB	2 replicas for production, more for high concurrency.	Sensitive to database latency and object storage access.
Tenant, identity, and admin services	0.5 vCPU, 1-2 GiB	2 vCPU, 4 GiB	2 replicas for production.	Keep SSO callback and session behavior stable during failover tests.
DFS connector worker	0.5-2 vCPU, 1-4 GiB	4 vCPU, 8 GiB	Start with 1 worker per connector group or schedule window.	Batch size and source-system latency usually dominate. Avoid overlapping large sync jobs.
AI Agent workflow worker	1-2 vCPU, 4-8 GiB	4 vCPU, 16 GiB	Start with 2 workers when scheduled workflows are enabled.	Queue depth, tool-call latency, retrieval latency, and approval backlog drive scaling.
ECM document and search service	1-2 vCPU, 2-8 GiB	4 vCPU, 16 GiB	2 API replicas; size index service separately.	Search index needs fast persistent storage and memory headroom.
Model conversion or asset processing worker	2-4 vCPU, 8-16 GiB	8 vCPU, 24-32 GiB	Start with 1-2 workers; isolate from API nodes for heavy assets.	Use fast local scratch storage. Large models can spike memory and temporary disk usage.
Simulation, rendering, or Physical AI worker	4-8 vCPU, 16-32 GiB	16 vCPU, 64 GiB	Size per project workload. Add GPU nodes when required by the delivery package.	Needs dedicated scratch storage and longer validation runs.
Cache or queue	1-2 vCPU, 2-4 GiB	4 vCPU, 8 GiB	Production should use a customer-approved HA pattern.	Monitor queue depth, memory eviction, and persistence mode.
Ingress controller	0.5-1 vCPU, 512 MiB-2 GiB	2 vCPU, 4 GiB	At least 2 replicas when cluster policy allows.	Size for TLS termination, upload size, and client download bursts.

Environment sizing bands

Use these bands as starting points for customer-side planning. They are cluster or environment-level references, not a replacement for the release-specific values file.

Profile	Typical use	Compute baseline	Data services	Storage and I/O baseline
Single-node validation	Lab validation, training, configuration review, issue reproduction.	8-12 vCPU, 32-48 GiB RAM on one VM or node.	Local or customer-provided database and cache.	300-500 GiB SSD. Use for validation only.
Small production	One site, moderate users, limited connectors, standard Inspector or ECM workload.	3 worker nodes, each 8 vCPU and 32 GiB RAM, plus control-plane nodes per customer standard.	PostgreSQL 4 vCPU and 16 GiB RAM; cache or queue 2 vCPU and 4 GiB RAM.	Database SSD with at least 3,000 IOPS; object storage 1-2 TiB; worker scratch 100-200 GiB.
Standard production	Multiple sites or departments, regular DFS sync, AI Agent workflows, document and evidence retention.	3-5 worker nodes, each 16 vCPU and 64 GiB RAM.	PostgreSQL 8 vCPU and 32 GiB RAM; cache or queue 4 vCPU and 8 GiB RAM; search service 4 vCPU and 16 GiB RAM when enabled.	Database SSD with 6,000-10,000 IOPS; object storage 2-5 TiB; worker scratch 300-500 GiB.
Asset-heavy or Physical AI	Large models, frequent conversion, simulation, rendering, SimReady asset preparation, robotics training scenarios.	Standard production plus dedicated worker nodes with 16-32 vCPU and 64-128 GiB RAM. Add GPU nodes only when required.	PostgreSQL 8-16 vCPU and 32-64 GiB RAM; separate search or index capacity when retrieval is enabled.	Database SSD with 10,000+ IOPS; object storage 5 TiB or more; scratch storage 500 GiB or more with high sequential throughput.
High-control environment	Restricted network, offline package import, strict retention, separate validation and production paths.	Size production and validation separately. Keep spare capacity for offline upgrade validation.	Customer-managed HA database, cache or queue, internal registry, backup platform.	Add space for image archives, restore samples, logs, and release bundles.

Storage and I/O recommendations

Storage area	Recommended class	Planning guidance	Monitor
Database volume	SSD or high-performance block storage.	Start with the IOPS band in the sizing profile. Keep enough free space for indexes, migrations, backup staging, and restore tests.	IOPS saturation, latency, slow queries, lock wait, connection pressure.
Object storage	Customer object storage or S3-compatible service.	Capacity should cover source files, converted assets, documents, evidence, generated reports, retained versions, and lifecycle buffers.	Growth rate, large-object latency, failed uploads, lifecycle cleanup, restore samples.
Worker scratch	Fast local SSD or high-throughput ephemeral volume.	Model conversion and simulation workers need temporary space separate from object storage. Plan scratch size from the largest expected model plus derived files.	Temporary disk pressure, conversion duration, worker eviction, failed jobs.
Search or index volume	SSD persistent volume.	Allocate memory and disk together. Rebuild time should fit the maintenance window.	Query latency, index size, rebuild time, memory pressure.
Logs and audit records	Customer log platform or retained persistent storage.	Size by retention policy and export volume. High-control projects usually need separate audit retention.	Log growth, dropped logs, retention pressure, query time.
Backup target	Customer backup platform or object storage tier.	Backup throughput must fit the maintenance window. Include database, object storage, configuration, and release package evidence.	Backup duration, failed backup, restore duration, incomplete protected asset list.

I/O planning rules

Put database volumes on SSD-class storage with predictable latency.
Keep object storage optimized for large sequential upload and download traffic.
Give model conversion and simulation workers dedicated scratch storage so temporary files do not compete with database I/O.
Schedule large DFS sync jobs, model conversions, backups, and search reindexing in separate windows when the environment is small.
Track storage growth by data class: models, converted assets, documents, inspection evidence, logs, database, and backups.
Keep at least one restore sample in the validation plan before accepting the capacity baseline.

Inputs

Input	Required detail	Capacity impact
Environments	Production, validation, training, disaster recovery, and lab environments.	Determines total cluster, VM, storage, backup, and monitoring footprint.
User workload	Named users, active users, peak concurrent sessions, user groups, site time zones, client types.	Drives web/API replicas, session load, network throughput, and support windows.
Scene and asset workload	Number of scenes, largest model size, model conversion frequency, media files, downloads, field-device cache behavior.	Drives object storage, model processing workers, cache, and backup volume.
DFS and integration workload	Source systems, connector count, sync cadence, batch size, retry policy, write-back requirements.	Drives connector workers, queue depth, database I/O, network routes, and source-system limits.
High-rate telemetry workload	Field protocol point count, sampling interval, burst behavior, retention window, and query pattern.	Drives time-series ingestion capacity, durable queue sizing, storage retention, and dashboard query load.
AI Agent workload	Workflow concurrency, tool-call volume, document retrieval, scheduled automation, approval queues.	Drives worker concurrency, queue capacity, database load, and optional private inference capacity.
Simulation or Physical AI workload	Simulation jobs, asset preparation, rendering, physics validation, robotics or training scenarios when in scope.	May require dedicated worker nodes, GPU-enabled nodes, larger storage, and longer validation windows.
ECM and evidence workload	Documents, SOPs, images, inspection evidence, audit records, retention period.	Drives object storage, database records, index size, backup window, and restore test scope.
Operations policy	Availability target, maintenance window, log retention, backup frequency, recovery objective.	Drives redundancy, monitoring, log storage, backup infrastructure, and restore process.

Planning steps

Select the sizing band that matches the product scope and expected workload.
Map enabled products to deployment units and identify which units need dedicated workers.
Fill the sizing worksheet for users, scenes, assets, integrations, AI Agent workflows, ECM documents, and retention requirements.
Define CPU requests, memory requests, limits, replicas, storage classes, persistent volumes, and namespace quotas.
Define database IOPS, object storage capacity, worker scratch size, search index size, log retention, and backup target throughput.
Separate steady workloads from burst workloads such as model conversion, scheduled synchronization, batch import, search indexing, and simulation jobs.
Define scaling triggers for replicas, worker count, storage expansion, database tuning, connector scheduling, and backup windows.
Run a validation workload with representative users, source records, scenes, documents, and client devices.
Include a high-rate telemetry validation run when BACnet, Modbus, SNMP, sensor, or streaming connector workloads are in scope.
Record the baseline capacity, known assumptions, headroom, review cadence, and owner for each resource domain.

Sizing worksheet

Worksheet item	Record
Peak concurrent users	Business peak, site peak, client type, expected growth, validation sample.
Largest operational scene	Scene size, asset count, media count, target devices, download behavior.
Integration schedule	Source system, sync frequency, batch size, allowed window, retry policy.
AI Agent concurrency	Workflow type, scheduled runs, manual runs, tool-call volume, approval queue.
Storage growth	Object storage growth, database growth, log growth, retention period.
Backup and restore	Backup frequency, backup window, restore objective, restore sample set.
High availability	Replica policy, node spread, database availability pattern, maintenance window.
Optional GPU workload	Simulation, rendering, model processing, private inference, validation runtime.

Validation checklist

Representative users can complete target workflows during the expected peak window.
Connector jobs finish inside the approved synchronization window.
Model conversion, asset loading, and document access meet acceptance expectations.
AI Agent workflows and approval queues do not create unbounded backlog.
Database, queue, cache, and object storage metrics remain within the agreed operating range.
Backup completes inside the approved window and restore sampling succeeds.
Alerts exist for CPU, memory, pod restart, queue backlog, database connection pressure, storage growth, and backup failure.
The customer owner has approved the baseline and review cadence.

Expected result

The expected output is a capacity baseline that includes product deployment units, workload assumptions, initial resource requests and limits, database and storage I/O assumptions, backup estimates, scaling triggers, validation evidence, and owners for future capacity reviews.

Troubleshooting capacity gaps

Symptom	Check
Users report slow pages during peak hours	Concurrent sessions, ingress capacity, API replicas, database latency, cache hit rate.
Connector jobs miss the sync window	Source-system limits, batch size, worker count, queue depth, retry policy, network route.
Model or asset tasks take too long	Worker resources, asset size, storage throughput, conversion queue, optional GPU worker need.
Storage grows faster than expected	Retention policy, duplicate uploads, log retention, imported file lifecycle, backup copies.
Backup overruns the maintenance window	Protected asset list, object storage volume, database size, backup target throughput, schedule.
Resource requests block deployment	Namespace quota, node capacity, storage class availability, OpenShift project limits, cluster policy.

Use Deployment Models to choose the customer-side deployment pattern.
Use Container Deployment to implement the runtime.
Use Environment Readiness to prepare owners, network, identity, and support inputs.
Use Operations and Maintenance to review capacity after go-live.

Prerequisites​

Capacity workflow​

Product deployment units​

Initial container sizing​

Environment sizing bands​

Storage and I/O recommendations​

I/O planning rules​

Inputs​

Planning steps​

Sizing worksheet​

Validation checklist​

Expected result​

Troubleshooting capacity gaps​

Related pages​