Introduction

SF Archive is a software‑defined archive platform designed for secure, long‑term preservation and management of digital assets across tape, disk, and object storage. It is built on the Microsoft .NET Core runtime and a relational database backend, and is deployed on modern 64‑bit Linux servers (e.g., Ubuntu 24.04 and later).

This document provides an overview of SFArchive’s architecture and the mechanisms it uses to help protect data integrity, control access to archival resources, and support secure operations in on‑premises and hybrid environments.


SF Archive Components

SF Archive consists of a small set of services that cooperate through a central database and standardized interfaces. 

  • Control‑plane services for administration, API access, job orchestration, and workflow management.
  • Data‑plane services for actual data movement, tape robotics control, and storage scanning.


Control‑Plane Services

  • Web User Interface (NxUI)
    • Browser‑based management portal for monitoring and administering the archive system.
    • Provides views into system status, current and historical jobs, workflows, servers, groups, volumes, and drives.
    • Supports drill‑down into jobs and objects for troubleshooting and operational review.
  • API Services (NxREST and NxSocket)
    • NxREST exposes a REST‑style interface for integration with asset management systems and automation tools.
    • NxSocket provides a socket‑based XML interface that is compatible with commonly‑used legacy archive integrations.
    • These services accept authenticated requests to create, monitor, and manage archive and restore operations.
  • Job Orchestrator (NxQueue)
    • Evaluates pending work and manages how jobs are scheduled across available servers, drives, and storage.
    • Enforces resource constraints such as media availability, drive type, free capacity, and configured policies.
    • Keeps a persistent record of job state in the database, enabling consistent behavior across restarts and failover scenarios.
  • Workflow Engine (NxWorkflow / NxWorkflowEngine)
    • Executes policies for scanning, copying, mirroring, and migrating data between source and target storage locations.
    • Uses checksums and metadata to detect changes, identify new or modified objects, and drive lifecycle actions.


Data‑Plane Services

  • Core Data Mover (NxCore)
    • Performs the actual transfer of data between filesystems, tape volumes, and object storage targets.
    • Reads job definitions and parameters from the database, executes work, and continuously updates status and metrics.
    • Integrates with the tape subsystem and filesystem drivers provided by the underlying operating system.
  • Tape and Robotics Controller (NxChanger)
    • Manages tape libraries, including media loading/unloading, slot management, and drive selection.
    • Ensures only compatible tapes and drives are paired, and enforces media usage states (e.g., read‑only volumes).
  • SCSI / Storage Scanner (NxScsi)
    • Detects and inventories attached devices (tape drives, changers, and volumes).
    • Updates the central database to reflect current hardware and media state, which is then used by the orchestrator to plan jobs.


Data Integrity and Protection

SF Archive is designed around the principle that archive data must remain recoverable and verifiably consistent over time. It provides several mechanisms that contribute to data integrity protection.


Structured Metadata and Multi‑Layered Model

SF Archive maintains a structured, database‑backed representation of archives:

  • Archives contain one or more assets.
  • Assets are logical containers for related objects (files).
  • Objects maintain metadata such as names, sizes, locations, and volume associations.

This model is stored in a small number of relational tables and is independent of any single storage technology. It allows SF Archive to:

  • Track where each object is physically stored (tape, disk, or object storage).
  • Reconstruct asset and archive membership even after hardware changes.
  • Support migration between different storage tiers and technologies without losing logical relationships.


Format‑Aware Media Handling

SF Archive is capable of working with different on‑media formats that are commonly used in archiving environments (for example, LTFS and vendor‑specific formats such as AXF). Each volume is tagged in the database with the format in which it was written.

During restore operations, the system:

  • Identifies the format of the target volume from its metadata.
  • Uses the appropriate read mode for that format.

This reduces the risk of misinterpreting media layouts and helps ensure that restored data matches what was originally written.


Checksum‑Based Detection in Workflows

The workflow engine uses checksums as part of its indexing and change‑detection logic:

  • When scanning a filesystem or storage location, it calculates checksums for discovered objects.
  • It compares these values against previously recorded checksums to detect changes.
  • Based on this, workflows can decide which objects need to be copied, mirrored, or migrated.
  • Detection of modified or newly created files.
  • Consistent behavior for copy, mirror, and migration workflows.


Media State and Safe Usage

Each storage volume is tracked with an operational state, for example:

  • Enabled – available for both read and write.
  • Read‑only – permitted for restores but excluded from new write operations.
  • Disabled / Unknown – not considered for any job.

The orchestrator and core data mover honor these states automatically. This provides:

  • Protection for “frozen” or regulatory volumes that must not be modified.
  • A reliable mechanism to mark problematic or out‑of‑service media as unavailable.
  • A consistent way to prevent accidental overwrites.


Robust Logging and Failure Analysis

For each job, SF Archive writes detailed logs that include:

  • Job identifiers, timestamps, and involved resources.
  • Operational messages and error information from the archival services and underlying filesystems or drivers.

Logs are stored in a structured directory hierarchy by year and month, and each run of a job is logged separately. This supports:

  • Post‑incident analysis and auditing of archive and restore operations.
  • Troubleshooting of media, hardware, and configuration issues.
  • Reconstruction of job history without overwriting prior logs.


Authentication, Authorization, and Access Control

SFArchive’s architecture separates user‑facing access control from the internal storage authentication required to access back‑end systems.


User Access to SFArchive

The SF Archive database includes tables to store user information and to allow the application to authorize access to administrative functions and views. While the exact identity provider and login mechanism can be customized per deployment, common patterns include:

  • Restricting access to the Web UI and APIs at the network level (VPN, private networks, firewall rules).
  • Mapping SF Archive user records to roles that control which UI functions and administrative actions are permitted.

This layered approach allows SF Archive to be deployed in environments that require:

  • Administrative separation of duties.
  • Role‑based access to configuration and operations.


Storage Credential Management

Access to external storage systems (e.g., NAS, object stores) is handled via dedicated configuration entries:

  • Storage locations (such as S3 buckets or NAS paths) are defined centrally.
  • Authentication details (such as access keys or account usernames) are stored in controlled database tables designed for this purpose.
  • Archive services retrieve only the credentials they need at runtime, based on the job and target.

This design keeps storage credentials in a single, auditable place and avoids embedding sensitive information in scripts, ad‑hoc tools, or per‑node configuration files. It also supports:

  • Separation between user identities and storage identities.
  • Alignment with external secret‑management practices if a customer wishes to integrate them.


Network and Service Security

SF Archive is intended to run inside a customer’s secure infrastructure and to be protected by existing network and perimeter controls. Its architecture supports standard hardening approaches without requiring public network exposure.


Service Connectivity

Typical deployments include the following service endpoints:

  • Web UI – HTTP port used for administration and monitoring.
  • REST API – HTTP endpoint used by asset management systems and orchestration tools.
  • Socket API – TCP port used for certain legacy integrated systems.

Exact port numbers and exposure patterns can be configured by the customer or integrator. Common security practices include:

  • Restricting access to management and API ports to trusted subnets or VPNs.
  • Terminating TLS at a reverse proxy or load balancer in front of SF Archive services.
  • Monitoring API usage via existing logging and SIEM infrastructure.


Database Connectivity

The archive database runs either on one of the SF Archive nodes or on a dedicated server. Configuration parameters include:

  • Database host, database name, and service account credentials.
  • Options to require encrypted connections (e.g., SSL/TLS mode).

Best practice is to:

  • Restrict database access to SF Archive servers only.
  • Use strong credentials and, where required, encrypted connections between application servers and the database.
  • Place the database on a hardened host with appropriate operating system security and backup policies.


Cluster and Device Communication

When configured as a cluster, multiple SF Archive nodes communicate with the shared database and with attached storage devices. Device discovery and tape handling are based on:

  • Standard system interfaces (e.g., SCSI) exposed by the operating system.
  • Controlled scanning and updating of device and media state in the database.

Security controls at this layer are typically provided by:

  • Host‑level access controls and service isolation on each SF Archive node.
  • Physical and logical security of the data center or machine rooms housing the tape libraries and servers.
  • Network segregation between archive infrastructure and general user networks.


Data Movement and Storage Security

SF Archive itself does not replace the security mechanisms of the underlying storage systems; rather, it is designed to work with them in a secure manner.


Tape and Library Operations

  • Tape libraries and drives are managed centrally by the tape controller service.
  • Device access is limited to the SF Archive services running on authorized nodes.
  • Media states (enabled, read‑only, disabled) are enforced before any data operation occurs.
  • In the event of tape or filesystem inconsistencies, SF Archive can invoke integrity checks on the media and report results via logs and administrative interfaces.

These capabilities help ensure that:

  • Only expected volumes are loaded and manipulated.
  • Read‑only or retired volumes are not inadvertently written to.
  • Operational issues affecting tape readability are detected and can be remediated.


Disk, NAS, and Object Storage

For disk‑based and network‑attached storage:

  • SF Archive relies on the host operating system’s mounting and permission model to access filesystems.
  • Typically, access is mediated by system accounts, export rules, and directory permissions configured by the customer.

For object storage:

  • SF Archive uses credentials and endpoints defined in its configuration to access buckets or containers.
  • Communications with these endpoints normally occur over secure protocols as provided by the underlying platform.

In both cases, SF Archive respects:

  • Access controls enforced by the storage systems themselves.
  • Any encryption or key management policies configured at the storage or infrastructure level.


Operational Visibility and Governance

SF Archive supports operational visibility and governance through its UI, database, and logging:

  • The Web UI displays aggregate system health metrics (such as capacity usage and job status), along with detailed job and workflow histories.
  • The database maintains a history of jobs, media, and workflow executions, which can be queried for reporting and compliance.
  • Per‑job logs provide detailed insight into how data was moved, what media was involved, and what errors (if any) occurred.

Organizations can integrate SF Archive with their existing governance processes by:

  • Using database views or exports to feed reporting systems.
  • Ingesting SF Archive logs into centralized logging or SIEM solutions.
  • Incorporating SF Archive job histories into change‑management and audit workflows.


Conclusion

SF Archive provides a flexible archive platform designed for secure operation within a customer’s infrastructure. It combines:

  • A modern service‑oriented architecture.
  • A structured metadata model for archives, assets, and objects.
  • Built‑in mechanisms for integrity checking, media state control, and robust logging.
  • A clear separation between user access, storage credentials, and internal system operations.