Shock

Shock Configuration Guide

This document provides a comprehensive guide to configuring the Shock server, including all configuration options, file formats, and command-line arguments.

Table of Contents

  1. Overview
  2. Configuration File
  3. Command-line Arguments
  4. Locations Configuration
  5. Types Configuration
  6. Data Migration and Caching
  7. Restore Functionality
  8. Examples

Overview

The Shock configuration system consists of several components:

  1. Main Configuration File: An INI-format file (typically shock-server.conf) that contains the core server settings
  2. Locations.yaml: Defines storage locations for data migration and caching
  3. Types.yaml: Defines node types and their priorities
  4. Command-line Arguments: Override settings in the configuration files

Configuration files are typically located in the /etc/shock.d/ directory, but can be specified with the -conf command-line argument.

Configuration File

The main configuration file uses INI format with sections and key-value pairs. Below are the available sections and options:

[Admin]

| Option | Type | Default | Description | |——–|——|———|————-| | email | string | “” | Administrator email address | | users | string | “” | Comma-separated list of admin users |

[Anonymous]

| Option | Type | Default | Description | |——–|——|———|————-| | read | bool | true | Allow anonymous read access | | write | bool | true | Allow anonymous write access | | delete | bool | true | Allow anonymous delete access |

[Address]

| Option | Type | Default | Description | |——–|——|———|————-| | api-ip | string | “0.0.0.0” | IP address to bind the API server | | api-port | int | 7445 | Port for the API server |

[External]

| Option | Type | Default | Description | |——–|——|———|————-| | api-url | string | “http://localhost” | External URL for the API |

[Auth]

| Option | Type | Default | Description | |——–|——|———|————-| | basic | bool | false | Enable basic authentication | | globus_token_url | string | “” | Globus token URL for authentication | | globus_profile_url | string | “” | Globus profile URL for authentication | | oauth_urls | string | “” | Comma-separated list of OAuth URLs | | oauth_bearers | string | “” | Comma-separated list of OAuth bearers | | cache_timeout | int | 60 | Authentication cache timeout in minutes | | use_auth | bool | true | Enable authentication (disable for debugging) |

[Runtime]

| Option | Type | Default | Description | |——–|——|———|————-| | expire_wait | int | 60 | Wait time for reaper in minutes | | GOMAXPROCS | string | “” | Number of CPU cores to use (empty uses Go default) | | max_revisions | int | 3 | Maximum number of node revisions to keep (values < 0 mean keep all) |

[Log]

| Option | Type | Default | Description | |——–|——|———|————-| | perf_log | bool | false | Enable performance logging | | rotate | bool | true | Enable log rotation | | logoutput | string | “both” | Log output destination: “console”, “file”, or “both” | | trace | bool | false | Enable trace logging | | debuglevel | int | 0 | Debug level (0-3) |

[Mongodb]

| Option | Type | Default | Description | |——–|——|———|————-| | attribute_indexes | string | “” | Comma-separated list of attribute indexes | | database | string | “ShockDB” | MongoDB database name | | hosts | string | “mongo” | MongoDB host(s) | | password | string | “” | MongoDB password | | user | string | “” | MongoDB username |

[Node-Indices]

Custom node indices can be defined in this section. Each index can have the following options:

Example:

[Node-Indices]
name=unique:true,dropDups:true,sparse:false

[Paths]

| Option | Type | Default | Description | |——–|——|———|————-| | site | string | “/usr/local/shock/site” | Path to site files | | data | string | “/usr/local/shock/data” | Path to data files | | logs | string | “/var/log/shock” | Path to log files | | local_paths | string | “/var/tmp” | Path to local temporary files | | pidfile | string | “” | Path to PID file |

[Cache]

| Option | Type | Default | Description | |——–|——|———|————-| | cache_path | string | “” | Path to cache directory. If set, the system will function as a cache | | cache_ttl | string | “24H” | Time before cached files are eligible for eviction. Formats: 30M (minutes), 24H (hours), 7D (days) | | auto_upload | bool | false | Automatically upload files to the default remote location after creation | | default_location | string | “” | Location ID (from Locations.yaml) used as the target for auto-upload | | upload_workers | int | 3 | Number of concurrent upload workers for auto-upload |

[Migrate]

| Option | Type | Default | Description | |——–|——|———|————-| | min_replica_count | int | 2 | Minimum number of locations required before enabling local Node file deletion | | node_migration | bool | false | Enable node migration to remote locations | | node_data_removal | bool | false | Enable removal of data for nodes with at least MIN_REPLICA_COUNT copies |

[SSL]

| Option | Type | Default | Description | |——–|——|———|————-| | enable | bool | false | Enable SSL | | key | string | “” | Path to SSL key file | | cert | string | “” | Path to SSL certificate file |

[Other]

| Option | Type | Default | Description | |——–|——|———|————-| | reload | string | “” | Path or URL to shock data (WARNING: this will drop all current data) | | conf | string | “shock-server.conf” | Path to config file | | no_config | bool | false | Do not use config file | | force_yes | bool | false | Force yes to all prompts | | version | bool | false | Show version | | fullhelp | bool | false | Show detailed usage without “–” prefixes | | help | bool | false | Show usage | | debug_auth | bool | false | Enable more detailed reasons for rejected auth (for debugging) |

Command-line Arguments

All configuration options can be overridden with command-line arguments. The format is:

shock-server --option=value

For example:

shock-server --conf=/path/to/shock-server.conf --api-port=8080

Common command-line arguments:

Locations Configuration

The Locations.yaml file defines storage locations for data migration and caching. It is located in the same directory as the main configuration file.

Format

Locations:
  - ID: "location_id"
    Type: "location_type"
    Description: "description"
    URL: "url"
    AuthKey: "auth_key"
    SecretKey: "secret_key"
    Bucket: "bucket_name"
    Persistent: true/false
    Region: "region"
    Priority: priority_value
    MinPriority: min_priority_value
    Tier: tier_value
    Cost: cost_value
    # Additional type-specific fields

Common Fields

Field Description
ID Unique identifier for the location
Type Type of storage location (S3, Shock, TSM, etc.)
Description Human-readable description
URL URL for the storage location
AuthKey Authentication key
SecretKey Secret key for authentication
Persistent Whether this is a valid long-term storage location
Priority Location priority for pushing files upstream (0 is lowest, 100 highest)
MinPriority Minimum node priority level for this location
Tier Storage tier (0=cache, 3=SSD, 5=disk, 10=tape archive)
Cost Cost per GB for this store (default=0)

Type-Specific Fields

S3 Location

Bucket: "bucket_name"
Region: "region"

Azure Location

Account: "account_name"
Container: "container_name"

Google Cloud Location

Project: "project_name"

IRods Location

Zone: "zone"
User: "user"
Password: "password"
Hostname: "hostname"
Port: port_number

Glacier Location

Vault: "vault_name"

Example Locations.yaml

Locations:
  - ID: "S3"
    Type: "S3"
    Description: "Example S3 Service"
    URL: "https://s3.example.com"
    AuthKey: "some_key"
    SecretKey: "another_key"
    Bucket: "mybucket1"
    Persistent: true
    Region: "us-east-1"
    Priority: 0
    Tier: 5
    Cost: 0
    MinPriority: 7
  - ID: "S3SSD"
    Type: "S3"
    Description: "Example_S3_SSD Service"
    URL: "https://s3-ssd.example.com"
    AuthKey: "yet_another_key"
    SecretKey: "yet_another_nother_key"
    Bucket: "ssd"
    Persistent: true
    Region: "us-east-1"
    Priority: 0
    Tier: 3
    Cost: 0
  - ID: "shock"
    Type: "shock"
    Description: "shock service"
    URL: "shock.example.org"
    AuthKey: ""
    SecretKey: ""
    Prefix: ""
    Priority: 0
    Tier: 5
    Cost: 0
  - ID: "tsm"
    Type: "tsm_archive"
    Description: "archive service"
    URL: ""
    AuthKey: ""
    SecretKey: ""
    Prefix: ""
    Recoverycommand: "dsmc restore %ID% -latest"
    Priority: 0
    Tier: 10
    Cost: 0

Types Configuration

The Types.yaml file defines node types and their priorities. It is located in the same directory as the main configuration file.

Format

Types:
  - ID: "type_id"
    Description: "description"
    Priority: priority_value
    Data-Types:
      - data_type1
      - data_type2

Fields

Field Description
ID Unique identifier for the type
Description Human-readable description
Priority Priority value for the type (0 is lowest, higher values indicate higher priority)
Data-Types List of data types associated with this type

Example Types.yaml

Types:
  - ID: "default"
    Description: "default"
    Priority: 0
  - ID: "temp"
    Description: "temporary file"
    Priority: 0
  - ID: "metagenome"
    Description: "MG-RAST metagenome"
    Priority: 9
    Data-Types:
      - fa
      - fasta
      - fastq
      - fq
      - bam
      - sam
  - ID: "image"
    Description: "image file"
    Priority: 1
    Data-Types:
      - jpeg
      - jpg
      - gif
      - tif
      - png

Data Migration and Caching

Shock supports data migration to remote locations and caching of data from remote locations.

Data Migration

Data migration is controlled by the following configuration options:

When node_migration is enabled, Shock will attempt to migrate data to remote locations defined in Locations.yaml. The migration process is based on the following algorithm:

  1. From the locations with the highest Priority, the lowest Cost location will be used first
  2. For each Node, the MinPriority value is checked to ensure no temporary files are moved to remote locations
  3. The Tier value describes the cost for staging the file back (lower tier values are faster)

Caching

Caching is controlled by the cache_path configuration option. If this option is set, Shock will function as a cache and attempt to download nodes present in MongoDB that are not present on local disk from one of the configured Locations.

When a node is requested and not found locally, Shock will:

  1. Check if the node exists in MongoDB
  2. If it does, check if it has a location entry pointing to a remote location
  3. Download the node data from the remote location
  4. Store it in the cache_path directory
  5. Serve the data to the client

Cached items are kept in the cache hierarchy for a configurable time period (default is 24 hours).

Restore Functionality

Shock supports restoring data from archive locations like tape storage. This is controlled by the following node properties:

When a node is marked for restore, Shock will attempt to retrieve it from the archive location. This is particularly useful for tape-based storage systems like IBM Tivoli Storage Manager (TSM).

Restore Process

  1. A node is marked for restore using the SetRestore() method
  2. External scripts (like tsm_restore.sh) are used to retrieve the data from the archive location
  3. Once the data is restored, the UnSetRestore() method is called to indicate that the restore has been completed

Examples

Basic Configuration

[Admin]
email = admin@example.com
users = admin1,admin2

[Address]
api-ip = 0.0.0.0
api-port = 7445

[Mongodb]
hosts = localhost
database = ShockDB

[Paths]
site = /usr/local/shock/site
data = /usr/local/shock/data
logs = /var/log/shock

Enabling Data Migration

[Migrate]
min_replica_count = 2
node_migration = true
node_data_removal = true

Enabling Caching

[Cache]
cache_path = /usr/local/shock/cache

Running Shock Server

To run the Shock server with a specific configuration file:

shock-server -conf /path/to/shock-server.conf

With Docker Compose (recommended):

docker-compose up -d

The included docker-compose.yml starts both Shock and MongoDB with the correct networking. For S3-compatible storage with MinIO, use docker-compose.minio.yml instead. See the building guide for details.

Data Migration Example

To enable data migration with a short expiration wait time:

shock-server --conf=/path/to/shock-server.conf --node_migration=true --expire_wait=1

This will start the Shock server with data migration enabled and set the expiration wait time to 1 minute, which is useful for testing as it avoids having to wait for hours until the NodeReaper starts moving files.