Storage

Geddes has a software defined storage system that provides user-provisioned persistent data storage for container deployments.

Ceph is used to provide block, filesystem and object storage on the Geddes Composable Platform. File storage provides an interface to access data in a file and folder hierarchy similar to Data Depot. Block storage is a flexible type of storage that is good for database workloads and generic container storage. Object storage is ideal for large unstructured data and features a REST based API providing an S3 compatible endpoint that can be utilized by the preexisting ecosystem of S3 client tools.

Note: The integrity of the Ceph storage components is accomplished via a redundant disk system (3x replication). RCAC currently provides no backup of Geddes storage, either via snapshots or transfer of data to other storage . No disaster recovery other than the redundant disk systems is currently provided.

Link to section 'Storage Classes' of 'Storage' Storage Classes

Geddes provides four different storage classes based on access characteristics and performance needs a workload. Performance classes should be used for workloads with high I/O requirements (databases, AI/ML).

geddes-standard-singlenode - Block storage based on SSDs that can be accessed by a single node (Single-Node Read/Write).
geddes-standard-multinode - File storage based on SSDs that can be accessed by multiple nodes (Many-Node Read/Write or Many-Node Read-Only)
geddes-performance-singlenode - Block storage based on NVMe drives that can be accessed by a single node (Single-Node Read/Write).
geddes-performance-multinode - File storage based on NVMe drives that can be accessed by multiple nodes (Many-Node Read/Write or Many-Node Read-Only)

Link to section 'Block and Filesystem Storage Provisioning in Deployments' of 'Storage' Block and Filesystem Storage Provisioning in Deployments

Block and Filesystem storage can both be provisioned in a similar way.

While deploying a Workload, click the Storage tab and click Add Volume…
Select “Create Persistent Volume Claim”
Set a unique Persistent Volume Claim Name, i.e. “<username>-volume”
Select a Storage Class. The default storage class is "geddes-standard-singlenode".
Select an Access Mode. The "geddes-standard-singlenode" class only supports Single-Node Read/Write.
Request an amount of storage in Gigabytes
Provide a Mount Point for the persistent volume: i.e /data

Link to section 'Backup Strategies' of 'Storage' Backup Strategies

Developers using the Geddes platform should have a backup strategy in place to ensure that your data is safe and can be recovered in case of a disaster. Below is a list of methods that can be used to backup data on Persistent Volume Claims.

Link to section 'Copying Files to and from a Container' of 'Storage' Copying Files to and from a Container

The kubectl cp command can be used to copy files into or out of a running container.

# get pod id you want to copy to/form
kubectl -n <namespace> get pods

# copy a file from local filesystem to remote pod
kubectl cp /tmp/myfile <namespace>/<pod>:/tmp/myfile

# copy a file from remote pod to local filesystem
kubectl cp <namespace>/<pod>:/tmp/myfile /tmp/myfile

This method requires the tar executable to be present in your container, which is usually the case with Linux image. More info can be found in the kubectl docs.

Link to section 'Copying Directories from a Container' of 'Storage' Copying Directories from a Container

The kubectl cp command can also be used to recusively copy entire directories to local storage or places like Data Depot.

# get pod id you want to copy to/form
kubectl -n <namespace> get pods

# copy a directory from remote pod to local filesystem
kubectl cp <namespace>/<pod>:/pvcdirectory /localstorage

Link to section 'Backing up a Database from a Container' of 'Storage' Backing up a Database from a Container

The kubectl exec command can be used to create a backup or dump of a database and save it to a local directory. For instance, to backup a MySQL database with kubectl, run the following commands from a local workstation or cluster frontend.

# get pod id of your database pod
kubectl -n <namespace> get pods

# run mysqldump in the remote pod and redirect the output to local storage
kubectl -n <namespace> exec <pod> -- mysqldump --user=<username> --password=<password> my_database > my_database_dump.sql

Link to section 'Backups using common Linux tools' of 'Storage' Backups using common Linux tools

If your container has the OpenSSH client or rsync packages installed, one can use the kubectl exec command to copy or synchonize another storage location.

# get pod id of your pod
kubectl -n <namespace> get pods

# run scp to transfer data from the pod to a remote storage location
kubectl -n <namespace> <pod> exec -- scp -r /data username@negishi.rcac.purdue.edu:~/backup

Link to section 'Automating Backups' of 'Storage' Automating Backups

Kubernetes CronJob resources can be used with the commands above to create an automated backup solution. For more information, refer to the Kubernetes documentation.

Link to section 'Object Storage' of 'Storage' Object Storage

Geddes provides S3 compatible object storage from the endpoint https://s3-prod.geddes.rcac.purdue.edu.

S3 access can be requested by contacting support. Access keys will be provided via Filelocker.

Link to section 'Accessing Object Storage' of 'Storage' Accessing Object Storage

The S3 endpoint provided by Geddes can be accessed in multiple ways. Two popular options for interacting with S3 storage via the command line and GUI are listed below.

S3cmd is a free command line tool for managing data in S3 compatible storage resources that works on Linux and Mac.

Download: https://s3tools.org/download
How-To Documentation: https://s3tools.org/s3cmd-howto

Cyberduck is a free server and cloud storage browser that can be used on Windows and Mac.

Download and install Cyberduck
Launch Cyberduck
Click + Open Connection at the top of the UI.
Select S3 from the dropdown menu
Fill in Server, Access Key ID and Secret Access Key fields
Click Connect
You can now right click to bring up a menu of actions that can be performed against the storage endpoint

Further information about using Cyberduck can be found on the Cyberduck documentation site.

Link to section 'Accessing and Mounting Depot' of 'Storage' Accessing and Mounting Depot

Contact support to request access. Make sure to provide the Geddes namespace that will be accessing depot and the $PATH to your user/lab depot space. Once Access has been approved and an admin has created the needed Persistent Volumes for depot you can move on to the steps below.

The overall process is:

Submit request.
1. An admin will create the needed Persistent Volume needed to access your depot space and will provide you with the name pv-depot-<your-pv-name>
Create Kubernetes secrets for Depot username/password authentication.
Create a Persistent Volume Claim via Rancher UI or Kubectl .
Use that claim for your workloads/pods to mount depot.

Create k8 username/password secret for depot auth

From the rancher UI, use the left navigation bar to select Storage > Secrets
Click Create at the top right
Select Oqaque and fill out the form.
1. Make sure select the namespace that will be accessing depot
2. Name should be depot-credentials-<myusername>
3. Under the data tab click add to create a second secret key field
4. Provide key/values
  1. Key: username value: <yourUsername>
  2. Key: password value: <yourPassword>
5. Click Create at the bottom right

Create a PersistentVolumeClaim for Depot (Rancher UI)

From the Rancher UI, use the left navigation bar to select Storage > PersistentVolumeCLaims
Click Create at the top right and fill out the form
1. Make sure select the namespace that will be accessing depot
2. Name should be pvc-depot-<yourUsername>
3. Select Use an existing Persistent Volume
4. Use the dropdown to the immediate right to select pv-depot-<your pv name>
5. Click Customize in the form tab on the left
6. Select Many Nodes Read-Write
7. Click Create at the bottom right.

Create a PersistentVolumeClaim for Depot (kubeclt)

Create a yaml file i.e depot-pvc.yaml with the code below

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-depot-<yourUsername>
  namespace: <namespace>
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
  volumeName: pv-depot-<your pv name>
  storageClassName: ""

Replace all the <yourUsername> and <namespace> with the appropriate values.
1. Do not include the example angle brackets < > in your code
Apply the yaml with the command $ kubectl apply -f depot-pvc.yaml