Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

DRAFT

The OURRstore system, an NSF-funded storage archive, is a very cost-effective way to archive data that needs to be securely kept.  OURRstore uses LTO media, which has a bit error rate 10 times lower than spinning disk, with an expected lifetime of 8 to 15 years.  OURRstore increases the reliability and safety of storage by creating redundant copies of the data, and periodically monitoring the health of data in the system.  OURRstore procedures creates at least 2 copies of archived data, though we strongly recommend using the 3 copies option.  One copy stays in the OURRstore robot, and is the 'online' copy, retrievable at any time.  A second copy gets shipped back to UAMS, where we store it in a secure location, giving an offsite backup of our data stored in OURRstore.  In a pinch, we can recover data from the copy sent to us.  The optional (but highly recommended) third copy is taken out of the robot and stored in a secure, environmentally controlled storage facility at the University of Oklahoma Health Sciences Campus.  This third copy allows them to generate replacements for the primary copy should that be necessary without the risk of shipping the backup copy to Oklahoma.

OURRstore is the least expensive, most reliable archival storage option available to UAMS basic science researchers.

Costs to use OURRstore

We are only charged the media costs (extremely low compared to other options) plus a small additional amount for shipping tape to and from Oklahoma.  At current pricing available to UAMS this works out to a one time charge of about $33.55 per TB for the recommended 3 copy option for storage that should keep data safe for at least 8 years, maybe longer.  The less secure 2 copy option would be $22.96 at current pricing for a minimum 8 year storage life.  These prices are expected to drop over time as we qualify less expensive vendors, media prices drop and capacities rise.  The last batch of cartridges that we ordered cost ~$13.50 apiece, with insurance and tracking, from FedEx, and the above numbers reflect that estimated shipping charge.  If you decide to use OURRstore for archiving your data, please be prepared to reimburse us for the media and shipping costs. 

...

Ideally researchers should include the archiving costs in their grant budgets.  This might be difficult for ongoing grants, since they might not have budgeted for archival costs.  Similarly, pilot research projects also might not have a budget for archiving costs.  In some cases, departments might be willing to cover the costs.

Restrictions on data that can be archived on OURRstore

Being an NSF project, there are certain stipulations on the kind of data that can be placed in OURRstore.  OURRstore is intended for NON-CLINICAL STEM RESEARCH DATA that is NOT LEGALLY REGULATED.  Non-STEM data is currently FORBIDDEN on OURRstore, because OURRstore was funded by the NSF, and non-STEM data is outside the NSF’s mandate. 

  1. The data should be relatively static (i.e. does not change), as OURRstore is only intended as a secure, redundant archive, not a backup solution where one is making daily or weekly copies of changing data.  (You may use ROSS, the Research NAS, or a cloud option if you need backup.)
  2. The data must be STEM related data (Science, Technology, Engineering, Math).  NSF's definition of STEM includes physical sciences, biosciences, geosciences, engineering, mathematics, technology (for example, computer and information sciences), and social sciences.
  3. While the data may include deidentified human subject data, it may not be clinical research data (i.e. data directly related to patient care or clinical studies of human disease).  If the human research is basic science research, that is acceptable.
  4. Legally regulated data (for example, HIPAA, Controlled Unclassified Information, FDA clinical trial, ITAR/EAR, FERPA) is currently FORBIDDEN on OURRstore, per their agreement with NSF.
  5. If your files are subject to one or more Institutional Review Board (IRB) agreement(s) governing human subjects research, then it’s YOUR RESPONSIBILITY to ensure full compliance with your IRB agreement(s).

If you decide to use OURRstore for archiving your data, you must insure that your data complies with the above rules.

How to request storage on OURRstore

If you have a means for covering the costs and agree to the data restrictions of OURRstore, please send an e-mail to hpcadmin@uams.edu confirming that you agree, and requesting access.  We will then work with you in archiving your data.

The mechanics of archiving data to OURRstore

Data stored on OURRstore should be collected into compressed archive files, preferably between 20 and 200 GB in length, for the best storage efficiency without excessive access times.  Currently, the absolute minimum size of an archived file is 1 GB.  The absolute maximum size is 1 TB.  These archive files need to be created at UAMS prior to electronic transfer to the OURRstore system.  The initial transfer is disk to disk, hence goes pretty quickly.  Once the data is in the OURRstore disk cache, the OURSStore archive management software will start copying the data onto a media cartridge for safekeeping.  When a cartridge is full, the system makes a copy of the cartridge, ejects it from the system, and the OURRstore team, using the prepaid label that we provide them, ships that copy back to us.  We then store that copy offline in a secure location.  If the optional third copy is requested, the OURRstore system makes that third copy, which is ejected from the system and stored in an environmentally controlled location in Oklahoma.

For collecting data into the archive files, we offer several options, listed here in order of simplicity for the user:

Simple option for data on Grace (possibly the research NAS)

For the simple option, all you need to do is collect your data to be archived into a sub-directory tree with just the files to be archived.  We suggest that you place the sub-directory under a parent directory named "ToBeArchived" in your home directory.  Please name (or rename) the sub-directory tree to be archived with the current date in "yyyy-mm-dd" format, for example "/home/john/ToBeArchived/2021-08-07/".  The sub-sub-directory tree under the dated subdirectory can be organized any way you see fit.  Please use the "mv" command, not "cp" or "rsync", since you eventually want the data to disappear from Grace once it is safely in OURRstore, and you don't want to run into a space crunch while organizing your archive directory.  (Remember, Grace's storage is only supposed to be a temporary holding place for running jobs.) 

...

This method can also be used for data on the research NAS, assuming that the HPC admins can get access to that storage area on the NAS.  For data on the NAS, just create the "ToBeArchive" directory somewhere in your storage area, and let the HPC admin team know the storage area and path to the directory when you send the archive request e-mail.

Slightly less simple option for data on Grace (possibly the research NAS)

If you would rather create the compressed tar files yourself, feel free to do so, and then collect the archive files into the top level your "ToBeArchived" subdirectory in your home directory.  In this option, feel free to use any method of your choosing (e.g. tar, zip, or some custom format) that can collect the data you want archived into files.  The ideal choice should keep the archive files between 20 to 200 GB in length, though OURRstore will accept anything between 1 GB to 1 TB in length.  We encourage you to use compression and encryption for efficiency and safety, but that is your choice. If you do encrypt, please safeguard your encryption key, since no one but you likely know it.  In this option you are responsible for creating your own manifests of the content of your archive files, if desired.  The name of the archive files must be globally unique.  In other words, do not name any 2 archive files with the same name.  They are all going into a single directory in OURRstore, so none of the names of any of the archive files that you create can clash with the name of any archive file that you previously created.  Otherwise you run the risk of losing the previous archive file.

...

This method can also be used for data on the research NAS, assuming that the HPC admins can get access to that storage area on the NAS.  For data on the NAS, just create the "ToBeArchive" directory somewhere in your storage area, and let the HPC admin team know the storage area and path to the directory when you send the archive request e-mail.

Simple option for data on ROSS

For data on ROSS, the simplest way to archive that data is to collect the data to be archived into a bucket.  When ready for archiving, you could optionally change the permissions on the bucket to read-only to minimize the chance of accidental changes, if you wish.  Then send an e-mail to hpcadmin@uams.edu giving them the name of the bucket to archive.   

...

If you prefer to create your own compressed archive files ready for OURRstore you could simply create them and place them in a bucket.  In this case, when you e-mail to hpcadmin@uams.edu the name of the bucket, let them know that you have already generated the compressed archive files in the bucket.  Or alternatively, you could used the "Slightly less simple option for data on Grace", temporarily moving copies of your archive files to Grace.

Complicated Option

To exercise the complicated option, you would need to approach the OURRstore team directly, sign agreements, and go through the mandatory training to get your own account on OURRstore.  You would then be responsible for following all of their rules, for purchasing your own media, for shipping to and from Oklahoma, for creating and tracking your own archive files, etc.  This option really is for the power user who wants full control over the process of archiving and retrieving data with only minimal or no assistance from the the HPC admin team.  This option is also appropriate for users whose data is on systems that the HPC Admins do not have access to.  While we do not encourage people to use this option due to the complications and responsibility of getting your own account on OURRstore, it is a possibility for those who prefer.  For more information, see OURRstore: OU & Regional Research Store

Retrieving Data from OURRstore

How to retrieve archived data depends on which of the above options you used to archive it.

Simple option for data archived from Grace (possibly the research NAS)

When the HPC Admins archived data for you, they left manifests of which files or objects are in what compressed tar file in your in your /home/archived directory.  You can search through those manifests (e.g. using grep) to find which archive file or files the data you are interested in is located.  Send an e-mail to hpcadmin@uams.edu with the name of the subdirectory tree that you would like retrieved and the name of the archive file or files it is in..  If you lost the manifests, don't fret.  The HPC Admins kept a backup copy and can help.   In the case of a lost manifest, still send an e-mail to hpcadmin@uams.edu with the name of the subdirectory tree that you would like retrieved and the approximate date that it was archived.  The HPC admins will do their best to find the archive file names in their manifest backup copies.  However, since yours is the primary copy, please do not lose it, as there is always a chance that the backup copy gets lost as well.

Once the HPC admins receive your e-mail, they will pull the pertinent archive files from OURRstore and restore the archived data to your /home/<username>/RestoredArchives directory.  In general, your data should be restored within 1 business/work day (i.e. things might not get restored on weekends and holidays).  The HPC admins will then notify you by e-mail that your data are restored.

Slightly less simple option for data archived from Grace (possibly the research NAS)

In this option, since you created the archive files yourself, the HPC Admins did not create manifest files.  It is up to you to keep track of what data is in which archive file.  When you want to retrieve one of your archive files, send an e-mail to hpcadmin@uams.edu with the names of the archive files that you would like retrieved.  

...

Once you get notification that your archive files have been restored, you may then use whatever means you chose to pull data from those files.  Don't forget that you could have compressed and encrypted the archive files before you archived them.  Remember that the HPC Admins would not know the encryption key if you encrypted the files before archiving, and cannot help if you have lost it.

Simple option for data archived from ROSS

Simply send an e-mail to hpcadmin@uams.edu with the name of the bucket that you want restored, and a name prefix of the objects that you want restored.  Leave the name prefix blank if you want the entire bucket restored.  You should also include the namespace where the bucket should be located.  

Once the HPC admins receive your e-mail, they will restore the objects into the bucket in the namespace, and notify you when it is ready.

Complicated Option

You are in complete control of your retrieval of archived data, since the archived data is on your OURRstore account.  The HPC admins are not involved.