Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Remember, the HPC Admins do not back up Grace's shared storage system, since it is only intended to be a staging or scratch area used to run HPC jobs.  In other words, the primary copy of data should be elsewhere, not on Grace.  ROSS is one option for keeping the primary copy of data.  In fact, the main reason UAMS purchased ROSS is as the primary location for storing research data.

Data in ROSS starts as triple replicated on 3 different storage nodes, but then transforms to erasure coding for resilience without major costs.  ROSS currently uses 12/16 error coding, meaning that for every 12 blocks of data stored it writes 16 blocks.  Up to four blocks could be damaged before ROSS is no longer able to recover data if a fifth block were damaged.  In contrast, most RAID systems only have a 1 or 2 drive redundancy.  ROSS scatters the 16 storage blocks across the storage nodes on in a campusdata center, improving the performance of data retrievals and further improving the resilience of the system.  Data can be access from any of the storage nodes.  Currently the two systems (UAMS and UARK) are isolated from each other, not allowing replication.  But the plans to join the two are in progress.  Eventually all of the content in ROSS, regardless of which campus it physically lives on, would be accessible from either campus.  Of course, if data living on one campus is accessed from the other campus, the access will be somewhat slower due to networking delays and bandwidth limitations going cross between campus.

As mentioned, soon Grace will have an option for replicating data in Fayetteville.  However, even in this case I would not consider the copy in Fayetteville as a true backup copy.  The replication is good for maintaining data that needs high availability and equivalent performance regardless of which campus it is accessed from.  Replication also doubles the storage cost since it reduces available storage capacity at twice the rate that non-replicated storage does.  We still recommended that researchers keep backup or archive copies of data somewhere else, even if replication is turned on.

...

The College of Medicine (COM) Associate Dean of Research, in consultation with an advisory group at UAMS, recommended charging charging an up-front payment of $70 per TB for the use of ROSS, good for the 5 year life of the storage as an up-front payment.  This is less than half the current replacement plus maintenance cost for the underlying hardware ($159 per TB).  If one wants replicated storage (i.e. one copy in Little Rock, one copy in Fayetteville), the cost would be double, or $140 per TB for the pair of copies.

...

Warning

Until pricing changes, users should be prepared for the up front $70 per TB per copy up front pricing set by the COM.  Although we are not collecting fees at the moment, retroactive fee collection could begin after the ARCC steering committee settles on the fee schedule.  Any storage in use at the time that ARCC sets the fee schedule would be charged the lower of the $70 fee imposed by COM, or the new fees set by ARCC, for 5 years of storage, retaining the remainder of the 5 year life.  Additional storage would be charged at the fee schedule set by ARCC.

...

There are few restrictions on the types of data that can be stored on ROSS.  Almost anything is allowed, as long as the data storage complies with both governmental, IRB, and UAMS rules and regulations.

Keep in mind that any campuses that are participants in ARCC, and by extension, the Arkansas Research Platform (ARP), have access to ROSS.  Unlike the UAMS Research NAS, which is locked down behind UAMS firewalls hence only accessible inside the UAMS Campus, ROSS is located in the ARCC Science DMZ, accessible by a number of campuses, both within Arkansas and potentially beyond.  As such, it is inappropriate to store in ROSS fully identified patient (PHI) or personal (PII) data, as it could be a violation of UAMS HIPAA or FERPA policies.  ROSS does have the ability to restrict access to buckets and to do server-side, data-at-rest encryption, but these capabilities have not been evaluated as to whether or not they are sufficient for HIPAA or FERPA compliance.  For now, ROSS should not be used for data that is regulated by HIPAA, FERPA, or any other governmental regulation.  De-identified human subject data is allowed.

...

Info
Before requesting access to ROSS, in addition to reading this article, please also read the Access to the Research Object Store System (ROSS) article on this wiki, since it describes key concepts needed to understand how storage is laid out and accessed. 

To initiate access to ROSS for a new personal or group (e.g. project, lab, department) namespace, please send a request via e-mail to HPCAdmin@uams.edu.  In your request, please indicate approximately how much space storage you intend to use in ROSS for the requested namepace, divided into local and replicated amounts.  The HPC Admins will use this information in setting the initial quotas and for capacity planning.  You will be allowed to request increases in quota if needed and space is available.  We would also appreciate a brief description of what you will be using the storage for.  The "what it is used for" assists us in drumming up support (and possibly dollars) for expanding the system. 

If you wish to access an existing group namespace as an object user, please contact the namespace administrator for that namespace and ask to be added as an object user for that namespace.

...