Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Center for High Performance Computing includes Research Object Storage System (nicknamed ROSS) as a low-cost option for archiving data.  Although there is a fee for using the object store, the cost of the storage is heavily subsidized by the university, according to current plans. 

Each HPC user can request private space on the object store, tied to their HPC account, for personal use, for example to back up their home directory on Grace's cluster storage, or to hold research data sets when they are not being actively used.  In addition, labs, departments, or other entities can request shared space that can be utilized by all of their users.  ROSS is currently HPC Center runs a object storage system for archiving research data, based on the Dell EMC ECS (Elastic Cloud Storage) system.  The ECS system is now available for use.  Since the ECS system is part of the HPC universe, the HPC team currently is managing it with back end support from , and is jointly managed by the HPC administration team and UAMS IT (i.e. UAMS IT runs the data center infrastructure in which it ROSS sits and the networking infrastructure needed to access it, while the HPC team manages access to the ECS access itself).    

  • Per the plan set out by Dean Morrison,  archival storage on ROSS may be purchased for $70 per TB, with an expected lifespan of 5 years, with no free storage available.  Note that this cost is considerably less expensive than commercial-cloud-based archival storage (e.g. Amazon Glacial storage).
    • The $70/TB charge is for the amount of storage reserved (i.e. the quota limit), and is not based on actual use.
    • Unlike some cloud providers, UAMS does not impose additional data transfer (access, egress or networking) fees for accessing the archival storage.
    • Please coordinate Robin at DBMI to arrange an IDT to the storage core to pay for storage
  • Researchers who sign up for an HPC account can request a 1 TB bucket for their personal use, controlled by quotas.  Send a request to the HPC Help Desk if you want a bucket.
  • Any entity or person that would like quotas over 1 TB may purchase additional capacity at the rate of $0.004 per GB per month, essentially what it cost us to purchase the storage, based on a 5 year life cycle for disks and servers. 
  • Note that Amazon Glacier pricing also starts at $0.004/GB/month; however, Amazon charges for retrieval of data from Glacier (e.g. $0.0025 per GB for slow bulk retrieves plus $0.025 per 1,000 requests plus transfer charges out ranging from $0.00 to $0.09 per GB depending on which network the data is going to).  We do not currently charge anything for transfers out, only for the storage itself.
  • Until the HPC Center accounts are set up, please coordinate Robin to arrange an IDT to DBMI to pay for extended
    • quota requests.
  • Once you’ve picked the size of bucket that you want, your storage pool and have arranged the financial details with Robin, one of the HPC center staff administrators will set up a bucket namespace and quotas for you. 
  • As part of this setup, you may designate one or more namespace administrators who would manage users of and set permissions for your namespace, as well as create and maintain buckets.
  • Although the EMC ECS storage pools have 12+4 error coding (i.e. redundantly stores data) to protect data against failures, there currently is no offsite backup.  (We are actively working to rectify this.)
    • Users who need offsite backup could, for example, send backup copies to Amazon Glacier, Box, or similar systems, with the hope of never having to ever retrieve them except in dire circumstances.  However, the users are responsible for the off site backup costs.
    • We are looking into an option that would allow researchers to send copies of their archival data to the NFS-sponsored OURRStore project, a write once, read seldom research archive at the University of Oklahoma in Norman.  If this pans out, users would only be charged for the media (currently LTO-7/8 tapes, at least 2, preferably 3).  Tapes with 9 TB capacity currently run $50 to $75 each (i.e. $100 to $225 for a set of 2-3, or about $12-25 per TB unformatted).  The expected lifetime of this media is 15 years, with a minimum expected lifetime of 8 years for the equipment needed to read the media.   Taking into account the media lifetime, this storage is about one third the cost of storage on ROSS.  However, OURRStore will not be in production mode until mid 2021, according to the current schedule.
  • Users who need automatic offsite backups or better file performance can still request space on the Research NAS that UAMS IT manages (i.e. the EMC ECS system is not the only game in town).


Once you have your namespace in play, here are some hints for using it, in addition to the  technical info about the object store, explained in a separate article in this wiki.

  • Objects in the archive are stored in buckets that belong to namespaces.  Your namespace administrator may create as many buckets as desired within your namespace.  Note that it is recommended that a namespace have no more than 1000 buckets.
    • On an ECS system like ROSS, a bucket may be accessed with either the S3 or Swift protocol interchangeably.  Of course, certain features available in one object storage API might not be available on the other.
    • On bucket creation, your namespace administrator
    • A bucket has a native format of either S3 or Swift, your choice.  However either object storage API (S3 or Swift) can be used to access a bucket, subject to the limitations of the native format (cross-head support).
    • On bucket creation, we can also configure a bucket for file access (in addition to object access) using either the NFS or HDFS protocol.  Changing the file access option after bucket creation currently requires re-creating the bucket and copying data from the old to the new bucket.  Note that you do not lose object access by enabling file access, but enabling file access on  a bucket may have some minor impacts on object access.  Note that there are other, potentially faster methods for POSIX-style file access to object storage that do not depend on enabling the NFS/HDFS access built into ROSS.
    • The ECS system also offers EMC-proprietary bucket formats (Atmos or CAS) which we are not actively supporting, and which do not offer cross-head support (i.e. you can’t access them with other protocols like S3 or Swift).  They are there for compatibility purposes for older systems/software, if needed. 
    • It is also possible to enable CIFS, or SMB access , but since that to a bucket set up for file access.  (CIFS / SMB is often used for Windows directory shares.  But since CIFS/SMB access goes through a secondary server, performance likely suffers, so we are not recommending it for heavy use.  There are tools available that can allow Windows users to access buckets as if they were mounted file systems.
  • Once the HPC staff your administrator sets up the bucket, you may object users designated by your namespace administrator may use the object APIs, and if you enabled file access, mount the buckets to any system inside the UAMS firewall. 
    • It currently is not possible to access the EMC ECS system from outside the UAMS firewall.
  • Grace, the HPC, has data movers that can assist in staging data to/from Grace’s cluster storage for running HPC jobs.  If you plan on using this feature, please discuss with HPC staff to get it set up for you.
  • Although the EMC ECS storage pools have 12+4 error coding (i.e. redundantly stores data) to protect data against failures, there currently is no offsite backup. 
    • Users who need offsite backup could, for example, send backup copies to Amazon Glacier or similar systems, with the hope of never having to ever retrieve them except in dire circumstances.  However, the users are responsible for the off site backup costs.
    • We are looking into an option that would allow researchers to send copies of their archival data to the NFS-sponsored OURRStore project, a write once, read seldom research archive at the University of Oklahoma in Norman.  If this pans out, users would only be charged for the LTO-7/8 tapes (at least 2, preferably 3).  Tapes with 9 TB capacity currently run $60 to $80 each (i.e. $120 to $240 for a set of 2-3, or about $13-26 per TB unformatted).  However, OURRStore will not be in production mode until late 2019, according to the current schedule.
  • Users who need automatic offsite backups or better file performance can still request space on the Research NAS that UAMS IT manages (i.e. the EMC ECS system is not the only game in town).