The Archive Tape Media Dilemma

CDS_Backup Tapes with barcodes.jpg

Oil and Gas explorers are pioneers at utilising data for data driven decision-making. Seismic data acquisition is often the first step in oil and gas exploration and the industry has traditionally stored these vast quantities of survey data on tape due to its low cost. Increasingly, oil companies are recognising the potential to exploit their legacy data collections by utilising the scalability of cloud computing including machine learning and advanced analytics to aid their exploration activities. So how do you migrate potentially millions of legacy tapes, some in a fragile condition, some in obscure and obsolete formats to the cloud with predictable cost and timing. How do you also navigate the ‘data as a hostage’ commercial terms often imposed by the offsite records management industry that are reluctant to relinquish your tapes and the opportunity to charge you a monthly fee for your tapes to sit on their shelves?

Tape Ark shares the same vision as these Oil and Gas companies and has set its sights on getting data into more accessible platforms as efficiently and inexpensively as possible, while at the same time adding as much value to the data as possible. Tape Ark’s service model puts the needs of the end user first and encourages software vendors to focus on developing compelling SAAS products, which attract customer loyalty rather than non-collaborative proprietary closed systems.

Tape Ark’s comprehensive Media Audit is a programme specifically designed to help oil and gas companies access their data holdings, (both on tape and in private clouds) and plan out the most cost effective way to consolidate the data in readiness for well-informed decision making.

Objectives of A Tape Media Audit

It is the belief of Tape Ark that oil and gas companies have adopted a range of industry standards in terms of data formats (such as SEG formats), databases (such as the PPDM data model), and a wide range of other standards to ensure that oil companies can exchange data between themselves when in joint exploration or after M&A activities. These standards never included data storage, partly because tape was a key storage receptacle, and partly because the cloud and big data and analytics tools were not available or compelling enough to execute on.   Now, migrating collections of data to the cloud – especially from large tape-based media collections - is commercially viable, but also fraught with uncertainties such as:  

CDS_Data Duplication 2.jpg
  • How many tapes do I have?

  • What volume of data is there on my tapes?

  • Will my tapes be difficult to read?

  • Will there be data loss?

  • Will I read duplicate data unnecessarily?

  • What penalties will apply to picking my tapes from offsite storage?

  • How to minimise exit costs from ‘private cloud’ storage?

  • Which cloud storage tiers provide the best balance of cost and service level?

  • How can I forecast the cloud storage costs.

If you can’t predict data volumes on tape, then you can’t predict the pricing of cloud-based storage, and not being able to predict your cloud storage costs moving forward is not a reasonable way to kick start a digital innovation strategy.

The Tape Media Audit Process

Tape Ark views an initial data audit as perhaps the most important aspect of a project that will determine the long-term success of a cloud storage strategy. Almost all activities in a digital transformation to the cloud hinge on knowing what you have to migrate, and if the audit is done correctly, should deliver significant and long term benefits to the overall project. Tape Ark has performed data audits for significant legacy seismic collections over the past 18 years across the globe with our largest having been done for a major Italian oil company with data on three continents. Geographic spread, numerous storage provider locations and vendors, and decades of old tapes present no shortage of concerns for oil companies in understanding their collections. However, undertaking a comprehensive media audit should deliver the following benefits to alleviate those concerns:

  1. Confirm with a high degree of accuracy exact volumes of media

  2. Confirm with a high degree of accuracy not only media counts, but also the types of media in the collection – distinguishing between every type and sub-type of media such as LTO1,2,3, etc. Knowing if an LTO is a version 1,2,3, or 6 is extremely important in predicting the volume of data that tapes are likely to have recorded on them. As an example, LTO1 has a capacity of 100Gb, while LTO6 has a 2.5Tb capacity (25 times more data). Knowing which tapes are which can help reduce surprises for cloud ingest timing and ongoing storage fees.

  3. The use of automation and Artificial Intelligence to conduct the audit correctly and quickly can produce stunningly accurate and in depth results that aid in numerous activities moving forward.

  4. Provide a current and accurate stock take of all media throughout the project via a project dashboard that also provides insights into data volumes expected for cloud ingest, at risk media, country of origin, etc.

  5. Using AI will immediately identify and characterise the most at-risk media so that project planning can be done to decide on tape ingest priorities and sequence order.

  6. Identify duplicate media across all tapes (even those on separate continents and in separate storage facilities) to reduce the potential of duplicate data being ingested into the cloud, reduce costs, and help increase price certainty.

  7. Allow a company to assign the right media types to the most efficient transcription partner on capacity by media type. (if a provider has no 9 track work on, you can send the right kind of tapes to the right provider to speed up the work flow, and reduce the costs.

  8. Save significant costs by not having to pay for ongoing storage during the project.The cost savings will likely cover the costs of the audit itself.

The Tape Audit Workflow

Critically, Tape Ark recommends the physical media audit is completed at its own storage locations. From vast experience, Tape Ark knows that audits conducted at records management companies (tape vaults) are difficult, time consuming and incur unnecessary fees. Tape Ark proposes that a different process be employed which aims to provide significant cost savings, a more rapid audit and a massive benefit in the overall project. The suggested process is as follows:

3590 arm2.JPG
  1. Ship all media from your current supplier(s) to any of Tape Ark’s locations in the EU and USA using secure chain of custody transport.

  2. Tapes will be securely warehoused during the audit and transcription process in a fully climate controlled and fire proof storage area.

  3. The successful receipt of all media is logged through Tape Ark’s audit system which uses QR codes and photography and a workflow that is integrated with the incumbent media storage company.

  4. All tapes are barcoded and photographed, followed by the application of AI to process the images and build a detailed profile. The profile will include risk, age, media type, brand, geography, deduplication potential and physical condition.

  5. Tape Ark will record not only media type, but also the brand of media. Through our extensive history of reading aging media assets, we have come to know that the brand of a tape tends to be the deciding factor of how deterioration will affect the data. The AI tools will build a brand profile and automatically produce risk profiles for the data assets.

  6. All navigation tapes encountered in the audit will be transcribed (unless duplicates) so that they can immediately be imported into the GIS system to gain a map based view of the audit as well as a database view.

  7. Metadata capture from the media will be gathered to build a survey and line database that can be connected to the navigation transcribed during the audit process.

  8. During the Audit, we will also perform a duplication check before performing work or releasing media to transcription companies.

  9. The Audit will provide for a single point of contact which will ensure that any duplication of transcription effort is minimised (to eliminate two vendors working on the same data without the knowledge that they are duplicating work).

  10. The Audit will produce the beginning of a PPDM database that can be further embellished during transcription and other parts of the project. The database can be run on a global basis from a centralised system so that comparison of data can be done from all sites that an audit is being performed in a combined manner.  Essentially many audits going on in various locations – with a single view of progress and comparison of datasets.

  11. To further eliminate unnecessary transcription costs, Tape Ark will perform a comparison with the Norwegian (Diskos), United Kingdom (OGA), Australia (GA), Indonesian (DG) and other databases as directed by the client in an effort to locate data that may be in the public domain that could eliminate data transcription and storage of data that is already freely available from other sources.  

  12. No cost, or cost-offset plan is provided for customers currently storing data within private cloud environments.

  13. Ongoing physical media storage will be provided at a nominal cost while the audit is being undertaken and this will greatly assist in offsetting the withdrawal charges from the incumbent provider.

  14. On completion a map view of the data as well as a database view and comprehensive audit document are provided so that tape requests can be made during the project using the Tape Ark portal.

Commercial Benefits

Tape Ark’s efficient media audit empowers Oil and Gas companies to understand the potential data volume prior to migrating media collections to the cloud, plan for the relevant cloud storage tiers and sequence tape ingest based on known media risk profiles. Duplicate data is identified to reduce total cloud storage volume and price certainty is achieved. Tape Ark’s low cost media storage model finally gives Oil and Gas companies the ability to understand their valuable legacy collections by offsetting the exorbitant ‘picking’ and exit fees charged the tape storage providers.  

The Tape Ark media audit is applicable all customers looking to migrate their legacy tape collections, regardless of which cloud platform the data will be migrated to including Microsoft Azure, AWS, Google Cloud Platform, and other cloud providers.

Click here for more information on: