Case Study
Most of the large global oil & gas exploration companies maintain legacy exploration data on physical backup tape. These items can sit across numerous warehouses globally where the data is not addressable by today’s AI and ML tools, and the data is becoming less recoverable as the tapes deteriorate over time. Add to this, the logistical complexity of managing such large collections of physical tapes. With some of the major exploration companies in possession of over 10 Million tapes containing data from as early as the 1960’s to present day, it is no wonder that at times, things can be difficult to locate.
To overcome these problems and leverage the true potential of their legacy data, a global oil & gas supermajor customer enlisted Tape Ark to assist in planning their migration of legacy data to cloud storage.
The Problem
The customer had a collection of tapes stored in Houston, TX and Perth, Western Australia. Several issues with retrieving data had led the team to believe that the inventory listings of the stored tapes they had were incorrect. Without reliable information, the team could not accurately plan the data migration, or forecast the ongoing costs associated with cloud storage. In addition, there was a particular parcel of data that they had not been able to locate for over 3 decades, that they had a focused interest in locating. The company needed a detailed review of their entire tape collection prior to migration, but did not have the resources or experience to perform this efficiently.
The Solution
Tape Ark – global experts in tape migration, performed a Comprehensive Media Audit (CMA) on a collection of over 2,000 pieces of media across 16 unique media types. The CMA provided the customer with a detailed understanding of their collection, without the cost of completely reading each tape. This prudent, pre-migration activity helps to reduce the cost of cloud migration by revealing:
- duplicate datasets; so they can be excluded from the migration process
- data that is licensed, proprietary and owned; to enable informed decision making
- the data footprint; to estimate future cloud storage costs
- media risk profile; to prioritize deteriorating tapes
The key steps in the process are set out as per below;
Tape Cataloguing – Tape Ark received the tapes at its Mass Ingest Facility in Perth. Upon receipt of the archive, a media audit was performed which included photographing each tape, reading RFID data chips, and applying a supplier QR code. This created a detailed catalogue of tapes and hard drives for quick retrieval, should they be needed mid-project, minimising disruption to the company’s operations. RFID data chips provided information regarding tape contents, and an accurate data footprint for each tape, critical for predicting future cloud storage costs.
Deep Dive – Tape Ark used Amazon Textract and Amazon Rekognition to automatically lift the hand-written material from the tape labels to create a deep index of the data. This included information such as data format, data type, location, survey name, project ID’s, dates etc. Amazon Quicksight was used to create geographic maps of data in the collection, based on the place names documented on the labels. Search criteria were set up to group data formats like SEGY and SEGD, and AWS Rekognition was used to auto-classify media types, ages of media, recording vendor, and even the brand of tape used – all without an operator at Tape Ark even typing on a keyboard. Tape Ark was also able to identify multiclient, copyrighted, licensed data, and other datasets of interest.
The Outcome
The final result was an up to date inventory listing and detailed report of the data collection including a media risk profile. The customer was far better positioned to plan their upcoming migration now they had gained certainty about the data footprint to be migrated, and the tapes to prioritise. In addition, the missing data they were hunting for from the 1980’s was found on a roll of microfiche that solved the missing data problem the company had been trying to solve for over 25 years! A fantastic result!