Data Validation

As a critical part of any data transcription or migration project, it is paramount that a full range of quality control and data validation tests are carried out to substantiate and confirm the integrity of the data sets.

In most circumstances, seismic exploration data sets are predominantly field and processed data.  Depending on the data type (stacked, non-stacked, field, seismic, shot gathers, processed) and particularly after data remastering, the data validation procedures and outputs will vary.

Each of these data types requires a different approach.  Remastering can involve the concatenation of data, conversion of data sets from multiplexed formats to demultiplexed formats and a variety of other data manipulations.  It is critical that these conversions be validated to be sure the data is correct.

To ensure accurate and successful validation and verification of the data migration or processing project, several QC procedures and checks should be performed on the output data. These include:

Media & Tape Validation

Aside from the correctness and completeness of the data format, the integrity of the new cartridge media should also be verified.  This would including ensuring the correct number of copies, label checking for accuracy, photography of tapes and labels for cataloguing, loading of tapes to ensure they can be read and validating any RFID tags that contain tape statistics.

MD5 Quality Control - Data Duplication

Innovative MD5 technology is used to provide the certainty our clients need for straight data duplication projects. This technology works by producing a unique 256bit signature for every unique file that is copied. Every file produces a unique signature which can only be produced by that file or one that is identical to it. Any two files that have the same MD5 value must, by definition, be exact duplicates.

Quality Control Reports

The following QC products can be produced:

  • A summary report of tapes that have passed and failed QC and reasons for any deficiencies
  • A summary listing for each output tape, (both copies) containing total files, file sizes, MD5 values, etc
  • For SEGY output, a listing as an ASCII text tile of the EBCDIC header for each SEGY file on each cartridge
  • Near trace and shot gathers as created from the cartridges being QC’ed in SEGY format with a free SEGY Viewer
  • A final written report
  • A complete set of tape images (photographs of the output media).

Validation of Field Data Direct from the Boat

When getting a seismic survey shot, some uncertainty can exist with regards to the quality of newly acquired seismic data coming directly from the seismic vessel into storage.   Now that it is possible to record data directly into the cloud, entire new processes can be done on the data in real time.  This allows companies to make adjustments to recording parameters while the survey is still being shot.  It is standard practice on most surveys that two copies of the data are created on the boat. Once these copies are delivered to land for processing, one copy goes to a processing house for further analysis, and the other often goes directly into long term storage.  While this practice continues, with the advent of cloud acquisition, there is no need to make 2 copies.  In fact, if directed into the cloud, the data will end up having three copies made, all free of charge.

In the conventional acquisition to 2 copies on tape, the copy being processed is fully loaded and validated during processing, while the second copy goes directly into storage and is often never read to confirm that the data quality is acceptable. Years can pass before this second copy is read and after that length of time it is often not possible to determine the cause of any data quality issue should they be found.

Tape Ark offers a unique service to validate these tapes immediately after the data is acquired, or it can be validated in the cloud as it is required before or negating it going into long term offsite data storage.  We can provide detailed file by file, shot by shot reports of the readability of the tapes, the quality of the data, and the total volume of information stored. Problems found with the data quality can then be referred direct to the acquisition company for rapid correction.

For the QC of non-stack seismic data (shot gathers, CDP gathers, etc) we would:

  • Read all gather data from output tapes generated. Inspect the data to check they are loadable
  • Dump and list down all the shot point/CDP content and any necessary information for every line/ sail-line/ swath/ patch, including total number of traces per shot or per CDP
  • Compare the list result with the data list (supporting documents)
  • Write down any tape problem, discrepancy information or data missing/incomplete
  • Plot every 25 or 50 shots/cdps, whichever suitable, in SEGY image files.

For the QC of stacked data we would:

  • Read all data from output tapes generated
  • Inspect the lines to check they are loadable
  • Dump and list down all the trace content of every line/Inline
  • Compare and check the list result with the data list (supporting documents)
  • Write down any tape problem, discrepancy information or data missing/incomplete
  • Plot each 2D line in SEGY image files. For 3D data plot Inline data for every regular increment.

Field Seismic Data (Non-stacked) - for the QC of field seismic data, we would:

  • The record length and sample rate of the data will be checked and reported
  • Blank near trace gather gaps of more than 5 shots will be investigated (where the transcription contractor may have missed a tape, or lost data in the recovery of damaged or deteriorated tapes)
  • All data displays will be checked for anomalous noise levels which may suggest incorrect number format conversion, or patterns or structures of signal in the data that could not reasonably be explained by seismic acquisition activity
  • For SEGY data, EBCDIC headers will be checked for completeness and correctness
  • For SEGD field data, we would inspect the structural integrity of the SEGD shots to ensure there conformance with the SEG standard, and ensure that no errors were introduced during the transcription process
  • All conversion from SEGA,B,C,D etc were performed correctly to SEGY. (Requires that some non-converted data be made available – this is an important issue to confirm).

Processed Data (Stacked) - for the QC of processed seismic data, we would also:

  • The record length and sample rate of the data will be checked and reported
  • CDP gaps of more than 5 CDPs will be investigated. (Where the transcription contractor may have missed a tape, or lost data in the recovery of damaged or deteriorated tapes)
  • All data will be checked for anomalous noise levels which may suggest incorrect number format conversion, or patterns or structures of signal in the data that could not reasonably be explained by seismic acquisition/processing activity
  • EBCDIC and binary headers will be checked for completeness and correctness
  • The header dumps will be examined for validity and conformance with expected data according to the definitions of the SEG format specifications.