The notion of bigger being better is arguably no more fitting than when it comes to data.

Simply put – some things can only be done at scale.

With an estimated 1.5 billion backup tapes stored in offsite warehouses around the world, there is no shortage of data; the challenge is liberating that data from aging media into something accessible and fit for purpose.

Today’s cutting edge technology allows us to intertwine fresh thinking and big data – but are we making the most of it?

Fuelling Big Data for a Better AI Outcome

Tape Ark President and CEO Guy C. Holmes spoke at the Intelligent Health Inspired AI virtual summit on May 27th, 2020. He outlined to the audience, in excess of 450 members, the untapped opportunity that awaits through accessing their historical data and pairing it with cutting edge technologies to fuel AI initiatives and achieve breakthrough discoveries.

The following is an abridged extract of Guy’s presentation. To watch the full 17-minute video scroll to the bottom of this page or click here.

Historical Legacy Data

To understand the future we first must look to the past.

In her book Turning Men Into Stone, Criena Fitzgerald tells the story of Slavic immigrant, Matti Dressa, from 1921.

As an underground miner working in the Golden Mile in Western Australia, he was admired by his workmates for his strength and work ethic. Unfortunately, Dressa fell ill and was diagnosed with fibrosis of the lung and rheumatism – forcing him to retire and seek work elsewhere.

After being examined by physicians, Dressa tested positive for Tuberculosis and never again found permanent employment, leaving him to support a family of four children and a wife on a pension of $2 per week until 1926 when he died at the age of 52.

Fast forward to the 1990s and migrant laborers from China’s Hunan Province are employed as drillers to bore holes into the bedrock in Shenzhen to support the foundations of the city’s subway lines. The drillers wore little to no protection from the silica dust that surrounded them.

In the last decade, more than 100 former laborers from Hunan have died of silicosis, and a further 600 are suffering and slowly dying. These laborers are but a small section of the estimated 875,000 Chinese workers who have been diagnosed with pneumoconiosis, a broad class of lung disease associated with dust inhalation.

Liberating Data for Discovery

In the early 1900s, no historical records existed in Western Australia regarding silicosis prevention or treatment that could help physicians monitor, diagnose, or treat the illness.

However, in what would become pivotal data in the future, more than 80,000 chest x-rays were captured alongside routine medical checks, with the results handwritten and documented on note cards no bigger than a Christmas card. Each card contained the name of the worker, but with little other personal details to identify them.

These x-rays sat dormant for many years until 2005 when, in conjunction with physicians from the University of Western Australia, a team painstakingly linked the various cards and x-rays together, creating a digital archive of this collection that would represent one of the best collections of historical chest x-rays in the world, perfectly primed for data-heavy AI and ML initiatives.

Applying New Technology to Big Data

In late 2019, Adam Lashley, a young data scientist from Florida, USA created an auto-classifier of the x-ray images to automate the identification of silicosis from the legacy films. Whilst the project is not yet complete, early-stage detection is now more of a possibility than ever before.

If near-on century-old x-rays can lead to this outcome, it begs the question; what other discoveries, preventions, insights, and analyses could become available if valuable data was not sitting on tapes on a shelf, inaccessible to modern cloud-enabled tools?

X-rays, MRI’s, CT Scans, cardiac ultrasounds, catheter recordings, blood results, and ocular imaging from 1970 through to 2010 (when cloud access became more prevalent) sit completely idle, representing the largest base layer of images and diagnostics that would feed the worlds machine learning models for years to come.

However, the opportunities for discovery also sit outside the health industry.

In the oil and gas space, for example, the subsea pipeline industry has discovered using their older collection of pipeline x-ray welds, they can run comparative analytics to look for changes in pipe weld integrity.

The Future

Knowledge is power – but only if the knowledge is accessible.

To move forward we must look back and understand what data we have and get thinking about how we use it – this is a big data frame of mind.

Our Commitment

Tape Ark is committed to doing data justice.

Find out how we can work with your organization to identify, preserve, and migrate your data to maximize its value for today, tomorrow, and beyond by reaching out to a member of our expert team.


Fitzgerald, C., 2016. Turning Men Into Stone. Hesperian Press.

Shih, G., 2019. They Built A Chinese Boomtown. It Left Them Dying Of Lung Disease With Nowhere To Turn.. [online] The Washington Post.

Additional imagery sourced from Guy C. Holmes personal collection.