This article was written for our sponsor, UNC School of Information and Library Science.

In this digital age, you can store millions of documents on a device smaller than your thumb. But can you find that photo you want to share without scrolling through your phone for an hour? Can you locate last year’s sales report without clicking on folder after folder? Can you open that file with your research data from 10 years ago, or will the current software reject something created in an older version?

From flash drives to the cloud, we now have many options for saving vast amounts of data and digital files, but our ability to keep those files organized and accessible remains a challenge, and one that is growing every day.

“We are inundated with lots of digital data, whether in our personal life or our professional life. Many things in our daily lives are turning digital — from bank statements and tax filings to personal pictures,” said Arcot Rajasekar, a professor at the UNC School of Information and Library Science and chief scientist at the Renaissance Computing Institute. “We have been somewhat good at keeping paper-based information, but many of us are not trained on how to keep our digital artifacts for the long term.”

Helen Tibbo, professor and director of the Professional Science Master’s degree program in digital curation and management at UNC SILS, suggests a “life cycle approach” to data curation and management.

“We want the data to live its best life for as long as it’s meant to live,” said Tibbo, adding that some data should be disposed of after a certain amount of time.

For example, an email’s life cycle can last a minute before it’s deleted, whereas a birth certificate lasts a lifetime. With more than 95 million photos and videos being uploaded to Instagram every day, if we don’t learn better management, we’ll “drown in the digital deluge” as Rajasekar puts it.

The Importance of Data Management

Companies especially should consider the importance of curation and management as the efficiency, accessibility and profitability of their businesses rely on digital data more and more.

When curating and managing digital content, businesses should think about the reliability, discoverability and renderability of it. A file that you can’t find is a file lost. A file that you can’t make sense of is meaningless.

“Usability and understandability over time — those are things that we want to preserve, not for preservation sake, but for as long as the data can be useful,” Tibbo explained. “When people hear the word ‘data’ they are often thinking of Excel sheets with numbers. That’s one type of data, but you can also think of that word as any digital object. It could be a document, a manuscript, an image or a video.”

In 2018, the Facebook and Cambridge Analytica scandal highlighted the importance of digital data and privacy, and a week rarely passes without reports of a company exposing the personal information of its users, due to negligence or external attacks.

Rajasekar said as a scientist, he’s witnessed many instances where data has been lost, or the person who collected the data and knows the schema for the data has moved on. Both technological and human roadblocks can lead to lost data if not managed properly.

The 5 Vs of Digital Data

Big data is often categorized using the five Vs:

  • Volume – the amount and size of your data
  • Velocity – how fast the data is coming and from how many sources
  • Variety – the different data types and formats
  • Veracity – is the data trustworthy and of good quality
  • Value – the level of importance of the data

Juggling these different components can be difficult, Rajasekar admits. “If we don’t do it effectively and efficiently, it can lead to disaster,”

Tibbo advised implementing a streamlined process from the get-go.

“At the ingest, when you’re pulling information, be very careful that you know where it came from and that it’s reliable,” she said. “Document how you do everything,”

Tibbo said one of the main problems with digital data is it often isn’t processed, catalogued or described in a way that gives it longevity. Whether it’s stored on something like a floppy disk or CD that’s unusable with current technology, or if it’s only understandable, findable or accessible to the person who created it, it lacks usefulness.

Storage devices can also fail.

“All of those old CDs? The content on them is going corrupt,” Tibbo said. “It was supposed to last for 100 years, but of course, didn’t.”

Another problem many companies have is that data is created and stored in silos.

“If you’re in an organization where everybody’s collecting their own data, writing their reports, and then storing it on their hard drives or even some common drive — people often don’t really know it’s there,” she said. “It’s not described, and you’re never going to be able to use that very well in the future. It’s like throwing money away; a little bit like finding food in the back of your refrigerator after it has gone bad.”

Companies would do well to come up with a digital filing system, a designated area (like a server) where things are stored, implementing password and firewall protection, and making sure that all data includes metadata — descriptions of the data itself.

Added Tibbo, “If you do not directly put energy into maintaining digital content, it will suffer.”

This article was written for our sponsor, UNC School of Information and Library Science.