HOW BIG IS BIG DATA?
For computer scientists and data scientists, data is now as important as writing programs.
According to IBM, approximately 2.5 quintillion bytes (2.5 exabytes) of data are created daily, and 90% of the world’s data was created in the last two years. According to IDC, the global data supply will reach 175 zettabytes (equal to 175 trillion gigabytes or 175 billion terabytes) annually by 2025. Consider the following examples of various popular data measures.
- https://www.ibm.com/blogs/watson/2016/06/welcome-to-the-world-of-a-i/.
- https://www.networkworld.com/article/3325397/storage/idc-expect-175zettabytes-of-data-worldwide-by-2025.html.
Megabytes (MB)
One megabyte is about one million (actually 2 ) bytes. Many of the files we use on a daily basis require one or more MBs of storage. Some examples include:
- MP3 audio files—Highquality MP3s range from 1 to 2.4 MB per minute.
https://www.audiomountain.com/tech/audio-file-size.html.
- Photos—JPEG format photos taken on a digital camera can require about 8 to 10 MB per photo.
- Video—Smartphone cameras can record video at various resolutions. Each minute of video can require many megabytes of storage. For example, on one of our iPhones, the Camera settings app reports that 1080p video at 30 framespersecond (FPS) requires 130 MB/minute and 4K video at 30 FPS requires 350 MB/minute.
Gigabytes (GB)
One gigabyte is about 1000 megabytes (actually 2 bytes). A duallayer DVD can store up to 8.5 GB , which translates to:
https://en.wikipedia.org/wiki/DVD.
- as much as 141 hours of MP3 audio,
- approximately 1000 photos from a 16megapixel camera,
- approximately 7.7 minutes of 1080p video at 30 FPS, or
- approximately 2.85 minutes of 4K video at 30 FPS.
The current highestcapacity Ultra HD Bluray discs can store up to 100 GB of video. Streaming a 4K movie can use between 7 and 10 GB per hour (highly compressed).
Terabytes (TB)
One terabyte is about 1000 gigabytes (actually 2 bytes). Recent disk drives for desktop computers come in sizes up to 15 TB, which is equivalent to:
https://www.zdnet.com/article/worldsbiggest-hard-drive-meet-western-digitals15tb-monster/.
- approximately 28 years of MP3 audio,
- approximately 1.68 million photos from a 16megapixel camera,
- approximately 226 hours of 1080p video at 30 FPS and
- approximately 84 hours of 4K video at 30 FPS.
Nimbus Data now has the largest solidstate drive (SSD) at 100 TB, which can store 6.67 times the 15TB examples of audio, photos and video listed above.
Petabytes, Exabytes and Zettabytes
There are nearly four billion people online creating about 2.5 quintillion bytes of data each day —that’s 2500 petabytes (each petabyte is about 1000 terabytes) or 2.5 exabytes (each exabyte is about 1000 petabytes). According to a March 2016 AnalyticsWeek article, within five years there will be over 50 billion devices connected to the Internet (most of them through the Internet of Things, which we discuss in Sections 1.6.2 and 16.8) and by 2020 we’ll be producing 1.7 megabytes of new data every second for every person on the planet. At today’s numbers (approximately 7.7 billion people ), that’s about
- 13 petabytes of new data per second,
- 780 petabytes per minute,
- 46,800 petabytes (46.8 exabytes) per hour and
- 1,123 exabytes per day—that’s 1.123 zettabytes (ZB) per day (each zettabyte is about 1000 exabytes).
That’s the equivalent of over 5.5 million hours (over 600 years) of 4K video every day or pproximately 116 billion photos every day!
Additional Big-Data Stats
For an entertaining realtime sense of big data, check out https://www.internetlivestats.com, with various statistics, including the numbers so far today of
- Google searches.
- Tweets.
- Videos viewed on YouTube.
- Photos uploaded on Instagram.
You can click each statistic to drill down for more information. For instance, they say over 250 billion tweets were sent in 2018.
Some other interesting bigdata facts:
- Every hour, YouTube users upload 24,000 hours of video, and almost 1 billion hours of video are watched on YouTube every day.
https://www.brandwatch.com/blog/youtubestats/.
- Every second, there are 51,773 GBs (or 51.773 TBs) of Internet traffic, 7894 tweets sent, 64,332 Google searches and 72,029 YouTube videos viewed.
http://www.internetlivestats.com/onesecond.
- On Facebook each day there are 800 million “likes,” 60 million emojis are sent, and there are over two billion searches of the more than 2.5 trillion Facebook posts since the site’s inception.
https://mashable.com/2017/07/17/facebookworldemojiday/.
https://mashable.com/2017/07/17/facebookworldemojiday/.
https://techcrunch.com/2016/07/27/facebookwillmakeyoutalk/.
- In June 2017, Will Marshall, CEO of Planet, said the company has 142 satellites that image the whole planet’s land mass once per day. They add one million images and seven TBs of new data each day. Together with their partners, they’re using machine learning on that data to improve crop yields, see how many ships are in a given port and track eforestation. With respect to Amazon deforestation, he said: “Used to be we’d wake up after a few years and there’s a big hole in the Amazon. Now we can literally count every tree on the planet every day.”
https://www.bloomberg.com/news/videos/20170630/learning-from-planets-shoe-boxedsize-dsatellitesvideo, June 30, 2017.
Domo, Inc. has a nice infographic called “Data Never Sleeps 6.0” showing how much data is generated every minute, including:
https://www.domo.com/learn/dataneversleeps6.
- 473,400 tweets sent.
- 2,083,333 Snapchat photos shared.
- 97,222 hours of Netflix video viewed.
- 12,986,111 million text messages sent.
- 49,380 Instagram posts.
- 176,220 Skype calls.
- 750,000 Spotify songs streamed.
- 3,877,140 Google searches.
- 4,333,560 YouTube videos watched.
Computing Power Over the Years
Data is getting more massive and so is the computing power for processing it. The performance of today’s processors is often measured in terms of FLOPS (floatingpoint operations per second). In the early to mid1990s, the fastest supercomputer speeds were measured in gigaflops (10 FLOPS). By the late 1990s, Intel produced the first teraflop (10 FLOPS) supercomputers. In the earlytomid 2000s, speeds reached hundreds of teraflops, then in 2008, IBM released the first petaflop (10 FLOPS) supercomputer. Currently, the fastest supercomputer—the IBM Summit, located at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL)—is capable of 122.3 petaflops.
Distributed computing can link thousands of personal computers via the Internet to produce even more FLOPS. In late 2016, the Folding@home network—a distributed network in which people volunteer their personal computers’ resources for use in disease research and drug design —was capable of over 100 petaflops. Companies like IBM are now working toward supercomputers capable of exaflops (10 FLOPS).
- https://en.wikipedia.org/wiki/Folding@home.
- https://en.wikipedia.org/wiki/FLOPS.
- https://www.ibm.com/blogs/research/2017/06/supercomputingweather-modelexascale/.
The Quantum Computers now under development theoretically could operate at 18,000,000,000,000,000,000 times the speed of today’s “conventional computers”! This number is so extraordinary that in one second, a quantum computer theoretically could do staggeringly more calculations than the total that have been done by all computers since the world’s first computer appeared. This almost unimaginable computing power could wreak havoc with blockchainbased cryptocurrencies like Bitcoin. Engineers are already rethinking blockchain to prepare for such massive increases in computing power.
- https://medium.com/@n.biedrzycki/onlygod-can-count-that-fast-the-world-ofquantum-computing-406a0a91fcf4.
- https://singularityhub.com/2017/11/05/isquantum-computing-an-existential-threatto-blockchain-technology/.
The history of supercomputing power is that it eventually works its way down from research labs, where extraordinary amounts of money have been spent to achieve those performance numbers, into “reasonably priced” commercial computer systems and even desktop computers, laptops, tablets and smartphones.
Computing power’s cost continues to decline, especially with cloud computing. People used to ask the question, “How much computing power do I need on my system to deal with my peak processing needs?” Today, that thinking has shifted to “Can I quickly carve out on the cloud what I need temporarily for my most demanding computing chores?” You pay for only what you use to accomplish a given task.
Processing the World’s Data Requires Lots of Electricity
Data from the world’s Internetconnected devices is exploding, and processing that data requires tremendous amounts of energy. According to a recent article, energy use for processing data in 2015 was growing at 20% per year and consuming approximately three to five percent of the world’s power. The article says that total dataprocessing power consumption could reach 20% by 2025.
Another enormous electricity consumer is the blockchainbased cryptocurrency Bitcoin. Processing just one Bitcoin transaction uses approximately the same amount of energy as powering the average American home for a week! The energy use comes from the process Bitcoin “miners” use to prove that transaction data is valid.
According to some estimates, a year of Bitcoin transactions consumes more energy than many countries. Together, Bitcoin and Ethereum (another popular blockchainbased platform and cryptocurrency) consume more energy per year than Israel and almost as much as Greece.
Morgan Stanley predicted in 2018 that “the electricity consumption required to create cryptocurrencies this year could actually outpace the firm’s projected global electric vehicle demand—in 2025.” This situation is unsustainable, especially given the huge interest in blockchainbased applications, even beyond the cryptocurrency explosion. The blockchain community is working on fixes.
Big-Data Opportunities
The bigdata explosion is likely to continue exponentially for years to come. With 50 billion computing devices on the horizon, we can only imagine how many more there will be over the next few decades. It’s crucial for businesses, governments, the military and even individuals to get a handle on all this data.
It’s interesting that some of the best writings about big data, data science, artificial intelligence and more are coming out of distinguished business organizations, such as J.P. Morgan, McKinsey and more. Big data’s appeal to big business is undeniable given the rapidly accelerating accomplishments. Many companies are making significant investments and getting valuable results through technologies in this article, such as big data, machine learning, deep learning and naturallanguage processing. This is forcing competitors to invest as well, rapidly increasing the need for computing professionals with datascience and computer science experience. This growth is likely to continue for many years.