Big Data Architect’s Handbook
上QQ阅读APP看书,第一时间看更新

Volume

In earlier years, company data only referred to the data created by their employees. Now, as the use of technology increases, it is not only data created by employees but also the data generated by machines used by the companies and their customers. Additionally, with the evolution of social media and other internet resources, people are posting and uploading so much content, videos, photos, tweets, and so on. Just imagine; the world's population is 7 billion, and almost 6 billion of them have cell phones. A cell phone itself contains many sensors, such as a gyro-meter, which generates data for each event, which is now being collected and analyzed.

When we talk about volume in a big data context, it is an amount of data that is massive with respect to the processing system that cannot be gathered, stored, and processed using traditional approaches. It is data at rest that is already collected and streaming data that is continuously being generated.

Take the example of Facebook. They have 2 billion active users who are continuously using this social networking site to share their statuses, photos, videos, commenting on each other's posts, likes, dislikes, and many more activities. As per the statistics provided by Facebook, a daily 600 TB of data is being ingested into the database of Facebook. The following graph represents the data that was there in previous years, the current situation and where it is headed in future:

Past, present and future data growth

Take another example of a jet airplane. One statistic shows that it generates 10 TB of data for every hour of flight time. Now imagine, with thousands of flights each day, how the amount of data generated may reach many petabytes every day.

In the last two years, the amount of data generated is equal to 90% of the data ever created. The world's data is doubling every 1.2 years. One survey states that 40 zettabytes of data will be created by 2020.

Not so long ago, the generation of such massive amount of data was considered to be a problem as the storage cost was very high. But now, as the storage cost is decreasing, it is no longer a problem. Also, solutions such as Hadoop and different algorithms that help in ingesting and processing this massive amount of data make it even appear resourceful.

The second characteristic of big data is velocity. Let's find out what this is.