上QQ阅读APP看书，第一时间看更新

Big data – how does it make a difference?

We have established an understanding regarding traditional systems, such as BI, how they work, what their focused areas are, and where they are lagging in terms of the different characteristics of data. Let's now talk about big data solutions. Big data solutions are focused on combining all the data dimensions that were previously ignored or considered of minimum value, taking all the available sources and types into consideration and analyzing them for different and difficult-to-identify patterns.

Big data solutions are not just about the data itself or other characteristics of data; it is also about affordability, making it easier for organizations to store all of their data for analysis and in real time, if required. You may discover different insights and facts regarding your suppliers, customers, and other business competitors, or you may find the root cause of different issues and potential risks your organization might be faced with.

Big data comprises structured and unstructured datasets, which also eliminates the need for any other relational database management solutions, as they don't have the capability to store unstructured data or to analyze it.

Another aspect is that scaling up a server is also not a solution, no matter how powerful it might be; there will always be a hard limit for each resource type. These limits will undoubtedly move upward, but the rate of data increase will grow much faster. Most importantly, the cost of this high-end server and resources will be relatively high. Big data solutions comprise clustered computing mechanisms, which involve commodity hardware with no high-end servers or resources and can easily be scaled up and down. You can start with a few servers and can easily scale without any limits.

If we talk about just data itself, in big data solutions, data is replicated to multiple servers, commonly known as data nodes, based on the configurations done to make them fault tolerant. If any of the data nodes fail, the respective task will continue to run on the replica server where the copy of same data resides. This is handled by the big data solution without additional software development and operation. To keep the data intact, all the data copies need to be updated accordingly.

Distributed computing comprises commodity hardware, with reasonable storage and computation power, which is considered much less expensive compared to a dedicated processing server with powerful hardware. This led to extremely cost-effective solutions that enabled big data solutions to evolve, something that was not possible a couple of years ago.