
上QQ阅读APP看书,第一时间看更新
RDDs, DataFrames, and datasets
So how does Spark store and partition data during its computational processing? Well, by default, Spark holds data in-memory, which helps to make it such a quick processing engine. In fact, as of Spark 2.0 and onward, there are three sets of APIs used to hold data— resilient distributed datasets (RDDs), DataFrames, and Datasets.