data:image/s3,"s3://crabby-images/4812d/4812df4f4c989d958c17905d4b50787a57ba3370" alt="Mastering Apache Spark 2.x(Second Edition)"
上QQ阅读APP看书,第一时间看更新
Processing the text files
Using SparkContext, it is possible to load a text file in RDD using the textFile method. Additionally, the wholeTextFile method can read the contents of a directory to RDD. The following examples show you how a file, based on the local filesystem (file://) or HDFS (hdfs://), can be read to a Spark RDD. These examples show you that the data will be divided into six partitions for increased performance. The first two examples are the same as they both load a file from the Linux filesystem, whereas the last one resides in HDFS:
sc.textFile("/data/spark/tweets.txt",6)
sc.textFile("file:///data/spark/tweets.txt",6)
sc.textFile("hdfs://server1:4014/data/spark/tweets.txt",6)