Frank Kane's Taming Big Data with Apache Spark and Python
上QQ阅读APP看书,第一时间看更新

Sort and display the results

The code we use for sorting and displaying our results is all just straight-up Python code, there's nothing Sparkish about it:

sortedResults = collections.OrderedDict(sorted(result.items())) 
for key, value in sortedResults.items(): 
    print("%s %i" % (key, value)) 

All it's doing is using the collections package from Python to create an ordered dictionary that sorts those results based on the key, which is the actual rating itself. Then it iterates through every key/value pair of those results and prints them out one at a time. So the output of this bit of code is going to be this:

Rating 1 occurred twice, rating 2 occurred once, and rating 3 occurred twice, that's all that's going on here. So we're taking those original key/value pairs that were returned by countByValue here:

result = ratings.countByValue() 

We are sorting them just using this boilerplate Python code:

sortedResults = collections.OrderedDict(sorted(result.items())) 
for key, value in sortedResults.items(): 
    print("%s %i" % (key, value)) 

I realize you may be new to Python, but if you look at this, it's pretty self-explanatory. As you go into more exercises in the future, you can use a lot of these examples that I'm giving you to actually create what you need and piece together the pieces that you need to actually construct your own program. So keep these snippets of code in your toolbox for future use.

There we have it, it's just that easy! We only had a handful of lines of code. Let's actually look at it as a whole again and take a look at that program all in one piece within Canopy. We'll run it one more time just to reiterate what's going on.