Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

Getting ready

We will read a file named unicode.html from our local web server, located at http://localhost:8080/unicode.html.  This file is UTF-8 encoded and contains several sets of characters in different parts of the encoding space. For example, the page looks as follows in your browser:

The Page in the Browser

Using an editor that supports UTF-8, we can see how the Cyrillic characters are rendered in the editor:

The HTML in an Editor

Code for the sample is in 02/06_unicode.py.