MySQL for Python
上QQ阅读APP看书,第一时间看更新

Project: A command-line search utility

This chapter has been about querying MySQL from Python. As a project to finish it out, we will build a command-line search utility. Rather than ask the user for the search term, we will expect the user to state the term as an argument for the command-line invocation.

Note

With a bit more code for this project, we could create a GUI for this program. GUI programming increases the complexity of the code. How much more complex it gets depends on the library being used, but it is nonetheless unnecessary for what we need to illustrate in this project. Until we are certain that we have a database and can connect to it, it is best to keep it simple.

Now, it is true that we could simply take input and feed it through MySQL for Python as a generic SELECT * statement. The logic for this bare bones implementation has been illustrated previously to a great extent. We can create something a bit more sophisticated.

The following characteristics should apply to our search utility:

  • The user calls it from the command line (that is, shell prompt)
  • The search term is defined at the time of calling
  • If the -t flag is issued, the following term is the table to be used; default is to search all tables
  • If the -f flag is issued, the output is formatted by table
  • If the -o flag is issued, the output is written to the given file

To illustrate, calling the application searchme.py, one should be able to call the application as follows:

./searchme.py -t menu -f -o output.txt query

This should search the table menu from the database fish for the term query, format the output, and write it to a file called output.txt. You may need to nuance this statement depending on your operating system's requirements. For example, Windows users should not include ./ before the program call. For more help with calling Python on Windows, see the Python Windows FAQ at the following URL: http://www.python.org/doc/faq/windows/.

Preparing a database for searching

For this project, however, we will leave behind the fish database for the moment and use the world sample database available from MySQL. For users of Linux, Unix, and Mac OS X, download the database from the following URL:

http://downloads.mysql.com/docs/world.sql.gz

To unpack this archive, simply issue the appropriate unpacking command:

gunzip world.sql.gz

Windows users, or users without the utility gunzip, should use the ZIP file:

http://downloads.mysql.com/docs/world.sql.zip

Then use an archive manager to unpack the ZIP file.

Regardless of your platform, you should then be left with a file world.sql. From the directory in which that file resides, log into MySQL. You first need to create a database world and then import the file.

  1. To create the database, type the following at the prompt:
    CREATE world;
  2. Then tell MySQL to use that database:
    USE world;
  3. Import the file with the following MySQL command:
    SOURCE world.sql;

MySQL will then populate the database world with three tables of data: City, Country, and CountryLanguage.

Planning your work, then working your plan

All of the flags mentioned as characteristics previously are nothing more than added functionality to the core function of searching. Our first order of business should therefore be to create well-abstracted search functionality. Then we can build on it to allow for the functionality of each of the given flags. So our plan of development may be summarized as follows:

  • Develop a well-abstracted search functionality.
  • Implement specification of the search term from the command-line.
  • Implement and incorporate the functionality of each flag in turn: -t, -f, and –o.

Develop a well-abstracted search functionality

Abstraction is the secret of all computer programming. It is what enables computer programs to be run for more than one single task. In the early days of computing, programs were written with very narrow applications. This was usually due to limits in the technology involved. In the modern day, the obtuseness of some languages still keeps them from being applied beyond certain domains (for example, BIOS systems). Even then, languages such as Forth are so difficult to follow that they are largely viewed as write-once-and-forget-about-it. Fortunately, Python offers us significant flexibility, and this flexibility is carried through in MySQL for Python. This allows us to create the infrastructure of a search while allowing us to specify select parts of it.

In this project, we will specify the host, database, and user information for the database connection. The rest of the query information, however, will be open to the user. First, however, we begin our program with a shebang (line):

#!/usr/bin/env python

This is a Linux shebang line that calls whichever Python interpreter is set for general use in the environmental variables of the shell. If we want a specific interpreter, we can naturally edit this and call the other interpreter directly. This format will also work on newer versions of Mac OS. If you are on Windows, you will naturally have to modify this line according to the directory structure of your Windows installation. A common Windows shebang line is:

#!c:/Python/python.exe -u

Ensure that you do not forget the trailing -u flag. This puts Python into an unbuffered mode on Windows systems.

Next, we import MySQL for Python. Until we are ready to add more functionality beyond a hard-coded search query, we should hold off on importing more modules.

import MySQLdb

Now we are ready to create a database connection. You will recall from the first chapter that the format for creating a database connection is:

[variable] = MySQLdb.connect(host="[hostname]", 
                             user="[username]", 
                             passwd="[password]", 
                             db="[database name]")

For the world database, using user user and password p4ssw0rd on the localhost, the invocation for the connection is:

mydb = MySQLdb.connect(host = 'localhost', 
                       user = 'user', 
                       passwd = 'p4ssw0rd', 
                       db = 'world')

We then must create the cursor:

cur = mydb.cursor()

We are then ready to construct a query infrastructure. To ensure as much flexibility as possible, we simply pull all of the variables out of the MySQL SELECT syntax and define them separately.

table =     'City'
column = 'Name'
term = 's%'
statement = """select * from %s where %s like '%s'""" %(table, column, term

This hardwiring of the search query allows us to test the connection before coding the rest of the function. By defining the variables discretely, we make it easier to change them to user-determined variables later.

Note

There may be a tendency here to insert user-determined variables immediately. With experience, it is possible to do this. However, if there are any doubts about the availability of the database, your best fallback position is to keep it simple and hardwired. This reduces the number of variables in making a connection and helps one to blackbox the situation, making troubleshooting much easier.

With the query constructed, we can execute it and get the results:

command = cur.execute(statement)
results = cur.fetchall()

You can then test the connection with the following code:

record_list = []
for record in results:
 record_list.append(record[0])

for i in xrange(0, len(record_list)):
 print "%s. %s" %(i+1, record_list[i])

The logic of this code is discussed previously in this chapter so we will pass over it here as understood.

If you execute the program, the output should scroll off the screen and you should get a list of 431 cities. The last entry should read as follows:

431. Santa Monica

Knowing that we can interact with the database, we can now go to the next step, specifying the search term from the command-line.

Specifying the search term from the command-line

Python allows you to receive a command-line argument using the sys module. If the only argument expected on the command-line is the name of the query, we could get by with code like this:

import sys
...
query = sys.argv[1]

Alas, life is seldom so simple. If we were to follow this route of development, all the flags mentioned previously for this program would be mandatory and have to be submitted every time the program ran. So, for the sample call on page 24, we would have to program for six fixed arguments everytime the program is called. Recall that the command read:

./searchme.py -t menu -f -o output.txt query

The arguments for sys.argv that would be required are:

  • 0 the command itself, naturally
  • 1 the flag -t
  • 2 the table name
  • 3 the -f flag
  • 4 the flag -o
  • 5 the output file name
  • 6 the search string for the query

If we pulled all the flags and left only the arguments in a set order, we would still have three arguments. This makes the program cumbersome and makes calls to it error-prone.

By far, the better way forward is to use the module optparse. Importing optparse incorporates high-level support for processing arguments at runtime. Another alternative is the getopt module, but this is not as robust or as easy to use.

For consistency's sake, let's edit the preamble of our Python program to import the module.

import optparse

The optparse module provides an OptionParser class that handles both option processing and error handling. It also provides a standardized interface similar to other programs (for example, -h for help). All for the meager cost of a module import.

To access all this option parsing goodness, we need to instantiate an object of the OptionParser class.

opt = optparse.OptionParser()

Adding arguments then is simply a matter of adding options using the add_option() method of the object. For our purposes, we need to follow a straightforward syntactic formula:

object.add_option("-[short flag option]", "--[long flag option]", action="store", type="string", dest = "[variable name under which to store the option]"

Note

We will keep to this formula for the sake of simplicity even when other options might seem more logical.

The optparse module is a very robust kit for option parsing, and the full syntax of it is beyond the scope of this book. A fuller discussion can be found in the online help() function or the Python documentation.

So to add an option for the query, we include this:

opt.add_option("-q", "--query", action="store", type="string", dest="term")

After all the options are assigned in the code, we then have to compile them.

opt, args = opt.parse_args()

The parse_args() method returns a tuple. The first of the two values, opt, is an object containing the values of all the options passed to the program. The second value, args, is a list of any remaining arguments.

These last three lines should precede the MySQLdb call to connect to the database. You may then change the assignment of the variable term.

term = opt.term

Whatever value is given in the dest argument of the add_option() method becomes an attribute of the OptionParser object. Therefore, you could also delete the term assignment line and edit the statement value to reflect opt.term. However, this makes the code more difficult to read by someone else or even by yourself six months down the line. Remember the second line of the Zen of Python:

Explicit is better than implicit

With that code written, you should be able to call the program with a -q flag and set the query from the command-line. If the program is called searchme.py, a sample query would be:

./searchme.py -q 'dubai'

The results should be singular:

1. Dubai

Knowing that it is a MySQL database, you could also pass MySQL metacharacters.

./project-cp.py -q 'm%i'

and get multiple hits in return:

1. Mallawi
2. Makati
3. Marawi
4. Malasiqui
5. Mati
6. Madurai
7. Malkajgiri
8. Morvi
9. Miyazaki
10. Maebashi
11. Moriguchi
12. Manzhouli
13. Moroni
14. Mbuji-Mayi
15. Matadi
16. Mexicali
17. Maradi
18. Maiduguri
19. Makurdi
20. Miaoli
21. Moshi
22. Mytiti
23. Miami

Note that your search will complete and return the same regardless of whether you use -q or --query.

Implementing and incorporating the other functions: -t, -f, and -o

Now we can add the other options to the program. Like the -q flag for preceding query, we can use -t to assign values for the table.

opt.add_option("-t", "--table", action="store", type="string", dest="table")

Unless you want to support multiple formats for the output, the format flag should be a Boolean value.

opt.add_option("-f", "--format", action="store_true", dest="format")

In order to store a Boolean variable, we set the default action as either store_true or store_false. The former is for setting the value to True when the flag is present; the converse is true for the latter. Because of the type of action applied to the option, the Boolean type of the variable is understood by Python.

We then assign the value of opt.table to table and opt.format to the variable format.

For formatting, we then set up a conditional clause that runs if the format flag is present. Given that we are formatting the output on one level, we can also revise the code so that the default is raw. The following code snippet should follow from where the value of record[0] is appended to column_list, taking the place of that loop as well as the for loop discussed in the section Planning your work, then working your plan (earlier in this chapter).

column_list = []
for record in results:
        column_list.append(record[0:])

if form is True:
        columns_query = """DESCRIBE %s""" %(table)
        columns_command = cur.execute(columns_query)
        headers = cur.fetchall()
        column_list = []
        for record in headers:
                column_list.append(record[0])

        output=""
        for record in results:
                output = output + "========================\n\n"
                for field_no in xrange(0, len(column_list)):
                        output = output + column_list[field_no]+ ": " + str(record[field_no]) + "\n"
                output = output + "\n"


else:
        output=[]
        for record in xrange(0, len(results)):

                output.append(results[record])
        output = ''.join(output)

Note that the subscript for record in the third line has been broadened to include the rest of the record.

At this stage, you can append an output statement at the end of the program to see its results.

print output

We are not yet at the end for this program as we also need to include facility for writing the output to a file of the user's choice.

Including an option for an output file

At this point, including an option for an output file is simply a matter of inserting the option into the list at the beginning of the program and then testing for it once output is finally saved. To add the option to the list managed by optparse, we simply insert this line after the last option:

opt.add_option("-o", "--output", action="store", type="string", dest="outfile")

Note that the output variable is not a requirement in our program. Therefore, one cannot assign it as automatically as the other variables were assigned. Instead, one must test for it, prefixing the object handle to its attribute. If a value has been assigned, then write the file. Otherwise, default output is STDOUT.

if opt.outfile:
        outfile = opt.outfile
        out = open(outfile, w)
        out.write(output)

else:   
        print output