Java Data Analysis
上QQ阅读APP看书,第一时间看更新

Relational database tables

In a relational database, we think of each dataset as a table, with each data point being a row in the table. The dataset's signature defines the columns of the table.

Here is an example of a relational database table. It has four rows and five columns, representing a dataset of four data points with five fields:

Because a database table is really a set of rows, the order of the rows is irrelevant, just as the order of the data points in any dataset is irrelevant. For the same reason, a database table may not contain duplicate rows and a dataset may not contain duplicate data points.

Key fields

A dataset may specify that all values of a designated field be unique. Such a field is called a key field for the dataset. In the preceding example, the ID number field could serve as a key field.

When specifying a key field for a dataset (or a database table), the designer must anticipate what kind of data the dataset might hold. For example, the First name field in the preceding table would be a bad choice for a key field, because many people have the same first name.

Key values are used for searching. For example, if we want to know how old Paul Jones is, we would first find the data point (that is, row) whose ID is 602588410 and then check the age value for that point.

A dataset may designate a subset of fields, instead of a single field, as its key. For example, in a dataset of geographical data, we might designate the Longitude and Latitude fields together as forming the key.

Key-value pairs

A dataset which has specified a subset of fields (or a single field) as the key is often thought of as a set of key-value pairs (KVP). From this point of view, each data point has two parts: its key and its value for that key. (The term attribute-value pairs is also used.)

In the preceding example, the key is ID and the value is Last name, First name, Sex, Age.

In the previously mentioned geographical dataset, the key could be Longitude, Latitude and the value could be Altitude, Average Temperature, Average, and Precipitation.

We sometimes think of a key-value pairs dataset as an input-output structure, with the key fields as input and the value fields as output. For example, in the geographical dataset, given a longitude and a latitude, the dataset returns the corresponding altitude, average temperature, and average precipitation.