Python Data Analysis(Second Edition)
上QQ阅读APP看书,第一时间看更新

NumPy numerical types

Python has an integer type, a float type, and complex type; nonetheless, this is not sufficient for scientific calculations. In practice, we still demand more data types with varying precisions and, consequently, different storage sizes of the type. For this reason, NumPy has many more data types. The bulk of the NumPy mathematical types end with a number. This number designates the count of bits related to the type. The following table (adapted from the NumPy user guide) presents an overview of NumPy numerical types:

For each data type, there exists a matching conversion function:

In: np.float64(42) 
Out: 42.0 
In: np.int8(42.0) 
Out: 42 
In: np.bool(42) 
Out: True 
In: np.bool(0) 
Out: False 
In: np.bool(42.0) 
Out: True 
In: np.float(True) 
Out: 1.0 
In: np.float(False) 
Out: 0.0 

Many functions have a data type argument, which is frequently optional:

In: np.arange(7, dtype= np.uint16) 
Out: array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)

It is important to be aware that you are not allowed to change a complex number into an integer. Attempting to do that sparks off TypeError:

In: np.int(42.0 + 1.j)Traceback (most recent call last):
<ipython-input-24-5c1cd108488d> in <module>()
----> 1 np.int(42.0 + 1.j)
TypeError: can't convert complex to int

The same goes for the conversion of a complex number into a floating point number. By the way, the j component is the imaginary coefficient of a complex number. Even so, you can convert a floating point number to a complex number; for example, complex(1.0). The real and imaginary pieces of a complex number can be pulled out with the real() and imag() functions, respectively.

Data type objects

Data type objects are instances of the numpy.dtype class. Once again, arrays have a data type. To be exact, each element in a NumPy array has the same data type. The data type object can tell you the size of the data in bytes. The size in bytes is given by the itemsize property of the dtype class:

In: a.dtype.itemsize 
Out: 8 

Character codes

Character codes are included for backward compatibility with numeric. Numeric is the predecessor of NumPy. Its use is not recommended, but the code is supplied here because it pops up in various locations. You should use the dtype object instead. The following table lists several different data types and character codes related to them:

Take a look at the following code to produce an array of single precision floats:

In: arange(7, dtype='f') 
Out: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.], dtype=float32) 

Likewise, the following code creates an array of complex numbers:

In: arange(7, dtype='D') 
Out: array([ 0.+0.j,  1.+0.j,  2.+0.j,  3.+0.j,  4.+0.j,  5.+0.j,   6.+0.j]) 

The dtype constructors

We have a variety of means to create data types. Take the case of floating point data (have a look at dtypeconstructors.py in this book's code bundle):

  • We can use the general Python float, as shown in the following lines of code:
            In: np.dtype(float) 
            Out: dtype('float64') 
    
  • We can specify a single precision float with a character code:
            In: np.dtype('f') 
            Out: dtype('float32') 
    
  • We can use a double precision float with a character code:
            In: np.dtype('d') 
            Out: dtype('float64') 
    
  • We can pass the dtype constructor a two character code. The first character stands for the type; the second character is a number specifying the number of bytes in the type (the numbers 2, 4, and 8 correspond to floats of 16, 32, and 64 bits, respectively):
            In: np.dtype('f8') 
            Out: dtype('float64') 
    

    A (truncated) list of all the full data type codes can be found by applying sctypeDict.keys():

    In: np.sctypeDict.keys() 
    Out: dict_keys(['?', 0, 'byte', 'b', 1, 'ubyte', 'B', 2, 'short', 'h', 3, 'ushort', 'H', 4, 'i', 5, 'uint', 'I', 6, 'intp', 'p', 7, 'uintp', 'P', 8, 'long', 'l', 'L', 'longlong', 'q', 9, 'ulonglong', 'Q', 10, 'half', 'e', 23, 'f', 11, 'double', 'd', 12, 'longdouble', 'g', 13, 'cfloat', 'F', 14, 'cdouble', 'D', 15, 'clongdouble', 'G', 16, 'O', 17, 'S', 18, 'unicode', 'U', 19, 'void', 'V', 20, 'M', 21, 'm', 22, 'bool8', 'Bool', 'b1', 'float16', 'Float16', 'f2', 'float32', 'Float32', 'f4', 'float64', 'Float64', 'f8', 'float128', 'Float128', 'f16', 'complex64', 'Complex32', 'c8', 'complex128', 'Complex64', 'c16', 'complex256', 'Complex128', 'c32', 'object0', 'Object0', 'bytes0', 'Bytes0', 'str0', 'Str0', 'void0', 'Void0', 'datetime64', 'Datetime64', 'M8', 'timedelta64', 'Timedelta64', 'm8', 'int64', 'uint64', 'Int64', 'UInt64', 'i8', 'u8', 'int32', 'uint32', 'Int32', 'UInt32', 'i4', 'u4', 'int16', 'uint16', 'Int16', 'UInt16', 'i2', 'u2', 'int8', 'uint8', 'Int8', 'UInt8', 'i1', 'u1', 'complex_', 'int0', 'uint0', 'single', 'csingle', 'singlecomplex', 'float_', 'intc', 'uintc', 'int_', 'longfloat', 'clongfloat', 'longcomplex', 'bool_', 'unicode_', 'object_', 'bytes_', 'str_', 'string_', 'int', 'float', 'complex', 'bool', 'object', 'str', 'bytes', 'a']) 
    

The dtype attributes

The dtype class has a number of useful properties. For instance, we can get information about the character code of a data type through the properties of dtype:

In: t = np.dtype('Float64') 
In: t.char 
Out: 'd' 

The type attribute corresponds to the type of object of the array elements:

In: t.type 
Out: numpy.float64 

The str attribute of dtype gives a string representation of a data type. It begins with a character representing endianness, if appropriate, then a character code, succeeded by a number corresponding to the number of bytes that each array item needs. Endianness, here, entails the way bytes are ordered inside a 32 or 64 bit word. In the big endian order, the most significant byte is stored first, indicated by >. In the little endian order, the least significant byte is stored first, indicated by <, as exemplified in the following lines of code:

In: t.str 
Out: '<f8'