Welcome to PnetCDF Python’s documentation!
Overview
PnetCDF-python is a Python interface to PnetCDF, a high-performance parallel I/O library for accessing netCDF files. This integration with Python allows for easy manipulation, analysis, and visualization of netCDF data using the rich ecosystem of Python’s scientific computing libraries, making it a valuable tool for python-based applications that require high-performance access to netCDF files.
Install from Source
Software Requirements
PnetCDF C library
Python 3.9 or later
Python libraries: numpy, mpi4py
Python libraries: Cython, setuptools (optional, for building from source)
Building PnetCDF C library
# download PnetCDF C library v1.12.3 (or later)
$ wget https://parallel-netcdf.github.io/Release/pnetcdf-1.12.3.tar.gz
$ tar -xf pnetcdf-1.12.3.tar.gz
$ cd pnetcdf-1.12.3
# configure
$ ./configure --prefix=/path/to/install-dir --enable-shared CC=mpicc
# build and install
$ make
$ make install
Building PnetCDF-python from source
# activate an virtual environment (optional)
# use Python 3.9 or later
$ python3 -m venv env
$ source env/bin/activate
$ pip install --upgrade pip
# install Python libraries
$ pip install numpy Cython setuptools
$ env CC=mpicc pip install --no-cache-dir mpi4py
# download PnetCDF-python source code
$ git clone git@github.com:yzanhua/pnetcdf-python.git
$ cd pnetcdf-python
# install PnetCDF-python
env CC=mpicc python3 setup.py build
env CC=mpicc python3 setup.py install
File
pncpy.File
is a high-level object representing an netCDF file,
which provides a Pythonic interface to create,
read and write within an netCDF file. A File object serves as
the root container for dimensions, variables, and attributes.
Together they describe the meaning of data and relations among
data fields stored in a netCDF file.
- class pncpy.File
Bases:
object
- __init__(self, filename, format='64BIT_OFFSET', mode="w", Comm comm=None, Info info=None, **kwargs)
The constructor for
pncpy.File
.- Parameters:
filename (str) – Name of the new file.
mode (str) –
Access mode.
r
: Opens a file for reading, error if the file does not exist.w
: Opens a file for writing, creates the file if it does not exist.x
: Creates the file, returns an error if the file exists.a
andr+
: append, creates the file if it does not exist.
format (str) –
underlying file format. Only relevant if
mode
isw
orx
.64BIT_OFFSET
: NetCDF-2 format.64BIT_DATA
: NetCDF-5 format.
comm (mpi4py.MPI.Comm or None) – MPI communicator to use for file access. None defaults to MPI_COMM_WORLD.
info (mpi4py.MPI.Info or None) – MPI info object to use for file access. None defaults to MPI_INFO_NULL.
- defineDim()
`defineDim(self, dimname, size=-1)` Creates a new dimension with the given dimname and size. size must be a positive integer or -1, which stands for “unlimited” (default is -1). Specifying a size of 0 also results in an unlimited dimension. The return value is the Dimension class instance describing the new dimension. To determine the current maximum size of the dimension, use the len function on the Dimension instance. To determine if a dimension is ‘unlimited’, use the Dimension.isunlimited method of the Dimension instance.
- defineVar(self, varname, nc_dtype, dimensions=(), fill_value=None)
Create a new variable with the given parameters.
- Parameters:
varname (str) – Name of the new variable.
nc_dtype (str or numpy.dtype) –
The datatype of the new variable. Supported string specifiers are:
S1
orc
for NC_CHARi1
orb
orB
for NC_BYTEu1
for NC_UBYTEi2
orh
ors
for NC_SHORTu2
for NC_USHORTi4
ori
orl
for NC_INTu4
for NC_UINTi8
for NC_INT64u8
for NC_UINT64f4
orf
for NC_FLOATf8
ord
for NC_DOUBLE
dimensions (tuple of str or
pncpy.Dimension
instances) – The dimensions of the new variable. Empty tuple suggests a scalar.fill_value –
The fill value of the new variable. Accepted values are:
None
: use the default fill value for the given datatypeFalse
: fill mode is turned offany other value: use the given value as fill value
- Returns:
The created variable.
- Return type:
- delncattr()
`delncattr(self,name,value)`
delete a netCDF file attribute. Use if you need to delete a netCDF attribute with the same name as one of the reserved python attributes.
- filepath()
`filepath(self,encoding=None)`
Get the file system path (or the opendap URL) which was used to open/create the Dataset. Requires netcdf >= 4.1.2. The path is decoded into a string using sys.getfilesystemencoding() by default, this can be changed using the encoding kwarg.
- getncattr()
`getncattr(self,name)`
retrieve a netCDF dataset or group attribute. Use if you need to get a netCDF attribute with the same name as one of the reserved python attributes.
option kwarg encoding can be used to specify the character encoding of a string attribute (default is utf-8).
- ncattrs()
`ncattrs(self)`
return netCDF attribute names for this File in a list.
- renameAttribute()
`renameAttribute(self, oldname, newname)`
rename a File attribute named oldname to newname.
- setncattr()
`setncattr(self,name,value)`
set a netCDF file attribute using name,value pair. Use if you need to set a netCDF attribute with the with the same name as one of the reserved python attributes.
- sync()
`sync(self)`
Writes all buffered data in the File to the disk file.
Attribute
In the library, netCDF attributes can be created, accessed, and manipulated
using python dictionary-like syntax. A Pythonic interface for metadata operations
is provided both in the File
class (for global attributes) and the
Variable
class (for variable attributes).
Dimension
Dimension defines the shape and structure of variables and stores
coordinate data for multidimensional arrays. The Dimension
object,
which is also a key component of File
class, provides an interface
to create, access and manipulate dimensions.
- class pncpy.Dimension
Bases:
object
- __init__(*args, **kwargs)
- getfile()
`file(self)`
return the file that this Dimension is a member of.
- isunlimited()
`isunlimited(self)`
returns True if the Dimension instance is unlimited, False otherwise.
- name
string name of Dimension instance
- size
current size of Dimension (calls len on Dimension instance)
Variable
Variable is a core component of a netCDF file representing an array
of data values organized along one or more dimensions, with associated
metadata in the form of attributes. The Variable
object in the library
provides operations to read and write the data and metadata of a variable
within a netCDF file. Particularly, data mode operations have a flexible
interface, where reads and writes can be done through either explicit
function-call style methods or indexer-style (numpy-like) syntax.
- class pncpy.Variable
Bases:
object
A PnetCDF Variable is used to read and write netCDF data. They are analogous to numpy array objects. See Variable.__init__ for more details.
A list of attribute names corresponding to netCDF attributes defined for the variable can be obtained with the Variable.ncattrs method. These attributes can be created by assigning to an attribute of the Variable instance. A dictionary containing all the netCDF attribute name/value pairs is provided by the __dict__ attribute of a Variable instance.
The following class variables are read-only:
`dimensions`: A tuple containing the names of the dimensions associated with this variable.
`dtype`: A numpy dtype object describing the variable’s data type.
`ndim`: The number of variable dimensions.
`shape`: A tuple with the current shape (length of all dimensions).
`scale`: If True, scale_factor and add_offset are applied, and signed integer data is automatically converted to unsigned integer data if the _Unsigned attribute is set. Default is True, can be reset using Variable.set_auto_scale and Variable.set_auto_maskandscale methods.
`mask`: If True, data is automatically converted to/from masked arrays when missing values or fill values are present. Default is True, can be reset using Variable.set_auto_mask and Variable.set_auto_maskandscale methods. Only relevant for Variables with primitive or enum types (ignored for compound and vlen Variables).
`chartostring`: If True, data is automatically converted to/from character arrays to string arrays when the _Encoding variable attribute is set. Default is True, can be reset using Variable.set_auto_chartostring method.
`least_significant_digit`: Describes the power of ten of the smallest decimal place in the data the contains a reliable value. Data is truncated to this decimal place when it is assigned to the Variable instance. If None, the data is not truncated.
`__orthogonal_indexing__`: Always True. Indicates to client code that the object supports ‘orthogonal indexing’, which means that slices that are 1d arrays or lists slice along each dimension independently. This behavior is similar to Fortran or Matlab, but different than np.
`datatype`: numpy data type (for primitive data types) or VLType/CompoundType instance (for compound or vlen data types).
`name`: String name.
`size`: The number of stored elements.
- __init__()
`__init__(self, file, name, datatype, dimensions=(), endian=’native’, least_significant_digit=None, significant_digits=None, fill_value=None, **kwargs)`
Variable constructor.
`group`: Group or Dataset instance to associate with variable.
`name`: Name of the variable.
`datatype`: Variable data type. Can be specified by providing a numpy dtype object, or a string that describes a numpy dtype object. Supported values, corresponding to str attribute of numpy dtype objects, include ‘f4’ (32-bit floating point), ‘f8’ (64-bit floating point), ‘i4’ (32-bit signed integer), ‘i2’ (16-bit signed integer), ‘i8’ (64-bit signed integer), ‘i4’ (8-bit signed integer), ‘i1’ (8-bit signed integer), ‘u1’ (8-bit unsigned integer), ‘u2’ (16-bit unsigned integer), ‘u4’ (32-bit unsigned integer), ‘u8’ (64-bit unsigned integer), or ‘S1’ (single-character string).
`dimensions`: a tuple containing the variable’s Dimension instances (defined previously with defineDim). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions).
`least_significant_digit`: If this or significant_digits are specified, variable data will be truncated (quantized). In conjunction with compression=’zlib’ this produces ‘lossy’, but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.
`significant_digits`: New in version 1.6.0. As described for least_significant_digit except the number of significant digits retained is prescribed independent of the floating point exponent. Default None - no quantization done.
`fill_value`: If specified, the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it) is replaced with this value. If fill_value is set to False, then the variable is not pre-filled.
*Note*: Variable instances should be created using the File.defineVar method of a File instance, not using this class directly.
- assignValue()
`assignValue(self, val)`
assign a value to a scalar variable. Provided for compatibility with Scientific.IO.NetCDF, can also be done by assigning to an Ellipsis slice ([…]).
- datatype
numpy data type
- delncattr()
`delncattr(self,name,value)`
delete a netCDF variable attribute. Use if you need to delete a netCDF attribute with the same name as one of the reserved python attributes.
- dimensions
get variables’s dimension names
- getValue()
`getValue(self)`
get the value of a scalar variable. Provided for compatibility with Scientific.IO.NetCDF, can also be done by slicing with an Ellipsis ([…]).
- get_dims()
`get_dims(self)`
return a tuple of Dimension instances associated with this Variable.
- getncattr()
`getncattr(self,name)`
retrieve a netCDF variable attribute. Use if you need to set a netCDF attribute with the same name as one of the reserved python attributes.
option kwarg encoding can be used to specify the character encoding of a string attribute (default is utf-8).
- name
string name of Variable instance
- ncattrs()
`ncattrs(self)`
return netCDF attribute names for this Variable in a list.
- renameAttribute()
`renameAttribute(self, oldname, newname)`
rename a Variable attribute named oldname to newname.
- set_auto_chartostring()
`set_auto_chartostring(self,chartostring)`
turn on or off automatic conversion of character variable data to and from numpy fixed length string arrays when the _Encoding variable attribute is set.
If chartostring is set to True, when data is read from a character variable (dtype = S1) that has an _Encoding attribute, it is converted to a numpy fixed length unicode string array (dtype = UN, where N is the length of the the rightmost dimension of the variable). The value of _Encoding is the unicode encoding that is used to decode the bytes into strings.
When numpy string data is written to a variable it is converted back to indiviual bytes, with the number of bytes in each string equalling the rightmost dimension of the variable.
The default value of chartostring is True (automatic conversions are performed).
- set_auto_mask()
`set_auto_mask(self,mask)`
turn on or off automatic conversion of variable data to and from masked arrays .
If mask is set to True, when data is read from a variable it is converted to a masked array if any of the values are exactly equal to the either the netCDF _FillValue or the value specified by the missing_value variable attribute. The fill_value of the masked array is set to the missing_value attribute (if it exists), otherwise the netCDF _FillValue attribute (which has a default value for each data type). If the variable has no missing_value attribute, the _FillValue is used instead. If the variable has valid_min/valid_max and missing_value attributes, data outside the specified range will be masked. When data is written to a variable, the masked array is converted back to a regular numpy array by replacing all the masked values by the missing_value attribute of the variable (if it exists). If the variable has no missing_value attribute, the _FillValue is used instead.
The default value of mask is True (automatic conversions are performed).
- set_auto_scale()
`set_auto_scale(self,scale)`
turn on or off automatic packing/unpacking of variable data using scale_factor and add_offset attributes. Also turns on and off automatic conversion of signed integer data to unsigned integer data if the variable has an _Unsigned attribute.
If scale is set to True, and the variable has a scale_factor or an add_offset attribute, then data read from that variable is unpacked using:
data = self.scale_factor*data + self.add_offset
When data is written to a variable it is packed using:
data = (data - self.add_offset)/self.scale_factor
If either scale_factor is present, but add_offset is missing, add_offset is assumed zero. If add_offset is present, but scale_factor is missing, scale_factor is assumed to be one. For more information on how scale_factor and add_offset can be used to provide simple compression, see the [PSL metadata conventions](http://www.esrl.noaa.gov/psl/data/gridded/conventions/cdc_netcdf_standard.shtml).
In addition, if scale is set to True, and if the variable has an attribute _Unsigned set, and the variable has a signed integer data type, a view to the data is returned with the corresponding unsigned integer datatype. This convention is used by the netcdf-java library to save unsigned integer data in NETCDF3 or NETCDF4_CLASSIC files (since the NETCDF3 data model does not have unsigned integer data types).
The default value of scale is True (automatic conversions are performed).
- setncattr()
`setncattr(self,name,value)`
set a netCDF variable attribute using name,value pair. Use if you need to set a netCDF attribute with the same name as one of the reserved python attributes.
- setncatts()
`setncatts(self,attdict)`
set a bunch of netCDF variable attributes at once using a python dictionary. This may be faster when setting a lot of attributes for a NETCDF3 formatted file, since nc_redef/nc_enddef is not called in between setting each attribute
- shape
find current sizes of all variable dimensions
- size
Return the number of stored elements.
Compatibility with C
The Following table contains list of PnetCDF-python’s compatibility with PnetCDF-C APIs.
Component
|
Implemented
|
To be implemented (priority [1])
|
---|---|---|
File API
|
ncmpi_strerror
ncmpi_strerrno
ncmpi_create
ncmpi_open/close
ncmpi_enddef/redef
ncmpi_sync
ncmpi_begin/end_indep_data
ncmpi_inq_path
ncmpi_inq
ncmpi_wait
ncmpi_wait_all
|
ncmpi_inq_libvers (3)
ncmpi_set_fill (3)
ncmpi_set_default_format (3)
ncmpi_inq_put/get_size (3)
ncmpi_delete (2)
ncmpi_sync_numrecs (2)
ncmpi_inq_file_info (3)
ncmpi_inq_files_opened (3)
|
Dimension API
|
ncmpi_def_dim
ncmpi_inq_ndims
ncmpi_inq_dimlen
ncmpi_inq_dim
ncmpi_inq_dimname
|
|
Attribute API
|
ncmpi_put/get_att_text
ncmpi_put/get_att
ncmpi_inq_att
ncmpi_inq_natts
ncmpi_inq_attname
ncmpi_rename_att
ncmpi_del_att
|
|
Variable API
|
ncmpi_def_var
ncmpi_def_var_fill
ncmpi_inq_varndims
ncmpi_inq_varname
ncmpi_put/get_vara
ncmpi_put/get_vars
ncmpi_put/get_var1
ncmpi_put/get_var
ncmpi_put/get_varn
ncmpi_put/get_varm
ncmpi_put/get_vara_all
ncmpi_put/get_vars_all
ncmpi_put/get_var1_all
ncmpi_put/get_var_all
ncmpi_put/get_varn_all
ncmpi_put/get_varm_all
ncmpi_iput/iget_var
ncmpi_iput/iget_vara
|
ncmpi_iput/iget_var1 (1)
ncmpi_iput/iget_vars (1)
ncmpi_iput/iget_varm (1)
ncmpi_iput/iget_varn (2)
ncmpi_bput/bget_var (1)
ncmpi_bput/bget_var1 (1)
ncmpi_bput/bget_vara (1)
ncmpi_bput/bget_vars (1)
ncmpi_bput/bget_varm (1)
ncmpi_bput/bget_varn (2)
ncmpi_wait/wait_all (1)
ncmpi_inq_nreqs (1)
ncmpi_inq_buffer_usage/size (1)
ncmpi_cancel (1)
ncmpi_fill_var_rec (2)
|