Welcome to PnetCDF Python’s documentation!

Overview

PnetCDF-python is a Python interface to PnetCDF, a high-performance parallel I/O library for accessing netCDF files. This integration with Python allows for easy manipulation, analysis, and visualization of netCDF data using the rich ecosystem of Python’s scientific computing libraries, making it a valuable tool for python-based applications that require high-performance access to netCDF files.

Install from Source

Software Requirements

  • PnetCDF C library

  • Python 3.9 or later

  • Python libraries: numpy, mpi4py

  • Python libraries: Cython, setuptools (optional, for building from source)

Building PnetCDF C library

# download PnetCDF C library v1.12.3 (or later)
$ wget https://parallel-netcdf.github.io/Release/pnetcdf-1.12.3.tar.gz

$ tar -xf pnetcdf-1.12.3.tar.gz
$ cd pnetcdf-1.12.3

# configure
$ ./configure --prefix=/path/to/install-dir --enable-shared CC=mpicc

# build and install
$ make
$ make install

Building PnetCDF-python from source

# activate an virtual environment (optional)
# use Python 3.9 or later
$ python3 -m venv env
$ source env/bin/activate
$ pip install --upgrade pip

# install Python libraries
$ pip install numpy Cython setuptools
$ env CC=mpicc pip install --no-cache-dir mpi4py

# download PnetCDF-python source code
$ git clone git@github.com:yzanhua/pnetcdf-python.git
$ cd pnetcdf-python

# install PnetCDF-python
env CC=mpicc python3 setup.py build
env CC=mpicc python3 setup.py install

File

pncpy.File is a high-level object representing an netCDF file, which provides a Pythonic interface to create, read and write within an netCDF file. A File object serves as the root container for dimensions, variables, and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file.

class pncpy.File

Bases: object

__init__(self, filename, format='64BIT_OFFSET', mode="w", Comm comm=None, Info info=None, **kwargs)

The constructor for pncpy.File.

Parameters:
  • filename (str) – Name of the new file.

  • mode (str) –

    Access mode.

    • r: Opens a file for reading, error if the file does not exist.

    • w: Opens a file for writing, creates the file if it does not exist.

    • x: Creates the file, returns an error if the file exists.

    • a and r+: append, creates the file if it does not exist.

  • format (str) –

    underlying file format. Only relevant if mode is w or x.

    • 64BIT_OFFSET: NetCDF-2 format.

    • 64BIT_DATA: NetCDF-5 format.

  • comm (mpi4py.MPI.Comm or None) – MPI communicator to use for file access. None defaults to MPI_COMM_WORLD.

  • info (mpi4py.MPI.Info or None) – MPI info object to use for file access. None defaults to MPI_INFO_NULL.

defineDim()

`defineDim(self, dimname, size=-1)` Creates a new dimension with the given dimname and size. size must be a positive integer or -1, which stands for “unlimited” (default is -1). Specifying a size of 0 also results in an unlimited dimension. The return value is the Dimension class instance describing the new dimension. To determine the current maximum size of the dimension, use the len function on the Dimension instance. To determine if a dimension is ‘unlimited’, use the Dimension.isunlimited method of the Dimension instance.

defineVar(self, varname, nc_dtype, dimensions=(), fill_value=None)

Create a new variable with the given parameters.

Parameters:
  • varname (str) – Name of the new variable.

  • nc_dtype (str or numpy.dtype) –

    The datatype of the new variable. Supported string specifiers are:

    • S1 or c for NC_CHAR

    • i1 or b or B for NC_BYTE

    • u1 for NC_UBYTE

    • i2 or h or s for NC_SHORT

    • u2 for NC_USHORT

    • i4 or i or l for NC_INT

    • u4 for NC_UINT

    • i8 for NC_INT64

    • u8 for NC_UINT64

    • f4 or f for NC_FLOAT

    • f8 or d for NC_DOUBLE

  • dimensions (tuple of str or pncpy.Dimension instances) – The dimensions of the new variable. Empty tuple suggests a scalar.

  • fill_value

    The fill value of the new variable. Accepted values are:

    • None: use the default fill value for the given datatype

    • False: fill mode is turned off

    • any other value: use the given value as fill value

Returns:

The created variable.

Return type:

pncpy.Variable

delncattr()

`delncattr(self,name,value)`

delete a netCDF file attribute. Use if you need to delete a netCDF attribute with the same name as one of the reserved python attributes.

filepath()

`filepath(self,encoding=None)`

Get the file system path (or the opendap URL) which was used to open/create the Dataset. Requires netcdf >= 4.1.2. The path is decoded into a string using sys.getfilesystemencoding() by default, this can be changed using the encoding kwarg.

getncattr()

`getncattr(self,name)`

retrieve a netCDF dataset or group attribute. Use if you need to get a netCDF attribute with the same name as one of the reserved python attributes.

option kwarg encoding can be used to specify the character encoding of a string attribute (default is utf-8).

ncattrs()

`ncattrs(self)`

return netCDF attribute names for this File in a list.

renameAttribute()

`renameAttribute(self, oldname, newname)`

rename a File attribute named oldname to newname.

setncattr()

`setncattr(self,name,value)`

set a netCDF file attribute using name,value pair. Use if you need to set a netCDF attribute with the with the same name as one of the reserved python attributes.

sync()

`sync(self)`

Writes all buffered data in the File to the disk file.

Attribute

In the library, netCDF attributes can be created, accessed, and manipulated using python dictionary-like syntax. A Pythonic interface for metadata operations is provided both in the File class (for global attributes) and the Variable class (for variable attributes).

Dimension

Dimension defines the shape and structure of variables and stores coordinate data for multidimensional arrays. The Dimension object, which is also a key component of File class, provides an interface to create, access and manipulate dimensions.

class pncpy.Dimension

Bases: object

__init__(*args, **kwargs)
getfile()

`file(self)`

return the file that this Dimension is a member of.

isunlimited()

`isunlimited(self)`

returns True if the Dimension instance is unlimited, False otherwise.

name

string name of Dimension instance

size

current size of Dimension (calls len on Dimension instance)

Variable

Variable is a core component of a netCDF file representing an array of data values organized along one or more dimensions, with associated metadata in the form of attributes. The Variable object in the library provides operations to read and write the data and metadata of a variable within a netCDF file. Particularly, data mode operations have a flexible interface, where reads and writes can be done through either explicit function-call style methods or indexer-style (numpy-like) syntax.

class pncpy.Variable

Bases: object

A PnetCDF Variable is used to read and write netCDF data. They are analogous to numpy array objects. See Variable.__init__ for more details.

A list of attribute names corresponding to netCDF attributes defined for the variable can be obtained with the Variable.ncattrs method. These attributes can be created by assigning to an attribute of the Variable instance. A dictionary containing all the netCDF attribute name/value pairs is provided by the __dict__ attribute of a Variable instance.

The following class variables are read-only:

`dimensions`: A tuple containing the names of the dimensions associated with this variable.

`dtype`: A numpy dtype object describing the variable’s data type.

`ndim`: The number of variable dimensions.

`shape`: A tuple with the current shape (length of all dimensions).

`scale`: If True, scale_factor and add_offset are applied, and signed integer data is automatically converted to unsigned integer data if the _Unsigned attribute is set. Default is True, can be reset using Variable.set_auto_scale and Variable.set_auto_maskandscale methods.

`mask`: If True, data is automatically converted to/from masked arrays when missing values or fill values are present. Default is True, can be reset using Variable.set_auto_mask and Variable.set_auto_maskandscale methods. Only relevant for Variables with primitive or enum types (ignored for compound and vlen Variables).

`chartostring`: If True, data is automatically converted to/from character arrays to string arrays when the _Encoding variable attribute is set. Default is True, can be reset using Variable.set_auto_chartostring method.

`least_significant_digit`: Describes the power of ten of the smallest decimal place in the data the contains a reliable value. Data is truncated to this decimal place when it is assigned to the Variable instance. If None, the data is not truncated.

`__orthogonal_indexing__`: Always True. Indicates to client code that the object supports ‘orthogonal indexing’, which means that slices that are 1d arrays or lists slice along each dimension independently. This behavior is similar to Fortran or Matlab, but different than np.

`datatype`: numpy data type (for primitive data types) or VLType/CompoundType instance (for compound or vlen data types).

`name`: String name.

`size`: The number of stored elements.

__init__()

`__init__(self, file, name, datatype, dimensions=(), endian=’native’, least_significant_digit=None, significant_digits=None, fill_value=None, **kwargs)`

Variable constructor.

`group`: Group or Dataset instance to associate with variable.

`name`: Name of the variable.

`datatype`: Variable data type. Can be specified by providing a numpy dtype object, or a string that describes a numpy dtype object. Supported values, corresponding to str attribute of numpy dtype objects, include ‘f4’ (32-bit floating point), ‘f8’ (64-bit floating point), ‘i4’ (32-bit signed integer), ‘i2’ (16-bit signed integer), ‘i8’ (64-bit signed integer), ‘i4’ (8-bit signed integer), ‘i1’ (8-bit signed integer), ‘u1’ (8-bit unsigned integer), ‘u2’ (16-bit unsigned integer), ‘u4’ (32-bit unsigned integer), ‘u8’ (64-bit unsigned integer), or ‘S1’ (single-character string).

`dimensions`: a tuple containing the variable’s Dimension instances (defined previously with defineDim). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions).

`least_significant_digit`: If this or significant_digits are specified, variable data will be truncated (quantized). In conjunction with compression=’zlib’ this produces ‘lossy’, but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.

`significant_digits`: New in version 1.6.0. As described for least_significant_digit except the number of significant digits retained is prescribed independent of the floating point exponent. Default None - no quantization done.

`fill_value`: If specified, the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it) is replaced with this value. If fill_value is set to False, then the variable is not pre-filled.

*Note*: Variable instances should be created using the File.defineVar method of a File instance, not using this class directly.

assignValue()

`assignValue(self, val)`

assign a value to a scalar variable. Provided for compatibility with Scientific.IO.NetCDF, can also be done by assigning to an Ellipsis slice ([…]).

datatype

numpy data type

delncattr()

`delncattr(self,name,value)`

delete a netCDF variable attribute. Use if you need to delete a netCDF attribute with the same name as one of the reserved python attributes.

dimensions

get variables’s dimension names

getValue()

`getValue(self)`

get the value of a scalar variable. Provided for compatibility with Scientific.IO.NetCDF, can also be done by slicing with an Ellipsis ([…]).

get_dims()

`get_dims(self)`

return a tuple of Dimension instances associated with this Variable.

getncattr()

`getncattr(self,name)`

retrieve a netCDF variable attribute. Use if you need to set a netCDF attribute with the same name as one of the reserved python attributes.

option kwarg encoding can be used to specify the character encoding of a string attribute (default is utf-8).

name

string name of Variable instance

ncattrs()

`ncattrs(self)`

return netCDF attribute names for this Variable in a list.

renameAttribute()

`renameAttribute(self, oldname, newname)`

rename a Variable attribute named oldname to newname.

set_auto_chartostring()

`set_auto_chartostring(self,chartostring)`

turn on or off automatic conversion of character variable data to and from numpy fixed length string arrays when the _Encoding variable attribute is set.

If chartostring is set to True, when data is read from a character variable (dtype = S1) that has an _Encoding attribute, it is converted to a numpy fixed length unicode string array (dtype = UN, where N is the length of the the rightmost dimension of the variable). The value of _Encoding is the unicode encoding that is used to decode the bytes into strings.

When numpy string data is written to a variable it is converted back to indiviual bytes, with the number of bytes in each string equalling the rightmost dimension of the variable.

The default value of chartostring is True (automatic conversions are performed).

set_auto_mask()

`set_auto_mask(self,mask)`

turn on or off automatic conversion of variable data to and from masked arrays .

If mask is set to True, when data is read from a variable it is converted to a masked array if any of the values are exactly equal to the either the netCDF _FillValue or the value specified by the missing_value variable attribute. The fill_value of the masked array is set to the missing_value attribute (if it exists), otherwise the netCDF _FillValue attribute (which has a default value for each data type). If the variable has no missing_value attribute, the _FillValue is used instead. If the variable has valid_min/valid_max and missing_value attributes, data outside the specified range will be masked. When data is written to a variable, the masked array is converted back to a regular numpy array by replacing all the masked values by the missing_value attribute of the variable (if it exists). If the variable has no missing_value attribute, the _FillValue is used instead.

The default value of mask is True (automatic conversions are performed).

set_auto_scale()

`set_auto_scale(self,scale)`

turn on or off automatic packing/unpacking of variable data using scale_factor and add_offset attributes. Also turns on and off automatic conversion of signed integer data to unsigned integer data if the variable has an _Unsigned attribute.

If scale is set to True, and the variable has a scale_factor or an add_offset attribute, then data read from that variable is unpacked using:

data = self.scale_factor*data + self.add_offset

When data is written to a variable it is packed using:

data = (data - self.add_offset)/self.scale_factor

If either scale_factor is present, but add_offset is missing, add_offset is assumed zero. If add_offset is present, but scale_factor is missing, scale_factor is assumed to be one. For more information on how scale_factor and add_offset can be used to provide simple compression, see the [PSL metadata conventions](http://www.esrl.noaa.gov/psl/data/gridded/conventions/cdc_netcdf_standard.shtml).

In addition, if scale is set to True, and if the variable has an attribute _Unsigned set, and the variable has a signed integer data type, a view to the data is returned with the corresponding unsigned integer datatype. This convention is used by the netcdf-java library to save unsigned integer data in NETCDF3 or NETCDF4_CLASSIC files (since the NETCDF3 data model does not have unsigned integer data types).

The default value of scale is True (automatic conversions are performed).

setncattr()

`setncattr(self,name,value)`

set a netCDF variable attribute using name,value pair. Use if you need to set a netCDF attribute with the same name as one of the reserved python attributes.

setncatts()

`setncatts(self,attdict)`

set a bunch of netCDF variable attributes at once using a python dictionary. This may be faster when setting a lot of attributes for a NETCDF3 formatted file, since nc_redef/nc_enddef is not called in between setting each attribute

shape

find current sizes of all variable dimensions

size

Return the number of stored elements.

Compatibility with C

The Following table contains list of PnetCDF-python’s compatibility with PnetCDF-C APIs.

Component
Implemented
To be implemented (priority [1])
File API










ncmpi_strerror
ncmpi_strerrno
ncmpi_create
ncmpi_open/close
ncmpi_enddef/redef
ncmpi_sync
ncmpi_begin/end_indep_data
ncmpi_inq_path
ncmpi_inq
ncmpi_wait
ncmpi_wait_all
ncmpi_inq_libvers (3)
ncmpi_set_fill (3)
ncmpi_set_default_format (3)
ncmpi_inq_put/get_size (3)
ncmpi_delete (2)
ncmpi_sync_numrecs (2)
ncmpi_inq_file_info (3)
ncmpi_inq_files_opened (3)



Dimension API




ncmpi_def_dim
ncmpi_inq_ndims
ncmpi_inq_dimlen
ncmpi_inq_dim
ncmpi_inq_dimname





Attribute API






ncmpi_put/get_att_text
ncmpi_put/get_att
ncmpi_inq_att
ncmpi_inq_natts
ncmpi_inq_attname
ncmpi_rename_att
ncmpi_del_att







Variable API

















ncmpi_def_var
ncmpi_def_var_fill
ncmpi_inq_varndims
ncmpi_inq_varname
ncmpi_put/get_vara
ncmpi_put/get_vars
ncmpi_put/get_var1
ncmpi_put/get_var
ncmpi_put/get_varn
ncmpi_put/get_varm
ncmpi_put/get_vara_all
ncmpi_put/get_vars_all
ncmpi_put/get_var1_all
ncmpi_put/get_var_all
ncmpi_put/get_varn_all
ncmpi_put/get_varm_all
ncmpi_iput/iget_var
ncmpi_iput/iget_vara
ncmpi_iput/iget_var1 (1)
ncmpi_iput/iget_vars (1)
ncmpi_iput/iget_varm (1)
ncmpi_iput/iget_varn (2)
ncmpi_bput/bget_var (1)
ncmpi_bput/bget_var1 (1)
ncmpi_bput/bget_vara (1)
ncmpi_bput/bget_vars (1)
ncmpi_bput/bget_varm (1)
ncmpi_bput/bget_varn (2)
ncmpi_wait/wait_all (1)
ncmpi_inq_nreqs (1)
ncmpi_inq_buffer_usage/size (1)
ncmpi_cancel (1)
ncmpi_fill_var_rec (2)