Dataset overview

Essentially, a Dataset is a collection of variables which are related to each other in some way. For example, when you load a file into PyGeode, a single Dataset is created, containing all variables from that file.

Here’s an example of a Dataset, with a single variable:

>>> from pygeode.tutorial import t1
>>> print(t1)
<Dataset>:
Vars:
  Temp (lat,lon)  (31,60)
Axes:
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature data generated by pygeode

This particular Dataset has a single variable inside (named Temp), defined over latitude and longitude. We can reference a particular Var object using a dot (.), treating it as a member of the Dataset:

>>> x = t1.Temp
>>> print(x)
<Var 'Temp'>:
  Units: K  Shape:  (lat,lon)  (31,60)
  Axes:
    lat <Lat>      :  90 S to 90 N (31 values)
    lon <Lon>      :  0 E to 354 E (60 values)
  Attributes:
    {}
  Type:  Add_Var (dtype="float64")

More examples can be found in tut_datasets.

class pygeode.Dataset[source]

Attributes

Dataset.vars

A list of variables contained by the Dataset.

Dataset.vardict

A dictionary of variables contained by the Dataset, indexed by name.

Dataset.axes

A list of axes contained by the Dataset.

Dataset.atts

A dictionary of metadata associated with the dataset.

Methods

Dataset.__init__(vars[, atts, print_warnings])

Create a new Dataset from a list of variables.

Dataset.__getitem__(key)

Gets a variable or axis object.

Dataset.__call__(**kwargs)

Subsets all variables in this dataset.

Dataset.add(*vars)

Adds variables to the dataset.

Dataset.copy()

Creates a new copy of this dataset.

Dataset.load()

Loads data from all variables in the dataset.

Dataset.map(f, *args, **kwargs)

Calls a given function on every variable in the dataset.

Dataset.remove(*varnames)

Removes variables from the dataset.

Dataset.rename_vars([vardict])

Rename variables in dataset.

Dataset.replace_vars([vardict])

Replaces variables in the dataset.

Variable operations

Dataset.smooth(*args, **kwargs)

Returns new dataset calling Var.smooth on each variable.

Dataset.deriv(*args, **kwargs)

Returns new dataset calling Var.deriv on each variable.

Dataset.diff(*args, **kwargs)

Returns new dataset calling Var.diff on each variable.

Dataset.integrate(*args, **kwargs)

Returns new dataset calling Var.integrate on each variable.

Dataset.composite(*args, **kwargs)

Returns new dataset calling Var.composite on each variable.

Dataset.flatten(*args, **kwargs)

Returns new dataset calling Var.flatten on each variable.

Dataset.fft_smooth(*args, **kwargs)

Returns new dataset calling Var.fft_smooth on each variable.

Dataset.lag(*args, **kwargs)

Returns new dataset calling Var.lag on each variable.

Dataset.interpolate(*args, **kwargs)

Returns new dataset calling Var.interpolate on each variable.

Dataset.squeeze(*args, **kwargs)

Returns new dataset calling Var.squeeze on each variable.

Dataset.extend(*args, **kwargs)

Returns new dataset calling Var.extend on each variable.

Dataset.transpose(*args, **kwargs)

Returns new dataset calling Var.transpose on each variable.

Dataset.sorted(*args, **kwargs)

Returns new dataset calling Var.sorted on each variable.

Dataset.replace_axes(*args, **kwargs)

Returns new dataset calling Var.replace_axes on each variable.

Dataset.rename(*args, **kwargs)

Returns new dataset calling Var.rename on each variable.

Dataset.rename_axes(*args, **kwargs)

Returns new dataset calling Var.rename_axes on each variable.

Dataset.fill(*args, **kwargs)

Returns new dataset calling Var.fill on each variable.

Dataset.unfill(*args, **kwargs)

Returns new dataset calling Var.unfill on each variable.

Dataset.as_type(*args, **kwargs)

Returns new dataset calling Var.as_type on each variable.

Dataset Attributes

Dataset.vars list of Var instances

A list of variables contained by the Dataset. See Var class.

Dataset.vardict dictionary of Var instances

A dictionary of variables contained by the Dataset, indexed by name.

Dataset.axes list of Axis instances

A list of axes contained by the Dataset. See Axis class.

Dataset.atts dictionary

A dictionary of metadata associated with the dataset. Sometimes referred to as global attributes.

Dataset Methods

Dataset.__init__(vars, atts={}, print_warnings=True)[source]

Create a new Dataset from a list of variables.

Parameters
vars: list

The list of Var objects to include.

atts: dictionary, optional

A dictionary of attributes available in the Dataset.atts attribute.

print_warnings: boolean, optional [True]

If True, print out warnings when variables and axes are renamed.

Returns
The new Dataset object.

Notes

Variable names and axis names must be unique. If multiple variables share the same name they will be renamed so that they are unique. If variables have axes with matching names but which are not matching, they will also be renamed. If any names are modified and print_warnings is True, a warning will be displayed indicating how objects have been renamed.

Dataset.__getitem__(key)[source]

Gets a variable or axis object.

Parameters
keystring

Name of axis or variable to return.

Returns
Var or Axis object matching the requested name. Raises KeyError if
no such member is found.
Dataset.__call__(**kwargs)[source]

Subsets all variables in this dataset. Behaves in the same way as Var.__call__().

Parameters
sliceslist of slices

See Var.__call__() for details.

Returns
Dataset

A new Dataset, in which all variables have been restricted to the specified domain.

See also

Var.__call__

Notes

Not all variables need to have the axes being sliced (any slice that doesn’t apply to a given variable is simply ignored). This is usually more convenient, but it does mean that if an axis name is misspelled (for example), the call will return successfully without performing any subsetting.

Dataset.add(*vars)[source]

Adds variables to the dataset.

Parameters
*varsVar objects

The variables to add

Returns
A new Dataset with the variables added.

Notes

The same naming rules are applied in case of name collisions as in Dataset.__init__(). The addition operator is also overloaded to do the same thing; in that case provide a list of the variables to add.

Examples

>>> from pygeode.tutorial import t1, t2
>>> print(t2.add(t1.Temp.rename('Temp2')))
<Dataset>:
Vars:
  Temp  (time,pres,lat,lon)  (3650,20,31,60)
  U     (time,pres,lat,lon)  (3650,20,31,60)
  Temp2 (lat,lon)            (31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
>>> print(t2 + t1.Temp.rename('Temp2'))
<Dataset>:
Vars:
  Temp  (time,pres,lat,lon)  (3650,20,31,60)
  U     (time,pres,lat,lon)  (3650,20,31,60)
  Temp2 (lat,lon)            (31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
Dataset.copy()[source]

Creates a new copy of this dataset. New instances of the internal lists and dictionaries are created, but the variable still rever to the same Var objects.

Returns
A new Dataset object.
Dataset.load()[source]

Loads data from all variables in the dataset.

Returns
A new dataset in which Var.load() has been called
on each variable, loading their data into memory.
Dataset.map(f, *args, **kwargs)[source]

Calls a given function on every variable in the dataset.

Parameters
f: callable

Method to call. Must take the variable as its first argument, and return either a single variable, or None. Further positional and keyword arguments can be passed through args and kwargs.

args, kwargs: positional and keyword arguments

These are passed on to f.

Returns
A new Dataset with the results of the calls to f. f can return None; in that
case no corresponding variable is included in the new Dataset object.
Dataset.remove(*varnames)[source]

Removes variables from the dataset.

Parameters
*varnamesstrings

The names of the variables to remove.

Returns
A new Dataset with the specified variables removed.

Notes

The sutraction operator is also overloaded to do the same thing; in that case provide a list of strings again specifying the names of the variables to remove.

Examples

>>> from pygeode.tutorial import t2
>>> print(t2)
<Dataset>:
Vars:
  Temp (time,pres,lat,lon)  (3650,20,31,60)
  U    (time,pres,lat,lon)  (3650,20,31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
>>> print(t2.remove('Temp'))
<Dataset>:
Vars:
  U (time,pres,lat,lon)  (3650,20,31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
>>> print(t2 - ['U'])
<Dataset>:
Vars:
  Temp (time,pres,lat,lon)  (3650,20,31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
Dataset.rename_vars(vardict={}, **kwargs)[source]

Rename variables in dataset. Variables to rename can be passed as keyword arguments, or as a dictionary.

Parameters
vardict: dictionary, optional

A dictionary with keys corresponding to the existing variables to rename and values giving their new names.

**kwargs: keyword arguments

One or more keyword arguments. The parameters are the existing variable names and the values are the new names to substitute.

Returns
A new Dataset object with the same contents but renamed variables.

Examples

>>> from pygeode.tutorial import t2
>>> print(t2)
<Dataset>:
Vars:
  Temp (time,pres,lat,lon)  (3650,20,31,60)
  U    (time,pres,lat,lon)  (3650,20,31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
>>> print(t2.rename_vars(Temp = 'T', U = 'Wind'))
<Dataset>:
Vars:
  T    (time,pres,lat,lon)  (3650,20,31,60)
  Wind (time,pres,lat,lon)  (3650,20,31,60)
Axes:
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode
Dataset.replace_vars(vardict={}, **kwargs)[source]

Replaces variables in the dataset.

Parameters
vardict: dictionary, optional

A dictionary with keys corresponding to the existing variables to replace and values giving the new Var instances.

**kwargs: keyword arguments

One or more keyword arguments. The parameters are the existing variable names and the values are the new Var instances.

Returns
A new Dataset with the specified variables replaced.

Examples

>>> from pygeode.tutorial import t1, t2
>>> print(t2.replace_vars(Temp = t1.Temp))
<Dataset>:
Vars:
  Temp (lat,lon)            (31,60)
  U    (time,pres,lat,lon)  (3650,20,31,60)
Axes:
  lat <Lat>      :  90 S to 90 N (31 values)
  lon <Lon>      :  0 E to 354 E (60 values)
  time <ModelTime365>:  Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)
  pres <Pres>    :  1000 hPa to 50 hPa (20 values)
Global Attributes:
  history        : Synthetic Temperature and Wind data generated by pygeode