Dataset overview¶
Essentially, a Dataset is a collection of variables which are related to each other in some way. For example, when you load a file into PyGeode, a single Dataset is created, containing all variables from that file.
Here’s an example of a Dataset, with a single variable:
>>> from pygeode.tutorial import t1
>>> print(t1)
<Dataset>:
Vars:
Temp (lat,lon) (31,60)
Axes:
lat <Lat> : 90 S to 90 N (31 values)
lon <Lon> : 0 E to 354 E (60 values)
Global Attributes:
history : Synthetic Temperature data generated by pygeode
This particular Dataset has a single variable inside (named Temp
),
defined over latitude and longitude. We can reference a particular
Var
object using a dot (.
), treating it as a member of the
Dataset:
>>> x = t1.Temp
>>> print(x)
<Var 'Temp'>:
Units: K Shape: (lat,lon) (31,60)
Axes:
lat <Lat> : 90 S to 90 N (31 values)
lon <Lon> : 0 E to 354 E (60 values)
Attributes:
{}
Type: Add_Var (dtype="float64")
More examples can be found in tut_datasets.
- class pygeode.Dataset[source]¶
Attributes
A list of variables contained by the Dataset.
A dictionary of variables contained by the Dataset, indexed by name.
A list of axes contained by the Dataset.
A dictionary of metadata associated with the dataset.
Methods
Dataset.__init__
(vars[, atts, print_warnings])Create a new
Dataset
from a list of variables.Dataset.__getitem__
(key)Gets a variable or axis object.
Dataset.__call__
(**kwargs)Subsets all variables in this dataset.
Dataset.add
(*vars)Adds variables to the dataset.
Creates a new copy of this dataset.
Loads data from all variables in the dataset.
Dataset.map
(f, *args, **kwargs)Calls a given function on every variable in the dataset.
Dataset.remove
(*varnames)Removes variables from the dataset.
Dataset.rename_vars
([vardict])Rename variables in dataset.
Dataset.replace_vars
([vardict])Replaces variables in the dataset.
Variable operations
Dataset.smooth
(*args, **kwargs)Returns new dataset calling Var.smooth on each variable.
Dataset.deriv
(*args, **kwargs)Returns new dataset calling Var.deriv on each variable.
Dataset.diff
(*args, **kwargs)Returns new dataset calling Var.diff on each variable.
Dataset.integrate
(*args, **kwargs)Returns new dataset calling Var.integrate on each variable.
Dataset.composite
(*args, **kwargs)Returns new dataset calling Var.composite on each variable.
Dataset.flatten
(*args, **kwargs)Returns new dataset calling Var.flatten on each variable.
Dataset.fft_smooth
(*args, **kwargs)Returns new dataset calling Var.fft_smooth on each variable.
Dataset.lag
(*args, **kwargs)Returns new dataset calling Var.lag on each variable.
Dataset.interpolate
(*args, **kwargs)Returns new dataset calling Var.interpolate on each variable.
Dataset.squeeze
(*args, **kwargs)Returns new dataset calling Var.squeeze on each variable.
Dataset.extend
(*args, **kwargs)Returns new dataset calling Var.extend on each variable.
Dataset.transpose
(*args, **kwargs)Returns new dataset calling Var.transpose on each variable.
Dataset.sorted
(*args, **kwargs)Returns new dataset calling Var.sorted on each variable.
Dataset.replace_axes
(*args, **kwargs)Returns new dataset calling Var.replace_axes on each variable.
Dataset.rename
(*args, **kwargs)Returns new dataset calling Var.rename on each variable.
Dataset.rename_axes
(*args, **kwargs)Returns new dataset calling Var.rename_axes on each variable.
Dataset.fill
(*args, **kwargs)Returns new dataset calling Var.fill on each variable.
Dataset.unfill
(*args, **kwargs)Returns new dataset calling Var.unfill on each variable.
Dataset.as_type
(*args, **kwargs)Returns new dataset calling Var.as_type on each variable.
Dataset Attributes¶
- Dataset.vars list of Var instances¶
A list of variables contained by the Dataset. See Var class.
- Dataset.vardict dictionary of Var instances¶
A dictionary of variables contained by the Dataset, indexed by name.
- Dataset.axes list of Axis instances¶
A list of axes contained by the Dataset. See Axis class.
- Dataset.atts dictionary¶
A dictionary of metadata associated with the dataset. Sometimes referred to as global attributes.
Dataset Methods¶
- Dataset.__init__(vars, atts={}, print_warnings=True)[source]¶
Create a new
Dataset
from a list of variables.- Parameters
- vars: list
The list of
Var
objects to include.- atts: dictionary, optional
A dictionary of attributes available in the
Dataset.atts
attribute.- print_warnings: boolean, optional [True]
If True, print out warnings when variables and axes are renamed.
- Returns
- The new
Dataset
object.
- The new
Notes
Variable names and axis names must be unique. If multiple variables share the same name they will be renamed so that they are unique. If variables have axes with matching names but which are not matching, they will also be renamed. If any names are modified and
print_warnings
is True, a warning will be displayed indicating how objects have been renamed.
- Dataset.__call__(**kwargs)[source]¶
Subsets all variables in this dataset. Behaves in the same way as
Var.__call__()
.- Parameters
- sliceslist of slices
See
Var.__call__()
for details.
- Returns
Dataset
A new Dataset, in which all variables have been restricted to the specified domain.
See also
Notes
Not all variables need to have the axes being sliced (any slice that doesn’t apply to a given variable is simply ignored). This is usually more convenient, but it does mean that if an axis name is misspelled (for example), the call will return successfully without performing any subsetting.
- Dataset.add(*vars)[source]¶
Adds variables to the dataset.
Notes
The same naming rules are applied in case of name collisions as in
Dataset.__init__()
. The addition operator is also overloaded to do the same thing; in that case provide a list of the variables to add.Examples
>>> from pygeode.tutorial import t1, t2 >>> print(t2.add(t1.Temp.rename('Temp2'))) <Dataset>: Vars: Temp (time,pres,lat,lon) (3650,20,31,60) U (time,pres,lat,lon) (3650,20,31,60) Temp2 (lat,lon) (31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode >>> print(t2 + t1.Temp.rename('Temp2')) <Dataset>: Vars: Temp (time,pres,lat,lon) (3650,20,31,60) U (time,pres,lat,lon) (3650,20,31,60) Temp2 (lat,lon) (31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode
- Dataset.copy()[source]¶
Creates a new copy of this dataset. New instances of the internal lists and dictionaries are created, but the variable still rever to the same
Var
objects.- Returns
- A new
Dataset
object.
- A new
- Dataset.load()[source]¶
Loads data from all variables in the dataset.
- Returns
- A new dataset in which
Var.load()
has been called - on each variable, loading their data into memory.
- A new dataset in which
- Dataset.map(f, *args, **kwargs)[source]¶
Calls a given function on every variable in the dataset.
- Parameters
- f: callable
Method to call. Must take the variable as its first argument, and return either a single variable, or None. Further positional and keyword arguments can be passed through
args
andkwargs
.- args, kwargs: positional and keyword arguments
These are passed on to f.
- Returns
- A new
Dataset
with the results of the calls to f. f can return None; in that - case no corresponding variable is included in the new Dataset object.
- A new
- Dataset.remove(*varnames)[source]¶
Removes variables from the dataset.
- Parameters
- *varnamesstrings
The names of the variables to remove.
- Returns
- A new
Dataset
with the specified variables removed.
- A new
Notes
The sutraction operator is also overloaded to do the same thing; in that case provide a list of strings again specifying the names of the variables to remove.
Examples
>>> from pygeode.tutorial import t2 >>> print(t2) <Dataset>: Vars: Temp (time,pres,lat,lon) (3650,20,31,60) U (time,pres,lat,lon) (3650,20,31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode >>> print(t2.remove('Temp')) <Dataset>: Vars: U (time,pres,lat,lon) (3650,20,31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode >>> print(t2 - ['U']) <Dataset>: Vars: Temp (time,pres,lat,lon) (3650,20,31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode
- Dataset.rename_vars(vardict={}, **kwargs)[source]¶
Rename variables in dataset. Variables to rename can be passed as keyword arguments, or as a dictionary.
- Parameters
- vardict: dictionary, optional
A dictionary with keys corresponding to the existing variables to rename and values giving their new names.
- **kwargs: keyword arguments
One or more keyword arguments. The parameters are the existing variable names and the values are the new names to substitute.
- Returns
- A new
Dataset
object with the same contents but renamed variables.
- A new
Examples
>>> from pygeode.tutorial import t2 >>> print(t2) <Dataset>: Vars: Temp (time,pres,lat,lon) (3650,20,31,60) U (time,pres,lat,lon) (3650,20,31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode >>> print(t2.rename_vars(Temp = 'T', U = 'Wind')) <Dataset>: Vars: T (time,pres,lat,lon) (3650,20,31,60) Wind (time,pres,lat,lon) (3650,20,31,60) Axes: time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode
- Dataset.replace_vars(vardict={}, **kwargs)[source]¶
Replaces variables in the dataset.
- Parameters
- Returns
- A new
Dataset
with the specified variables replaced.
- A new
Examples
>>> from pygeode.tutorial import t1, t2 >>> print(t2.replace_vars(Temp = t1.Temp)) <Dataset>: Vars: Temp (lat,lon) (31,60) U (time,pres,lat,lon) (3650,20,31,60) Axes: lat <Lat> : 90 S to 90 N (31 values) lon <Lon> : 0 E to 354 E (60 values) time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values) pres <Pres> : 1000 hPa to 50 hPa (20 values) Global Attributes: history : Synthetic Temperature and Wind data generated by pygeode