Example with ARGO data#

We first need to download some ARGO data for example.

[1]:
import requests
url = 'https://www.ncei.noaa.gov/data/oceans/argo/gadr/data/coriolis/69022/nodc_69022_prof.nc'
r = requests.get(url, allow_redirects=True)
with open('ARGO_example.nc', 'wb') as f:
    f.write(r.content)
[2]:
import xarray as xr
import cf_xarray
import gsw_xarray as gsw
[3]:
ds = xr.open_dataset('ARGO_example.nc')
[4]:
ds
[4]:
<xarray.Dataset>
Dimensions:                       (n_prof: 130, n_param: 3, n_levels: 56,
                                   n_calib: 1, n_history: 0)
Dimensions without coordinates: n_prof, n_param, n_levels, n_calib, n_history
Data variables: (12/65)
    data_type                     object ...
    format_version                object ...
    handbook_version              object ...
    reference_date_time           object ...
    date_creation                 object ...
    date_update                   object ...
    ...                            ...
    history_parameter             (n_history, n_prof) object ...
    history_start_pres            (n_history, n_prof) float32 ...
    history_stop_pres             (n_history, n_prof) float32 ...
    history_previous_value        (n_history, n_prof) float32 ...
    history_qctest                (n_history, n_prof) object ...
    crs                           int32 ...
Attributes: (12/49)
    title:                           Argo float vertical profile
    institution:                     FR GDAC
    source:                          Argo float
    history:                         2019-10-20T09:53:12Z boyer convAGDAC.f90...
    references:                      https://www.nodc.noaa.gov/argo/
    user_manual_version:             3.1
    ...                              ...
    time_coverage_end:               2005-09-19T02:13:46Z
    time_coverage_duration:          point
    time_coverage_resolution:        point
    gadr_ConventionVersion:          GADR-3.0
    gadr_program:                    convAGDAC.f90
    gadr_programVersion:             1.2

We can rely on cf-xarray to see what variables have standard names in our dataset:

[5]:
ds.cf
[5]:
Coordinates:
- CF Axes:   X, Y, Z, T: n/a

- CF Coordinates:   longitude, latitude, vertical, time: n/a

- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

- Bounds:   n/a

Data Variables:
- Cell Measures:   area, volume: n/a

- Standard Names:   latitude: ['latitude']
                    longitude: ['longitude']
                    sea_water_pressure: ['pres', 'pres_adjusted']
                    sea_water_salinity: ['psal', 'psal_adjusted']
                    sea_water_temperature: ['temp', 'temp_adjusted']
                    time: ['juld']

- Bounds:   n/a

The dataset contains multiple time the same variable (e.g. ‘pres_adjusted’ and ‘pres’ both have the standard name ‘sea_water_pressure’). For the accessor to work, only 1 variable or each standard name must be present, explicitely stated when calling the function, or the gsw option set_cf_name_preference must be set. For this example we will only retain the adjusted variables. We set the global option, but we could also use context, i.e.

[6]:
with gsw.set_cf_name_preference(
    sea_water_pressure="pres_adjusted",
    sea_water_practical_salinity="psal_adjusted",
    sea_water_temperature="temp_adjusted"
):
    # do the computation
    pass

gsw.set_cf_name_preference(
    sea_water_pressure="pres_adjusted",
    sea_water_practical_salinity="psal_adjusted",
    sea_water_temperature="temp_adjusted"
)

# We can check the options we have set:
gsw.get_options()
[6]:
{'non_cf_name': {},
 'cf_name_preference': {'sea_water_pressure': 'pres_adjusted',
  'sea_water_practical_salinity': 'psal_adjusted',
  'sea_water_temperature': 'temp_adjusted'}}

In the following sections we will demonstrate each features of gsw-xarray. We will focus on computing the potential density anomaly.

Basic usage as drop in replacement of gsw#

[7]:
help(gsw.sigma0)
Help on function sigma0 in module gsw._wrapped_ufuncs:

sigma0(SA, CT)
    Calculates potential density anomaly with reference pressure of 0 dbar,
    this being this particular potential density minus 1000 kg/m^3.  This
    function has inputs of Absolute Salinity and Conservative Temperature.
    This function uses the computationally-efficient expression for
    specific volume in terms of SA, CT and p (Roquet et al., 2015).

    Parameters
    ----------
    SA : array-like
        Absolute Salinity, g/kg
    CT : array-like
        Conservative Temperature (ITS-90), degrees C

    Returns
    -------
    sigma0 : array-like, kg/m^3
        potential density anomaly with
        respect to a reference pressure of 0 dbar,
        that is, this potential density - 1000 kg/m^3.

We need Absolute Salinity and Conservative Temperature, so 1st we need to do some conversions:

[8]:
SA = gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds.temp_adjusted, p=ds.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
sigma0
[8]:
<xarray.DataArray 'sigma0' (n_prof: 130, n_levels: 56)>
array([[26.64758465, 26.64656536, 26.64667857, ..., 27.77078682,
        27.77467073, 27.77756609],
       [26.48498586, 26.48409899, 26.48432005, ..., 27.76600562,
        27.76941237, 27.76895082],
       [26.25518807, 26.25449663, 26.25488776, ..., 27.7603578 ,
        27.76478937, 27.76924889],
       ...,
       [25.31136543, 25.31674362, 25.42998838, ..., 27.77414777,
        27.76859076,         nan],
       [25.30657435, 25.30409728, 25.29751108, ..., 27.78551733,
        27.77778253,         nan],
       [25.43166004, 25.43155161, 25.43251587, ..., 27.7946225 ,
        27.79664872,         nan]])
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_sigma_t
    units:          kg/m^3

Using Pint and pint-xarray to handle units#

[9]:
import pint_xarray
import cf_xarray.units
/home/docs/checkouts/readthedocs.org/user_builds/gsw-xarray/envs/latest/lib/python3.10/site-packages/cf_xarray/units.py:112: UserWarning: Import(s) unavailable to set up matplotlib support...skipping this portion of the setup.
  warnings.warn(
[10]:
ds_pint = ds.pint.quantify()
ds_pint
[10]:
<xarray.Dataset>
Dimensions:                       (n_prof: 130, n_param: 3, n_levels: 56,
                                   n_calib: 1, n_history: 0)
Dimensions without coordinates: n_prof, n_param, n_levels, n_calib, n_history
Data variables: (12/65)
    data_type                     object b'Argo profile    '
    format_version                object b'3.1 '
    handbook_version              object b'1.2 '
    reference_date_time           object b'19500101000000'
    date_creation                 object b'20040428194950'
    date_update                   object b'20190626110153'
    ...                            ...
    history_parameter             (n_history, n_prof) object
    history_start_pres            (n_history, n_prof) float32 [dbar]
    history_stop_pres             (n_history, n_prof) float32 [dbar]
    history_previous_value        (n_history, n_prof) float32
    history_qctest                (n_history, n_prof) object
    crs                           int32 -2147483647
Attributes: (12/49)
    title:                           Argo float vertical profile
    institution:                     FR GDAC
    source:                          Argo float
    history:                         2019-10-20T09:53:12Z boyer convAGDAC.f90...
    references:                      https://www.nodc.noaa.gov/argo/
    user_manual_version:             3.1
    ...                              ...
    time_coverage_end:               2005-09-19T02:13:46Z
    time_coverage_duration:          point
    time_coverage_resolution:        point
    gadr_ConventionVersion:          GADR-3.0
    gadr_program:                    convAGDAC.f90
    gadr_programVersion:             1.2

We compute again sigma0, using the ds_pint dataset, i.e. variables have a physical dimension

[11]:
SA = gsw.SA_from_SP(SP=ds_pint.psal_adjusted, p=ds_pint.pres_adjusted, lon=ds_pint.longitude, lat=ds_pint.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds_pint.temp_adjusted, p=ds_pint.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
sigma0
[11]:
<xarray.DataArray 'sigma0' (n_prof: 130, n_levels: 56)>
<Quantity([[26.64758465 26.64656536 26.64667857 ... 27.77078682 27.77467073
  27.77756609]
 [26.48498586 26.48409899 26.48432005 ... 27.76600562 27.76941237
  27.76895082]
 [26.25518807 26.25449663 26.25488776 ... 27.7603578  27.76478937
  27.76924889]
 ...
 [25.31136543 25.31674362 25.42998838 ... 27.77414777 27.76859076
          nan]
 [25.30657435 25.30409728 25.29751108 ... 27.78551733 27.77778253
          nan]
 [25.43166004 25.43155161 25.43251587 ... 27.7946225  27.79664872
          nan]], 'kilogram / meter ** 3')>
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_sigma_t

gsw-xarray converts the units (if necessary) when using pint quantities:

[12]:
# start to convert the pressure into Pascal
pressure_in_pascal = ds_pint.pres_adjusted.pint.to('Pa')
pressure_in_pascal
[12]:
<xarray.DataArray 'pres_adjusted' (n_prof: 130, n_levels: 56)>
<Quantity([[7.0000e+03 5.8000e+04 1.5700e+05 ... 1.4253e+07 1.4752e+07 1.5013e+07]
 [2.7000e+04 5.5000e+04 1.5200e+05 ... 1.4249e+07 1.4751e+07 1.4920e+07]
 [3.6000e+04 5.4000e+04 1.5100e+05 ... 1.4249e+07 1.4749e+07 1.5041e+07]
 ...
 [4.5000e+04 1.2900e+05 2.2700e+05 ... 1.4726e+07 1.5011e+07        nan]
 [4.8000e+04 1.3100e+05 2.3100e+05 ... 1.4725e+07 1.4979e+07        nan]
 [4.5000e+04 1.3200e+05 2.2900e+05 ... 1.4726e+07 1.5021e+07        nan]], 'pascal')>
Dimensions without coordinates: n_prof, n_levels
Attributes:
    long_name:       Sea water pressure, equals 0 at sea-level
    standard_name:   sea_water_pressure
    valid_min:       0.0
    valid_max:       12000.0
    C_format:        %7.1f
    FORTRAN_format:  F7.1
    resolution:      1.0
    axis:            Z

Compute again density, using the pressure in Pascal. No worries as the conversion to dbar is automatic!

[13]:
SA = gsw.SA_from_SP(SP=ds_pint.psal_adjusted, p=pressure_in_pascal, lon=ds_pint.longitude, lat=ds_pint.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds_pint.temp_adjusted, p=ds_pint.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
sigma0
[13]:
<xarray.DataArray 'sigma0' (n_prof: 130, n_levels: 56)>
<Quantity([[26.64758465 26.64656536 26.64667857 ... 27.77078682 27.77467073
  27.77756609]
 [26.48498586 26.48409899 26.48432005 ... 27.76600562 27.76941237
  27.76895082]
 [26.25518807 26.25449663 26.25488776 ... 27.7603578  27.76478937
  27.76924889]
 ...
 [25.31136543 25.31674362 25.42998838 ... 27.77414777 27.76859076
          nan]
 [25.30657435 25.30409728 25.29751108 ... 27.78551733 27.77778253
          nan]
 [25.43166004 25.43155161 25.43251587 ... 27.7946225  27.79664872
          nan]], 'kilogram / meter ** 3')>
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_sigma_t
[ ]:

Using the accessor to simplify the workflow#

Common case#

gsw-xarray adds ths gsw accessor to datasets. This accessor makes it easy to run the gsw functions on variables from a dataset.

A first solution is to use the accessor exactly as when using gsw:

[14]:
ds.gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
[14]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)>
array([[35.81755188, 35.81657471, 35.81659823, ..., 35.08564178,
        35.08162835, 35.0796239 ],
       [35.80757275, 35.8065829 , 35.80559104, ..., 35.09481071,
        35.09079742, 35.08778979],
       [36.03182228, 36.03082693, 36.0308492 , ..., 35.10502264,
        35.1010208 , 35.10101815],
       ...,
       [35.79041046, 35.77693492, 35.76322478, ..., 35.20835127,
        35.19176101,         nan],
       [35.78753964, 35.78651775, 35.77824776, ..., 35.25467841,
        35.22770745,         nan],
       [35.86339942, 35.86237223, 35.85926688, ..., 35.27570263,
        35.26426658,         nan]])
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_absolute_salinity
    units:          g/kg

This is however not very useful… A better option is to simply give the name of the variables from the dataset:

[15]:
ds.gsw.SA_from_SP(SP='psal_adjusted', p='pres_adjusted', lon='longitude', lat='latitude')
[15]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)>
array([[35.81755188, 35.81657471, 35.81659823, ..., 35.08564178,
        35.08162835, 35.0796239 ],
       [35.80757275, 35.8065829 , 35.80559104, ..., 35.09481071,
        35.09079742, 35.08778979],
       [36.03182228, 36.03082693, 36.0308492 , ..., 35.10502264,
        35.1010208 , 35.10101815],
       ...,
       [35.79041046, 35.77693492, 35.76322478, ..., 35.20835127,
        35.19176101,         nan],
       [35.78753964, 35.78651775, 35.77824776, ..., 35.25467841,
        35.22770745,         nan],
       [35.86339942, 35.86237223, 35.85926688, ..., 35.27570263,
        35.26426658,         nan]])
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_absolute_salinity
    units:          g/kg

It is even possible to go 1 step further and rely on the usage of standard name! In this case, you don’t need to provide any argument for the variables with the proper standard name.

With this method, it is way faster to compute the density:

[16]:
# WITHOUT any detection
SA = gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds.temp_adjusted, p=ds.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)

# WITH autodetection
ds = ds.merge(ds.gsw.SA_from_SP())
ds = ds.merge(ds.gsw.CT_from_t())
ds = ds.merge(ds.gsw.sigma0())

You can also use brackets to get either 1 or multiple variables computed:

[17]:
ds.gsw['sigma0'] # Returns a DataArray
ds.gsw[['sigma0']] # Returns a Dataset
ds.gsw[['sigma0', 'alpha', 'beta', 'sigma1', 'rho']] # With multiple outputs
sigma0
sigma0
sigma0
alpha
beta
sigma1
rho
[17]:
<xarray.Dataset>
Dimensions:  (n_prof: 130, n_levels: 56)
Dimensions without coordinates: n_prof, n_levels
Data variables:
    sigma0   (n_prof, n_levels) float64 26.65 26.65 26.65 ... 27.79 27.8 nan
    alpha    (n_prof, n_levels) float64 0.0002081 0.0002082 ... 0.0001458 nan
    beta     (n_prof, n_levels) float64 0.0007438 0.0007437 ... 0.0007505 nan
    sigma1   (n_prof, n_levels) float64 31.01 31.01 31.01 ... 32.37 32.38 nan
    rho      (n_prof, n_levels) float64 1.027e+03 1.027e+03 ... 1.035e+03 nan

Of course any kind of mixture between all the solutions is possible:

[18]:
# Give a value for SP
# Take p from dataset
# Automatically get lon and lat based on standard names
ds.gsw.SA_from_SP(SP=35, p='pres_adjusted')
[18]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)>
array([[35.16540604, 35.16543255, 35.16545565, ..., 35.16805541,
        35.16807052, 35.16807842],
       [35.16541107, 35.16542513, 35.16544852, ..., 35.16805194,
        35.16806717, 35.1680723 ],
       [35.1654079 , 35.16541651, 35.16543824, ..., 35.16804504,
        35.16806022, 35.16806909],
       ...,
       [35.16533263, 35.16534798, 35.16535502, ..., 35.16820029,
        35.16823002,         nan],
       [35.16532498, 35.16533771, 35.16534623, ..., 35.16817466,
        35.16820349,         nan],
       [35.1653088 , 35.16532382, 35.16533425, ..., 35.16819766,
        35.16823758,         nan]])
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_absolute_salinity
    units:          g/kg

And it is also possible to use automatic discovery of argument with pint datasets:

[19]:
ds_pint.gsw.SA_from_SP()
[19]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)>
<Quantity([[35.81755188 35.81657471 35.81659823 ... 35.08564178 35.08162835
  35.0796239 ]
 [35.80757275 35.8065829  35.80559104 ... 35.09481071 35.09079742
  35.08778979]
 [36.03182228 36.03082693 36.0308492  ... 35.10502264 35.1010208
  35.10101815]
 ...
 [35.79041046 35.77693492 35.76322478 ... 35.20835127 35.19176101
          nan]
 [35.78753964 35.78651775 35.77824776 ... 35.25467841 35.22770745
          nan]
 [35.86339942 35.86237223 35.85926688 ... 35.27570263 35.26426658
          nan]], 'gram / kilogram')>
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:  sea_water_absolute_salinity
[ ]:

Case with some argument without standard name#

Some functions have argument without any standard name. In this case, it is possible to refer to these arguments using gsw-xarray options.

Let’s take the function gsw.SP_salinometer that has 2 arguments: Rt without standard name, and t the in situ temperature.

A 1st option is to explicitely provide a value or the name from the dataset (we will create some fake data for the purpose of this example):

[20]:
ds['salinometer_Rt'] = 35
[21]:
ds.gsw.SP_salinometer(Rt='salinometer_Rt')
[21]:
<xarray.DataArray 'SP' (n_prof: 130, n_levels: 56)>
array([[14852.44767448, 14852.37691945, 14852.34156055, ...,
        15304.61363721, 15308.13947294, 15310.42597351],
       [14827.44354407, 14827.40896543, 14827.51267175, ...,
        15298.31168787, 15301.51050946, 15302.44067949],
       [14768.33013327, 14768.33013327, 14768.33013327, ...,
        15291.27112184, 15294.91511554, 15297.07554699],
       ...,
       [14673.397394  , 14675.12423965, 14689.35099574, ...,
        15258.27973762, 15261.63122146,            nan],
       [14673.09995293, 14672.86208721, 14672.77288815, ...,
        15246.89048002, 15252.84600545,            nan],
       [14680.80248747, 14680.83244267, 14681.16223826, ...,
        15243.57738327, 15248.32719727,            nan]])
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:    sea_water_practical_salinity
    units:            1
    reference_scale:  PSS-78

A 2nd solution is to use gsw-xarray option with function set_non_cf_name. This way if you need to compute multiple times functions that use these arguments without a standard name, you only need to provide once the mapping.

[22]:
help(gsw.set_non_cf_name)
Help on class set_non_cf_name in module gsw_xarray._options:

class set_non_cf_name(set_options)
 |  set_non_cf_name(**kwargs)
 |
 |  Set `non_cf_name` options for gsw_xarray in a controlled context.
 |
 |  Parameters
 |  ----------
 |  Provide the name in dataset of arguments that don't have a standard name
 |  e.g. entropy='entropy_name_in_ds', SA_seaice='salinity_of_sea_ice_name_in_ds'
 |
 |  Using `set_non_cf_name` is equivalent to using `set_options`
 |  with argument 'set_non_cf_name', but is provided as a shorter method.
 |
 |  You can use `set_non_cf_name` either as a context manager (using `with`)
 |  or to set global options
 |
 |  Method resolution order:
 |      set_non_cf_name
 |      set_options
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __init__(self, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from set_options:
 |
 |  __enter__(self)
 |
 |  __exit__(self, type, value, traceback)
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from set_options:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

Using a context manager:

[23]:
with gsw.set_non_cf_name(Rt='salinometer_Rt'):
    ds.gsw.SP_salinometer()

Or globally:

[24]:
gsw.set_non_cf_name(Rt='salinometer_Rt')
ds.gsw.SP_salinometer()
[24]:
<xarray.DataArray 'SP' (n_prof: 130, n_levels: 56)>
array([[14852.44767448, 14852.37691945, 14852.34156055, ...,
        15304.61363721, 15308.13947294, 15310.42597351],
       [14827.44354407, 14827.40896543, 14827.51267175, ...,
        15298.31168787, 15301.51050946, 15302.44067949],
       [14768.33013327, 14768.33013327, 14768.33013327, ...,
        15291.27112184, 15294.91511554, 15297.07554699],
       ...,
       [14673.397394  , 14675.12423965, 14689.35099574, ...,
        15258.27973762, 15261.63122146,            nan],
       [14673.09995293, 14672.86208721, 14672.77288815, ...,
        15246.89048002, 15252.84600545,            nan],
       [14680.80248747, 14680.83244267, 14681.16223826, ...,
        15243.57738327, 15248.32719727,            nan]])
Dimensions without coordinates: n_prof, n_levels
Attributes:
    standard_name:    sea_water_practical_salinity
    units:            1
    reference_scale:  PSS-78
[ ]:

Note on performance#

It is still very slightly faster to provide the arguments than rely on autodetect, however for large data sets this difference should be negligible compared to the internal computation time of the gsw functions.

[25]:
%%timeit
gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
2.03 ms ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[26]:
%%timeit
ds.gsw.SA_from_SP(SP='psal_adjusted', p='pres_adjusted', lon='longitude', lat='latitude')
2.13 ms ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[27]:
%%timeit
ds.gsw.SA_from_SP()
2.14 ms ± 8.07 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Compare with upstream gsw

[28]:
import gsw as gsw_upstream
[29]:
%%timeit
gsw_upstream.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
1.74 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)