Example with ARGO data#
We first need to download some ARGO data for example.
[1]:
import requests
url = 'https://www.ncei.noaa.gov/data/oceans/argo/gadr/data/coriolis/69022/nodc_69022_prof.nc'
r = requests.get(url, allow_redirects=True)
with open('ARGO_example.nc', 'wb') as f:
f.write(r.content)
[2]:
import xarray as xr
import cf_xarray
import gsw_xarray as gsw
[3]:
ds = xr.open_dataset('ARGO_example.nc')
[4]:
ds
[4]:
<xarray.Dataset> Dimensions: (n_prof: 130, n_param: 3, n_levels: 56, n_calib: 1, n_history: 0) Dimensions without coordinates: n_prof, n_param, n_levels, n_calib, n_history Data variables: (12/65) data_type object ... format_version object ... handbook_version object ... reference_date_time object ... date_creation object ... date_update object ... ... ... history_parameter (n_history, n_prof) object ... history_start_pres (n_history, n_prof) float32 ... history_stop_pres (n_history, n_prof) float32 ... history_previous_value (n_history, n_prof) float32 ... history_qctest (n_history, n_prof) object ... crs int32 ... Attributes: (12/49) title: Argo float vertical profile institution: FR GDAC source: Argo float history: 2019-10-20T09:53:12Z boyer convAGDAC.f90... references: https://www.nodc.noaa.gov/argo/ user_manual_version: 3.1 ... ... time_coverage_end: 2005-09-19T02:13:46Z time_coverage_duration: point time_coverage_resolution: point gadr_ConventionVersion: GADR-3.0 gadr_program: convAGDAC.f90 gadr_programVersion: 1.2
We can rely on cf-xarray to see what variables have standard names in our dataset:
[5]:
ds.cf
[5]:
Coordinates:
- CF Axes: X, Y, Z, T: n/a
- CF Coordinates: longitude, latitude, vertical, time: n/a
- Cell Measures: area, volume: n/a
- Standard Names: n/a
- Bounds: n/a
Data Variables:
- Cell Measures: area, volume: n/a
- Standard Names: latitude: ['latitude']
longitude: ['longitude']
sea_water_pressure: ['pres', 'pres_adjusted']
sea_water_salinity: ['psal', 'psal_adjusted']
sea_water_temperature: ['temp', 'temp_adjusted']
time: ['juld']
- Bounds: n/a
The dataset contains multiple time the same variable (e.g. ‘pres_adjusted’ and ‘pres’ both have the standard name ‘sea_water_pressure’). For the accessor to work, only 1 variable or each standard name must be present, explicitely stated when calling the function, or the gsw option set_cf_name_preference
must be set. For this example we will only retain the adjusted variables. We set the global option, but we could also use context, i.e.
[6]:
with gsw.set_cf_name_preference(
sea_water_pressure="pres_adjusted",
sea_water_practical_salinity="psal_adjusted",
sea_water_temperature="temp_adjusted"
):
# do the computation
pass
gsw.set_cf_name_preference(
sea_water_pressure="pres_adjusted",
sea_water_practical_salinity="psal_adjusted",
sea_water_temperature="temp_adjusted"
)
# We can check the options we have set:
gsw.get_options()
[6]:
{'non_cf_name': {},
'cf_name_preference': {'sea_water_pressure': 'pres_adjusted',
'sea_water_practical_salinity': 'psal_adjusted',
'sea_water_temperature': 'temp_adjusted'}}
In the following sections we will demonstrate each features of gsw-xarray. We will focus on computing the potential density anomaly.
Basic usage as drop in replacement of gsw#
[7]:
help(gsw.sigma0)
Help on function sigma0 in module gsw._wrapped_ufuncs:
sigma0(SA, CT)
Calculates potential density anomaly with reference pressure of 0 dbar,
this being this particular potential density minus 1000 kg/m^3. This
function has inputs of Absolute Salinity and Conservative Temperature.
This function uses the computationally-efficient expression for
specific volume in terms of SA, CT and p (Roquet et al., 2015).
Parameters
----------
SA : array-like
Absolute Salinity, g/kg
CT : array-like
Conservative Temperature (ITS-90), degrees C
Returns
-------
sigma0 : array-like, kg/m^3
potential density anomaly with
respect to a reference pressure of 0 dbar,
that is, this potential density - 1000 kg/m^3.
We need Absolute Salinity and Conservative Temperature, so 1st we need to do some conversions:
[8]:
SA = gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds.temp_adjusted, p=ds.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
sigma0
[8]:
<xarray.DataArray 'sigma0' (n_prof: 130, n_levels: 56)> array([[26.64758465, 26.64656536, 26.64667857, ..., 27.77078682, 27.77467073, 27.77756609], [26.48498586, 26.48409899, 26.48432005, ..., 27.76600562, 27.76941237, 27.76895082], [26.25518807, 26.25449663, 26.25488776, ..., 27.7603578 , 27.76478937, 27.76924889], ..., [25.31136543, 25.31674362, 25.42998838, ..., 27.77414777, 27.76859076, nan], [25.30657435, 25.30409728, 25.29751108, ..., 27.78551733, 27.77778253, nan], [25.43166004, 25.43155161, 25.43251587, ..., 27.7946225 , 27.79664872, nan]]) Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_sigma_t units: kg/m^3
Using Pint and pint-xarray to handle units#
[9]:
import pint_xarray
import cf_xarray.units
/home/docs/checkouts/readthedocs.org/user_builds/gsw-xarray/envs/latest/lib/python3.10/site-packages/cf_xarray/units.py:112: UserWarning: Import(s) unavailable to set up matplotlib support...skipping this portion of the setup.
warnings.warn(
[10]:
ds_pint = ds.pint.quantify()
ds_pint
[10]:
<xarray.Dataset> Dimensions: (n_prof: 130, n_param: 3, n_levels: 56, n_calib: 1, n_history: 0) Dimensions without coordinates: n_prof, n_param, n_levels, n_calib, n_history Data variables: (12/65) data_type object b'Argo profile ' format_version object b'3.1 ' handbook_version object b'1.2 ' reference_date_time object b'19500101000000' date_creation object b'20040428194950' date_update object b'20190626110153' ... ... history_parameter (n_history, n_prof) object history_start_pres (n_history, n_prof) float32 [dbar] history_stop_pres (n_history, n_prof) float32 [dbar] history_previous_value (n_history, n_prof) float32 history_qctest (n_history, n_prof) object crs int32 -2147483647 Attributes: (12/49) title: Argo float vertical profile institution: FR GDAC source: Argo float history: 2019-10-20T09:53:12Z boyer convAGDAC.f90... references: https://www.nodc.noaa.gov/argo/ user_manual_version: 3.1 ... ... time_coverage_end: 2005-09-19T02:13:46Z time_coverage_duration: point time_coverage_resolution: point gadr_ConventionVersion: GADR-3.0 gadr_program: convAGDAC.f90 gadr_programVersion: 1.2
We compute again sigma0, using the ds_pint
dataset, i.e. variables have a physical dimension
[11]:
SA = gsw.SA_from_SP(SP=ds_pint.psal_adjusted, p=ds_pint.pres_adjusted, lon=ds_pint.longitude, lat=ds_pint.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds_pint.temp_adjusted, p=ds_pint.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
sigma0
[11]:
<xarray.DataArray 'sigma0' (n_prof: 130, n_levels: 56)> <Quantity([[26.64758465 26.64656536 26.64667857 ... 27.77078682 27.77467073 27.77756609] [26.48498586 26.48409899 26.48432005 ... 27.76600562 27.76941237 27.76895082] [26.25518807 26.25449663 26.25488776 ... 27.7603578 27.76478937 27.76924889] ... [25.31136543 25.31674362 25.42998838 ... 27.77414777 27.76859076 nan] [25.30657435 25.30409728 25.29751108 ... 27.78551733 27.77778253 nan] [25.43166004 25.43155161 25.43251587 ... 27.7946225 27.79664872 nan]], 'kilogram / meter ** 3')> Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_sigma_t
gsw-xarray converts the units (if necessary) when using pint quantities:
[12]:
# start to convert the pressure into Pascal
pressure_in_pascal = ds_pint.pres_adjusted.pint.to('Pa')
pressure_in_pascal
[12]:
<xarray.DataArray 'pres_adjusted' (n_prof: 130, n_levels: 56)> <Quantity([[7.0000e+03 5.8000e+04 1.5700e+05 ... 1.4253e+07 1.4752e+07 1.5013e+07] [2.7000e+04 5.5000e+04 1.5200e+05 ... 1.4249e+07 1.4751e+07 1.4920e+07] [3.6000e+04 5.4000e+04 1.5100e+05 ... 1.4249e+07 1.4749e+07 1.5041e+07] ... [4.5000e+04 1.2900e+05 2.2700e+05 ... 1.4726e+07 1.5011e+07 nan] [4.8000e+04 1.3100e+05 2.3100e+05 ... 1.4725e+07 1.4979e+07 nan] [4.5000e+04 1.3200e+05 2.2900e+05 ... 1.4726e+07 1.5021e+07 nan]], 'pascal')> Dimensions without coordinates: n_prof, n_levels Attributes: long_name: Sea water pressure, equals 0 at sea-level standard_name: sea_water_pressure valid_min: 0.0 valid_max: 12000.0 C_format: %7.1f FORTRAN_format: F7.1 resolution: 1.0 axis: Z
Compute again density, using the pressure in Pascal. No worries as the conversion to dbar is automatic!
[13]:
SA = gsw.SA_from_SP(SP=ds_pint.psal_adjusted, p=pressure_in_pascal, lon=ds_pint.longitude, lat=ds_pint.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds_pint.temp_adjusted, p=ds_pint.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
sigma0
[13]:
<xarray.DataArray 'sigma0' (n_prof: 130, n_levels: 56)> <Quantity([[26.64758465 26.64656536 26.64667857 ... 27.77078682 27.77467073 27.77756609] [26.48498586 26.48409899 26.48432005 ... 27.76600562 27.76941237 27.76895082] [26.25518807 26.25449663 26.25488776 ... 27.7603578 27.76478937 27.76924889] ... [25.31136543 25.31674362 25.42998838 ... 27.77414777 27.76859076 nan] [25.30657435 25.30409728 25.29751108 ... 27.78551733 27.77778253 nan] [25.43166004 25.43155161 25.43251587 ... 27.7946225 27.79664872 nan]], 'kilogram / meter ** 3')> Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_sigma_t
[ ]:
Using the accessor to simplify the workflow#
Common case#
gsw-xarray adds ths gsw
accessor to datasets. This accessor makes it easy to run the gsw functions on variables from a dataset.
A first solution is to use the accessor exactly as when using gsw:
[14]:
ds.gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
[14]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)> array([[35.81755188, 35.81657471, 35.81659823, ..., 35.08564178, 35.08162835, 35.0796239 ], [35.80757275, 35.8065829 , 35.80559104, ..., 35.09481071, 35.09079742, 35.08778979], [36.03182228, 36.03082693, 36.0308492 , ..., 35.10502264, 35.1010208 , 35.10101815], ..., [35.79041046, 35.77693492, 35.76322478, ..., 35.20835127, 35.19176101, nan], [35.78753964, 35.78651775, 35.77824776, ..., 35.25467841, 35.22770745, nan], [35.86339942, 35.86237223, 35.85926688, ..., 35.27570263, 35.26426658, nan]]) Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_absolute_salinity units: g/kg
This is however not very useful… A better option is to simply give the name of the variables from the dataset:
[15]:
ds.gsw.SA_from_SP(SP='psal_adjusted', p='pres_adjusted', lon='longitude', lat='latitude')
[15]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)> array([[35.81755188, 35.81657471, 35.81659823, ..., 35.08564178, 35.08162835, 35.0796239 ], [35.80757275, 35.8065829 , 35.80559104, ..., 35.09481071, 35.09079742, 35.08778979], [36.03182228, 36.03082693, 36.0308492 , ..., 35.10502264, 35.1010208 , 35.10101815], ..., [35.79041046, 35.77693492, 35.76322478, ..., 35.20835127, 35.19176101, nan], [35.78753964, 35.78651775, 35.77824776, ..., 35.25467841, 35.22770745, nan], [35.86339942, 35.86237223, 35.85926688, ..., 35.27570263, 35.26426658, nan]]) Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_absolute_salinity units: g/kg
It is even possible to go 1 step further and rely on the usage of standard name! In this case, you don’t need to provide any argument for the variables with the proper standard name.
With this method, it is way faster to compute the density:
[16]:
# WITHOUT any detection
SA = gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
CT = gsw.CT_from_t(SA=SA, t=ds.temp_adjusted, p=ds.pres_adjusted)
sigma0 = gsw.sigma0(SA=SA, CT=CT)
# WITH autodetection
ds = ds.merge(ds.gsw.SA_from_SP())
ds = ds.merge(ds.gsw.CT_from_t())
ds = ds.merge(ds.gsw.sigma0())
You can also use brackets to get either 1 or multiple variables computed:
[17]:
ds.gsw['sigma0'] # Returns a DataArray
ds.gsw[['sigma0']] # Returns a Dataset
ds.gsw[['sigma0', 'alpha', 'beta', 'sigma1', 'rho']] # With multiple outputs
sigma0
sigma0
sigma0
alpha
beta
sigma1
rho
[17]:
<xarray.Dataset> Dimensions: (n_prof: 130, n_levels: 56) Dimensions without coordinates: n_prof, n_levels Data variables: sigma0 (n_prof, n_levels) float64 26.65 26.65 26.65 ... 27.79 27.8 nan alpha (n_prof, n_levels) float64 0.0002081 0.0002082 ... 0.0001458 nan beta (n_prof, n_levels) float64 0.0007438 0.0007437 ... 0.0007505 nan sigma1 (n_prof, n_levels) float64 31.01 31.01 31.01 ... 32.37 32.38 nan rho (n_prof, n_levels) float64 1.027e+03 1.027e+03 ... 1.035e+03 nan
Of course any kind of mixture between all the solutions is possible:
[18]:
# Give a value for SP
# Take p from dataset
# Automatically get lon and lat based on standard names
ds.gsw.SA_from_SP(SP=35, p='pres_adjusted')
[18]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)> array([[35.16540604, 35.16543255, 35.16545565, ..., 35.16805541, 35.16807052, 35.16807842], [35.16541107, 35.16542513, 35.16544852, ..., 35.16805194, 35.16806717, 35.1680723 ], [35.1654079 , 35.16541651, 35.16543824, ..., 35.16804504, 35.16806022, 35.16806909], ..., [35.16533263, 35.16534798, 35.16535502, ..., 35.16820029, 35.16823002, nan], [35.16532498, 35.16533771, 35.16534623, ..., 35.16817466, 35.16820349, nan], [35.1653088 , 35.16532382, 35.16533425, ..., 35.16819766, 35.16823758, nan]]) Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_absolute_salinity units: g/kg
And it is also possible to use automatic discovery of argument with pint datasets:
[19]:
ds_pint.gsw.SA_from_SP()
[19]:
<xarray.DataArray 'SA' (n_prof: 130, n_levels: 56)> <Quantity([[35.81755188 35.81657471 35.81659823 ... 35.08564178 35.08162835 35.0796239 ] [35.80757275 35.8065829 35.80559104 ... 35.09481071 35.09079742 35.08778979] [36.03182228 36.03082693 36.0308492 ... 35.10502264 35.1010208 35.10101815] ... [35.79041046 35.77693492 35.76322478 ... 35.20835127 35.19176101 nan] [35.78753964 35.78651775 35.77824776 ... 35.25467841 35.22770745 nan] [35.86339942 35.86237223 35.85926688 ... 35.27570263 35.26426658 nan]], 'gram / kilogram')> Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_absolute_salinity
[ ]:
Case with some argument without standard name#
Some functions have argument without any standard name. In this case, it is possible to refer to these arguments using gsw-xarray options.
Let’s take the function gsw.SP_salinometer
that has 2 arguments: Rt
without standard name, and t
the in situ temperature.
A 1st option is to explicitely provide a value or the name from the dataset (we will create some fake data for the purpose of this example):
[20]:
ds['salinometer_Rt'] = 35
[21]:
ds.gsw.SP_salinometer(Rt='salinometer_Rt')
[21]:
<xarray.DataArray 'SP' (n_prof: 130, n_levels: 56)> array([[14852.44767448, 14852.37691945, 14852.34156055, ..., 15304.61363721, 15308.13947294, 15310.42597351], [14827.44354407, 14827.40896543, 14827.51267175, ..., 15298.31168787, 15301.51050946, 15302.44067949], [14768.33013327, 14768.33013327, 14768.33013327, ..., 15291.27112184, 15294.91511554, 15297.07554699], ..., [14673.397394 , 14675.12423965, 14689.35099574, ..., 15258.27973762, 15261.63122146, nan], [14673.09995293, 14672.86208721, 14672.77288815, ..., 15246.89048002, 15252.84600545, nan], [14680.80248747, 14680.83244267, 14681.16223826, ..., 15243.57738327, 15248.32719727, nan]]) Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_practical_salinity units: 1 reference_scale: PSS-78
A 2nd solution is to use gsw-xarray option with function set_non_cf_name
. This way if you need to compute multiple times functions that use these arguments without a standard name, you only need to provide once the mapping.
[22]:
help(gsw.set_non_cf_name)
Help on class set_non_cf_name in module gsw_xarray._options:
class set_non_cf_name(set_options)
| set_non_cf_name(**kwargs)
|
| Set `non_cf_name` options for gsw_xarray in a controlled context.
|
| Parameters
| ----------
| Provide the name in dataset of arguments that don't have a standard name
| e.g. entropy='entropy_name_in_ds', SA_seaice='salinity_of_sea_ice_name_in_ds'
|
| Using `set_non_cf_name` is equivalent to using `set_options`
| with argument 'set_non_cf_name', but is provided as a shorter method.
|
| You can use `set_non_cf_name` either as a context manager (using `with`)
| or to set global options
|
| Method resolution order:
| set_non_cf_name
| set_options
| builtins.object
|
| Methods defined here:
|
| __init__(self, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| ----------------------------------------------------------------------
| Methods inherited from set_options:
|
| __enter__(self)
|
| __exit__(self, type, value, traceback)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from set_options:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
Using a context manager:
[23]:
with gsw.set_non_cf_name(Rt='salinometer_Rt'):
ds.gsw.SP_salinometer()
Or globally:
[24]:
gsw.set_non_cf_name(Rt='salinometer_Rt')
ds.gsw.SP_salinometer()
[24]:
<xarray.DataArray 'SP' (n_prof: 130, n_levels: 56)> array([[14852.44767448, 14852.37691945, 14852.34156055, ..., 15304.61363721, 15308.13947294, 15310.42597351], [14827.44354407, 14827.40896543, 14827.51267175, ..., 15298.31168787, 15301.51050946, 15302.44067949], [14768.33013327, 14768.33013327, 14768.33013327, ..., 15291.27112184, 15294.91511554, 15297.07554699], ..., [14673.397394 , 14675.12423965, 14689.35099574, ..., 15258.27973762, 15261.63122146, nan], [14673.09995293, 14672.86208721, 14672.77288815, ..., 15246.89048002, 15252.84600545, nan], [14680.80248747, 14680.83244267, 14681.16223826, ..., 15243.57738327, 15248.32719727, nan]]) Dimensions without coordinates: n_prof, n_levels Attributes: standard_name: sea_water_practical_salinity units: 1 reference_scale: PSS-78
[ ]:
Note on performance#
It is still very slightly faster to provide the arguments than rely on autodetect, however for large data sets this difference should be negligible compared to the internal computation time of the gsw functions.
[25]:
%%timeit
gsw.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
2.03 ms ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[26]:
%%timeit
ds.gsw.SA_from_SP(SP='psal_adjusted', p='pres_adjusted', lon='longitude', lat='latitude')
2.13 ms ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[27]:
%%timeit
ds.gsw.SA_from_SP()
2.14 ms ± 8.07 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Compare with upstream gsw
[28]:
import gsw as gsw_upstream
[29]:
%%timeit
gsw_upstream.SA_from_SP(SP=ds.psal_adjusted, p=ds.pres_adjusted, lon=ds.longitude, lat=ds.latitude)
1.74 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)