How to write a single file according to the data reference syntax (DRS)¶
Here we document how to write a single file according to the input4MIPs DRS. This is an optional step in preparing files for submission to the input4MIPs collection and, ultimately, publication in the ESGF index. If you don't do it, the ESGF publishers will. However, doing it yourself means that you understand how your data is treated and have a copy of the file which is ultimately published to the ESGF. This document assumes that you already have a file that passes validation. If you don't have that yet, check out "How to validate a single file".
Note: Before you write your files according to the DRS, there are a few other steps you need to do too. See the instructions for data producers in the input4MIPs CVS repository. Don't forget to do those steps at some point too.
import tempfile
from pathlib import Path
import iris
import xarray as xr
starting_file = Path(
"fixed_CH4-em-biomassburning_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc"
)
Starting point¶
We assume that you are starting from an existing file, which can pass validation. If you don't have that yet, check out "How to validate a single file".
start = xr.open_dataset(starting_file)
start.data_vars
Data variables:
CH4 (time, lat, lon) float64 152kB ...
time_bnds (time, bnds) datetime64[ns] 2kB ...
lat_bnds (lat, bnds) float64 192B ...
lon_bnds (lon, bnds) float64 192B ...
start_iris = iris.load(starting_file)
start_iris
| Biomass Burning Ch4 Flux (kg m-2 s-1) | time | latitude | longitude |
|---|---|---|---|
| Shape | 132 | 12 | 12 |
| Dimension coordinates | |||
| time | x | - | - |
| latitude | - | x | - |
| longitude | - | - | x |
| Cell methods | |||
| 0 | time: mean | ||
| Attributes | |||
| Conventions | 'CF-1.7' | ||
| activity_id | 'input4MIPs' | ||
| comment | 'Demo' | ||
| contact | 'zebedee.nicholls@climate-resource.com;malte.meinshausen@climate-resour ...' | ||
| creation_date | '2024-07-25T18:23:47Z' | ||
| data_structure | 'grid' | ||
| dataset_category | 'emissions' | ||
| external_variables | 'gridcellarea sources' | ||
| frequency | 'mon' | ||
| further_info_url | 'www.tbd.invalid' | ||
| grid_label | 'gn' | ||
| institution | 'Climate Resource' | ||
| institution_id | 'DRES' | ||
| license | 'The input4MIPs data linked to this entry is licensed under a Creative Commons ...' | ||
| mip_era | 'CMIP6Plus' | ||
| nominal_resolution | '25 km' | ||
| realm | 'atmos' | ||
| references | 'Reference to great paper' | ||
| source | 'Climate Resource demo' | ||
| source_id | 'CR-CMIP-0-2-0' | ||
| source_version | '0.2.0' | ||
| target_mip | 'CMIP' | ||
| title | 'Title here' | ||
| tracking_id | 'hdl:21.14100/e574dafc-aec5-4f51-94fd-af400bf5f36e' | ||
| variable_id | 'CH4' | ||
The CVs¶
A note before we continue.
In the below, you will notice that there is an option, --cv-source.
This points to the source of the controlled vocabularies (CVs).
You can pick different sources for the CVs.
For example, you can load the CVs from local files,
or from the input4MIPs CVs GitHub
(or any other web source).
In this example, we're going to use a specific commit from the input4MIPs CVs GitHub to avoid anything breaking, even if we make further changes to the CVs. For your own work, you will probably want to use either:
- local files, e.g.
--cv-source path/to/local/files - the branch where you have added your information to the CVs,
--cv-source https://raw.githubusercontent.com/PCMDI/input4MIPs_CVs/branch_name/CVs/ - a tagged version of the input4MIPs CVs GitHub
--cv-source gh:tag-id - the main branch of the input4MIPs CVs GitHub
--cv-source gh:main - a specific commit in the input4MIPs CVs GitHub,
like we do below,
--cv-source gh:commit-hash
This is definitely not the best documented feature of the library, so if anything is unclear, please raise an issue.
Write the file in the DRS¶
Writing the file in the DRS is very similar to validating a file.
We simply add the --write-in-drs option to our validate-file
call, then the file is automatically written in the DRS for us.
Below, we use our command-line interface. There is also a Python API, in case you want to do this directly from Python (note, the logging is setup slightly differently in the Python API so the default shown messages are different, but the behaviour is the same and you can always adjust the logging to suit your own preferences).
TMP_DIR = Path(tempfile.mkdtemp())
tree_to_write_in = TMP_DIR / "how-to-write-a-single-file-in-drs"
!input4mips-validation \
validate-file \
--cv-source "gh:v6.6.0" \
{starting_file} \
--write-in-drs {tree_to_write_in}
4666 - 134897345468224 - 2025-10-01T14:42:03.507941+0000 - INFO_FILE - input4mips_validation.validation.file:file.py:105 - Creating validation results for file: docs/how-to-guides/fixed_CH4-em-biomassburning_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc 4666 - 134897345468224 - 2025-10-01T14:42:03.517421+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Load controlled vocabularies to use during validation ran without error
4666 - 134897345468224 - 2025-10-01T14:42:03.578841+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Open data with `xr.open_dataset` ran without error
4666 - 134897345468224 - 2025-10-01T14:42:03.592973+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Load data with `iris.load` ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.599950+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Load data with `iris.load_cube` ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.600064+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.file:file.py:150 - Using the cf-checker to check docs/how-to-guides/fixed_CH4-em-biomassburning_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc
4666 - 134897345468224 - 2025-10-01T14:42:03.873192+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Check data with cf-checker ran without error
4666 - 134897345468224 - 2025-10-01T14:42:03.911957+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'creation_date' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.912237+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'tracking_id' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917134+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'frequency' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917260+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'variable_id' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917325+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'activity_id' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917363+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'comment' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917442+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'external_variables' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917510+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'contact' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917548+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'source_version' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.917633+0000 - INFO_FILE - input4mips_validation.validation.file:file.py:190 - Created validation results for file: docs/how-to-guides/fixed_CH4-em-biomassburning_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc 4666 - 134897345468224 - 2025-10-01T14:42:03.918491+0000 - SUCCESS - input4mips_validation.cli:__init__.py:196 - File passed validation: docs/how-to-guides/fixed_CH4-em-biomassburning_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc
4666 - 134897345468224 - 2025-10-01T14:42:03.955898+0000 - INFO - input4mips_validation.cli:__init__.py:229 - Re-writing docs/how-to-guides/fixed_CH4-em-biomassburning_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc to /tmp/tmpx9yw4jgl/how-to-write-a-single-file-in-drs/input4MIPs/CMIP6Plus/CMIP/DRES/CR-CMIP-0-2-0/atmos/mon/CH4/gn/v20251001/CH4_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc 4666 - 134897345468224 - 2025-10-01T14:42:03.957719+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'creation_date' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.957827+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'tracking_id' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962372+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'frequency' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962485+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'variable_id' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962537+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'activity_id' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962587+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'comment' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962668+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'external_variables' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962717+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'contact' attribute ran without error 4666 - 134897345468224 - 2025-10-01T14:42:03.962750+0000 - INFO_INDIVIDUAL_CHECK - input4mips_validation.validation.error_catching:error_catching.py:208 - Validate the 'source_version' attribute ran without error /home/docs/checkouts/readthedocs.org/user_builds/input4mips-validation/conda/latest/lib/python3.9/site-packages/iris/fileformats/_nc_load_rules/helpers.py:486: _WarnComboIgnoringLoad: Skipping disallowed global attribute 'coordinates': 'coordinates' is not a permitted attribute warnings.warn(
4666 - 134897345468224 - 2025-10-01T14:42:04.008772+0000 - SUCCESS - input4mips_validation.cli:__init__.py:242 - File written according to the DRS in /tmp/tmpx9yw4jgl/how-to-write-a-single-file-in-drs/input4MIPs/CMIP6Plus/CMIP/DRES/CR-CMIP-0-2-0/atmos/mon/CH4/gn/v20251001/CH4_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc
As the log output shows, the file is re-written with a full file path that matches the DRS. This file is then ready for submission to input4MIPs.
If it is of interest, we show some of the file's attributes below.
written_file = list(tree_to_write_in.rglob("*.nc"))
if len(written_file) != 1:
msg = f"Found {written_file=}"
raise AssertionError(msg)
written_file = written_file[0]
print(f"The file's name according to the DRS is {written_file.name}")
print(
"The file's path according to the DRS is "
f"{written_file.parent.relative_to(tree_to_write_in)}"
)
rewritten = xr.open_dataset(written_file)
print(f"The written file's tracking ID is {rewritten.attrs['tracking_id']}")
print(f"The written file's creation date is {rewritten.attrs['creation_date']}")
The file's name according to the DRS is CH4_input4MIPs_emissions_CMIP_CR-CMIP-0-2-0_gn_200001-201012.nc The file's path according to the DRS is input4MIPs/CMIP6Plus/CMIP/DRES/CR-CMIP-0-2-0/atmos/mon/CH4/gn/v20251001 The written file's tracking ID is hdl:21.14100/d63b7e05-b888-4e55-92bf-83c088f14334 The written file's creation date is 2025-10-01T14:42:03Z
Next steps¶
This procedure can obviously be repeated over a number of files with loops etc. We currently don't have a tool that repeats this procedure over numerous files, but are happy to receive requests for one in our issues. Having said that, if you've got this far, we assume you can write a loop in Python or bash :)
Once you have written your files, the next step is to upload the files to LLNL's FTP server, please see "How to upload to an FTP server".