BayLum: Specification of the YAML config file
Sebastian Kreutzer
Updated for BayLum package version >= 0.3.3.9000.13 (2024-09-23)
Source:vignettes/ConfigFile_Specification.Rmd
ConfigFile_Specification.Rmd
Background and scope
In earlier versions of 'BayLum'
measurement data and
settings had to be prepared using a multiple folder structure comprising
various CVS and BIN/BINX files in a very particular way. This concept
proved error-prone and left a lot of frustrated 'BayLum'
users behind, who sometimes spent hours trying to understand unclear
error messages and then realising that there was a typo in one of the
CSV files, or the folder structure was not precisely how
'BayLum'
expected it to be found. By switching to a
single-configuration file users have more options while the settings are
cleaner and less scatters over different files in numerous subfolders.
The parameter naming follows the naming convention used in the “old”
'BayLum'
CSV-files.
The purpose of this document is the specification and description of
the YAML (file ending
*.yml
) configuration file used by the function
create_DataFile()
to provide data input and settings to the
'BayLum'
modelling. The YAML file is an alternative and a
future replacement of the previous folder structure with various CSV
files required by the functions Generate_DataFile()
and
Generate_DateFile_MG()
.
Key concepts
The configuration file uses the YAML format, which uses
indention to nest different parameters. Please see the cited
documentation for details, for 'BayLum'
the following
features stick out:
- Only a single configuration file is needed
- The path of the file on your hard drive does not matter
- Measurement data can be stored wherever you like to have them, although it makes sense to pool them into one folder for the analysis
- One sample is one record in the YAML file. If data of one sample is scattered over different measurement files (e.g., BIN/BINX), one sample still has only one record.
Examples and detailed specifications
A single sample entry
A single sample entry appears as follows:
- sample: "samp1"
files:
- "/yourhardrive/yourfolder/sample_one.binx"
settings:
dose_points: null
dose_source: { value: 0.1535, error: 0.00005891 }
dose_env: { value: 2.512, error: 0.05626 }
rules:
beginSignal: 6
endSignal: 8
beginBackground: 50
endBackground: 55
beginTest: 6
endTest: 8
beginTestBackground: 50
endTestBackground: 55
inflatePercent: 0.027
nbOfLastCycleToRemove: 1
Each new record (aka sample) starts with a -
and the
indention as shown above. Furthermore:
- Indention rules need to be followed strictly
- Each record has three levels: top level, settings level, rules level and this nesting needs to be kept
- Only parameters shown in the examples are allowed, and the list of parameters need to be complete (i.e. you cannot just remove a parameter)
- Sample names need to be unique
Other than that, if you keep these simple rules in mind, you will
have an easy time preparing your 'BayLum'
analysis.
Multiple records
While the single record makes an easy case, you probably have more than one sample to be thrown into the modelling. Although the number of records is not limited, we keep it simple here; a two records entry (dots replace the entries as shown above):
# this is a coment for sample number one
- sample: "samp1"
files:
- "/yourhardrive/yourfolder/sample_one.binx"
settings:
dose_points: null
dose_source: { value: 0.1535, error: 0.00005891 }
dose_env: { value: 2.512, error: 0.05626 }
rules:
beginSignal: 6
.
.
.
nbOfLastCycleToRemove: 1
# this is a coment for sample number two
- sample: "samp2"
files: null
settings:
dose_points: null
dose_source: { value: 0.1535, error: 0.00005891 }
dose_env: { value: 2.512, error: 0.05626 }
rules:
beginSignal: 6
.
.
.
nbOfLastCycleToRemove: 1
As you can see from the example, you can also add comments to the
records, which start with #
. The two records also show
different entries for argument files
. In the first case, a
file path is given, while for record number two, files
is
set to null
. Both options are possible. In the first case,
the record specifies where the measurement data can be found. In the
second case it is assumed that an R object with the name
samp2
can be found in the global environment of your R
session.
Paramter specifcation
Top level (sample
, files
)
The top level has two parameters:
sample
This parameter specifies the name of the sample. This name must be unique and is ideally free of non-ASCII characters and white space.
files
This parameter can be null
(files
is the
only parameter that can be set to null
) or is followed by a
set of -
with the path to the measurement file given in
quotes. The number of entries under files
is not limited.
Example:
files:
- "/yourhardrive/yourfolder/sample_one_a.binx"
- "/yourhardrive/yourfolder/sample_one_b.binx"
If the entry is null
, the function
BayLum::create_DataFile()
that uses the settings from the
YAML file will assume that R objects with the name specified in
sample
are available in the global session environment. For
instance, files are imported and
treated with
Luminescence::read_BIN2R(...) |> subset(...)
or similar.
Setting files
to null
gives you all options to
pre-process your measurement data and is the recommended mode of
operation.
If files
comes with file path entries, then
BayLum::create_DataFile()
will try to import those files
using the appropriate import functions. This is very convenient,
however, except for minimal filtering (e.g., removing non-OSL and
non-IRSL curves), the measurement data remain untreated, and
BayLum::create_DataFile()
expects that all data are
complete (e.g., identical number of curves), without error and strictly
follow the SAR structure.
settings
level
The settings
level allows you to specify the dose rate
of your source used for the irradiation in Gy/s
(dose_source
) and the environmental dose rate in Gy/ka
(dose_env
). Each value needs to be provided with its
uncertainty, as shown in the example:
settings:
dose_points: null
dose_source: { value: 0.1535, error: 0.00005891 }
dose_env: { value: 2.512, error: 0.05626 }
Additionally, you can set specify the regeneration dose points (in
s). The default is null
, because irradiation times are
automatically extracted from the data by create_DataFile()
.
However, this information might be missing or, more likely, wrong and it
is very cumbersome to fix those numbers manually in the measurement
data. Therefore the dose points can be provided with the config
file:
settings:
dose_points: [10, 20, 50, 0, 10]
dose_source: { value: 0.1535, error: 0.00005891 }
dose_env: { value: 2.512, error: 0.05626 }
The example corresponds to 5 (five) regeneration dose points of 10 s, 20 s, …, 10 s.
Note
- You should add the values as you have specified them in the measurement sequence, except for the natural dose point (0 s) and the test dose points, which must not be added.
- The provided vector will be shortened automatically to fit the actual number of dose points.
- An error will be thrown if you provide not enough dose points
rules
level
The rules level enables you to provide a couple of parameters, which are used in Bayesian modelling.
PARAMETER | TYPE | COMMENT |
---|---|---|
beginSignal |
integer | Channel number start OSL signal integral () |
endSignal |
integer | Channel number end OSL signal integral () |
beginBackground |
integer | Channel number start OSL background integral () |
endBackground |
integer | Channel number end OSL background integral () |
beginTest |
integer | Channel number start OSL signal integral () |
endTest |
integer | Channel number end OSL signal integral () |
beginTestBackground |
integer | Channel number start OSL background integral () |
endTestBackground |
integer | Channel number end OSL background integral () |
inflatePercent |
double | Additional overdispersion value to inflate the uncertainty in percentage |
nbOfLastCycleToRemove |
integer | Number of SAR cycles to be removed from the measurement file |
Example:
rules:
beginSignal: 6
endSignal: 8
beginBackground: 50
endBackground: 55
beginTest: 6
endTest: 8
beginTestBackground: 50
endTestBackground: 55
inflatePercent: 0.027
nbOfLastCycleToRemove: 1
Please ensure that the set values correspond to your measurement data. For instance, if your OSL curve has only 100 channels (data points), it does not make sense to set larger integral settings (e.g., 1000), and such a setting will lead to an error. Integral values for () and () are usually set to identical values unless you have good reasons to use different integral settings.
Auto-generate the config file using
write_YAMLConfigFile()
To ease the generation of configuration files for many samples, you
can use the function write_YAMLConfigFile()
.
The function has two different operation modes, which are shown
below. Important is to note that the function does not seem to have
function parameters, because all parameters are extracted from a
reference file within the package. All parameters in the reference file
are allowed. However, you can only preset each parameter for all
records, except for the parameter sample
. The length of
this parameter (e.g.,
write_YAMLConfigFile(sample = c("a1", "a2))
) determines the
number of records in the configuration file output.
Show available parameters
In this mode, the function displays available parameters in the
terminal and returns a list that can be modified in R and then passed to
create_DataFile()
:
l <- write_YAMLConfigFile()
── Allowed function parameters (start) ─────────────────────────────────────────
sample
files
settings.dose_points
settings.dose_source.value
settings.dose_source.error
settings.dose_env.value
settings.dose_env.error
settings.rules.beginSignal
settings.rules.endSignal
settings.rules.beginBackground
settings.rules.endBackground
settings.rules.beginTest
settings.rules.endTest
settings.rules.beginTestBackground
settings.rules.endTestBackground
settings.rules.inflatePercent
settings.rules.nbOfLastCycleToRemove
── Allowed function parameters (end) ───────────────────────────────────────────
str(l)
List of 1
$ :List of 3
..$ sample : chr "reference"
..$ files : NULL
..$ settings:List of 4
.. ..$ dose_points: NULL
.. ..$ dose_source:List of 2
.. .. ..$ value: int 0
.. .. ..$ error: int 0
.. ..$ dose_env :List of 2
.. .. ..$ value: int 0
.. .. ..$ error: int 0
.. ..$ rules :List of 10
.. .. ..$ beginSignal : int 0
.. .. ..$ endSignal : int 0
.. .. ..$ beginBackground : int 0
.. .. ..$ endBackground : int 0
.. .. ..$ beginTest : int 0
.. .. ..$ endTest : int 0
.. .. ..$ beginTestBackground : int 0
.. .. ..$ endTestBackground : int 0
.. .. ..$ inflatePercent : int 0
.. .. ..$ nbOfLastCycleToRemove: int 0
Write YAML file
Alternatively, the function can be used to generate a config file with preset values. You can then modify the generated YAML file with any text-editor.
l <- write_YAMLConfigFile(output_file = "<your filepath>")
Internals
The YAML settings file is loaded and processed by the function
BayLum::create_DataFile()
using
yaml::read_yaml()
from the R package 'yaml'
.
yaml::read_yaml()
returns a list
on R, which
is then processed by BayLum::create_DataFile()
. Sometimes
it makes sense to modify the settings on the fly in R. To avoid import
and export of YAML files, BayLum::create_DataFile()
always
tries to process the input of the parameter
BayLum::create_DataFile(config_file, ...)
as a
list
before trying to load a YAML file from the hard drive.
While this option is usually unnecessary, this information may help in
more complex R scripts.