Title: | Read and Write CDISC Dataset JSON Files |
---|---|
Description: | Read, construct and write CDISC (Clinical Data Interchange Standards Consortium) Dataset JSON (JavaScript Object Notation) files, while validating per the Dataset JSON schema file, as described in CDISC (2023) <https://www.cdisc.org/standards/data-exchange/dataset-json>. |
Authors: | Mike Stackhouse [aut, cre] |
Maintainer: | Mike Stackhouse <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.3.0 |
Built: | 2025-01-29 19:39:31 UTC |
Source: | https://github.com/atorus-research/datasetjson |
Create the base object used to write a Dataset JSON file.
dataset_json( .data, file_oid = NULL, last_modified = NULL, originator = NULL, sys = NULL, sys_version = NULL, study = NULL, metadata_version = NULL, metadata_ref = NULL, item_oid = NULL, name = NULL, dataset_label = NULL, columns = NULL, version = "1.1.0" )
dataset_json( .data, file_oid = NULL, last_modified = NULL, originator = NULL, sys = NULL, sys_version = NULL, study = NULL, metadata_version = NULL, metadata_ref = NULL, item_oid = NULL, name = NULL, dataset_label = NULL, columns = NULL, version = "1.1.0" )
.data |
Input data to contain within the Dataset JSON file. Written to the itemData parameter. |
file_oid |
fileOID parameter, defined as "A unique identifier for this file." (optional) |
last_modified |
The date/time the source database was last modified before creating the Dataset-JSON file (optional) |
originator |
originator parameter, defined as "The organization that generated the Dataset-JSON file." (optional) |
sys |
sourceSystem.name parameter, defined as "The computer system or database management system that is the source of the information in this file." (Optional, required if coupled with sys_version) |
sys_version |
sourceSystem.Version, defined as "The version of the sourceSystem" (Optional, required if coupled with sys) |
study |
Study OID value (optional) |
metadata_version |
Metadata version OID value (optional) |
metadata_ref |
Metadata reference (i.e. path to Define.xml) (optional) |
item_oid |
ID used to label dataset with the itemGroupData parameter. Defined as "Object of Datasets. Key value is a unique identifier for Dataset, corresponding to ItemGroupDef/@OID in Define-XML." |
name |
Dataset name |
dataset_label |
Dataset Label |
columns |
Variable level metadata for the Dataset JSON object. See details for format requirements. |
version |
The DatasetJSON version to use. Currently only 1.1.0 is supported. |
The columns
parameter should be provided as a dataframe based off the
Dataset JSON Specification:
itemOID: string, required: Unique identifier for the variable that may also function as a foreign key to an ItemDef/@OID in an associated Define-XML file. See the ODM specification for OID considerations.
name: string, required: Variable name
label: string, required: Variable label
dataType: string, required: Logical data type of the variable. The dataType attribute represents the planned specificity of the data. See the ODM Data Formats specification for details. -targetDataType: string, optional: Indicates the data type into which the receiving system must transform the associated Dataset-JSON variable. The variable with the data type attribute of dataType must be converted into the targetDataType when transforming the Dataset-JSON dataset into a format for operational use (e.g., SAS dataset, R dataframe, loading into a system's data store). Only specify targetDataType when it is different from the dataType attribute or the JSON data type and the data needs to be transformed by the receiving system. See the Supported Column Data Type Combinations table for details on usage. See the User's Guide for additional information.
length: integer, optional: Specifies the number of characters allowed for the variable value when it is represented as a text.
displayFormat: *string, optional: A SAS display format value used for data visualization of numeric float and date values.
keySequence: integer, optional: Indicates that this item is a key variable in the dataset structure. It also provides an ordering for the keys.
Note that DatasetJSON is on version 1.1.0. Based off findings from the pilot, version 1.1.0 reflects feedback from the user community. Support for 1.0.0 has been deprecated.
dataset_json object pertaining to the specific Dataset JSON version specific
# Create a basic object ds_json <- dataset_json( iris, file_oid = "/some/path", last_modified = "2023-02-15T10:23:15", originator = "Some Org", sys = "source system", sys_version = "1.0", study = "SOMESTUDY", metadata_version = "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7", metadata_ref = "some/define.xml", item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) # Attach attributes directly ds_json <- dataset_json(iris, columns = iris_items) ds_json <- set_file_oid(ds_json, "/some/path") ds_json <- set_last_modified(ds_json, "2025-01-21T13:34:50") ds_json <- set_originator(ds_json, "Some Org") ds_json <- set_source_system(ds_json, "source system", "1.0") ds_json <- set_study_oid(ds_json, "SOMESTUDY") ds_json <- set_metadata_ref(ds_json, "some/define.xml") ds_json <- set_metadata_version(ds_json, "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7") ds_json <- set_item_oid(ds_json, "IG.IRIS") ds_json <- set_dataset_name(ds_json, "Iris") ds_json <- set_dataset_label(ds_json, "The Iris Dataset")
# Create a basic object ds_json <- dataset_json( iris, file_oid = "/some/path", last_modified = "2023-02-15T10:23:15", originator = "Some Org", sys = "source system", sys_version = "1.0", study = "SOMESTUDY", metadata_version = "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7", metadata_ref = "some/define.xml", item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) # Attach attributes directly ds_json <- dataset_json(iris, columns = iris_items) ds_json <- set_file_oid(ds_json, "/some/path") ds_json <- set_last_modified(ds_json, "2025-01-21T13:34:50") ds_json <- set_originator(ds_json, "Some Org") ds_json <- set_source_system(ds_json, "source system", "1.0") ds_json <- set_study_oid(ds_json, "SOMESTUDY") ds_json <- set_metadata_ref(ds_json, "some/define.xml") ds_json <- set_metadata_version(ds_json, "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7") ds_json <- set_item_oid(ds_json, "IG.IRIS") ds_json <- set_dataset_name(ds_json, "Iris") ds_json <- set_dataset_label(ds_json, "The Iris Dataset")
This function pulls out the column metadata from the datasetjson
object
attributes into a more user-friendly data.frame.
get_column_metadata(x)
get_column_metadata(x)
x |
A datasetjson object |
A data frame containing the columns metadata
ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) get_column_metadata(ds_json)
ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) get_column_metadata(ds_json)
Example of the necessary variable metadata included in a Dataset JSON file based on the Iris data frame.
iris_items
iris_items
iris_items
A data frame with 5 rows and 6 columns:Unique identifier for Variable. Must correspond to ItemDef/@OID in Define-XML.
Display format supports data visualization of numeric float and date values.
Label for Variable
Data type for Variable
Length for Variable
Indicates that this item is a key variable in the dataset structure. It also provides an ordering for the keys.
This function validates a dataset JSON file against the Dataset JSON schema, and if valid returns a datasetjson object. The Dataset JSON file can be either a file path on disk of a URL which contains the Dataset JSON file.
read_dataset_json(file, decimals_as_floats = FALSE)
read_dataset_json(file, decimals_as_floats = FALSE)
file |
File path or URL of a Dataset JSON file |
decimals_as_floats |
Convert variables of "decimal" type to float |
The resulting dataframe contains the additional metadata available on the Dataset JSON file within the attributes to make this accessible to the user. Note that these attributes are only populated if available.
sourceSystem: The information system from which the content of this dataset was source, including system name and version.
datasetJSONVersion: The version of the Dataset-JSON standard used to create the dataset.
fileOID: A unique identifier for this dataset.
dbLastModifiedDateTime: The date/time the source database was last modified before creating the Dataset-JSON file.
originator: The organization that generated the Dataset-JSON dataset.
studyOID: Unique identifier for the study that may also function as a foreign key to a Study/@OID in an associated Define-XML document, or to any studyOID references that are used as keys in other documents;
metaDataVersionOID: Unique identifier for the metadata version that may also function as a foreign key to a MetaDataVersion/@OID in an associated Define-XML file
metaDataRef: URI for the metadata file describing the dataset (e.g., a Define-XML file).
itemGroupOID: Unique identifier for the dataset that may also function as a foreign key to an ItemGroupDef/@OID in an associated Define-XML file.
name: The human-readable name for the dataset.
label: A short description of the dataset.
columns: An array of metadata objects that describe the dataset
variables. See dataset_json()
for further information on the contents of
these fields.
A dataframe with additional attributes attached containing the DatasetJSON metadata.
# Read from disk ## Not run: dat <- read_dataset_json("path/to/file.json") # Read file from URL dat <- dataset_json('https://www.somesite.com/file.json') ## End(Not run) # Read from an already imported character vector ds_json <- dataset_json(iris, "IG.IRIS", "IRIS", "Iris", columns=iris_items) js <- write_dataset_json(ds_json) dat <- read_dataset_json(js)
# Read from disk ## Not run: dat <- read_dataset_json("path/to/file.json") # Read file from URL dat <- dataset_json('https://www.somesite.com/file.json') ## End(Not run) # Read from an already imported character vector ds_json <- dataset_json(iris, "IG.IRIS", "IRIS", "Iris", columns=iris_items) js <- write_dataset_json(ds_json) dat <- read_dataset_json(js)
This object is a character vector holding the schema for Dataset JSON Version 1.1.0
schema_1_1_0
schema_1_1_0
schema_1_1_0
A character vector with 1 element
Set information about the file, source system, study, and dataset used to generate the Dataset JSON object.
set_source_system(x, sys, sys_version) set_originator(x, originator) set_file_oid(x, file_oid) set_study_oid(x, study) set_metadata_version(x, metadata_version) set_metadata_ref(x, metadata_ref) set_item_oid(x, item_oid) set_dataset_name(x, name) set_dataset_label(x, dataset_label) set_last_modified(x, last_modified)
set_source_system(x, sys, sys_version) set_originator(x, originator) set_file_oid(x, file_oid) set_study_oid(x, study) set_metadata_version(x, metadata_version) set_metadata_ref(x, metadata_ref) set_item_oid(x, item_oid) set_dataset_name(x, name) set_dataset_label(x, dataset_label) set_last_modified(x, last_modified)
x |
datasetjson object |
sys |
sourceSystem.name parameter, defined as "The computer system or database management system that is the source of the information in this file." (Optional, required if coupled with sys_version) |
sys_version |
sourceSystem.Version, defined as "The version of the sourceSystem" (Optional, required if coupled with sys) |
originator |
originator parameter, defined as "The organization that generated the Dataset-JSON file." (optional) |
file_oid |
fileOID parameter, defined as "A unique identifier for this file." (optional) |
study |
Study OID value (optional) |
metadata_version |
Metadata version OID value (optional) |
metadata_ref |
Metadata reference (i.e. path to Define.xml) (optional) |
item_oid |
ID used to label dataset with the itemGroupData parameter. Defined as "Object of Datasets. Key value is a unique identifier for Dataset, corresponding to ItemGroupDef/@OID in Define-XML." |
name |
Dataset name |
dataset_label |
Dataset Label |
last_modified |
The date/time the source database was last modified before creating the Dataset-JSON file (optional) |
The fileOID parameter should be structured following description outlined in the ODM V2.0 specification. "FileOIDs should be universally unique if at all possible. One way to ensure this is to prefix every FileOID with an internet domain name owned by the creator of the ODM file or database (followed by a forward slash, "/"). For example, FileOID="BestPharmaceuticals.com/Study5894/1" might be a good way to denote the first file in a series for study 5894 from Best Pharmaceuticals."
datasetjson object
ds_json <- dataset_json(iris, columns = iris_items) ds_json <- set_file_oid(ds_json, "/some/path") ds_json <- set_last_modified(ds_json, "2025-01-21T13:34:50") ds_json <- set_originator(ds_json, "Some Org") ds_json <- set_source_system(ds_json, "source system", "1.0") ds_json <- set_study_oid(ds_json, "SOMESTUDY") ds_json <- set_metadata_ref(ds_json, "some/define.xml") ds_json <- set_metadata_version(ds_json, "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7") ds_json <- set_item_oid(ds_json, "IG.IRIS") ds_json <- set_dataset_name(ds_json, "Iris") ds_json <- set_dataset_label(ds_json, "The Iris Dataset")
ds_json <- dataset_json(iris, columns = iris_items) ds_json <- set_file_oid(ds_json, "/some/path") ds_json <- set_last_modified(ds_json, "2025-01-21T13:34:50") ds_json <- set_originator(ds_json, "Some Org") ds_json <- set_source_system(ds_json, "source system", "1.0") ds_json <- set_study_oid(ds_json, "SOMESTUDY") ds_json <- set_metadata_ref(ds_json, "some/define.xml") ds_json <- set_metadata_version(ds_json, "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7") ds_json <- set_item_oid(ds_json, "IG.IRIS") ds_json <- set_dataset_name(ds_json, "Iris") ds_json <- set_dataset_label(ds_json, "The Iris Dataset")
Using the columns
element of the Dataset JSON file, assign the available
metadata to individual columns
set_variable_attributes(x)
set_variable_attributes(x)
x |
A datasetjson object |
A datasetjson object with attributes assigned to individual variables
ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) ds_json <- set_variable_attributes(ds_json)
ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) ds_json <- set_variable_attributes(ds_json)
This function calls jsonvalidate::json_validate()
directly, with the
parameters necessary to retrieve the error information of an invalid JSON
file per the Dataset JSON schema.
validate_dataset_json(x)
validate_dataset_json(x)
x |
File path or URL of a Dataset JSON file, or a character vector holding JSON text |
A data frame
## Not run: validate_dataset_json('path/to/file.json') validate_dataset_json('https://www.somesite.com/file.json') ## End(Not run) ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) js <- write_dataset_json(ds_json) validate_dataset_json(js)
## Not run: validate_dataset_json('path/to/file.json') validate_dataset_json('https://www.somesite.com/file.json') ## End(Not run) ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) js <- write_dataset_json(ds_json) validate_dataset_json(js)
Write out a Dataset JSON file
write_dataset_json( x, file, pretty = FALSE, float_as_decimals = FALSE, digits = 16 )
write_dataset_json( x, file, pretty = FALSE, float_as_decimals = FALSE, digits = 16 )
x |
datasetjson object |
file |
File path to save Dataset JSON file |
pretty |
If TRUE, write with readable formatting. Note: The Dataset JSON standard prefers compressed formatting without line feeds. It is not recommended you use pretty printing for submission purposes. |
float_as_decimals |
If TRUE, Convert float variables to "decimal" data
type in the JSON output. This will manually convert the numeric values
using the |
digits |
When using |
NULL when file written to disk, otherwise character string
# Write to character object ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) js <- write_dataset_json(ds_json) # Write to disk ## Not run: write_dataset_json(ds_json, "path/to/file.json") ## End(Not run)
# Write to character object ds_json <- dataset_json( iris, item_oid = "IG.IRIS", name = "IRIS", dataset_label = "Iris", columns = iris_items ) js <- write_dataset_json(ds_json) # Write to disk ## Not run: write_dataset_json(ds_json, "path/to/file.json") ## End(Not run)