Bulkload data from an XBRL repository, Part 2: Metadata and Storage of XBRL Repository
Estimated reading time: 4 minutes- 1: Introduction and Installation
- 2: Metadata and Storage
- 3: Configure Local File System Repository
- 4: Configure AWS S3 Repository
- 5: Run CellStore ETL Tool
Part 2: Metadata and Storage of XBRL Repository
How to Store Archives
An archive either contains an XBRL instance or DTS.
Hence, for each instance or DTS a separate zip archive needs to be created.
Optionally, each archive (e.g. instance-b50-dpm-test.zip) can have an accompanying Json metadata file sharing the same base name and location (i.e. instance-b50-dpm-test.json)).
The metadata file contains additional information about the instance or DTS that are relevant for the import.
Organize the Storage
You can maintain an XBRL data repository on:
The folder structure within the storage option is very flexible. Hence, you can organize / store archives grouped in subfolders as you wish. When you are actually running the ETL tool you will be able to cherry pick only certain subfolders to import. Therefore it makes sense to organize the archives in some way (for example by certain instance types or periods).
How the ETL Tool Works
The ETL Tool will recursively traverse through the storage structure (limited to an optional prefix).
For each zip archive found the tool will then look for a Json metadata file with the same name.
If a metadata file is found it will be loaded.
Then the tool will import the archive either as DTS (in case a dts field is found in the metadata) or as instance (in case an instance field or no field at all is found in the metadata).
In case no metadata file is present for an archive, the ETL tool will treat the archive as an instance archive. Also the parameters will be autopopulated for both instance and DTL imports the same way the Rest API would:
- Use a UUID, the instance file name or the archive file name as
instanceId(behavior depends on theDEFAULT_INSTANCE_IDsetting in thereportix.properties) - Use the current timestamp as
version - Use the default system locale
- Use the default base url as defined in the reportix.properties
- By default automatically import the DTS with the instance
Maintain Metadata for each Archive
Each instance xbrl file plus optional extension taxonomies needs to be archived in a single zip archive.
Accompanying the zip archive you might want to store additional metadata for each filing to set the parameters for the import.
This metadata file needs to reside in the same subfolder and have the same file name, except for the file ending.
For example, if you have a zip archive containing an XBRL archive with the name 10-K/2013/aapl-20130928.zip then you would need to store the metadata in the 10-K/2013/aapl-20130928.json.
The metadata file contains the same parameters as documented for the according Rest API import endpoints:
If the metadata file contains a field named dts the archive will be treated as dts import.
In all other cases, the zip file will be imported as XBRL instance.
For example, an instance XBRL Archive would contain:
{
"instance": {
"instanceId": "000119312513416534",
"version": "2013-10-30",
"baseUrl": "https://www.sec.gov/Archives/edgar/data/320193/000119312513416534/",
"locale": "en_US",
"importDts": true
}
}
An example Metadata file for importing a DTS:
{
"dts": {
"entryPointUrl": [
"http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/its-2016-repxx/2016-02-01/mod/corep_lcr_ind.xsd",
"http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/its-2016-repxx/2016-02-01/mod/corep_alm_con.xsd"
],
"version": "20170816_etl",
"baseUrl": "http://data.test.reportix.com.s3-eu-central-1.amazonaws.com/dpm/dts/",
"locale": "en_GB"
}
}
Maintain Directory-Wide Metadata for Several Archive
Usually, instance or DTS archives can be organized in semantic groups. Accordingly, these archives should be physically organized in a repository (see sections above), e.g. grouped in directories and subdirectories. Commonly, for each group of archives some metadata, such as the plugins or locale settings can be applied to all archives in the group. For this reason the CellStore ETL tool allows directory-wide metadata settings.
Metadata specified in a file named reportix.json will be applied to all the archives in a directory and its subdirectories (No matter whether its an instance or dts import).
If a subdirectory also contains a file called reportix.json and this file contains a field with the same name as a parent directory reportix.json, then the setting in the subdirectory overrides the setting in the parent directory.
Accordingly, also the instance specific metadata overrides directory-wide settings.
Here is an example reportix.json file:
{
"baseUrl": "http://reportix.com/dpm/",
"locale": "en_GB",
"plugin": "dpm.eba"
}