Installing CellStore with Docker, Part 7: Additional Configuration and Optimization

These instructions are meant for the first installation only of CellStore. For updates, refer to the update instruction.

Estimated reading time: 4 minutes

When it comes to accessing the data in a unified way it is necessary for CellStore to do some cleansing and alignment work. Also, for optimization it is important to find the right balance between inheritance, i.e. sharing of taxonomy metadata, and redundancy. Both can be configured for optimal use.

Configure well-known Namespaces

In order to make data and metadata of XBRL accessible in a unified way - as it is allowed by CellStore - it is necessary to align prefixes of QNames. For example, the concept us-gaap:Assets might be named usgaap:Assets or US_GAAP:Assets in a different instance document, because the same namespace can be bound to arbitrary prefixes. Hence, the user would have to search for all variations of the concept name, because a core feature of CellStore is to be able to query across different namespaces with the same prefix.

Consequently, it is necessary to align prefixes. e.g. usgaap and US_GAAP should both be aligned to us-gaap. Therefore, CellStore needs to be configured to know which namespaces need to be aligned to which prefix.

In order to configure the prefix alignment mechanism, create a file or append to the file reportix.json in your %CELLSTORE_HOME% directory (where also the reportix.properties can be found). Then add a field knownNamespaces:

{
  "knownNamespaces": [
    {
      "matches": [ "^http://fasb.org/us-gaap/\\d{4}-\\d{2}-\\d{2}$" ],
      "prefix": "us-gaap"
    },
    {
      "exact": [ "http://www.eurofiling.info/xbrl/ext/filing-indicators" ],
      "prefix": "find"
    }
  ]
}

As implied by the example above,

  1. there can be either exact definitions or regex matches
  2. each known namespace can have several actual namespaces mapped to the same prefix

Capturing groups in the regular expression in matches can be used for back-reference (with $<capturing group id>) in the prefix field, e.g. to match the gaap in the us-gaap namespace and resolve the prefix to us-gaap:

{
  "knownNamespaces": [
    {
      "matches": [ "^http://fasb.org/us-([a-z]+)/.*$" ],
      "prefix": "us-$1"
    }
  ]
}

CellStore maps namespaces that it knows from the config to prefixes. As a matter of priority, exact matches always have higher priority over regex matches. In case multiple definitions match in the same category (either exact or matches), the first match in document order in this category is taken. If it does not know a namespace it uses the prefix as-is. This can of course lead to confusions if different prefixes are used for one particular namespace. In order to prevent this from happening unnoticed, you might want to enable a strict mode in which CellStore triggers an error if a namespace cannot be matches with any known namespace. This can be achieved by adding the following to the reportix.properties file:

REQUIRE_ALL_NAMESPACES_TO_BE_KNOWN=true

All prefixes can also be forced to be lower case. If you prefer this behavior add the following to the reportix.properties file:

ALIGN_PREFIXES_TO_LOWER_CASE=true

Configure Core DTS

XBRL is organized in a modularizable and extensible nature. Therefore, different taxonomy maintainers evolve and publish taxonomy sets, for example the IFRS foundation publishes the IFRS taxonomy. This allows other taxonomy creators to import and extend those reusable taxonomies (Called “Core DTS”). By default, CellStore imports these core DTS for each extending DTS importing those. This means in many cases, that a lot of data is imported multiple times, especially in case of e.g. Edinet (JP-GAAP), SEC (US-GAAP, IFRS), or ESMA (IFRS) filings - all of which allow full extensibility.

Hence, Core DTSs can be configured in CellStore that are only imported once and then shared across all extending DTSs. Every DTS importing a configured core DTS will not physically import it again and therefore, reduce storage of duplicated items.

In order to reduce redundancy and improve performance, a reportix.json file can be created to instruct CellStore which DTS should be considered to be a core DTS. By doing so, a balanced mix of normalization and denormalization can be chosen for optimal perfomance. Please store the reportix.json file in %CELLSTORE_HOME% directory (where also the reportix.properties can be found).

Identifiying core DTSs can most easily be described by example. Note that DTSs are identified by their URL which can be matched by regex:

{
  "coreDts": {
      "jpcrp_cor": {
          "matches": "http://disclosure.edinet-fsa.go.jp/taxonomy/jpcrp/(\\d{4}-\\d{2}-\\d{2})/jpcrp_cor_\\1.xsd"
      },
      "jppfs_cor": {
          "matches": "http://disclosure.edinet-fsa.go.jp/taxonomy/jppfs/(\\d{4}-\\d{2}-\\d{2})/jppfs_cor_\\1.xsd"
      },
      "jpdei_cor": {
          "matches": "http://disclosure.edinet-fsa.go.jp/taxonomy/jpdei/(\\d{4}-\\d{2}-\\d{2})/jpdei_cor_\\1.xsd"
      }
  }
}
cellstore, configure, configuration, documentation, namespaces, prefixes, core-taxonomies, inheritance