TVD Configuration ================= .. contents:: The Tabular Vocabulary Definition CSV file must have a column headed **Type** and a column headed **URI** which contain information on the type of RDF term being defined and its URI. A configuration file is used to determine how the entries under each column heading are processed. This section describes the default config file and how to customize it for specific needs. Introduction ------------ The transformation of a Tabular Vocabulary Definition CSV file into RDF requires a YAML config file that has information about how column headings and entries should be handled. For example, entries under the **Type** column may be words such as "class", "property", "concept scheme" etc, and the entries in the **URI** column may be compact URIs (cURIs) such as "dct:title". In this case it is necessary to state that the types "class", "property" and "concept scheme" map to the URIs used for those types in RDF, for example ``rdf:Property``, ``rdfs:Class`` and ``skos:ConceptScheme``. It is also necessary to map each prefix used in the cURIs for the mapped types and the terms being described to their correct URI stems, for example, ``rdf`` maps to ``http://www.w3.org/1999/02/22-rdf-syntax-ns#``, ``dct`` maps to ``http://purl.org/dc/terms/`` and so on. TVD2RDF comes with a default configuration file which is used if no other option is present. If a file called ``config.yaml`` is present in the working directory when ``tvd convert`` is run, then it is used in preference to the default. It is possible to use an alternative file by passing the ``-c`` argument from ther command line: ``tvd convert -c «your-config-filename»``. A CSV file containing namespace prefix and base URIs can be used to supplement namespace information in the YAML configuration. .. note:: TVD2RDF generally normalizes column headings and some values in a TVD so that variations in case and white space are ignored. Thus the values "Concept Scheme", "concept scheme", "ConceptScheme" would all be equilant. Likewise, column headers such as "Related Term", "Related term", "related_term" and "Related-term" would all be equivalent. This is to maximize the readability of the TVD. It means that the use of case, white sapce and the characters - and _ to distinguish between values must be avoided. The Config File --------------- The command ``tvd config`` can be used display and save default configuration information for editing: .. code-block:: console (venv) $ tvd config config.yaml Config written to file config.yaml. If no filename is used, the configuration YAML is displayed in the terminal. The default config YAML is as follows: .. literalinclude:: ../../src/tvd2rdf/config.yaml :language: yaml :linenos: .. Note:: The use of quoted keys and values is optional, but useful as the value for ``splitters`` contains the new-line character [``\n``] Taking this a section at a time: namespaces ~~~~~~~~~~~ The first block is a simple listing of namespace prefix and URI pairs used to expand compact URIs (cURIs) of the form rdf:Property into the full URI. It is necessary that namespace entries are present for every cURI used in the config.yaml file. Namespace information for values entered into the TVD may be added to a custom config.yaml file (if not already present in the default). .. note:: A supplementary list of prefixes and URIs for namespaces used in a project may be maintained as a CSV file with two columns, one headed **prefix** the other headed **URI**. Unlike namespaces added to config.yaml, which over-write the default, these namespace entries will be added to those in the config.yaml file used. This may be used with the ``-ns`` option when running ``tvd convert``, e.g. ``$ tvd convert -ns namespaces.csv terms.csv``. This is useful when wanting to use the same local or default config file for projects with different namespace requirements and when wanting to maintain the list of namespaces references in the same tools as used for editing the TVD. Fields Map ~~~~~~~~~~ This section has the information that drives much of the translation. For every column heading present, except for **Type** and **URI**, the URI for the property that will be used in the RDF term definition is provided and the type of value that will be used (one of ``annotation``, ``datatype``, ``object`` or ``other``). .. warning:: It is likely that a future version of TVD2RDF will changes the list of recognized types of values to ``literal``, ``object`` and ``other``, as these are the distinctions that matter. It is also likely that the key used to specify this in YAML will change from ``type`` to ``value type`` in order to avoid confusion with the mandatory column heading **type**. The fields map is divided into sections depending on the type of term that is being defined. The first section has entries that are used for all types of term. The there is a section that is used when a "class" is being defined, follwed by sections for "concept", "concept scheme", "ontology" and "property" These types must correspond to the entries in the final section, the ``types map``, see below. So, looking at the default configuration YAML, we can see that lines 3—5 (part of the ``all`` section) specifies that a column headed **comment** will be translated into a value for the ``rdfs:comment`` property for all types of term being defined. On the other hand, there are several entries for **label** : see lines 28—30 in the ``class`` section; lines 35—37 in the ``concept`` section; lines 42—44 in the ``concept scheme`` section, and so on. This means that a column headed label will be translated using different RDF properties depending on the type of term being defined. For a class, the ``rdfs:label`` property will be used; for a concept ``skos:prefLabel`` will be used; for a concept scheme ``dcterms:title`` will be used, and so on. The ability to use different RDF properties for similar attributes of different types of RDF term is useful in reducing the number of columns in a TVD that describes multiple types of term. The default values may all be varied. For example ``label`` could be listed in ``all`` section and set to ``rdf:label`` were always used, and other entries such as ``title`` and ``pref label`` could be added where appropriate. The ability to vary the column headings can be useful in making the intent of the columns more obvious (especially to those who understand the properties used to describe various attribures of RDF terms). The entry against ``type`` for each field is used to determine whether the value under that heading in a TVD is encoded as an Literal or URI value, or in the case of the **Relationship**, **Related-term** pair of columns, processed in some other way. Relationships map ~~~~~~~~~~~~~~~~~~ RDF terms defined in a TVD may be related to other terms, for example through a statement using the ``rdfs:subPropertyOf`` or ``rdfs:subClassOf`` or any of the ``skos:`` concept-to-concept relationships. The related term is entered in a column headed **Related Term**, and the nature of the relationship is entered in a column headed **Relationship**. The entries in the Relationship column should be simple key words that are mapped to the URI for the desired relationship in this block of the config. Any prefixes used in the URI values of this section must appear in the ``namespaces`` section of the configuration YAML file. Types Map ~~~~~~~~~~~ This is a mapping of names used for different types of RDF term that may be described to the URIs for those types. The names mapped here may be used in the **Type** column of a TVD and, optionally, for a block in the ``fields map`` section of the configuration YAML file. Any prefixes used in the URI values of this section must appear in the ``namespaces`` section of the configuration YAML file. Splitters ~~~~~~~~~~ Multiple entries of URIs in a cell can indicate multiple values for the relevant attribute if separated by one of the characters in this string, excepting `|` which is used to separate the charaters listed. For example if a Property is defined as being a subPropertyOf of several others, the URIs for the super properties could be listed in the Related Term column each on a new line. .. note:: Providing multiple values only works where the entry in a column is a URI. This is because none of separators can be guaranteed not to be contained in a Literal value. .. warning:: The format for value for this entry may change to a list in a future release.