TVD Configuration

The Tabular Vocabulary Definition CSV file must have a column headed Type and a column headed URI which contain information on the type of RDF term being defined and its URI. A configuration file is used to determine how the entries under each column heading are processed. This section describes the default config file and how to customize it for specific needs.

Introduction

The transformation of a Tabular Vocabulary Definition CSV file into RDF requires a YAML config file that has information about how column headings and entries should be handled. For example, entries under the Type column may be words such as “class”, “property”, “concept scheme” etc, and the entries in the URI column may be compact URIs (cURIs) such as “dct:title”. In this case it is necessary to state that the types “class”, “property” and “concept scheme” map to the URIs used for those types in RDF, for example rdf:Property, rdfs:Class and skos:ConceptScheme. It is also necessary to map each prefix used in the cURIs for the mapped types and the terms being described to their correct URI stems, for example, rdf maps to http://www.w3.org/1999/02/22-rdf-syntax-ns#, dct maps to http://purl.org/dc/terms/ and so on.

TVD2RDF comes with a default configuration file which is used if no other option is present. If a file called config.yaml is present in the working directory when tvd convert is run, then it is used in preference to the default. It is possible to use an alternative file by passing the -c argument from ther command line: tvd convert -c «your-config-filename». A CSV file containing namespace prefix and base URIs can be used to supplement namespace information in the YAML configuration.

Note

TVD2RDF generally normalizes column headings and some values in a TVD so that variations in case and white space are ignored. Thus the values “Concept Scheme”, “concept scheme”, “ConceptScheme” would all be equilant. Likewise, column headers such as “Related Term”, “Related term”, “related_term” and “Related-term” would all be equivalent.

This is to maximize the readability of the TVD. It means that the use of case, white sapce and the characters - and _ to distinguish between values must be avoided.

The Config File

The command tvd config can be used display and save default configuration information for editing:

(venv) $ tvd config config.yaml
Config written to file config.yaml.

If no filename is used, the configuration YAML is displayed in the terminal.

The default config YAML is as follows:

 1---
 2namespaces:
 3  dcterms: http://purl.org/dc/terms/
 4  dct: http://purl.org/dc/terms/
 5  dc11: http://purl.org/dc/elements/1.1/
 6  ex: https://example.org/terms#
 7  ex2: https://example.org/moreterms#
 8  owl: http://www.w3.org/2002/07/owl#
 9  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
10  rdfs: http://www.w3.org/2000/01/rdf-schema#
11  sdo: https://schema.org/
12  skos: http://www.w3.org/2004/02/skos/core#
13  xsd: http://www.w3.org/2001/XMLSchema#
14
15fields map:
16  all:
17    type:
18      uri: rdfs:type
19      type: other
20    uri:
21      uri: URIRef
22      type: other
23    notation:
24      uri: skos:notation
25      type: annotation
26    usage:
27      uri: skos:scopeNote
28      type: annotation
29    comment:
30      uri: rdfs:comment
31      type: annotation
32    relatedTerm:
33      uri: rdf:object
34      type: other
35    relationship:
36      uri: rdf:predicate
37      type: other
38    date:
39      uri: dcterms:date
40      type: datatype
41  ontology:
42    label:
43      uri: rdfs:label
44      type: annotation
45  class:
46    label:
47      uri: rdfs:label
48      type: annotation
49  property:
50    label:
51      uri: rdfs:label
52      type: annotation
53    definition:
54      uri: rdfs:comment
55      type: annotation
56    domainIncludes:
57      uri: sdo:domainIncludes
58      type: object
59    rangeIncludes:
60      uri: sdo:rangeIncludes
61      type: object
62  concept scheme:
63    label:
64      uri: dcterms:title
65      type: datatype
66    definition:
67      uri: dcterms:description
68      type: datatype
69  concept:
70    label:
71      uri: skos:prefLabel
72      type: annotation
73    definition:
74      uri: skos:definition
75      type: annotation
76
77relationships map:
78  broader: skos:broader
79  broadMatch: skos:broadMatch
80  closeMatch: skos:closeMatch
81  hasTopConcept: skos:hasTopConcept
82  inScheme: skos:inScheme
83  narrower: skos:narrower
84  narrowMatch: skos:narrowMatch
85  topConceptOf: skos:topConceptOf
86
87types map:
88  Property: rdf:Property
89  Class: rdfs:Class
90  Ontology: owl:Ontology
91  Concept Scheme: skos:ConceptScheme
92  Concept: skos:Concept
93
94splitters: ",\n|;\n|\n|,|;" # chars used to separate multiple entries in a cell.

Note

The use of quoted keys and values is optional, but useful as the value for splitters contains the new-line character [\n]

Taking this a section at a time:

namespaces

The first block is a simple listing of namespace prefix and URI pairs used to expand compact URIs (cURIs) of the form rdf:Property into the full URI.

It is necessary that namespace entries are present for every cURI used in the config.yaml file.

Namespace information for values entered into the TVD may be added to a custom config.yaml file (if not already present in the default).

Note

A supplementary list of prefixes and URIs for namespaces used in a project may be maintained as a CSV file with two columns, one headed prefix the other headed URI. Unlike namespaces added to config.yaml, which over-write the default, these namespace entries will be added to those in the config.yaml file used.

This may be used with the -ns option when running tvd convert, e.g. $ tvd convert -ns namespaces.csv terms.csv.

This is useful when wanting to use the same local or default config file for projects with different namespace requirements and when wanting to maintain the list of namespaces references in the same tools as used for editing the TVD.

Fields Map

This section has the information that drives much of the translation. For every column heading present, except for Type and URI, the URI for the property that will be used in the RDF term definition is provided and the type of value that will be used (one of annotation, datatype, object or other).

Warning

It is likely that a future version of TVD2RDF will changes the list of recognized types of values to literal, object and other, as these are the distinctions that matter.

It is also likely that the key used to specify this in YAML will change from type to value type in order to avoid confusion with the mandatory column heading type.

The fields map is divided into sections depending on the type of term that is being defined. The first section has entries that are used for all types of term. The there is a section that is used when a “class” is being defined, follwed by sections for “concept”, “concept scheme”, “ontology” and “property” These types must correspond to the entries in the final section, the types map, see below.

So, looking at the default configuration YAML, we can see that lines 3—5 (part of the all section) specifies that a column headed comment will be translated into a value for the rdfs:comment property for all types of term being defined. On the other hand, there are several entries for label : see lines 28—30 in the class section; lines 35—37 in the concept section; lines 42—44 in the concept scheme section, and so on. This means that a column headed label will be translated using different RDF properties depending on the type of term being defined. For a class, the rdfs:label property will be used; for a concept skos:prefLabel will be used; for a concept scheme dcterms:title will be used, and so on.

The ability to use different RDF properties for similar attributes of different types of RDF term is useful in reducing the number of columns in a TVD that describes multiple types of term.

The default values may all be varied. For example label could be listed in all section and set to rdf:label were always used, and other entries such as title and pref label could be added where appropriate.

The ability to vary the column headings can be useful in making the intent of the columns more obvious (especially to those who understand the properties used to describe various attribures of RDF terms).

The entry against type for each field is used to determine whether the value under that heading in a TVD is encoded as an Literal or URI value, or in the case of the Relationship, Related-term pair of columns, processed in some other way.

Relationships map

RDF terms defined in a TVD may be related to other terms, for example through a statement using the rdfs:subPropertyOf or rdfs:subClassOf or any of the skos: concept-to-concept relationships. The related term is entered in a column headed Related Term, and the nature of the relationship is entered in a column headed Relationship. The entries in the Relationship column should be simple key words that are mapped to the URI for the desired relationship in this block of the config.

Any prefixes used in the URI values of this section must appear in the namespaces section of the configuration YAML file.

Types Map

This is a mapping of names used for different types of RDF term that may be described to the URIs for those types. The names mapped here may be used in the Type column of a TVD and, optionally, for a block in the fields map section of the configuration YAML file.

Any prefixes used in the URI values of this section must appear in the namespaces section of the configuration YAML file.

Splitters

Multiple entries of URIs in a cell can indicate multiple values for the relevant attribute if separated by one of the characters in this string, excepting | which is used to separate the charaters listed. For example if a Property is defined as being a subPropertyOf of several others, the URIs for the super properties could be listed in the Related Term column each on a new line.

Note

Providing multiple values only works where the entry in a column is a URI.

This is because none of separators can be guaranteed not to be contained in a Literal value.

Warning

The format for value for this entry may change to a list in a future release.