TVD Configuration
The Tabular Vocabulary Definition CSV file must have a column headed Type and a column headed URI which contain information on the type of RDF term being defined and its URI. A configuration file is used to determine how the entries under each column heading are processed. This section describes the default config file and how to customize it for specific needs.
Introduction
The transformation of a Tabular Vocabulary Definition CSV file into RDF
requires a YAML config file that has information about how column headings
and entries should be handled. For example, entries under the Type
column may be words such as “class”, “property”, “concept scheme” etc, and
the entries in the URI column may be compact URIs (cURIs) such as
“dct:title”. In this case it is necessary to state that the types “class”,
“property” and “concept scheme” map to the URIs used for those types in
RDF, for example rdf:Property, rdfs:Class and skos:ConceptScheme.
It is also necessary to map each prefix used in the cURIs for the mapped
types and the terms being described to their correct URI stems, for
example, rdf maps to http://www.w3.org/1999/02/22-rdf-syntax-ns#,
dct maps to http://purl.org/dc/terms/ and so on.
TVD2RDF comes with a default configuration file which is used if no other
option is present. If a file called config.yaml is present in the working
directory when tvd convert is run, then it is used in preference to the
default. It is possible to use an alternative file by passing the -c
argument from ther command line: tvd convert -c «your-config-filename».
A CSV file containing namespace prefix and base URIs can be used to supplement
namespace information in the YAML configuration.
Note
TVD2RDF generally normalizes column headings and some values in a TVD so that variations in case and white space are ignored. Thus the values “Concept Scheme”, “concept scheme”, “ConceptScheme” would all be equilant. Likewise, column headers such as “Related Term”, “Related term”, “related_term” and “Related-term” would all be equivalent.
This is to maximize the readability of the TVD. It means that the use of case, white sapce and the characters - and _ to distinguish between values must be avoided.
The Config File
The command tvd config can be used display and save default configuration
information for editing:
(venv) $ tvd config config.yaml
Config written to file config.yaml.
If no filename is used, the configuration YAML is displayed in the terminal.
The default config YAML is as follows:
1---
2namespaces:
3 dcterms: http://purl.org/dc/terms/
4 dct: http://purl.org/dc/terms/
5 dc11: http://purl.org/dc/elements/1.1/
6 ex: https://example.org/terms#
7 ex2: https://example.org/moreterms#
8 owl: http://www.w3.org/2002/07/owl#
9 rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
10 rdfs: http://www.w3.org/2000/01/rdf-schema#
11 sdo: https://schema.org/
12 skos: http://www.w3.org/2004/02/skos/core#
13 xsd: http://www.w3.org/2001/XMLSchema#
14
15fields map:
16 all:
17 type:
18 uri: rdfs:type
19 type: other
20 uri:
21 uri: URIRef
22 type: other
23 notation:
24 uri: skos:notation
25 type: annotation
26 usage:
27 uri: skos:scopeNote
28 type: annotation
29 comment:
30 uri: rdfs:comment
31 type: annotation
32 relatedTerm:
33 uri: rdf:object
34 type: other
35 relationship:
36 uri: rdf:predicate
37 type: other
38 date:
39 uri: dcterms:date
40 type: datatype
41 ontology:
42 label:
43 uri: rdfs:label
44 type: annotation
45 class:
46 label:
47 uri: rdfs:label
48 type: annotation
49 property:
50 label:
51 uri: rdfs:label
52 type: annotation
53 definition:
54 uri: rdfs:comment
55 type: annotation
56 domainIncludes:
57 uri: sdo:domainIncludes
58 type: object
59 rangeIncludes:
60 uri: sdo:rangeIncludes
61 type: object
62 concept scheme:
63 label:
64 uri: dcterms:title
65 type: datatype
66 definition:
67 uri: dcterms:description
68 type: datatype
69 concept:
70 label:
71 uri: skos:prefLabel
72 type: annotation
73 definition:
74 uri: skos:definition
75 type: annotation
76
77relationships map:
78 broader: skos:broader
79 broadMatch: skos:broadMatch
80 closeMatch: skos:closeMatch
81 hasTopConcept: skos:hasTopConcept
82 inScheme: skos:inScheme
83 narrower: skos:narrower
84 narrowMatch: skos:narrowMatch
85 topConceptOf: skos:topConceptOf
86
87types map:
88 Property: rdf:Property
89 Class: rdfs:Class
90 Ontology: owl:Ontology
91 Concept Scheme: skos:ConceptScheme
92 Concept: skos:Concept
93
94splitters: ",\n|;\n|\n|,|;" # chars used to separate multiple entries in a cell.
Note
The use of quoted keys and values is optional, but useful as the value
for splitters contains the new-line character [\n]
Taking this a section at a time:
namespaces
The first block is a simple listing of namespace prefix and URI pairs used to expand compact URIs (cURIs) of the form rdf:Property into the full URI.
It is necessary that namespace entries are present for every cURI used in the config.yaml file.
Namespace information for values entered into the TVD may be added to a custom config.yaml file (if not already present in the default).
Note
A supplementary list of prefixes and URIs for namespaces used in a project may be maintained as a CSV file with two columns, one headed prefix the other headed URI. Unlike namespaces added to config.yaml, which over-write the default, these namespace entries will be added to those in the config.yaml file used.
This may be used with the -ns option when running tvd convert,
e.g. $ tvd convert -ns namespaces.csv terms.csv.
This is useful when wanting to use the same local or default config file for projects with different namespace requirements and when wanting to maintain the list of namespaces references in the same tools as used for editing the TVD.
Fields Map
This section has the information that drives much of the translation. For
every column heading present, except for Type and URI, the URI for
the property that will be used in the RDF term definition is provided and
the type of value that will be used (one of annotation, datatype,
object or other).
Warning
It is likely that a future version of TVD2RDF will changes the list
of recognized types of values to literal, object and other,
as these are the distinctions that matter.
It is also likely that the key used to specify this in YAML will
change from type to value type in order to avoid confusion
with the mandatory column heading type.
The fields map is divided into sections depending on the type of term that
is being defined. The first section has entries that are used for all types
of term. The there is a section that is used when a “class” is being defined,
follwed by sections for “concept”, “concept scheme”, “ontology” and “property”
These types must correspond to the entries in the final section, the
types map, see below.
So, looking at the default configuration YAML, we can see that lines 3—5 (part
of the all section) specifies that a column headed comment will be
translated into a value for the rdfs:comment property for
all types of term being defined. On the other hand, there are several entries
for label : see lines 28—30 in the class section; lines 35—37 in the
concept section; lines 42—44 in the concept scheme section, and so on.
This means that a column headed label will be translated using different RDF
properties depending on the type of term being defined. For a class, the
rdfs:label property will be used; for a concept skos:prefLabel will
be used; for a concept scheme dcterms:title will be used, and so on.
The ability to use different RDF properties for similar attributes of different types of RDF term is useful in reducing the number of columns in a TVD that describes multiple types of term.
The default values may all be varied. For example label could be listed in
all section and set to rdf:label were always used, and other entries
such as title and pref label could be added where appropriate.
The ability to vary the column headings can be useful in making the intent of the columns more obvious (especially to those who understand the properties used to describe various attribures of RDF terms).
The entry against type for each field is used to determine whether the
value under that heading in a TVD is encoded as an Literal or URI value,
or in the case of the Relationship, Related-term pair of columns,
processed in some other way.
Relationships map
RDF terms defined in a TVD may be related to other terms, for example
through a statement using the rdfs:subPropertyOf or rdfs:subClassOf
or any of the skos: concept-to-concept relationships. The related
term is entered in a column headed Related Term, and the nature of the
relationship is entered in a column headed Relationship.
The entries in the Relationship column should be simple key words that are
mapped to the URI for the desired relationship in this block of the config.
Any prefixes used in the URI values of this section must appear in the
namespaces section of the configuration YAML file.
Types Map
This is a mapping of names used for
different types of RDF term that may be described to the URIs for
those types. The names mapped here may be used in the Type column
of a TVD and, optionally, for a block in the fields map section of the
configuration YAML file.
Any prefixes used in the URI values of this section must appear in the
namespaces section of the configuration YAML file.
Splitters
Multiple entries of URIs in a cell can indicate multiple values for the relevant attribute if separated by one of the characters in this string, excepting | which is used to separate the charaters listed. For example if a Property is defined as being a subPropertyOf of several others, the URIs for the super properties could be listed in the Related Term column each on a new line.
Note
Providing multiple values only works where the entry in a column is a URI.
This is because none of separators can be guaranteed not to be contained in a Literal value.
Warning
The format for value for this entry may change to a list in a future release.