...
First, a brief summary of the differences between URL, URI, and IRI. A Uniform Resource Locator (URL) specifies the location of a specific resource, so it is used for localization. This includes addresses, i.e., references to content that may change constantly, such as an HTML page or a database. According to RFC 1738 [1] from 1994, only a subset of 60 US-ASCII characters is allowed for a URL, which represents a significant restriction for internationalized applications today. The term base URL is used to divide an address with a path into several sub-addresses, for example, http://my.baseurl.com/cat1/func2
. The https scheme instead of http for the URL has a special meaning because it causes the transfer to be encrypted, usually between the web server and the browser.
The Uniform Resource Identifier (URI) is used to uniquely identify specific resources globally. On the web, these are, for example, specific files or interfaces to services. They consist of a URL followed by a file or service name, optionally with additional query parameters or hash values, such as http://my.domain/db/customers?id=7
or http://my.domain/db/product#46510
; URLs are therefore a subset of URIs. According to RFC 3986 [2], URIs are subject to the same character set restrictions as URLs.
Remaining as the third in the group is the IRI, the Internationalized Resource Identifier. It was standardized by the IETF (Internet Engineering Task Force) in 2005 to meet modern requirements for global use. Like the URI, it is used to uniquely identify a specific resource worldwide. It serves the same purpose as the URI but without the restriction on the ASCII character set. According to RFC 3987, all UTF-8 characters are allowed for IRIs with a few exceptions [3]. In the context of the semantic web, IRIs are composed of a namespace, a separator, and a so-called local name. The #-character has established itself as the de facto standard, even though a few others are still allowed. Analogous to URL and URI, the namespace should actually be called IRL, i.e., International Resource Locator, but this term has never caught on. A valid IRI for a resource in an ontology is, for example: http://ont.enapso.com/erp#product_1
The namespace here is http://ont.enapso.com/erp#
and the local name is product_1
, referred to simply as name or identifier in the further course of this article. In contrast to URLs, the https scheme is not only uncommon for IRIs, but due to the purely identificational character, it also does not cause any encryption - neither of the transport nor of the content.
...
For the interoperability of ontologies, it is considered good practice to use only those namespaces for which you are also the official domain owner. In the example ontology for this text, it could be, for example, http://ont.enapso.com/[...]
. Although the validity of the namespaces is currently not checked by name servers (DNS) or by an official institution when publishing ontologies, you risk errors in applications or with users by using namespaces such as http://foo.bar/
because the resources can no longer be uniquely referenced globally and can therefore not be used in internationally composed knowledge graphs.
...
The abbreviation RDFS stands for Resource Description Framework Schema \ [5\], a semantic extension of the RDF vocabulary specifically for data modeling. It contains mechanisms especially for grouping resources and their relationships to each other. It is comparable to the class system known from object-oriented programming (OOP), but with an essential and important extension: While in OOP it is defined which properties a class has and may have, an RDF schema can specify for properties to which class an individual is automatically assigned if it contains or references the property in question. This includes support for subClassOf and subPropertyOf to organize classes and properties hierarchically. RDFS plus, finally, is an extended version of RDFS that supports symmetric, inverse, and transitive properties. These new concepts, in contrast to OOP, will be explained in more detail below. Many of the RDFS and RDFS plus components are also part of the even more expressive Web Ontology Language (OWL).
...
The abbreviation OWL stands for Web Ontology Language \ [6\]. The language was specially designed for the semantic web to represent knowledge about objects and classes (as groups of objects) and their relationships to each other. Ontologies are based on RDF and OWL and can be read and modified with SPARQL as a query language. Reasoners support OWL.
...
An explicit assignment of an individual to one or more classes is done via type statements. Here, corresponding statements per individual determine its memberships:
enapsodnp:Max rdf:type enapsodnp:Person
enapsodnp:Max rdf:type enapsodnp:Freelancer
enapsodnp:Max rdf:type enapsodnp:Developer
For graph databases with many instances without extensive semantics and the need to process these even without a reasoner with simple CRUD operations (Create, Read, Update, and Delete), this is already a sufficient and practicable approach. A great advantage of this type of modeling is that, referring to the above example, all persons, but alternatively also all freelancers or all developers can now be queried very easily with a single command. As a further advantage, you can also query all individuals of the class Personnel. For this, however, you need the reasoner with RDFS support. RDFS can specify the property subClassOf and thus allows the creation of class hierarchies or taxonomies:
enapsodnp:Employee rdfs:subClassOf enapsodnp:Personnel
enapsodnp:Freelancer rdfs:subClassOf enapsodnp:Personnel
An important realization from this is that it is now not necessary to explicitly define for each person that he or she is a member of the class Personnel, but that this is done implicitly and only once via the central subClassOf definition within the taxonomy. This task is performed by the reasoner. It uses two independent pieces of information, namely "Max is a member of Employee" and "Employee is a subclass of Personnel", and logically concludes: Max is a member of Personnel. This conclusion is called inference and represents one of the strengths of semantic databases. To efficiently query the knowledge, the database internally generates temporary triples and also manages them. This is also the reason why semantic graph databases require more memory than pure triple stores when inference is used intensively. However, exports can be performed with or without the inferred triples, for example, to convert OWL2 ontologies with reasoning support into simple RDF graphs without losing the automatically generated additional information. But you don't have to worry about managing this information, the database with the reasoner takes care of it automatically.
...
Another special feature of semantic graph databases is the implicit classification of individuals using properties. Like all information in a graph, properties are also mapped via RDF triples, for example in the following form:
enapsodnp:EbnerVerlag rdf:type enapsodnp:Company
enapsodnp:SemanticDatabases rdf:type enapsodnp:Document
enapsodnp:EbnerVerlag enapsodnp:publishedArticle enapsodnp:SemanticDatabases
The individual EbnerVerlag is a member of the class Company and the individual SemanticDatabases belongs to the class Document. Suppose there were also the classes Publisher and Article. The property publishedArticle can be used to classify the subject, here EbnerVerlag, as well as the object, here SemanticDatabases. In RDF Schema, the so-called domain and range axioms were introduced for this purpose. Axioms describe facts, so the reasoner also considers them independently of the Open World Assumption. In particular, the range axiom is often confused with a value restriction, but this is not about restrictions or validations, but about the classification of individuals. While the domain axiom controls the classification of the subject, the range axiom determines the classification of the object. Here is an example:
enapsodnp:publishedArticle rdfs:domain enapsodnp:Publisher
enapsodnp:publishedArticle rdfs:range enapsodnp:Article
The first triple states that any subject that uses the property publishedArticle is a Publisher (axiom). The second triple expresses that any object referenced by this property is an Article. Through the statement, EbnerVerlag is thus implicitly a member of the class Publisher and SemanticDatabases a member of the class Article:
enapsodnp:EbnerVerlag enapsodnp:publishedArticle enapsodnp:SemanticDatabases
Conversely, a query for Publisher also returns EbnerVerlag and a query for Article also SemanticDatabases, although this was not explicitly defined for either of the two individuals. What on the one hand is an extremely useful and welcome feature - after all, this saves many redundant declarations and thus ultimately a lot of maintenance effort - harbors a certain danger on the other hand. If, for example, the statement enapsodnp:EbnerVerlag enapsodnp:publishedArticle enapsodnp:Max is made, the reasoner automatically infers that Max is an Article. So a certain amount of care should be taken here. Since the reasoner does not automatically identify such semantic errors, they are difficult to identify and fix later in extensive ontologies. The Shapes Constraints Language (SHACL [7]) based on RDF graphs is a suitable tool to uncover type violations, among other things.
...
While object properties are descriptions of relationships between individuals, data properties describe certain characteristics of a particular individual. They are comparable to the data fields of an object in OOP or value columns from SQL databases. In an RDF graph, data properties - like all other statements - are represented by triples. Example:
enapsodnp:EbnerVerlag enapsodnp:companyName "Ebner Media Group GmbH & Co. KG"^^xsd:String
...
In OWL2, a variety of data types are available for data properties [8]. The numeric ones include the following:
owl:real
owl:rational
xsd:decimal
xsd:integer
xsd:nonNegativeInteger
xsd:nonPositiveInteger
xsd:positiveInteger
xsd:negativeInteger
xsd:long
xsd:int
xsd:short
xsd:byte
xsd:unsignedLong
xsd:unsignedInt
xsd:unsignedShort
xsd:unsignedByte
xsd:double
xsd:float
The string types include the following variants:
xsd:string
xsd:normalizedString
xsd:token
xsd:language
xsd:Name
xsd:NCName
xsd:NMTOKEN
Boolean, binary, and time types:
xsd:Boolean
xsd:hexBinary
xsd:base64Binary
xsd:dateTime
xsd:dateTimeStamp
The character sequence Hello World would be 48656c6c6f20576f726c64 in hexBinary format and SGVsbG8gV29ybGQ= in base64. The two date/time types are noted in ISO8601 format [9]. The difference between the two is that for dateTime the specification of the time zone is optional, but for dateTimeStamp it is required. A valid dateTime value is, for example, 2020-02-15T12:45:00Z, as a dateTimeStamp value it is 2020-02-15T12:45:00-05:00. In OWL, it is also possible to create your own data types. A follow-up article will go into more detail on this topic and the benefits for applications.
...