Semantic Web, Part 3 - From Model to Database
Semantic Web, Part 3: From Model to Database
When creating a model for a graph database, the Protégé tool helps with axioms, data and object properties, individuals, annotations and internationalization. The foundations of the "Semantic Web" were presented already in two previous articles:
Part 1 covered the standards RDF and OWL, class hierarchies, taxonomies, data and object properties with their constraints and semantic metadata, individuals (the actual objects in a semantic graph database), reasoners that ensure data consistency, and inference that enables logical conclusions and machine learning (https://www.linkedin.com/pulse/working-knowledge-innotrade-gmbh-rnq0e/).
Part 2 introduced practical work with ontologies, i.e., creating a semantic data model using the Protégé software. It demonstrated how to manage classes, properties, individuals, and annotations with Protégé, implement schemas and classifications, use restrictions and the reasoner, and import and export models (https://www.linkedin.com/pulse/mastering-knowledge-graphs-getting-started-semantic-modelling-vooce).
This third part focuses on synchronizing these models with a semantic graph database and using them for knowledge management. It continues directly from the second part.
Protégé: A Mature Ontology Modeling Tool
Protégé is a highly mature and one of the most frequently used free tools for modeling ontologies. Maintained by Stanford University, it is freely available on their website (https://protege.stanford.edu ). The software has established itself as a modeling tool whose generated models in various formats are supported by all W3C-compliant semantic databases.
For simple testing purposes, the program offers a plug-in for SPARQL queries. As the name suggests, it can execute SPARQL 1.0-compliant select queries. The plug-in does not (yet) support SPARQL 1.1 operations like insert or delete.
Protégé clearly positions itself as a technical, UI-oriented modeling tool, leaving both operational use and programmatic data management to database and app vendors.
Global Axioms vs. Property Constraints
It's important to know that both Range and Domain axioms (see https://www.linkedin.com/pulse/working-knowledge-innotrade-gmbh-rnq0e) have a global effect when defined directly for properties, with corresponding implications for the ontology.
If, for example, you provide a specific property companyName
with a domain axiom Company
, the reasoner ensures that every individual that uses companyName
automatically becomes a member of the Company
class - without this requiring an explicit class assignment à la
dnp:Company_EbnerVerlag rdf:type dnp:Company
The desired and helpful side of the inference.
If, on the other hand, you assign a generic property name
with a domain axiom Product
, all Individuals that use this property also automatically become members of the Product
class. Consequently, an individuals who uses the name
property would also become a Product
object, which is logically correct but semantically nonsensical.
For generic properties with a general character, one should be cautious with Domain axioms to avoid later surprises through unexpected class assignments. A good practice is to use axioms as a tool for automatic classification of individuals - Range axioms for classifying objects and Domain axioms for classifying subjects - and not for validation or as (type) constraints.
If you want to specify locally for a class that a property should be restricted to a certain data type or value range, define this as a Type/Value Restriction at the class level. The article will go into more detail on this when specifying classes later.
Creating Object Properties
While data properties contain concrete values of different data types, so-called literals, for individuals, object properties reference other individuals. They are the basis for the relationships between the individuals in a graph, for example the relationship between an invoice and the associated customer.
In RDF, all entities, including all individuals and properties, are uniquely identified by their IRIs. The triples for the representation of relations are correspondingly simple. An example:
dnp:Invoice_A dnp:hasCustomer dnp:Customer_B
The object property hasCustomer
links invoice A with customer B. To create such a property, click on the Object properties tab at the bottom of the Protégé UI. Like the data properties, all object properties in OWL are derived from owl:topObjectProperty
. Select owl:topObjectProperty
and click the Add sub property
button. Protégé prompts you to enter the name of the object property (Fig. 1).
Classes - Classification vs. Schemas
The next step after creating the taxonomy, data and object properties is to specify the individual classes with their respective properties and constraints. Previously, we looked at how domain and range axioms are used in ontologies to automatically classify individuals, i.e. to automatically group individuals into members of one or more classes: implicit single or multiple classification.
Coming from the OOP world, however, one is tempted to regard the classes in an ontology not as groups of individuals, but as schemas, as a predefined framework of properties and their constraints, which the instances of the classes must adhere to. In general, however, semantic databases are initially schema-less, which means that each individual can have any properties. Individuals also do not necessarily have to have one or even several classifications, neither explicit ones using rdf:type
nor implicit ones using domain and range axioms.
What is ideal for mapping complex real environments, environments and knowledge domains due to its openness and freedom may seem impractical compared to the tables and collections from the SQL and NoSQL world, as ERP systems, for example, usually have a large number of similar instances such as customers, products or invoices, all of which are subject to their own schemas. However, OWL also offers mechanisms for defining schemas, albeit from a slightly different perspective than perhaps expected.
Specifying Classes and Properties
While traditional schemas say, for example, "A person has a name", ontologies express this as "A person is a subclass of all things that have a name". This expression comes from Description Logic (DL), sounds a bit confusing, but basically means the same thing. Since there is no concrete class Thing
with a name
, it is technically the inheritance of a so-called anonymous class. Because multiple inheritance is an integral part of OWL, defining multiple properties per class according to this concept is not only not a problem, but also an intended and established modeling practice.
Figure 2 shows the Product
class with its data properties. In accordance with the DL/OWL conventions mentioned, these are not to be found in the properties, but in the SubClass Of
area.
The data types are specified as so-called data restrictions (Fig. 3), a kind of type and value constraints. The information on the cardinality (the permitted number) of properties is an additional feature of OWL. Two aspects are important here:
Firstly, the restrictions - in contrast to the domain and range axioms - are applied at class level and not at property level. They therefore do not apply globally, but only to the corresponding class.
Secondly, unlike the axioms, restrictions do not lead to an automatic classification of the individuals of the class. They are primarily descriptive in nature and do not create any new triples, but are taken into account by the reasoner in the consistency checks.
Due to the Open World Assumption, however, this is not entirely comparable with a validation scheme. If a minimum cardinality of 1 is specified in the example and an individual does not have a corresponding property, the reasoner will not recognize this as a violation, because the OWA states that everything that is not explicitly specified is simply unknown, but not necessarily false. The triple for the unknown property could be in a different ontology or database. SHACL (https://www.w3.org/TR/shacl/) can help here; more on this in a subsequent article.
The same applies to the object properties. The important difference is that the restriction filler offers a class in the taxonomy (the class hierarchy) for selection in the right-hand area instead of the data types. The cardinality specification affects the object properties in the same way as the data properties.
Creating Individuals
The classes with their restrictions have been created, now it is the turn of the individuals, the actual data records in the ontology. In Protégé, switch to the Individuals by class tab and to the Direct Instances
area at the bottom left. It shows all existing individuals of the class selected above, in Figure 4 using the example of Product
.
Note that in the dotnetpro sample ontology for this article (https://github.com/innotrade/enapso-dotnetpro ), all products are labeled and the Protégé renderer shows the labels in the screenshot. If the mouse pointer hovers over an article, the tooltip shows its complete IRI.
In contrast to classes and properties, which are unique per ontology, the IRI assignment for a large number of individuals is, as expected, different. While classes and properties are usually given unique and human-readable identifiers, the uniqueness of the IRI per resource would no longer be guaranteed if there were several individuals with possibly the same name. An IRI dnp:AlexanderSchulze
would inevitably lead to conflicts with an individual of the same name in the database.
A good practice for assigning IRIs for individuals is therefore the combination of class name and a UUID, a kind of primary key for the corresponding resource. Fortunately, Protégé supports the generation of IRIs with UUIDs quite conveniently.
To create a new individual, click on the Add
button above the list and then on New Entity Options
. Protégé offers a wizard that can be used to configure the assignment of the IRI (Figure 5).
Next, select the Autogenerated ID
option and enter the desired prefix for the IRI below, in this case Product_
. As soon as you enter a name for the new individual (Fig. 6), Protégé creates a new UUID with every letter you type and appends it to the selected prefix.
As can be seen in the screenshot, Protégé automatically creates a label for the relevant product in this case and displays it accordingly in the user interface. This technology therefore guarantees both the global uniqueness of Individuals and legibility for the administrator.
Adding Annotations
Annotations are statements that can be attached to any entity (class, individual or property) without affecting its semantics. They can be used, for example, for comments, to specify authors or version numbers, but also for translations into different national languages. They are treated like data and can therefore be queried and manipulated via SPARQL. Reasoners do not take annotations into account.
To create an annotation for an entity, select the entity on the left-hand side of the Protégé UI. Its annotations then appear on the right in the tab of the same name. Figure 7 shows an example of the data property purchasePrice
, for which two label and two comment annotations have already been created in German and English.
To create a new annotation, click the Add
button; to change an existing one, use the Edit
button behind the annotation in question. Protégé prompts you to enter the type of annotation and its value. Figure 8 shows the entry of documentation in English for the purchasePrice
property in the form of a comment annotation.
This example makes it clear that metadata directly integrated into ontologies not only simplifies their cross-cultural documentation for developers, but also the maintenance of content for international target groups. OWL also makes it possible to create any number of other annotation types that can also be internationalized.
Collaboration and I18N for Ontologies
One feature of Protégé is particularly useful for collaborating on ontologies in international teams: the rendering of the identifiers of all entities is configurable. For example, you can determine whether you want to display the complete IRI, the named prefix notation or an annotation in the UI - in the case of the annotation, even in which language.
A good practice in terms of documentation and international usability of ontologies is therefore to provide each class, property and individual with at least one comment and one label annotation in English, ideally also in other languages, depending on the distribution of the team or the scope of the target group.
To configure the renderer, select the Renderer
tab under Preferences
in the Protégé main menu and then the Render by annotation property
option. Figure 9 shows the settings dialog.
By clicking on Configure ...
you can then specify which annotation type is to be used to display the interface and which language is to be used (Fig. 10).
The ontology can then be easily "translated" to the desired language, which also makes it very easy to maintain ontologies across cultures. Figure 11 shows an example of how the taxonomy is now rendered in German.
Since most of the identifiers for classes and properties in your ontology are probably recorded in English anyway, the use and maintenance of labels for each entity may seem cumbersome at first. However, if you manage a large number of individuals in Protégé, you will quickly come to appreciate the labels. Think of IRIs with UUIDs for products like the following:
Product_c2318c23_2db8_4c8b_9dbf_cd20970d7723
The actual product cannot be practically identified in the Protégé UI. However, if you provide it with a label, it will be displayed with this label. In the example ontology, it looks like this:
Blu-ray player, HD, including cable
What is still done manually in Protégé can easily be automated in applications. A good practice is to define a template for each class, on the basis of which the labels are automatically composed of certain fields of the respective Individuals, for a product, for example, from the fields productCode
and productName
. More on this in a later article on programming with ontologies and semantic graph databases.
Data Properties versus Annotations
Since the name of individuals is usually arbitrary and therefore neither semantically nor relationally relevant, the question arises as to whether annotations should be used for such information instead of data properties. Since values for annotations in SPARQL queries can be determined in a technically similar way to those of data properties, this seems legitimate, especially to avoid unnecessary redundancies.
And for predominantly static and manually maintained lookup lists or enumerations, this is also perfectly justifiable. For dynamic datasets, however, calculate the manual effort involved in creating mixed queries from annotations and properties. In addition, annotations are not subject to any restrictions - neither on their values nor on their cardinality (number), which ultimately limits the validation options of the apps and increases the risk of ambiguities or inconsistencies.
Queries against individuals of classes based purely on properties, on the other hand, can be easily generated programmatically and therefore automatically. Individuals can be validated programmatically against the property constraints and even SHACL shapes can be generated automatically based on this. This opens up enormous potential for automation and productivity increases, which will be discussed in more detail in a follow-up article.
Exporting Models
To conclude this article, let's take a look at how the models created in Protégé can be exported for use in W3C-compliant graph databases. To do this, it can simply be saved from Protégé in one of the standard formats offered, such as RDF/XML, Turtle or JSON-LD. To do this, simply select the menu item File | Save As
and select the desired format.
Semantic graph databases such as Graph-DB from Ontotext have various adapters that make importing the model much easier. In practice, it is advisable to manage the model and instances in separate sub-graphs. This makes it very easy to update the model graph in the database after updates in Protégé without jeopardizing the existing data.
Conclusion
Protégé is a sophisticated ontology modeling tool and an indispensable assistant for creating and maintaining W3C-compliant semantic data models and ontologies. It provides useful functions for managing class, taxonomies, properties and constraints in a convenient and configurable UI for developers, without having to worry about the underlying RDF, RDFS and OWL triples and their various representations in the various file formats.
Protégé is written in Java, so it runs on all platforms. A large number of plug-ins are available, including those for SPARQL, SHACL and different and also configurable reasoners. Protégé can therefore meet the wide range of requirements for developing an ontology, including annotations and prefixes, IRI automation and internationalization, right through to merging multiple ontologies in catalogs to create comprehensive knowledge graphs.
Protégé was designed as a semantic modeling tool with an import/export interface for all common W3C-compliant file formats. Protégé leaves the efficient operation of the actual database with millions and billions of triples to established manufacturers such as Ontotext or Stardog, just as the actual knowledge management and app development is left to the software and knowledge graph developers.
The next article will show you how to set up and operate a semantic graph database, as well as how to conveniently manage data and knowledge in it and make it available for applications. So stay tuned!
References
[1] Alexander Schulze, Working with Knowledge Instead of Data, Semantic Web Part 1, dotnetpro 4/2020, page 78 ff., http://www.dotnetpro.de/A2004Semantik
[2] Alexander Schulze, The Model Comes First, Semantic Web Part 2, dotnetpro 5/2020, page 96 ff., http://www.dotnetpro.de/A2005Semantik
(C) Copyright 2014-2024 INNOTRADE GmbH, Herzogenrath, NRW, Germany (all rights reserved)