Semantic Web, Part 3: From Model to Database
Creating When creating a model for a graph database using Protégé: , the Protégé tool helps with axioms, data and object properties, individuals, annotations , and internationalization. The foundations of the "Semantic Web" were presented already in dotnetpro in the previous issues 4/2020 and 5/2020two previous articles:
Part 1 covered the standards RDF and OWL, class hierarchies, taxonomies, data and object properties with their constraints and semantic metadata, individuals (the actual objects in a semantic graph database), reasoners that ensure data consistency, and inference that enables logical conclusions and thus machine learning [1]machine learning (https://www.linkedin.com/pulse/working-knowledge-innotrade-gmbh-rnq0e/).
Part 2 introduced practical work with ontologies, i.e., creating a semantic data model using the Protégé software [2]. It demonstrated how to manage classes, properties, individuals, and annotations with Protégé, implement schemas and classifications, use restrictions and the reasoner, and import and export models (https://www.linkedin.com/pulse/mastering-knowledge-graphs-getting-started-semantic-modelling-vooce).
This third part focuses on synchronizing these models with a semantic graph database and using them for knowledge management. It continues directly from the second part.
...
It's important to know that both Range and Domain axioms (see [2] https://www.linkedin.com/pulse/working-knowledge-innotrade-gmbh-rnq0e) have a global effect when defined directly for properties, with corresponding implications for the ontology.[Example of global axioms and their effects]
If, for example, you provide a specific property companyName
with a domain axiom Company
, the reasoner ensures that every individual that uses companyName
automatically becomes a member of the Company
class - without this requiring an explicit class assignment Ă la
dnp:Company_EbnerVerlag rdf:type dnp:Company
The desired and helpful side of the inference.
If, on the other hand, you assign a generic property name
with a domain axiom Product
, all Individuals that use this property also automatically become members of the Product
class. Consequently, an individuals who uses the name
property would also become a Product
object, which is logically correct but semantically nonsensical.
For generic properties with a general character, one should be cautious with Domain axioms to avoid later surprises through unexpected class assignments. A good practice is to use axioms as a tool for automatic classification of individuals - Range axioms for classifying objects and Domain axioms for classifying subjects - and not for validation or as (type) constraints.
...
While data properties contain concrete values of various different data types (, so-called literals) , for individuals, object properties reference other individuals. They are the basis for the relationships between the individuals in a graph, for example , the relation relationship between an invoice and its associated customer.
...
the associated customer.
In RDF, all entities, including all individuals and properties, are uniquely identified by their IRIs. The triples for the representation of relations are correspondingly simple. An example:
dnp:Invoice_A dnp:hasCustomer dnp:Customer_B
The object property hasCustomer
links invoice A with customer B. To create such a property, click on the Object properties tab at the bottom of the Protégé UI. Like the data properties, all object properties in OWL are derived from owl:topObjectProperty
. Select owl:topObjectProperty
and click the Add sub property
button. Protégé will prompt prompts you to enter the name of the object property (Fig. 1).
...
Classes - Classification vs. Schemas
The next step after creating the taxonomy, data , and object properties is specifying to specify the individual classes with their respective properties and constraints.
[Explanation of classification in ontologies vs. traditional schemas]
Specifying Classes and Properties
For object properties, the same principles apply. Previously, we looked at how domain and range axioms are used in ontologies to automatically classify individuals, i.e. to automatically group individuals into members of one or more classes: implicit single or multiple classification.
Coming from the OOP world, however, one is tempted to regard the classes in an ontology not as groups of individuals, but as schemas, as a predefined framework of properties and their constraints, which the instances of the classes must adhere to. In general, however, semantic databases are initially schema-less, which means that each individual can have any properties. Individuals also do not necessarily have to have one or even several classifications, neither explicit ones using rdf:type
nor implicit ones using domain and range axioms.
What is ideal for mapping complex real environments, environments and knowledge domains due to its openness and freedom may seem impractical compared to the tables and collections from the SQL and NoSQL world, as ERP systems, for example, usually have a large number of similar instances such as customers, products or invoices, all of which are subject to their own schemas. However, OWL also offers mechanisms for defining schemas, albeit from a slightly different perspective than perhaps expected.
Specifying Classes and Properties
While traditional schemas say, for example, "A person has a name", ontologies express this as "A person is a subclass of all things that have a name". This expression comes from Description Logic (DL), sounds a bit confusing, but basically means the same thing. Since there is no concrete class Thing
with a name
, it is technically the inheritance of a so-called anonymous class. Because multiple inheritance is an integral part of OWL, defining multiple properties per class according to this concept is not only not a problem, but also an intended and established modeling practice.
Figure 2 shows the Product
class with its data properties. In accordance with the DL/OWL conventions mentioned, these are not to be found in the properties, but in the SubClass Of
area.
...
The data types are specified as so-called data restrictions (Fig. 3), a kind of type and value constraints. The information on the cardinality (the permitted number) of properties is an additional feature of OWL. Two aspects are important here:
Firstly, the restrictions - in contrast to the domain and range axioms - are applied at class level and not at property level. They therefore do not apply globally, but only to the corresponding class.
Secondly, unlike the axioms, restrictions do not lead to an automatic classification of the individuals of the class. They are primarily descriptive in nature and do not create any new triples, but are taken into account by the reasoner in the consistency checks.
...
Due to the Open World Assumption, however, this is not entirely comparable with a validation scheme. If a minimum cardinality of 1 is specified in the example and an individual does not have a corresponding property, the reasoner will not recognize this as a violation, because the OWA states that everything that is not explicitly specified is simply unknown, but not necessarily false. The triple for the unknown property could be in a different ontology or database. SHACL (https://www.w3.org/TR/shacl/) can help here; more on this in a subsequent article.
The same applies to the object properties. The important difference is that the Restriction Filler in the right area restriction filler offers a class in the taxonomy (the class hierarchy) for selection in the right-hand area instead of the data types. The cardinality specification of cardinality affects the object properties in the same way as the data properties.
Creating Individuals
With The classes and with their restrictions set uphave been created, it's time to create now it is the turn of the individuals, the actual data records in the ontology. In Protégé, switch to the " Individuals by class " tab and to the "Direct Instances
" area at the bottom left. It shows all existing individuals of the class selected classabove, as shown in Image Figure 4 for using the example of Product
class.
...
Note that in the âdotnetproâ example dotnetpro sample ontology for this article (https://github.com/innotrade/enapso-dotnetpro ), all products are labeled , and the ProtĂ©gĂ© renderer displays these shows the labels in the screenshot. Hovering If the mouse pointer hovers over an item article, the tooltip shows its full complete IRI in the tooltip.
Unlike In contrast to classes and properties, which are unique per ontology, the IRI assignment for multiple a large number of individuals is, as expected, different. While classes and properties are usually receive given unique and human-readable identifiers, having multiple the uniqueness of the IRI per resource would no longer be guaranteed if there were several individuals with potentially possibly the same names would no longer guarantee IRI uniqueness per resourcename. An IRI dnp:AlexanderSchulze
would inevitably lead to conflicts with an individual of the same name in the database.
A good practice for assigning IRIs to for individuals is to combine therefore the combination of class name with and a UUID, a kind of primary key for the corresponding resource. Fortunately, Protégé supports generating the generation of IRIs with UUIDs quite conveniently.
...
To create a new individual, click on the Add
button above the list and then on "New Entity Options
". Protégé offers a wizard that can be used to configure the assignment of the IRI assignment. Choose "Auto-generated ID" (Figure 5).
...
Next, select the Autogenerated ID
option and enter the desired prefix for the IRI , such as "below, in this case Product_
". As soon as you enter a name for the new individual (Fig. 6), Protégé creates a new UUID with each typed every letter you type and appends it to the chosen selected prefix.
...
This technique ensures both As can be seen in the screenshot, Protégé automatically creates a label for the relevant product in this case and displays it accordingly in the user interface. This technology therefore guarantees both the global uniqueness of individuals Individuals and readability legibility for the administrator.
Adding Annotations
Annotations are statements that can be attached to any entity (class, individual , or property) without affecting its semantics. They can be used, for example, for comments, specifying to specify authors or version numbers, or but also for translations into different national languages. They are treated as like data and can therefore be queried and manipulated via SPARQL. Reasoners do not consider take annotations into account.
...
To create an annotation for an entity, select it in the entity on the left area -hand side of the Protégé UI. Its annotations then appear on the right in the "Annotations" tab on of the rightsame name. Image Figure 7 shows an example for of the data property "purchasePrice
", for which already has two label and two comment annotations have already been created in German and English.
...
OWL also allows creating arbitrary additional and equally internationalizable annotation typesTo create a new annotation, click the Add
button; to change an existing one, use the Edit
button behind the annotation in question. Protégé prompts you to enter the type of annotation and its value. Figure 8 shows the entry of documentation in English for the purchasePrice
property in the form of a comment annotation.
...
This example makes it clear that metadata directly integrated into ontologies not only simplifies their cross-cultural documentation for developers, but also the maintenance of content for international target groups. OWL also makes it possible to create any number of other annotation types that can also be internationalized.
Collaboration and I18N for Ontologies
A particularly useful One feature of Protégé is particularly useful for collaborating on ontologies in international teams is : the configurable rendering of the identifiers for of all entities . You is configurable. For example, you can determine whether you want to display the complete IRI, the named prefix notation , or an annotation in the UI - in the case of annotationsthe annotation, even in which language.
...
A good practice for in terms of documentation and international usability of ontologies is therefore to provide each class, property and individual with at least one comment and one label annotation in English for each class, property, and individual, ideally also in other languages, depending on the distribution of the team 's distribution or the target audience's scope.
...
scope of the target group.
To configure the renderer, select the Renderer
tab under Preferences
in the Protégé main menu and then the Render by annotation property
option. Figure 9 shows the settings dialog.
...
By clicking on Configure ...
you can then specify which annotation type is to be used to display the interface and which language is to be used (Fig. 10).
...
The ontology can then be easily "translated" to the desired language, which also makes it very easy to maintain ontologies across cultures. Figure 11 shows an example of how the taxonomy is now rendered in German.
...
Since most of the identifiers for classes and properties in your ontology are probably recorded in English anyway, the use and maintenance of labels for each entity may seem cumbersome at first. However, if you manage a large number of individuals in Protégé, you will quickly come to appreciate the labels. Think of IRIs with UUIDs for products like the following:
Product_c2318c23_2db8_4c8b_9dbf_cd20970d7723
The actual product cannot be practically identified in the Protégé UI. However, if you provide it with a label, it will be displayed with this label. In the example ontology, it looks like this:
Blu-ray player, HD, including cable
What is still done manually in Protégé can easily be automated in applications. A good practice is to define a template for each class, on the basis of which the labels are automatically composed of certain fields of the respective Individuals, for a product, for example, from the fields productCode
and productName
. More on this in a later article on programming with ontologies and semantic graph databases.
Data Properties versus Annotations
Since the name of individuals is usually arbitrary and therefore neither semantically nor relationally relevant, the question arises as to whether annotations should be used for such information instead of data properties for such information. While this may be reasonable . Since values for annotations in SPARQL queries can be determined in a technically similar way to those of data properties, this seems legitimate, especially to avoid unnecessary redundancies.
And for predominantly static and manually maintained lookup lists or enumerations, consider this is also perfectly justifiable. For dynamic datasets, however, calculate the manual effort involved in creating mixed queries from annotations and properties for dynamic data sets.Also. In addition, annotations are not subject to any restrictions - neither to on their values nor to on their cardinality (number), which ultimately limits the validation options of the apps and increases the risk of ambiguities or inconsistencies.
Queries against individuals of classes based purely on properties, on the other hand, can be easily generated programmatically and therefore automatically. Individuals can be validated programmatically against the property constraints and even SHACL shapes can be generated automatically based on this. This opens up enormous potential for automation and productivity increases, which will be discussed in more detail in a follow-up article.
Exporting Models
To export conclude this article, let's take a look at how the models created in Protégé can be exported for use in W3C-compliant graph databases. To do this, simply save them it can simply be saved from Protégé in one of the offered standard formats offered, such as RDF/XML, Turtle , or JSON-LD. Select To do this, simply select the menu item File | Save As
and choose select the desired format.
Semantic graph databases like GraphDB such as Graph-DB from Ontotext have various adapters that greatly simplify the import of the model. For practical purposesmake importing the model much easier. In practice, it is recommended advisable to manage the model and instances in separate sub-graphs. This makes it very easy to update the model graph in the database after updates in Protégé without jeopardizing the existing data.
Conclusion
Protégé is a mature tool for modeling ontologies sophisticated ontology modeling tool and an indispensable assistant in for creating and maintaining W3C-compliant semantic data models and ontologies. It offers provides useful functions for managing classesclass, taxonomies, properties , and constraints in a developer-friendly convenient and configurable UI for developers, without having to worry about the underlying RDF, RDFS , and OWL triples and their diverse various representations in the various file formats.
Protégé is written in Java, so it runs on all platforms. A large number of plug-ins are available, including those for SPARQL, SHACL and different and also configurable reasoners. Protégé can therefore meet the wide range of requirements for developing an ontology, including annotations and prefixes, IRI automation and internationalization, right through to merging multiple ontologies in catalogs to create comprehensive knowledge graphs.
Protégé was designed as a semantic modeling tool with an import/export interface for all common W3C-compliant file formats. Protégé leaves the efficient operation of the actual database with millions and billions of triples to established manufacturers such as Ontotext or Stardog, just as the actual knowledge management and app development is left to the software and knowledge graph developers.
The next article will show you how to set up and operate a semantic graph database, as well as how to conveniently manage data and knowledge in it and make it available for applications. So stay tuned!
References
[1] Alexander Schulze, Working with Knowledge Instead of Data, Semantic Web Part 1, dotnetpro 4/2020, page 78 ff., http://www.dotnetpro.de/A2004Semantik
[2] Alexander Schulze, The Model Comes First, Semantic Web Part 2, dotnetpro 5/2020, page 96 ff., http://www.dotnetpro.de/A2005Semantik
...