Semantic Web, Part 3: From Model to Database

Creating a model for a graph database using Protégé: axioms, object properties, individuals, annotations, and internationalization.

The foundations of the "Semantic Web" were presented in dotnetpro in the previous issues 4/2020 and 5/2020:

Part 1 covered the standards RDF and OWL, class hierarchies, taxonomies, data and object properties with their constraints and semantic metadata, individuals (the actual objects in a semantic graph database), reasoners that ensure data consistency, and inference that enables logical conclusions and thus machine learning [1].
Part 2 introduced practical work with ontologies, i.e., creating a semantic data model using the Protégé software [2]. It demonstrated how to manage classes, properties, individuals, and annotations with Protégé, implement schemas and classifications, use restrictions and the reasoner, and import and export models.

This third part focuses on synchronizing these models with a semantic graph database and using them for knowledge management. It continues directly from the second part.

Protégé: A Mature Ontology Modeling Tool

Protégé is a highly mature and one of the most frequently used free tools for modeling ontologies. Maintained by Stanford University, it is freely available on their website [3]. The software has established itself as a modeling tool whose generated models in various formats are supported by all W3C-compliant semantic databases.

For simple testing purposes, the program offers a plug-in for SPARQL queries. As the name suggests, it can execute SPARQL 1.0-compliant select queries. The plug-in does not (yet) support SPARQL 1.1 operations like insert or delete.

Protégé clearly positions itself as a UI-oriented modeling tool, leaving both operational use and programmatic data management to database and app vendors.

Global Axioms vs. Property Constraints

It's important to know that both Range and Domain axioms (see [2]) have a global effect when defined directly for properties, with corresponding implications for the ontology.

[Example of global axioms and their effects]

For generic properties with a general character, one should be cautious with Domain axioms to avoid later surprises through unexpected class assignments. A good practice is to use axioms as a tool for automatic classification of individuals - Range axioms for classifying objects and Domain axioms for classifying subjects - and not for validation or as (type) constraints.

If you want to specify locally for a class that a property should be restricted to a certain data type or value range, define this as a Type/Value Restriction at the class level. The article will go into more detail on this when specifying classes later.

Creating Object Properties

While data properties contain concrete values of various data types (so-called literals) for individuals, object properties reference other individuals. They are the basis for relationships between individuals in a graph, for example, the relation between an invoice and its associated customer.

To create such a property, click on the Object properties tab at the bottom of the Protégé UI. Like data properties, all object properties in OWL are derived from owl:topObjectProperty. Select owl:topObjectProperty and click the Add sub property button. Protégé will prompt you to enter the name of the object property.

Classes - Classification vs. Schemas

The next step after creating taxonomy, data, and object properties is specifying individual classes with their respective properties and constraints.

[Explanation of classification in ontologies vs. traditional schemas]

Specifying Classes and Properties

For object properties, the same principles apply. The important difference is that the Restriction Filler in the right area offers a class in the taxonomy (the class hierarchy) for selection instead of data types. The specification of cardinality affects object properties in the same way as data properties.

Creating Individuals

With classes and their restrictions set up, it's time to create individuals, the actual data records in the ontology. In Protégé, switch to the "Individuals by class" tab and the "Direct Instances" area at the bottom left. It shows all existing individuals of the selected class, as shown in Image 4 for the Product class.

Note that in the dotnetpro example ontology for this article [5], all products are labeled, and the Protégé renderer displays these labels in the screenshot. Hovering the mouse over an item shows its full IRI in the tooltip.

Unlike classes and properties, which are unique per ontology, IRI assignment for multiple individuals is different. While classes and properties usually receive unique and human-readable identifiers, having multiple individuals with potentially the same names would no longer guarantee IRI uniqueness per resource.

A good practice for assigning IRIs to individuals is to combine the class name with a UUID, a kind of primary key for the corresponding resource. Fortunately, Protégé supports generating IRIs with UUIDs quite conveniently.

To create a new individual, click the Add button above the list and then on "New Entity Options". Protégé offers a wizard to configure the IRI assignment. Choose "Auto-generated ID" and enter the desired prefix for the IRI, such as "Product_". As you enter a name for the new individual, Protégé creates a new UUID with each typed letter and appends it to the chosen prefix.

This technique ensures both global uniqueness of individuals and readability for the administrator.

Adding Annotations

Annotations are statements that can be attached to any entity (class, individual, or property) without affecting its semantics. They can be used for comments, specifying authors or version numbers, or for translations into different languages. They

treated as data and can be queried and manipulated via SPARQL. Reasoners do not consider annotations.

To create an annotation for an entity, select it in the left area of the Protégé UI. Its annotations then appear in the "Annotations" tab on the right. Image 7 shows an example for the data property "purchasePrice", which already has two label and two comment annotations in German and English.

OWL also allows creating arbitrary additional and equally internationalizable annotation types.

Collaboration and I18N for Ontologies

A particularly useful feature of Protégé for collaborating on ontologies in international teams is the configurable rendering of identifiers for all entities. You can determine whether to display the complete IRI, the named prefix notation, or an annotation in the UI - in the case of annotations, even in which language.

A good practice for documentation and international usability of ontologies is to provide at least one comment and one label annotation in English for each class, property, and individual, ideally also in other languages depending on the team's distribution or the target audience's scope.

Data Properties versus Annotations

Since the name of individuals is usually arbitrary and therefore neither semantically nor relationally relevant, the question arises whether annotations should be used instead of data properties for such information. While this may be reasonable for predominantly static and manually maintained lookup lists or enumerations, consider the manual effort in creating mixed queries from annotations and properties for dynamic data sets.

Also, annotations are not subject to any restrictions - neither to their values nor to their cardinality (number), which ultimately limits the validation options of apps and increases the risk of ambiguities or inconsistencies.

Exporting Models

To export models created in Protégé for use in W3C-compliant graph databases, simply save them in one of the offered standard formats such as RDF/XML, Turtle, or JSON-LD. Select the menu item File | Save As and choose the desired format.

Semantic graph databases like GraphDB from Ontotext have various adapters that greatly simplify the import of the model. For practical purposes, it is recommended to manage the model and instances in separate sub-graphs.

Conclusion

Protégé is a mature tool for modeling ontologies and an indispensable assistant in creating and maintaining W3C-compliant semantic data models and ontologies. It offers useful functions for managing classes, taxonomies, properties, and constraints in a developer-friendly and configurable UI, without having to worry about the underlying RDF, RDFS, and OWL triples and their diverse representations in various file formats.

The next article in this series will show how to set up and operate a semantic graph database, as well as how to comfortably manage data and knowledge in it and make it available for applications. Stay tuned!

References

[1] Alexander Schulze, Working with Knowledge Instead of Data, Semantic Web Part 1, dotnetpro 4/2020, page 78 ff., http://www.dotnetpro.de/A2004Semantik
[2] Alexander Schulze, The Model Comes First, Semantic Web Part 2, dotnetpro 5/2020, page 96 ff., http://www.dotnetpro.de/A2005Semantik
[3] Protégé, https://protege.stanford.edu
[4] Wikipedia, SHACL, http://www.dotnetpro.de/SL2006Semantik1
[5] GitHub Enapso dotnetpro Repository, http://www.dotnetpro.de/SL2006Semantik2

Alexander Schulze, CEO of Innotrade GmbH, is an expert in semantic data management and business analytics. As an IT consultant, speaker, and author, he regularly reports on AI and knowledge management and their benefits. aschulze@innotrade.de

Semantic Web, Part 3 - From Model to Database