The ontological commitments embedded in a representation language

Just like programming language preferences generate heated debates, this happens every now and then with languages to represent ontologies as well. Passionate dislikes for description logics or limitations of OWL are not unheard of, in favour of, say, Common Logic for more expressiveness and a different notation style, or of OBO because of its graph-based fundamentals, or that abuse of UML Class Diagram syntax  won’t do as approximation of an OWL file. But what is really going on here? Are they practically all just the same anyway and modellers merely stick with, and defend, what they know? If you could design your pet language, what would it look like?

The short answer is: they are not all the same and interchangeable. There are actually ontological commitments baked into the language, even though in most cases this is not explicitly stated as such. The ‘things’ one has in the language indicate what the fundamental building blocks are in the world (also called “epistemological primitives” [1]) and therewith assume some philosophical stance. For instance, a crisp vs vague world (say, plain OWL or a fuzzy variant thereof) or whether parthood is such a special relation that it deserves its own primitive next to class subsumption (alike UML’s aggregation). Or maybe you want one type of class for things indicated with count nouns and another type of element for stuffs (substances generally denoted with mass nouns). This then raises the question as to what the sort of commitments are that are embedded in, or can go into, a language specification and that have an underlying philosophical point of view. This, in turn, raises the question about which philosophical stances actually can have a knock-on effect on the specification or selection of an ontology language.

My collaborator, Pablo Fillottrani, and I tried to answer these questions in the paper entitled An Analysis of Commitments in Ontology Language Design that was published late last year as part of the proceedings of the 11th Conference on Formal Ontology in Information Systems 2020 that was supposed to have been held in September 2020 in Bolzano, Italy. In the paper, we identified and analysed ontological commitments that are, or could have been, embedded in logics, and we showed how they have been taken for well-known languages for representing ontologies and similar artefacts, such as OBO, SKOS, OWL 2DL, DLRifd, and FOL. We organised them in four main categories: what the very fundamental furniture is (e.g., including roles or not, time), acknowledging refinements thereof (e.g., types of relations, types of classes), the logic’s interaction with natural language, and crisp vs various vagueness options. They are discussed over about 1/3 of the paper.

Obviously, engineering considerations can interfere in the design of the logic as well. They concern issues such as how the syntax should look like and whether scalability is an issue, but this is not the focus of the paper.

We did spend some time contextualising the language specification in an overall systematic engineering process of language design, which is summarised in the figure below (the paper focuses on the highlighted step).

(source: [2])

While such a process can be used for the design of a new logic, it also can be used for post hoc reconstructions of past design processes of extant logics and conceptual data modelling languages, and for choosing which one you want to use. At present, the documentation of the vast majority of published languages do not describe much of the ‘softer’ design rationales, though.  

We played with the design process to illustrate how it can work out, availing also of our requirements catalogue for ontology languages and we analysed several popular ontology languages on their commitments, which can be summed up as in the table shown below, also taken from the paper:

(source: [2])

In a roundabout way, it also suggests some explanations as to why some of those transformation algorithms aren’t always working well; e.g., any UML-to-OWL or OBO-to-OWL transformation algorithm is trying to shoe-horn one ontological commitment into another, and that can only be approximated, at best. Things have to be dropped (e.g., roles, due to standard view vs positionalism) or cannot be enforced (e.g., labels, due to natural language layer vs embedding of it in the logic), and that’ll cause some hick-ups here and there. Now you know why, and that won’t ever work well.

Hopefully, all this will feed into a way to help choosing a suitable language for the ontology one may want to develop, or assist with understanding better the language that you may be using, or perhaps gain new ideas for designing a new ontology language.

References

[1] Brachman R, Schmolze J. An overview of the KL-ONE Knowledge Representation System. Cognitive Science. 1985, 9:171–216.

[2] Fillottrani, P.R., Keet, C.M. An Analysis of Commitments in Ontology Language Design. Proc. of FOIS 2020. Brodaric, B. and Neuhaus, F. (Eds.). IOS Press. FAIA vol. 330, 46-60.

On computer program being a whole

Who cares whether some computer program is a whole, how, and why? Turns out, more people than you may think—and so should you, since it can be costly depending on the answer. Consider the following two scenarios: 1) you download a ‘pirated’ version of MS Office or Adobe Photoshop (the most popular ones still) and 2) you take the source code of a popular open source program, such as Notepad++, add a little code for some additional function, and put it up for sale only as an executable app called ‘Notepad++ extreme (NEXT)’ so as to try to earn money quickly. Are these actions legal?

In both cases, you’d break the law, but how many infringements took place, of the one that you potentially could be fined for or face jail time? For the piracy case, is that once for the MS Office suite, or for each progam in the suite, or for each file created upon installing MS office, or for each source code file that went into making the suite during software development? For the open source case, was that violating its GNU GLP open source licence once for the zipped&downloaded or cloned source code or for each file in the source code, of which there are hundreds? It is possible to construct similar questions for trade secret violations and patent infringements for programs, as well as other software artefacts, like illegal downloads of TV series episodes (going strong during COVID-19 lockdowns indeed). Just in case you think this sort of issue is merely hypothetical: recently, Arista paid Cisco $400 million for copyright damages and just before that, Zenimax got $500 million from Oculus (yes, the VR software) for trade secret violations, and Google vs Oracle is ongoing with “billions of dollars at stake”.

Let’s consider some principles first. To be able to answer the number of infringements, we first need to know whether a computer program is a whole or not and why, and if so, what’s ‘in’ (i.e., a part of it) and what’s ‘out’ (i.e., definitely not part of it). Spoiler alert: a computer program is a functional whole.

To get to that conclusion, I had to combine insights from theories of parthood (mereology), granularity, modularity, unity, and function and add a little more into the mix. To provide less and more condensed versions of the argumentation, there is a longer technical report [1], of which I hope it is readable by a wider audience, and a condensed version for a specialist audience [2] that was published in the Proceedings of the 11th Conference on Formal Ontologies in Information Systems (FOIS’20) two weeks ago. Very briefly and informally, the state of affairs can be illustrated with the following picture:

(Source: adapted from [2])

This schematic representation shows, first, two levels of granularity: level 1 and level 2. At level 1, there’s some whole, like the a1 and a2 in the figure that could be referring to, say, a computer program, a module repository, an electorate, or a human body. At a more fine-grained level 2, there are different entities, which are in some way linked to the respective whole. This ‘link’ to the whole is indicated with the vertical dashed lines, and one can say that they are part of the whole. For the blue dots on the right residing at level 2, i.e., the parts of a1, there’s also a unifying relation among the parts, indicated with the solid lines with arrows, which makes a1 an integral whole. Moreover, for that sort of whole, it holds that if some object x (residing at level 2) is part of a1 then if there’s a y that is also part of a1, it participates in that unifying relation with x and vice versa (i.e., if y is in that unifying relation with x, then it must also be part of a1). For the computer program’s source code, that unifying relation can be the source tree graph.

There is some nitty gritty detail also involving the notion of function—a source code file contributes to doing something—and optional vs mandatory vs essential part that you can read about in the report or in the paper [1,2], covering the formalisation, more argumentation, and examples.

How would it pan out for the infringements? The Notepad++ exploitation scenario would simply be a case of one infringement in total for all the files needed to create the executable, not one for each source code file. This conclusion from the theory turns out remarkably in line with the GNU GPL’s explanation of their licence, albeit then providing a theoretical foundation for their intuition that there’s a difference between a mere aggregate where different things are bundled, loose coupling (e.g., sockets and pipes) and a single program (e.g., using function calls, being included in the same executable). The order of things perhaps should have been from there into the theory, but practically, I did the analysis and stumbled into a situation where I had to look up the GPL and its explanatory FAQ. On the bright side, in the other direction now then: just  in case someone wants to take on copyleft principles of open source software, here are some theoretical foundations to support that there’s probably much less money to be gained than you might think.

For the MS Office suite case mentioned at the start, I’d need a look under the hood to determine how it ties together and one may have to argue about the sameness of, or difference between, a suite and a program. The easier case for a self-standing app, like the 3rd-place most pirated Windows app Internet Download Manager, is that it is one whole and so one infringement then.

It’s a pity that FOIS 2020 has been postponed to 2021, but at least I got to talk about some of this as expert witness for a litigation case and I managed to weave an exercise about the source tree with open source licences into the social issues and professional practice module I thought to some 750 students this past winter.

References

[1] Keet, C.M. Why a computer program is a functional whole. Technical report 2008.07273, arXiv. 21 July 2020. 25 pages.

[2] Keet, C.M. The computer program as a functional whole. Proc. of FOIS 2020. Brodaric, B. and Neuhaus, F. (Eds.). IOS Press. FAIA vol. 330, 216-230.