Language annotation on the Web with MoLA

The Web consists of very many resources in many languages and has information about even more. Sure, the majority of Internet users speak English, Chinese, or Spanish, but there are sites, pages, paragraphs, and documents in other languages and about lesser known ‘languoids’ (language, dialect, variant, etc.), ranging from, say, a poem about the poor man’s dinner written in an old Brabants dialect that used to be spoken in the south of the Netherlands to the effects of mobile phones on Zimbabwean (cf. South African) isiNdebele [1]. How should that be annotated? Here’s a complex use case of languoids for old French:

(source: [2])

The extant multilingual Semantic Web models, such as W3C’s ontolex-lemon community standard have outsourced that to a ‘the language tag comes from some place’, as they focus on the word-level and/or sentence-level for the multilingual (semantic) Web. There are indeed standardisations of language tags. Notably, there are the ISO 639 codes (parts 1, 2, 3 and 5) for some 8000 languages—but there are more languoids that are not covered by the ISO list, with an estimated 8-15K or so currently spoken languoids and 150K extinct. There are also Glottolog, Ethnologue, and MultiTree, which are more comprehensive in some respect, but they are limited and problematic in other cases. For instance, Glottolog—the best among them—still uses the broader/narrower than, has artificial names for grouping languoids, has inconsistencies in modelling decisions, and is still incomplete in coverage.

My co-authors—Frances Gillis-Webber, also at UCT, and Sabine Tittel, with the Heidelberg Academy of Sciences and Humanities—and I aim to change that so as to allow for more comprehensive and more inclusive language tags and annotations on the Semantic Web.

In order to be able to do so, we developed a Model for Language Annotation (MoLA) that caters for relatively comprehensive languoid annotations and how they are related, such as allowing recording which languoid evolved from or was influenced by which other languoid, when it was spoken and where, its preferred and alternate names, what sort of lect it is (e.g., dialect, pidgin), which dialect cluster or language family it is a member of, and backward compatibility with the ISO 639 codes.

The design approach was that of labour-intensive manual modelling, including competency questions, an extensive use case, and iterative development of the model at the conceptualisation stage using the Object-Role Modeling (ORM) language. This model was then formalised in OWL (well, most of it). It was tested on the competency questions, smaller use case scenarios, and validated with the large use case. A snippet for Spanish is as follows, as the one for Old French gets quite lengthy (some details).

It enhances Glottolog’s model on several key points, including proper relations between languoids cf BT/RT, a languoid can be associated with zero or more regions, and it allows for multiple names of a languoid both concurrently and over time.

This sounds like a smooth process, but there were a few modelling hurdles that had to be overcome. One of them is level of granularity of analysis of a languoid. For instance, one could argue both that isiXhosa is a language—it’s one of the 11 official languages of South Africa—but also that it is a dialect cluster (i.e., a collection), as there are multiple dialects of isiXhosa. This is a similar case for Old French that’s a language and member of the Romance family of languages, but also can be seen as a collection of dialects (e.g., Picard and Norman), and dialects, in turn, may have varieties (e.g., Artois and Santerre for Picard). On the bright side, this now can be represented and, because it is represented explicitly, it can be queried, such as “Which languoids are dialects of French that were spoken in the middle ages in France?” and “Which languoids are a member of Nguni?”. The knowledgebase still needs to be populated, though, so it won’t work yet with all languoids.

More details can be found in the paper that was recently published [2]. It will be presented in a few weeks at the 1st Iberoamerican Conference on Knowledge Graphs and Semantic Web (KGSWC’19), in Villa Clara, Cuba, alongside 13 other papers on ontologies. The first author, Frances, is soon travelling to the ISWS summer school, so I will present it at KGSWC’19.

 

References

[1] Nkomo, D., Khumalo, L. Embracing the mobile phone technology: its social and linguistic impact with special reference to Zimbabwean Ndebele. African Identities, 10(2): 143-153.

[2] Gillis-Webber, F., Tittel, S., Keet, C.M.. A Model for Language Annotations on the Web. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS. 23-30 June 2019, Villa Clara, Cuba.

Advertisements

About modelling styles in ontologies

As any modeller will know, there are pieces of information or knowledge that can be represented in different ways. For instance, representing ‘marriage’ as class or as a ‘married to’ relationship, adding ‘address’ as an attribute or a class in one’s model, and whether ‘employee’ will be positioned as a subclass of ‘person’ or as a role that ‘person’ plays. In some cases, there a good ontological arguments to represent it in one way or the other, in other cases, that’s less clear, and in yet other cases, efficiency is king so that the most compact way of representing it is favoured. This leads to different design decisions in ontologies, which hampers ontology reuse and alignment and affects other tasks, such as evaluating competency questions over the ontology and verbalising ontologies.

When such choices are made consistently throughout the ontology, one may consider this to be a modelling style or representation style. If one then knows which style an ontology is in, it would simplify use and reuse of the ontology. But what exactly is a representation style?

While examples are easy to come by, shedding light on that intuitive notion turned out to be harder than it looked like. My co-author Pablo Fillottrani and I tried to disentangle it nonetheless, by characterising the inherent features and the dimensions by which a style may differ. This resulted in 28 different traits for the 10 identified dimensions.  For instance, the dimension “modular vs. monolithic” has three possible options: 1) ‘Monolithic’, where the ontology is stored in one file (no imports or mergers); 2) ‘Modular, external’, where at least one ontology is imported or merged, and it kept its URI (e.g., importing DOLCE into one’s domain ontology, not re-creating it there); 3) ‘Modular, internal’, where there’s at least one ontology import that’s based on having carved up the domain in the sense of decomposition of the domain (e.g., dividing up a domain into pizzas and drinks at pizzerias).  Other dimensions include, among others, the granularity of relations (many of few), how the hierarchy looks like, and attributes/data properties.

We tried to “eat our own dogfood” and applied the dimensions and traits to a set of 30 ontologies. This showed that it is feasible to do, although we needed two rounds to get to that stage—after the first round of parallel annotation, it turned out we had interpreted a few traits differently, and needed to refine the number of traits and be more precise in their descriptions (which we did). Perhaps unsurprising, some tendencies were observed, and we could identify three easily recognisable types of ontologies because most ontologies had clearly one or the other trait and similar values for sets of trait. Of course, there were also ontologies that were inherently “mixed” in the sense of having applied different and conflicting design decisions within the same ontology, or even included two choices. Coding up the results, we generated two spider diagrams that visualise that difference. Here’s one:

Details of the dimensions, traits, set-up and results of the evaluation, and discussion thereof have been published this week [1] and we’ll present it next month at the 1st Iberoamerican Conference on Knowledge Graphs and Semantic Web (KGSWC’19), in Villa Clara, Cuba, alongside 13 other papers on ontologies. I’m looking forward to it!

 

References

[1] Keet, C.M., Fillottrani, P.R.. Dimensions Affecting Representation Styles in Ontologies. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS vol 1029, 186-200. 24-28 June 2019, Villa Clara, Cuba. Paper at Springer

Some experiences on making a textbook available

I did make available a textbook on ontology engineering for free in July 2018. Meanwhile, I’ve had several “why did you do this and not a proper publisher??!?” I had tried to answer that already in the textbook’s FAQ. Turns out that that short answer may be a bit too short after all. So, here follows a bit more about that.

The main question I tried to answer in the book’s FAQ was “Would it not have been better with a ‘proper publisher’?” and the answer to that was:

Probably. The layout would have looked better, for sure. There are several reasons why it isn’t. First and foremost, I think knowledge should be free, open, and shared. I also have benefited from material that has been made openly available, and I think it is fair to continue contributing to such sharing. Also, my current employer pays me sufficient to live from and I don’t think it would sell thousands of copies (needed for making a decent amount of money from a textbook), so setting up such a barrier of high costs for its use does not seem like a good idea. A minor consideration is that it would have taken much more time to publish, both due to the logistics and the additional reviewing (previous multi-author general textbook efforts led to nothing due to conflicting interests and lack of time, so I unlikely would ever satisfy all reviewers, if they would get around reading it), yet I need the book for the next OE installment I will teach soon.

Ontology Engineering (OE) is listed as an elective in the ACM curriculum guidelines. Yet, it’s suited best for advanced undergrad/postgrad level because of the prerequisites (like knowing the basics of databases and conceptual modeling). This means there won’t be big 800-students size classes all over the world lining up for OE. I guess it would not go beyond some 500-1000/year throughout the world (50 classes of 10-20 computer science students), and surely not all classes would use the textbook. Let’s say, optimistically, that 100 students/year would be asked to use the book.

With that low volume in mind, I did look up the cost of similar books in the same and similar fields with the ‘regular’ academic publishers. It doesn’t look enticing for either the author or the student. For instance this one from Springer and that one from IGI Global are all still >100 euro. for. the. eBook., and they’re the cheap ones (not counting the 100-page ‘silver bullet’ book). Handbooks and similar on ontologies, e.g., this and that one are offered for >200 euro (eBook). Admitted there’s the odd topical book that’s cheaper and in the 50-70 euro range here and there (still just the eBook) or again >100 as well, for a, to me, inexplicable reason (not page numbers) for other books (like these and those). There’s an option to publish a textbook with Springer in open access format, but that would cost me a lot of money, and UCT only has a fund for OA journal papers, not books (nor for conference papers, btw).

IOS press does not fare much better. For instance, a softcover version in the studies on semantic web series, which is their cheapest range, would be about 70 euro due to number of pages, which is over R1100, and so again above budget for most students in South Africa, where the going rate is that a book would need to be below about R600 for students to buy it. A plain eBook or softcover IOS Press not in that series goes for about 100 euro again, i.e., around R1700 depending on the exchange rate—about three times the maximum acceptable price for a textbook.

The MIT press BFO eBook is only R425 on takealot, yet considering other MIT press textbooks there, with the size of the OE book, it then would be around the R600-700. Oxford University Press and its Cambridge counterpart—that, unlike MIT press, I had checked out when deciding—are more expensive and again approaching 80-100 euro.

One that made me digress for a bit of exploration was Macmillan HE, which had an “Ada Lovelace day 2018” listing books by female authors, but a logics for CS book was again at some 83 euros, although the softer area of knowledge management for information systems got a book down to 50 euros, and something more popular, like a book on linguistics published by its subsidiary “Red Globe Press”, was down to even ‘just’ 35 euros. Trying to understand it more, Macmillan HE’s “about us” revealed that “Macmillan International Higher Education is a division of Macmillan Education and part of the Springer Nature Group, publishers of Nature and Scientific American.” and it turns out Macmillan publishes through Red Globe Press. Or: it’s all the same company, with different profit margins, and mostly those profit margins are too high to result in affordable textbooks, whichever subsidiary construction is used.

So, I had given up on the ‘proper publisher route’ on financial grounds, given that:

  • Any ontology engineering (OE) book will not sell large amounts of copies, so it will be expensive due to relatively low sales volume and I still will not make a substantial amount from royalties anyway.
  • Most of the money spent when buying a textbook from an established publisher goes to the coffers of the publisher (production costs etc + about 30-40% pure profit [more info]). Also, scholarships ought not to be indirect subsidy schemes for large-profit-margin publishers.
  • Most publishers would charge an amount of money for the book that would render the book too expensive for my own students. It’s bad enough when that happens with other textbooks when there’s no alternative, but here I do have direct and easy-to-realise agency to avoid such a situation.

Of course, there’s still the ‘knowledge should be free’ etc. argument, but this was to show that even if one were not to have that viewpoint, it’s still not a smart move to publish the textbook with the well-known academic publishers, even more so if the topic isn’t in the core undergraduate computer science curriculum.

Interestingly, after ‘publishing’ it on my website and listing it on OpenUCT and the Open Textbook Archive—I’m certainly not the only one who had done a market analysis or has certain political convictions—one colleague pointed me to the non-profit College Publications that aims to “break the monopoly that commercial publishers have” and another colleague pointed me to UCT press. I had contacted both, and the former responded. In the meantime, the book has been published by CP and is now also listed on Amazon for just $18 (about 16 euro) or some R250 for the paperback version—whilst the original pdf file is still freely available—or: you pay for production costs of the paperback, which has a slightly nicer layout and the errata I knew of at the time have been corrected.

I have noticed that some people don’t take the informal self publishing seriously—even below the so-called ‘vanity publishers’ like Lulu—notwithstanding the archives to cater for it, the financial take on the matter, the knowledge sharing argument, and the ‘textbooks for development’ in emerging economies angle of it. So, I guess no brownie points from them then and, on top of that, my publication record did, and does, take a hit. Yet, writing a book, as an activity, is a nice and rewarding change from just churning out more and more papers like a paper production machine, and I hope it will contribute to keeping the OE research area alive and lead to better ontologies in ontology-driven information systems. The textbook got its first two citations already, the feedback is mostly very positive, readers have shared it elsewhere (reddit, ungule.it, Open Libra, Ebooks directory, and other platforms), and I recently got some funding from the DOT4D project to improve the resources further (for things like another chapter, new exercises, some tools development to illuminate the theory, a proofreading contest, updating the slides for sharing, and such). So, overall, if I had to make the choice again now, I’d still do it again the way I did. Also, I hope more textbook authors will start seeing self-publishing, or else non-profit, as a good option. Last, the notion of open textbooks is gaining momentum, so you even could become a trendsetter and be fashionable 😉

A useful abstract relational model and SQL path queries

Whilst visiting David Toman at the University of Waterloo during my sabbatical earlier this year, one of the topics we looked into was their experiments on whether their SQLP—SQL with path queries, extended from [1]—would be better than plain SQL in terms of time it takes to understand queries and correctness in writing them. Turned out (in a user evaluation) that it’s faster with SQLP whilst maintaining accuracy. The really interesting aspect in all this from my perspective, however, was the so-called Abstract Relational Model (ARM), or: the modelling side of things rather than making the querying easier, as the latter is made easier with the ARM. In simple terms, the ARM [1] is alike the relational model, but then with identifiers, which makes those path queries doable and mostly more succinct, and one can partition the relations into class-relationship-like models (approaching the look-and-feel of a conceptual model) or lump stuff together into relational-model-like models, as preferred. Interestingly, it turns out that the queries remain exactly the same regardless whether one makes the ARM look more relational-like or ontology-like, which is called “invariance under vertical partitioning” in the paper [2]. Given all these nice things, there’s now also an algorithm to go from the usual relational model to an ARM schema, so that even if one has legacy resources, it’s possible to bump it up to this newer technology with more features and ease of use.

Our paper [2] that describes these details (invariance, RM-to-ARM, the evaluation), entitled “The Utility of the Abstract Relational Model and Attribute Paths in SQL”, is being published as part of the proceedings of the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18), which will be held in Nancy, France, in about two weeks.

This sort of Conceptual Model(like)-based Data Access (CoMoDA, if you will) may sound a bit like Ontology-Based Data Access (OBDA). Yes and No. Roughly, yes on the conceptual querying sort of thing (there’s still room for quite some hair splitting there, though); no regarding the conceptual querying sort of thing. The ARM doesn’t pretend to be an ontology, but easily has a reconstruction in a Description Logic language [3] (with n-aries! and identifiers!). SQLP is much more expressive than the union of conjunctive queries one can pose in a typical OBDA setting, however, for it is full SQL + those path queries. So, both the theory and technology are different from the typical OBDA setting. Now, don’t think I’m defecting on the research topics—I still have a whole chapter on OBDA in my textbook—but it’s interesting to learn about and play with alternative approaches toward solutions to (at a high level) the same problem of trying to make querying for information easier and faster.

 

References

[1] Borgida, A., Toman, D., Weddell, G.E. On referring expressions in information systems derived from conceptual modelling. Proc. of ER’16. Springer LNCS, vol. 9974, 183-197.

[2] Ma, W., Keet, C.M., Olford, W., Toman, D., Weddell, G. The Utility of the Abstract Relational Model and Attribute Paths in SQL. 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18). Springer LNAI. (in print). 12-16 Nov. 2018, Nancy, France.

[3] Jacques, J.S., Toman, D., Weddell, G.E. Object-relational queries over CFDInc knowledge bases: OBDA for the SQL-Literate. Proc. of IJCAI’16. 1258-1264 (2016)

FOIS’18 conference report

To some perhaps surprisingly, despite being local organizer, I could attend all sessions of the 10th International Conference Formal Ontology in Information Systems as participant (cf. running around for last-minute things). It just wasn’t as much of a trip as it usually is: only 15 minutes to town at the Atlantic Imbizo conference venue, which is situated between the Clock Tower and (award-winning) Zeitz MOCAA at Cape Town’s V&A Waterfront. This blog post has turned into a longer post than intended—yet, there’s still so much left out to talk about—and it is divided up into sections on keynotes, presentations, ontologies, and the (ontologically inappropriate basket of) other things.

 

Keynotes

The first keynote was presented by (emeritus) professor in philosophy Peter Simons from Trinity College Dublin and Universität Salzburg, on the ontology of aboutness (slides).

Peter Simon during his keynote talk

That may sound a bit abstract, but it is not unusual for some information system that it will have to record statements about something, such as different medical opinions, changes of policies, plans or expectations, and we need a way to represent that and deal with it. Simons discussed several earlier proposals before proposing his own, which includes as main entities a bearer, act, time, act-type, mental content, mental content type, intentional objects, referent, and referent type (slide 16), and then variants for pictorial and linguistic (speech and writing). And, in closing, his advice of “Don’t get involved in irrelevant philosophical disputes”.

The second keynote was presented by Alessandro Oltramari, who works at Bosch Research and Technology Centre in Pittsburgh, USA. He presented several of Bosch’s projects where ontologies are used in one way or another (slides) and that he was involved in. One of them was about knowledge-based intelligent IoT and another on an emergency assistant, or, in business sales parlance, a “personal guardian angel” mobile device that has location awareness, safety information of those locations, a decision support system for alternate route computation, and automatic escalation. The ontologies used include the foundational ontology DOLCE, the domain ontology of semantic sensor networks (SSN) from the W3C, and specific schemas developed in-house. Another project on a knowledge-based chatbot for healthcare policies links up DOLCE, schema.org, and some in-house schemas with Highmark-specific information (and is not ashamed of using SKOS). Om my question what methods and methodologies were used for the in-house ontology development, the (disappointing) answer was, unfortunately, only “DOLCE and OntoClean”, but the former is neither a method nor a methodology (it implies a top-down approach), and the latter is some 15 years old, as if nothing has happened in ontology engineering in the meantime (more about that further below). Regardless, it was good to see that ontologies are being used in industry.

The third keynote (slides) was by Riichiro Mizoguchi from the Japan Advanced Institute of Science and Technology (JAIST), on a state-centric methodology, which I’ll leave for a separate post.

Riichiro Mizoguchi during his keynote talk.

 

Presentations

The report on the presentations easily could take up several pages, but I’ll try to keep it short, lest otherwise this post never gets posted. The first session of the conference was on foundations. This included Antony Galton’s assessment of the treatment of time in upper ontologies [1]. It was mildly entertaining in that it turned out that BFO would need abstract things for its treatment of time (which it doesn’t have and doesn’t like) and adheres to Newtonian physics cf. the latest scientific theories. It is definitely on my list of papers to read in more detail. Another paper-for-printing to read is Torsten Hahmann’s work on mereotopology, which extends it to multidimensional space [2]. A nice bonus (though it ought not to be perceived as such) is that at least the theorems in the paper have been proved with Prover9 and Vampire (cf. having to double-check them manually). Laure Vieu presented a proposal for a graph-based approach to represent structure among the components of an entity [3], which is apparently different from the graph-based approach for representing molecules (within the Semantic Web context); I’ll have to look at that in more detail, for it sounds like it might be of some use for the parts aspects of part-whole relations.

Besides such theoretical contributions that are rather distant from applications, there were two of note that were motivated from praxis more clearly. One was about the ontological foundations of competition and the sort of competitive relations there are [4], which was presented by Tiago Prince Sales. The other one was presented by Pawel Garbacz, whose presentation conveyed more than the paper so as to get a real feel of the problem, being identity criteria for localities [5], with complicating use cases extracted from a Polish history project. He presented some examples of changes and a proposal for how to identify a locality/settlement. For instance, settlements can get moved altogether, have a population-only move, split into two, be merged, renamed and renamed again, deserted by a population and repopulated and renamed, and so on. When is it the same settlement and when is it another one? The paper [5] describes a first solution for identity criteria with an event-based approach to identity of localities.

My presentation on part-whole relations in Zulu language and culture [6] was scheduled in the ‘applications’ session, which had positive feedback and some pointers that may assist with future work.

 

venue during a Q&A session

Ontologies

Besides presentations, there was a discussion session on “what constitutes a good ontology paper?” for the Applied Ontology journal. Seeing the ontology papers at FOIS now, they should have done such as session for FOIS as well. There are four papers in the proceedings describing OWL files: “Amnestic forgery” (AF, conceptual metaphors) [7] presented by Mehwish Alam, UNiCS for research and innovation policy [8] presented by Fernando Roda, SAREF4Health [9] presented by João Moreira, and religious and spiritual belief (ORSB) [10] presented by Stefan Schulz. Skimming through each paper, AF, UNiCS and ORSB do not use a methodology explicitly, none of them uses existing methods, but they all do use a foundational or top-level ontology or the WordNet material, and then it’s cool enough to get into FOIS, apparently. This is a bit disappointing. At least SAREF4Health presented a set of competency questions, a systematic approach and broader framework, and some evaluation, and ORSB reuses not only top-level and top-domain ontologies but also tests some patterns. AF and ORSB have some interest to it as they’re addressing relatively novel modeling issues to solve and the ORSB discussion could be used more broadly for any “terms of dubious reference”. UNiCS is not really an ontology but an information model or, at best, a conceptual data model (e.g. calling “SCOPUS subject” an ontology is pushing it a bit too far); it makes their OBDA scenario easier to realize, true, but that’s a separate discussion. Fig 1 of SAREF4Health doesn’t look any better either, which has all the hallmarks of a plain UML Class Diagram (attributes with data types and such), with object diagram components attached and coloured in and annotated with OntoUML. SAREF4Health’s other downsides are things like “implementing the ontology as RDF” that just hurts to read (it is left implicit for AF that is plugged into the LOD cloud), as is the download in Turtle format (cf. the required exchange syntax of OWL 2), which isn’t even available at the provided link when you click on it (copy-paste gets you in the right direction), but is [I think] in some github sub-directory that has a whole bunch of ttl files with neither head nor tail, but one of them is called saref4health.ttl. On first inspection, it has plenty of data properties and data type use, and the class-as-instance issue here and there (e.g., ‘Rechargeable Lithium Polymer battery’ as instance cf. class), and others (e.g., a ‘series’ of measurements is not a subclass of a measurement) and very many classes directly subsumed by top, though some are knock-on effects from imports.

And then ontologists at FOIS deplored that there are many domain ontologies that are of poor quality and artifacts presented as ontologies but aren’t. The FOIS reviewers themselves apparently can’t even get their act together in the reviewing process, where artifacts that are sold as domain ontologies but aren’t (UNiCS, SAREF4Health) make it not only through the reviewing process but, moreover, even get a best paper award from the PC chairs (SAREF4Health). The PC chairs wanted to make a political statement to communicate that FOIS accepts domain ontology papers. It is good that the FOIS topics are becoming less narrow and I’m not saying they are pointless papers or lousy artifacts per sé—they are useful reference papers and UNiCS and SAREF4Health perform the application tasks they’re supposed to be performing, which is a good thing. Maybe, collectively, ontology developers can’t do better or don’t need to do better w.r.t. applied ontology? Either way, once upon a time there were principles for what ontologies are; what happened to that? Also, there are multiple methodologies for domain ontology development, and there are a myriad of methods and tools, which have been mostly ignored. For instance, using one foundational ontology over another ‘just because I know x’ is neither a scientific nor a sound engineering approach. There are comparisons, requirements, and a mix of the two to help you figure out which one is the best to use; an early tool for that is ONSET, the ONtology Selection and Explanation Tool, developed by Zubeida Khan (more data). To name one example.

Coincidentally, ontology engineering papers with such a content do not, or very rarely, make it into FOIS; but just that they don’t (because they’re typically not philosophical enough), doesn’t mean they don’t exist. Just in case a FOIS ontologist would like to explore methods, methodologies and tools for ontology development: ESWC, EKAW, and K-CAP are good/top conferences covering such topics in whole or in part, and Chapter 5 of the ontology engineering textbook provides a sampling as well (as do some other sections in Block II). Considering my critical comments, one may ask whether my ontologies and ontology papers are any better, or anyone else’s for that matter. Perhaps, perhaps not. You can check for yourself some of my recent papers on domain ontologies that also have OWL files[1] that I was involved in developing; one paper was intended as a reference paper for the domain ontology [11], another paper was a bit of both domain ontology and some framework [12], and yet another turned into a core ontology [13] (v1, with the main categories; there’s an updated version for the relations).

Anyway, returning to the first sentence of this section: the open forum discussion did not make it any clearer as to what would be the characteristics of a good ontology paper for the Applied Ontology journal (or FOIS, for that matter). Mainly just Protégé screenshots certainly is not, but opinions varied as to what would be. Going by examples of the ontology papers that made it through: use of a top-level or foundational ontology and some modeling issues and solutions seems to be preferred, evaluation and usage & uptake as a nice-to-have. Is developing an (domain) ontology science? That question wasn’t answered unanimously; I think it was leaning towards a ‘mostly no’ w.r.t. applied ontology but it may be if it’s the first to solve a modeling issue. How to evaluate the ontology? Another question without a satisfactory answer. Overall, the criteria for an ontology paper—let alone for the ontology itself—are “TBD” and meanwhile one has to hope that one will get a supportive ‘reviewer 2’.

 

Other

In case you have clicked-though to one or more of the listed papers, you may have noticed that the FOIS’18 proceedings are Open Access—paid for by those who registered for the conference (it was calculated in the registration fee). I suppose the next FOIS organisers and the IAOA exec may like your opinion on that approach.

mentors of the early career symposium papers

Besides the best paper award for SAREF4Health [9], there were two “distinguished paper awards”, which went to aforementioned paper on the graph-based approach for structured universals by Laure Vieu and Claudio Masolo [3] and to the foundational ontologies for units of measure by Michael Grüninger and co-authors [14]. The early career symposium went well and from hearsay they had a good social activity, too. There were lots of interesting conversations, networking, good food, and so on, and lots more to write about. There are also more photos.

Some of the postgraduate students and a recent PhD graduate in the spotlight at the closing ceremony, being thanked for chairing the sessions.

Last, but not least: the next FOIS in 2020 will be in Bolzano, Italy, as part of a ‘Bolzano summer of knowledge’ with more co-located conferences, workshops, and summer schools.

 

References

[1] Antony Galton. The treatment of time in upper ontologies. Proc. of FOIS’18. IOS Press, 306: 33-46.

[2] Thorsten Hahmann. On Decomposition Operations in a Theory of Multidimensional Qualitative Space. Proc. of FOIS’18. IOS Press, 306: 173-186.

[3] Claudio Masolo, Laure Vieu. Graph-Based Approaches to Structural Universals and Complex States of Affairs. Proc. of FOIS’18. IOS Press, 306: 69-82.

[4] Tiago Prince Sales, Daniele Porello, Nicola Guarino, Giancarlo Guizzardi, John Mylopoulos. Ontological Foundations of Competition. Proc. of FOIS’18. IOS Press, 306: 96-112.

[5] Pawel Garbacz, Agnieszka Ławrynowicz, Bogumił Szady. Identity criteria for localities. Proc. of FOIS’18. IOS Press, 306: 47-56.

[6] C. Maria Keet, Langa Khumalo. On the Ontology of Part-Whole Relations in Zulu Language and Culture. Proc. of FOIS’18. IOS Press, 306: 225-238.

[7] Aldo Gangemi, Mehwish Alam, Valentina Presutti. Amnestic Forgery: An Ontology of Conceptual Metaphors. Proc. of FOIS’18. IOS Press, 306: 159-172.

[8] Alessandro Mosca, Fernando Roda, Guillem Rull. UNiCS – The Ontology for Research and Innovation Policy Making. Proc. of FOIS’18. IOS Press, 306: 200-210.

[9] João Moreira, Luís Ferreira Pires, Marten van Sinderen, Laura Daniele. SAREF4health: IoT Standard-Based Ontology-Driven Healthcare Systems. Proc. of FOIS’18. IOS Press, 306: 239-252.

[10] Stefan Schulz, Ludger Jansen. Towards an Ontology of Religious and Spiritual Belief. Proc. of FOIS’18. IOS Press, 306: 253-260.

[11] Keet, C.M., Lawrynowicz, A., d’Amato, C., Kalousis, A., Nguyen, P., Palma, R., Stevens, R., Hilario, M. The Data Mining OPtimization ontology. Web Semantics: Science, Services and Agents on the World Wide Web, 2015, 32:43-53.

[12] Chavula, C., Keet, C.M. An Orchestration Framework for Linguistic Task Ontologies. 9th Metadata and Semantics Research Conference (MTSR’15), Garoufallou, E. et al. (Eds.). Springer CCIS vol. 544, 3-14.

[13] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

[14] Michael Grüninger, Bahar Aameri, Carmen Chui, Torsten Hahmann, Yi Ru. Foundational Ontologies for Units of Measure. Proc. of FOIS’18. IOS Press, 306: 211-224.

[1] I have others developed as part of methods & tools research

ISAO 2018, Cape Town, ‘trip’ report

The Fourth Interdisciplinary School on Applied Ontology has just come to an end, after five days of lectures, mini-projects, a poster session, exercises, and social activities spread over six days from 10 to 15 September in Cape Town on the UCT campus. It’s not exactly fair to call this a ‘trip report’, as I was the local organizer and one of the lecturers, but it’s a brief recap ‘trip report kind of blog post’ nonetheless.

The scientific programme consisted of lectures and tutorials on:

The linked slides (titles of the lectures, above) reveal only part of the contents covered, though. There were useful group exercises and plenary discussion with the ontological analysis of medical terms such as what a headache is, a tooth extraction, blood, or aspirin, an exercises on putting into practice the design process of a conceptual modelling language of one’s liking (e.g.: how to formalize flowcharts, including an ontological analysis of what those elements are and ontological commitments embedded in a language), and trying to prove some theorems of parthood theories.

There was also a session with 2-minute ‘blitztalks’ by participants interested in briefly describing their ongoing research, which was followed by an interactive poster session.

It was the first time that an ISAO had mini-projects, which turned out to have had better outcomes than I expected, considering the limited time available for it. Each group had to pick a term and investigate what it meant in the various disciplines (task description); e.g.: what does ‘concept’ or ‘category’ mean in psychology, ontology, data science, and linguistics, and ‘function’ in manufacturing, society, medicine, and anatomy? The presentations at the end of the week by each group were interesting and most of the material presented there easily could be added to the IAOA Education wiki’s term list (an activity in progress).

What was not a first-time activity, was the Ontology Pub Quiz, which is a bit of a merger of scientific programme and social activity. We created a new version based on questions from several ISAO’18 lecturers and a few relevant questions created earlier (questions and answers; we did only questions 1-3,6-7). We tried a new format compared to the ISAO’16 quiz and JOWO’17 quiz: each team had 5 minutes to answer a set of 5 questions, and another team marked the answers. This set-up was not as hectic as the other format, and resulted in more within-team interaction cf. among all participants interaction. As in prior editions, some questions and answers were debatable (and there’s still the plan to make note of that and fix it—or you could write an article about it, perhaps :)). The students of the winning team received 2 years free IAOA membership (and chocolate for all team members) and the students of the other two teams received one year free IAOA membership.

Impression of part of the poster session area, moving into the welcome reception

As with the three previous ISAO editions, there was also a social programme, which aimed to facilitate getting to know one another, networking, and have time for scientific conversations. On the first day, the poster session eased into a welcome reception (after a brief wine lapse in the coffee break before the blitztalks). The second day had an activity to stretch the legs after the lectures and before the mini-project work, which was a Bachata dance lesson by Angus Prince from Evolution Dance. Not everyone was eager at the start, but it turned out an enjoyable and entertaining hour. Wednesday was supposed to be a hike up the iconic Table Mountain, but of all the dry days we’ve had here in Cape Town, on that day it was cloudy and rainy, so an alternative plan of indoor chocolate tasting in the Biscuit Mill was devised and executed. Thursday evening was an evening off (from scheduled activities, at least), and Friday early evening we had the pub quiz in the UCT club (the campus pub). Although there was no official planning for Saturday afternoon after the morning lectures, there was again an attempt at Table Mountain, concluding the week.

The participants came from all over the world, including relatively many from Southern Africa with participants coming also from Botswana and Mauritius, besides several universities in South Africa (UCT, SUN, CUT). I hope everyone has learned something from the programme that is or will be of use, enjoyed the social programme, and made some useful new contacts and/or solidified existing ones. I look forward to seeing you all at the next ISAO or, better, FOIS, in 2020 in Bolzano, Italy.

Finally, as a non-trip-report comment from my local chairing viewpoint: special thanks go to the volunteers Zubeida Khan for the ISAO website, Zola Mahlaza and Michael Harrison for on-site assistance, and Sam Chetty for the IT admin.

Review of ‘The web was done by amateurs’ by Marco Aiello

Via one of those friend-of-a-friend likes on social media that popped up in my stream, I stumbled upon the recently published book “The web was done by amateurs” (there’s also a related talk) by Marco Aiello, which piqued my interest both concerning the title and the author. I’ve met Aiello once in Trento, when a colleague and he had a departing party, with Aiello leaving for Groningen. He probably doesn’t remember me, nor do I remember much of him—other than his lamentations about Italian academia and going for greener pastures. Turns out he’s done very well for himself academically, and the foray into writing for the general public has been, in my opinion, a fairly successful attempt with this book.

The short book—it easily can be read in a weekend—starts in the first part with historical notes on who did what for the Internet (the infrastructure) and the multiple predecessor proposals and applications of hyperlinking across documents that Tim Berners-Lee (TBL) apparently was blissfully unaware of. It’s surely a more interesting and useful read than the first Google hit, the few factoids from W3C, or Wikipedia one can find online with a simple search—or: it pays off to read books still in this day and age :). The second part is for most readers, perhaps, also still history: the ‘birth’ of the Web and the browser wars in the mid 1990s.

Part III is, in my opinion, the most fun to read: it discusses various extensions to the original design of TBL’s Web that fixes, or at least aims to fix, a shortcoming of the Web’s basics, i.e., they’re presented as “patches” to patch up a too basic—or: rank-amateur—design of the original Web. They are, among others, persistence with cookies to mimic statefulness for Web-based transactions (for, e.g., buying things on the web), trying to get some executable instructions with Java (ActiveX, Flash), and web services (from CORBA, service-oriented computing, to REST and the cloud and such). Interestingly, they all originate in the 1990s in the time of the browser wars.

There are more names in the distant and recent history of the Web that I knew of, so even I picked up a few things here or there. IIRC, they’re all men, though. Surely there would be at least one woman worthy of mention? I probably ought to know, but didn’t, so I searched the Web and easily stumbled upon the Internet Hall of Fame. That list includes Susan Estrada among the pioneers, who founded CERFnet that “grew the network from 25 sites to hundreds of sites.”, and, after that, Anriette Esterhuysen and Nancy Hafkin for the network in Africa, Qiheng Hu for doing this for China, and Ida Holz for the same in Latin America (in ‘global connections’). Web innovators specifically include Anne-Marie Eklund Löwinder for DNS security extensions (DNSSEC, noted on p143 but not by its inventor’s name) and Elizabeth Feinler for the “first query-based network host name and address (WHOIS) server” and “she and her group developed the top-level domain-naming scheme of .com, .edu, .gov, .mil, .org, and .net, which are still in use today”.

One patch to the Web that I really missed in the overview of the early patches, is the “Web 2.0”. I know that, technologically, it is a trivial extension to TBL’s original proposal: the move from static web pages in 1:n communication from content provider to many passive readers, to m:n communication with comment sections (fancy forms), or: instead of the surfer being just a recipient of information by reading one webpage after another and thinking her own thing of it, to be able to respond and interact, i.e., the chatrooms, the article and blog comment features, and, in the 2000s, the likes of MySpace and Facebook. It got so many more people involved in it all.

Continuing with the book’s content, cloud computing and the fog (section 7.9) are from this millennium, as is, what Aiello dubbed, the “Mother of All Patches.”: the Semantic Web. Regarding the latter, early on in the book (pp. vii-viii) there is already an off-hand comment that does not bode well: “Chap. 8 on the Semantic Web is slightly more technical than the rest and can be safely skipped.” (emphasis added). The way Chapter 8 is written, perhaps. Before discussing his main claim there, a few minor quibbles: it’s the Web Ontology Language OWL, not “Ontology Web Language” (p105), and there’s OWL 2 as successor of the OWL of 2004. “RDF is a nifty combination of being a simple modeling language while also functioning as an expressive ontological language” (p104), no: RDF is for representing data, not really for modeling, and most certainly would not be considered an ontology language (one can serialize an ontology in RDF/XML, but that’s different). Class satisfiability example: no, that’s not what it does, or: the simplification does not faithfully capture it; an example with a MammalFish that cannot have any instances (as subclass of both Mammal and Fish that are disjoint), would have been (regardless the real world).

The main claim of Aiello regarding the Semantic Web, however, is that it’s been that time to throw in the towel, because there hasn’t been widespread uptake of Semantic Web technologies on the Web even though it was proposed already around the turn of the millenium. I lean towards that as well and have reduced the time spent on it from my ontology engineering course over the years, but don’t want to throw out the baby with the bathwater just yet, for two reasons. First, scientific results tend to take a long time to trickle down. Second, I am not convinced that the ‘semantic’ part of the Web is the same level of end-user stuff as playing with HTML is. I still have an HTML book from 1997. It has instructions to “design your first page in 10 minutes!”. I cannot recall if it was indeed <10 minutes, but it sure was fast back in 1998-1999 when I made my first pages, as a non-IT interested layperson. I’m not sure if the whole semantics thing can be done even on the proverbial rainy Sunday afternoon, but the dumbed down version with schema.org sort of works. This schema.org brings me to p110 of Aiello’s book, which states that Google can make do with just statistics for optimal search results because of its sheer volume (so bye-bye Semantic Web). But it is not just stats-based: even Google is trying with schema.org and its “knowledge graph”; admitted, it’s extremely lightweight, but it’s more than stats-only. Perhaps the schema.org and knowledge graph sort of thing are to the Semantic Web what TBL’s proposal for the Web was to, say, the fancier HyperCard.

I don’t know if people within the Semantic Web research community would think of its tooling as technologies for the general public. I suspect not. I consider the development and use of ontologies in ontology-driven information systems as part of the ‘back office’ technologies, notwithstanding my occasional attempts to explain to friends and family what sort of things I’m working on.

What I did find curious, is that one of Aiello’s arguments for the Semantic Web’s failure was that “Using ontologies and defining what the meaning of a page is can be much more easily exploited by malicious users” (p110). It can be exploited, for sure, but statistics can go bad, very bad, too, especially on associations of search terms, the creepy amount of data collection on the Web, and bias built into the Machine Learning algorithms. Search engine optimization is just the polite terms for messing with ‘honest’ stats and algorithms. With the Semantic Web, it would a conscious decision to mess around and that’s easily traceable, but with all the stats-based approaches, it sneakishly can creep in whilst trying to keep up the veneer of impartiality, which is harder to detect. If it were a choice between two technology evils, I prefer the honest bastard cf. being stabbed in the back. (That the users of the current Web are opting for the latter does not make it the lesser of two evils.)

As to two possible new patches (not in the book and one can debate whether they are), time will tell whether a few recent calls for “decentralizing” the Web will take hold, or more fine-grained privacy that also entails more fine-grained recording of events (e.g., TBL’s solid project). The app-fication discussion (Section 10.1) was an interesting one—I hardly use mobile apps and so am not really into it—and the lock-in it entails is indeed a cause for concern for the Web and all it offers. Another section in Chapter 10 is IoT, which sounds promising and potentially scary (what would the data-hungry ML algorithms of the Web infer from my fridge contents, and from that, about me??)—for the past 10 years or so. Lastly, the final chapter has the tempting-to-read title “Should a new Web be designed?”, but the answer is not a clear yes or no. Evolve, it will.

Would I have read the book if I weren’t on sabbatical now? Probably still, on an otherwise ‘lost time’ intercontinental trip to a conference. So, overall, besides the occasional gap and one could quibble a bit here and there, the book is a nice read on the whole for any lay-person interested in learning something about the ubiquitous Web, any expert who’s using only a little corner of it, and certainly for the younger generation to get a feel for how the current Web came about and how technologies get shaped in praxis.