What about ethics and responsible data integration and data firewalls?

With another level 4 lockdown and a curfew from 9pm for most of July, I eventually gave in and decided to buy a TV, for some diversion with the national TV channels. In the process of buying, it appeared that here in South Africa, you have to have a valid paid-up TV licence to be allowed to buy a TV. I had none yet. So there I was in the online shopping check-out on a Sunday evening being held up by a message that boiled down to a ‘we don’t recognise your ID or passport number as having a TV licence’. As advances in the state’s information systems would have it, you can register for a TV licence online and pay with credit card to obtain one near-instantly. The interesting question from an IT perspective then was: how long will it take for the online retailer to know I duly registered and paid for the licence? In other words: are the two systems integrated and if so, how? It definitely is not based on a simple live SPARQL query from the retailer to a SPARQL endpoint of the TV licences database, as I still failed the retailer’s TV licence check immediately after payment of the licence and confirmation of it. Some time passed with refreshing the page and trying again and writing a message to the retailer, perhaps 30-45 minutes or so. And then it worked! A periodic data push or pull it is then, either between the licence database and the retailer or within the state’s back-end system and any front-end query interface. Not bad, not bad at all.

One may question from a privacy viewpoint whether this is the right process. Why could I not simply query by, say, just TV licence number and surname, but having had to hand over my ID or passport number for the check? Should it even be the retailer’s responsibility to check whether their customer has paid the tax?

There are other places in the state’s systems where there’s some relatively advanced integration of data between the state and companies as well. Notably, the SA Revenue Service (SARS) system pulls data from any company you work for (or they submit that via some ETL process) and from any bank you’re banking with to check whether you paid the right amount (if you owe them, they send the payment order straight to your bank, but you still have to click ‘approve’ online). No doubt it will help reduce fraud, and by making it easier to fill in tax forms, it likely will increase the amount collected and will cause less errors that otherwise may be costly to fix. Clearly, the system amounts to reduced privacy, but it remains within the legal framework—someone trying to evade paying taxes is breaking the law, rather—and I support the notion of redistributive taxation and to achieve that will as little admin as possible.

These examples do raise broader questions, though: when is data integration justified? Always? If not always, then when is it not? How to ensure that it won’t happen when it should not? Who regulates data integration, if anyone? Are there any guidelines or a checklist for doing it responsibly so that it at least won’t cause unintentional harm? Which steps in the data integration, if any, are crucial from a responsibility and ethical point of view?

No good answers

pretty picture of a selection of data integration tasks. source: https://datawarehouseinfo.com/wp-content/uploads/2018/10/data-integration-1024x1022.png
pretty picture of a selection of data integration tasks. (source: dwh site)

I did search for academic literature, but found only one paper mentioning we should think of at least some of these sort of questions [1]. There are plenty of ethics & Big Data papers (e.g., [2,3]), but those papers focus on the algorithms let loose on the data and consequences thereof once the data has been integrated, rather than yes/no integration or any of the preceding integration processes themselves. There are, among others, data cleaning, data harmonisation and algorithms for that, schema-based integration (LAV, GAV, or GLAV), conceptual model-based integration, ontology-driven integration, possibly recurring ETL processes and so on, and something may go wrong at each step or may be the fine-grained crucial component of the ethical considerations. I devised one toy example in the context of ontology-based data access and integration where things would go wrong because of a bias [4] in that COVID-19 ontology that has data integration as its explicit purpose [5]. There are also informal [page offline dd 25-7-2021] descriptions of cases where things went wrong, such as the data integration issues with the City of Johannesburg that caused multiple riots in 2011, and no doubt there will be more.

Taking the ‘non-science’ route further to see if I could find something, I did find a few websites with some ‘best practices’ and ‘guidelines’ for data integration (e.g., here and here), with the brand new and most comprehensive set of data integration guidelines at end-user level by UN’s ESCAP that focuses on data integration for statistics offices on what to do and where errors may creep in [6]. But that’s all. No substantive hits with ‘ethics in data integration’ and similar searches in the academic literature. Maybe I’m searching in the wrong places. Wading through all ‘data ethics’ papers to find the needle in the haystack may have to be done some other time. If you know of scientific literature that I missed specifically regarding data integration, I’d be most grateful if you’d let me know.

The ‘recurring reliables’ for issues: health and education

Meanwhile, to take a step toward an answer of at least a subset of the aforementioned questions, let me first mention two other recent cases, also from South Africa, although the second issue happened in the Netherlands as well.

The first one is about healthcare data. I’m trying to get a SARS-CoV-2 vaccine. Registration for the age group I’m in opened on the 14th in the evening and so I did register in the state’s electronic vaccination data system (EVDS), which is the basic requirement for getting a vaccine. The next day, it appeared that we could book a slot via the health insurance I’m a member of. Their database and the EVDS are definitely not integrated, and so my insurer spammed me for a while with online messages in red, via email, and via SMS that I should register with the EVDS, even though I had already done that well before trying out their app.

Perhaps the health data are not integrated because it’s health; perhaps it was just time pressure to not delay the SARS-CoV-2 vaccination programme rollout. For some sectors, such as the basic education sector and then the police, they got loaded into the EVDS by the respective state department in one go via some ETL process, rather than people having to bother with individual registration. ID number, names, health insurance, dependants, home address, phone number, and whatnot that the EVDS asked for. And that regardless whether you want the vaccine or not—at least most people do. I don’t recall anyone having had a problem with that back-end process that it happened, aside from reported glitches in the basic education sectors’ ETL process, with reports on missing foreign national teachers and employees of independent schools who wanted in but weren’t.

Both the IT systems for vaccination management and any app for a ‘pass’ for having been vaccinated enjoys some debates on privacy internationally. Should they be self-standing systems? If it is allowed some integration, then with what? Should a healthcare provider or insurer be informed of the vaccination status of a member (and, consequently, act accordingly, whatever that may be), only if the member voluntarily discloses it (like with the vaccination scheduling app), or never? One’s employer? The movie theatre or mall you may want to enter? Perhaps airline companies want access to the vaccine database as well, who could choose to only let vaccinated people on their planes? The latter happens with other vaccinations for sure; e.g., yellow fever vaccination proof to enter SA from some countries, which the airline staff did ask for when I checked in in Argentina when travelling back to SA in 2012. That vaccination proof had gone into the physical yellow fever vaccination booklet that I carried with me; no app was involved in that process, ever. But now more things are digital. Must any such ‘covid-19 pass’ necessarily be digital? If so, who decides who, if anyone, will get access to the vaccination data, be it the EVDS data in SA or their homologous systems in other countries? To the best of my knowledge, no regulations exist yet. Since the EVDS is an IT system of the state, I presume they will decide. If they don’t, it will be up to the whims of each company, municipality, or province, and then is bound to generate lots of confusion among people.

The other case of a different nature comes in the news regularly; e.g., here, here, and here. It’s the tension that exists between children’s right to education and the paperwork to apply for a school. This runs into complications when they have an “undocumented” status, be it because of an absent birth certificate or their and their parent’s status as legal/illegal and their related ID documents or the absence thereof. It is forbidden for a school to contact Home affairs to get the prospective pupil’s and their respective parents’/guardians’ status, and for Home Affairs to provide that data to the schools, let alone integrate those two database at the ministerial level. Essentially, it is an intentional ‘Chinese wall’ between the two databases: the right to education of a child trumps any possible violation of legality of stay in the country or missing paperwork of the child or their parents/guardians.

Notwithstanding, exclusive or exclusionary schools try to filter them out by other means, such as by demanding that sort of data when you want to apply for admission; here’s an example, compared to public schools where evidence of an application for permission to stay suffices or at least evidence of efforts to engage with Home Affairs will do already. When the law says ‘no’ to the integration, how can you guarantee it won’t happen, neither through the software nor by other means (like by de facto requiring the relevant data stored in the Home Affairs database in an admission form)? Policing it? People reporting it somewhere? Would requesting such information now be a violation of the Protection of Personal Information Act (POPIA) that came into force on the 1st of July, since it asks for more personal data than needed by law?

Regulatory aspects

These cases—TV licence, SARS (the tax, not the syndrome), vaccine database, school admissions—are just a few anecdotes. Data integration clearly is not always allowed and when it is not, it has been a deliberate decision not to do so because its outcome is easy to predict and deemed unwanted. Notably for the education case, it is the government who devised the policy for a regulatory Chinese wall between its systems. The TV licence appears to lie at the other end of the spectrum. The broadcasting act of 1999 implicitly puts the onus on the seller of TVs: the licence is not a fee to watch public TV, it is a thing to give the licence holder the right to use a TV (article 27, if you must know), so if you don’t have the right to have it, then you can’t buy it. It’s analogous to having to be over 18 to buy alcohol, where the seller is held culpable if the buyer isn’t. That said, there are differences in what the seller requests from the customer: Makro requires the licence number only and asks for ID only if you can’t remember the licence number so as to ‘help you find it’, whereas takealot demands both ID and licence in any case, and therewith perhaps is then asking for more than strictly needed. Either way, since any retailer thus should be able to access the licence information instantly to check whether you have the right to own a TV, it’s a bit like as if “come in and take my data” is written all over the TV licence database. I haven’t seen any news articles about abuse.

For the SARS-CoV-2 vaccine and the EVDS data, there is, to the best of my knowledge, no specific regulation in place from the EVDS to third parties, other than that vaccination is voluntary and there is SA’s version of the GDPR, the aforementioned POPIA, which is based on the GDPR principles. I haven’t seen much debate about organisations requiring vaccination, but they can make vaccination mandatory if they want to, from which follows that there will have to be some data exchange either between the EVDS and third parties or from EVDS to the person and from there to the company. Would it then become another “come in and take my data”? We’ll cross that bridge when it comes, I suppose; coverage is currently at about 10% of the population and not everyone who wants to could get vaccinated yet, so we’re still in a limbo.

What could possibly go wrong with widespread access, alike with the TV licence database? A lot, of course. There are the usual privacy and interoperability issues (also noted here), and there are calls even in the laissez faire USA to put a framework in place to provide companies with “standards and bounds”. They are unlikely going to be solved by the CommonPass of the Commons Project bottom-up initiative, since there are so many countries with so many rules on privacy and data sharing. Interoperability between some systems is one thing; one world-wide system is another cup of tea.

What all this boils down to is not unlike Moshe Vardi’s argument, in that there’s the need for more policy to reduce and avoid ethical issues in IT, AI, and computing, rather than that computing would be facing an ethics crisis [7]. His claim is that failures of policy cause problems and that the “remedy is public policy, in the form of laws and regulations”, not some more “ethics outrage”. Presumably, there’s no ethics crisis, of the form that there would be a lack of understanding of ethical behaviour among computer scientists and their managers. Seeing each year how students’ arguments improve between the start of the ethics course and at the end in the essay and exam, I’d argue that basic sensitization is still needed, but on the whole, more and better policy could go a long way indeed.

More research on possible missteps in the various data integration processes would also be helpful, and that from a technical angle, as would learning from case studies be, and contextual inquiries [8], as well as a rigorous assessment on possible biases, alike it was examined for software development processes [9]. Those outcomes then may end up as a set of guidelines for data integration practitioners and the companies they work for, and inform government to devise policies. For now, the ESCAP guidelines [6] probably will be of most use to a data integration practitioner. It won’t catch all biases and algorithmic issues & tools and assumes one is allowed to integrate already, but it is a step in the direction of responsible data integration. I’ll think about it a bit more, too, and for the time being I won’t bother my students with writing an essay about ethics of data integration just yet.

References

[1] Firmani, D., Tanca, L., Torlone, R. Data processing: reflection on ethics. International Workshop on Processing Information Ethically (PIE’19). CEUR-WS vol. 2417. 4 June 2019.

[2] Herschel, R., Miori, V.M. Ethics & Big Data. Technology in Society, 2017, 49:31‐36.

[3] Sax, M. Finders keepers, losers weepers. Ethics and Information Technology, 2016, 18: 25‐31.

[4] Keet, C.M. Bias in ontologies — a preliminary assessment. Technical Report, Arxiv.org, January 20, 2021. 10p

[5] He, Y., et al. 2020. CIDO: The Community-based CoronavirusInfectious Disease Ontology. In Hastings, J.; and Loebe, F., eds., Proceedings of the 11th international Conference on Biomedical Ontologies, CEUR-WS vol. 2807.

[6] Economic and Social Commission for Asia and the Pacific (ESCAP). Asia-Pacific Guidelines to Data Integration for Official Statistics. Training manual. 15 April 2021.

[7] Vardi, M.Y. Are We Having An Ethical Crisis in Computing? Communications of the ACM, 62(1):7

[8] McKeown, A., Cliffe, C., Arora, A. et al. Ethical challenges of integration across primary and secondary care: a qualitative and normative analysis. BMC Med Ethics 20, 42 (2019).

[9] R. Mohanani, I. Salman, B. Turhan, P. Rodriguez, P. Ralph, Cognitive biases in software engineering: A systematic mapping study, IEEE Transactions on Software Engineering, 46 (2020): 1318–1339.

NLG requirements for social robots in Sub-Saharan Africa

When the robots come rolling, or just trickling or seeping or slowly creeping, into daily life, I want them to be culturally aware, give contextually relevant responses, and to do that in a language that the user can understand and speak well. Currently, they don’t. Since I work and in live in South Africa, then what does all that mean for the Southern Africa context? Would social robot use case scenarios be different here than in the Global North where most of the robot research and development is happening, and if so, how? What is meant with  contextually relevant responses? Which language(s) should the robot communicate in?

The question of which languages is the easiest to answer: those spoken in this region, which are mainly those in the Niger-Congo B [NCB] (aka ‘Bantu’) family of languages, and then also Portuguese, French, Afrikaans, and English. I’ve been working on theory and tools for NCB languages, and isiZulu in particular (and some isiXhosa and Runyankore), whose research was mainly as part of the two NRF-funded projects GeNI and MoReNL. However, if we don’t know how that human-robot interaction occurs in which setting, we won’t know whether the algorithms designed so far can also be used for that, which may well be beyond the ontology verbalisation, a patient’s medicine prescription generation, weather forecasts, or language learning exercises that we roughly got covered for the controlled language and natural language generation aspects of it.

So then what about those use case scenarios and contextually relevant responses? Let me first give an example of the latter. A few years ago in one of the social issues and professional practice lectures I was teaching, I brought in the Amazon Echo to illustrate precisely that as well as privacy issues with Alexa and digital assistants (‘robot secretaries’) in general. Upon asking “What is the EFF?”, the whole class—some 300 students present at the time—was expecting that Alexa would respond with something like “The EFF is the economic freedom fighters, a political party in South Africa”. Instead, Alexa fetched the international/US-based answer and responded with “The EFF is the electronic frontier foundation” that the class had never heard of and that EFF doesn’t really do anything in South Africa (it does pass the revue later on in the module nonetheless, btw). There’s plenty of online content about the EFF as political party, yet Alexa chose to ignore that and prioritise information from elsewhere. Go figure with lots of other information that has limited online presence and doesn’t score high in the search engine results because there are fewer queries about it. How to get the right answer in those cases is not my problem (area of expertise), but I take that a solved black box and zoom in on the natural language aspects to automatically generate a sentence that has the answer taken from some structured data or knowledge.

The other aspect of this instance, is that the interactions both during and after the lecture was not a 1:1 interaction of students with their own version of Siri or Cortana and the like, but eager and curious students came in teams, so a 1:m interaction. While that particular class is relatively large and was already split into two sessions, larger classes are also not uncommon in several Sub-Saharan countries: for secondary school class sizes, the SADC average is 23.55 learners per class (the world average is 17), with the lowest is Botswana (13.8 learners) and the highest in Malawi with a whopping 72.3 learners in a class, on average. An educational robot could well be a useful way to get out of that catch-22, and, given resource constraints, end up as a deployment scenario with a robot per study group, and that in a multilingual setting that permits code switching (going back and forth between different languages). While human-robot interaction experts still will need to do some contextual inquiries and such to get to the bottom of the exact requirements and sentences, this variation in use is on top of the hitherto know possible ways for educational robots.

Going beyond this sort of informal chatter, I tried to structure that a bit and narrowed it down to a requirements analysis for the natural language generation aspects of it. After some contextualisation, I principally used two main use cases to elucidate natural language generation requirements and assessed that against key advances in research and technologies for NCB languages. Very, very, briefly, any system will need to i) combine data-to-text and knowledge-to-text, ii) generate many more different types of sentences, including sentences for  both written and spoken languages in the NCB languages that are grammatically rich and often agglutinating, and iii) process non-trivial numbers that is non-trivial to do for NCB languages because the surface realization of the numbers depend on the noun class of the noun that is being counted. At present, no system out there can do all of that. A condensed version of the analysis was recently accepted as a paper entitled Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa [1], for the IST-Africa’21 conference, and it will be presented there next week at the virtual event, in the ‘next generation computing’ session no less, on Wednesday the 12th of May.

Screen grab of the recording of the conference presentation (link to recording available upon request)

Probably none of you has ever heard of this conference. IST-Africa is yearly IT conference in Africa that aims to foster North-South and South-South  networking, promote the academia->industry and academia->policy bridge-creation and knowledge transfer pipelines, and capacity building for paper writing and presentation. The topics covered are distinctly of regional relevance and, according to its call for papers, the “Technical, Policy, Social Implications Papers must present analysis of early/final Research or Implementation Project Results, or business, government, or societal sector Case Study”.

Why should I even bother with an event like that? It’s good to sometimes reflect on the context and ponder about relevance of one’s research—after all, part of the university’s income (and thus my salary) and a large part of the research project funding I have received so far comes ultimately from the taxpayers. South African tax payers, to be more precise; not the taxpayers of the Global North. I can ‘advertise’, ahem, my research area and its progress to a regional audience. Also, I don’t expect that the average scientist in the Global North would care about HRI in Africa and even less so for NCB languages, but the analysis needed to be done and papers equate brownie points. Also, if everyone thinks to better not participate in something locally or regionally, it won’t ever become a vibrant network of research, applied research, and technology. I’ve attended the event once, in 2018 when we had a paper on error correction for isiZulu spellcheckers, and from my researcher viewpoint, it was useful for networking and ‘shopping’ for interesting problems that I may be able to solve, based on other participants’ case studies and inquiries.

Time will tell whether attending that event then and now this paper and online attendance will be time wasted or well spent. Unlike the papers on the isiZulu spellcheckers that reported research and concrete results that a tech company easily could take up (feel free to do so), this is a ‘fluffy’ paper, but exploring the use of robots in Africa was an interesting activity to do, I learned a few things along the way, it will save other interested people time in the analysis phase, and hopefully it also will generate some interest and discussion about what sort of robots we’d want and what they could or should be doing to assist, rather than replace, humans.

p.s.: if you still were to think that there are no robots in Africa and deem all this to be irrelevant: besides robots in the automotive and mining industries by, e.g., Robotic Innovations and Robotic Handling Systems, there are robots in education (also in Cape Town, by RD-9), robot butlers in hotels that serve quarantined people with mild COVID-19 in Johannesburg, they’re used for COVID-19 screening in Rwanda, and the Naledi personal banking app by Botlhale, to name but a few examples. Other tools are moving in that direction, such as, among others, Awezamed’s use of speech synthesis with (canned) text in isiZulu, isiXhosa and Afrikaans and there’s of course my research group where we look into knowledge-to-text text generation in African languages.

References

[1] Keet, C.M. Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa. IST-Africa 2021, 10-14 May 2021, online. in print.

The ontological commitments embedded in a representation language

Just like programming language preferences generate heated debates, this happens every now and then with languages to represent ontologies as well. Passionate dislikes for description logics or limitations of OWL are not unheard of, in favour of, say, Common Logic for more expressiveness and a different notation style, or of OBO because of its graph-based fundamentals, or that abuse of UML Class Diagram syntax  won’t do as approximation of an OWL file. But what is really going on here? Are they practically all just the same anyway and modellers merely stick with, and defend, what they know? If you could design your pet language, what would it look like?

The short answer is: they are not all the same and interchangeable. There are actually ontological commitments baked into the language, even though in most cases this is not explicitly stated as such. The ‘things’ one has in the language indicate what the fundamental building blocks are in the world (also called “epistemological primitives” [1]) and therewith assume some philosophical stance. For instance, a crisp vs vague world (say, plain OWL or a fuzzy variant thereof) or whether parthood is such a special relation that it deserves its own primitive next to class subsumption (alike UML’s aggregation). Or maybe you want one type of class for things indicated with count nouns and another type of element for stuffs (substances generally denoted with mass nouns). This then raises the question as to what the sort of commitments are that are embedded in, or can go into, a language specification and that have an underlying philosophical point of view. This, in turn, raises the question about which philosophical stances actually can have a knock-on effect on the specification or selection of an ontology language.

My collaborator, Pablo Fillottrani, and I tried to answer these questions in the paper entitled An Analysis of Commitments in Ontology Language Design that was published late last year as part of the proceedings of the 11th Conference on Formal Ontology in Information Systems 2020 that was supposed to have been held in September 2020 in Bolzano, Italy. In the paper, we identified and analysed ontological commitments that are, or could have been, embedded in logics, and we showed how they have been taken for well-known languages for representing ontologies and similar artefacts, such as OBO, SKOS, OWL 2DL, DLRifd, and FOL. We organised them in four main categories: what the very fundamental furniture is (e.g., including roles or not, time), acknowledging refinements thereof (e.g., types of relations, types of classes), the logic’s interaction with natural language, and crisp vs various vagueness options. They are discussed over about 1/3 of the paper.

Obviously, engineering considerations can interfere in the design of the logic as well. They concern issues such as how the syntax should look like and whether scalability is an issue, but this is not the focus of the paper.

We did spend some time contextualising the language specification in an overall systematic engineering process of language design, which is summarised in the figure below (the paper focuses on the highlighted step).

(source: [2])

While such a process can be used for the design of a new logic, it also can be used for post hoc reconstructions of past design processes of extant logics and conceptual data modelling languages, and for choosing which one you want to use. At present, the documentation of the vast majority of published languages do not describe much of the ‘softer’ design rationales, though.  

We played with the design process to illustrate how it can work out, availing also of our requirements catalogue for ontology languages and we analysed several popular ontology languages on their commitments, which can be summed up as in the table shown below, also taken from the paper:

(source: [2])

In a roundabout way, it also suggests some explanations as to why some of those transformation algorithms aren’t always working well; e.g., any UML-to-OWL or OBO-to-OWL transformation algorithm is trying to shoe-horn one ontological commitment into another, and that can only be approximated, at best. Things have to be dropped (e.g., roles, due to standard view vs positionalism) or cannot be enforced (e.g., labels, due to natural language layer vs embedding of it in the logic), and that’ll cause some hick-ups here and there. Now you know why, and that won’t ever work well.

Hopefully, all this will feed into a way to help choosing a suitable language for the ontology one may want to develop, or assist with understanding better the language that you may be using, or perhaps gain new ideas for designing a new ontology language.

References

[1] Brachman R, Schmolze J. An overview of the KL-ONE Knowledge Representation System. Cognitive Science. 1985, 9:171–216.

[2] Fillottrani, P.R., Keet, C.M. An Analysis of Commitments in Ontology Language Design. Proc. of FOIS 2020. Brodaric, B. and Neuhaus, F. (Eds.). IOS Press. FAIA vol. 330, 46-60.

An architecture for Knowledge-driven Information and Data access: KnowID

Advanced so-called ‘intelligent’ information systems may use an ontology or runtime-suitable conceptual data modelling techniques in the back end combined with efficient data management. Such a set-up aims to provide a way to better support informed decision-making and data integration, among others. A major challenge to create such systems, is to figure out which components to design and put together to realise a ‘knowledge to data’ pipeline, since each component and process has trade-offs; see e.g., the very recent overview of sub-topics and challenges [1]. A (very) high level categorization of the four principal approaches is shown in the following figure: put the knowledge and data together in the logical theory the AI way (left) or the database way (right), or bridge it by means of mappings or by means of transformations (centre two):

Figure 1. Outline of the four approaches to relate knowledge and data in a system. (Source: adapted from [6])

Among those variants, one can dig into considerations like which logic to design or choose in the AI-based “knowledge with (little) data” (e.g.: which OWL species? common logic? Other?), which type of database (relational, object-relational, or rather an RDF store), which query language to use or design, which reasoning services to support, how expressive it all has to and optimized for what purpose. None is best in all deployment scenarios. The AI-only one with, say, OWL 2 DL, is not scalable; the database-only one either lacks interesting reasoning services or supports few types of constraints.

Among the two in the middle, the “knowledge mapping data” is best known under the term ‘ontology-based data access’ (OBDA) and the Ontop system in particular [2] with its recent extension into ‘virtual knowledge graphs’ and the various use cases [3]. Its distinguishing characteristic of the architecture is the mapping layer to bridge the knowledge to the data. In the “Data transformation knowledge” approach, the idea is to link the knowledge to the data through a series of transformations. No such system is available yet. Considering the requirements for that, it turned out that a good few components are already available and just needed one crucial piece of transformations to convincingly put that together.

We did just that and devised a new knowledge-to-data architecture. We dub this the KnowID architecture (pronounced as ‘know it’), abbreviated from Knowledge-driven Information and Data access. KnowID adds novel transformation rules between suitably formalised EER diagrams as application ontology and Borgida, Toman & Weddel’s Abstract Relational Model with SQLP ([4,5]) to complete the pipeline (together with some recently proposed other components). Overall, it then looks like this:

Figure 2. Overview of the KnowID architecture (Source: adapted from [6])

Its details are described in the article entitled “KnowID: an architecture for efficient Knowledge-driven Information and Data access” [6], which was recently publish in the Data Intelligence journal. In a nutshell: the logic-based EER diagram (with deductions materialised) is transformed into an abstract relational model (ARM) that is transformed into a traditional relational model and then onward to a database schema, where the original ‘background knowledge’ of the ARM is used for data completion (i.e., materializing the deductions w.r.t. the data), and then the query posed in SQLP (SQL + path queries) is answered over that ‘extended’ database.

Besides the description of the architecture and the new transformation rules, the open access journal article also describes several examples and it features a more detailed comparison of the four approaches shown in figure 1 above. For KnowID, compared to other ontology-based data access approaches, its key distinctive architectural features are that runtime use can avail of full SQL augmented with path queries, the closed world assumption commonly used in information systems, and it avoids a computationally costly mapping layer.

We are working on the implementation of the architecture. The transformation rules and corresponding algorithms were implemented last year [7] and two computer science honours students are currently finalising their 4th-year project, therewith contributing to the materialization and query formulation steps aspects of the architecture. The latest results are available from the KnowID webpage. If you were to worry that will suffer from link rot: the version associated with the Data Intelligence paper has been archived as supplementary material of the paper at [8]. The plan is, however, to steadily continue with putting the pieces together to make a functional software system.

References

[1] Schneider, T., Šimkus, M. Ontologies and Data Management: A Brief Survey. Künstl Intell 34, 329–353 (2020).

[2] Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M., Rodriguez-Muro, M., Xiao, G.: Ontop: Answering SPARQL queries over relational databases. Semantic Web Journal, 2017, 8(3), 471-487.

[3] G. Xiao, L. Ding, B. Cogrel, & D. Calvanese. Virtual knowledge graphs: An overview of systems and use cases. Data Intelligence, 2019, 1, 201-223.

[4] A. Borgida, D. Toman & G.E. Weddell. On referring expressions in information systems derived from conceptual modeling. In: Proceedings of ER’16, 2016, pp. 183–197

[5] W. Ma, C.M. Keet, W. Oldford, D. Toman & G. Weddell. The utility of the abstract relational model and attribute paths in SQL. In: C. Faron Zucker, C. Ghidini, A. Napoli & Y. Toussaint (eds.) Proceedings of the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18)), 2018, pp. 195–211.

[6] P.R. Fillottrani & C.M. Keet. KnowID: An architecture for efficient knowledge-driven information and data access. Data Intelligence, 2020 2(4), 487–512.

[7] Fillottrani, P.R., Jamieson, S., Keet, C.M. Connecting knowledge to data through transformations in KnowID: system description. Künstliche Intelligenz, 2020, 34, 373-379.

[8] Pablo Rubén Fillottrani, C. Maria Keet. KnowID. V1. Science Data Bank. http://www.dx.doi.org/10.11922/sciencedb.j00104.00015. (2020-09-30)

Digital Assistants and AMAs with configurable ethical theories

About a year ago, there was a bit of furore in the newspapers on digital assistants, like Amazon Echo’s Alexa, Apple’s Siri, or Microsoft’s Cortana, in a smart home to possibly snitch on you if you’re the marijuana-smoking family member [1,2]. This may be relevant if you live in a conservative state or country, where it is still illegal to do so. Behind it is a multi-agent system that would do some argumentation among the stakeholders (the kids, the parents, and the police). That example sure did get the students’ attention in the computer ethics class I taught last year. It did so too with an undergraduate student—double majoring in compsci and philosophy—who opted to do the independent research module. Instead of the multiple actor scenario, however, we considered it may be useful to equip such a digital assistant, or an artificial moral agent (AMA) more broadly, with multiple moral theories, so that a user would be able to select their preferred theory and let the AMA make the appropriate decision for her on whichever dilemma comes up. This seems preferable over an at-most-one-theory AMA.

For instance, there’s the “Mia the alcoholic” moral dilemma [3]: Mia is disabled and has a new model of the carebot that can fetch her alcoholic drinks in the comfort of her home. At some point, she’s getting drunk but still orders the carebot to bring her one more tasty cocktail. Should the carebot comply? The answer depends on one’s ethical viewpoint. If you had answered with ‘yes’, you probably would not want to buy a carebot that would refuse to serve you, and likewise vv. But how to make the AMA culturally and ethically more flexible to be able to adjust to the user’s moral preferences?

The first step in that direction has now been made by that (undergrad) research student, George Rautenbach, which I supervised. The first component is a three-layered approach, with at the top layer a ‘general ethical theory’ model (called Genet) that is expressive enough to be able to model a specific ethical theory, such as utilitarianism, ethical egoism, or Divine Command Theory. This was done for those three and Kantianism, so as to have a few differences in consequence-based or not, the possible ‘patients’ of the action, sort of principles, possible thresholds and such. These reside in the middle layer. Then there’s Mia’s egoism, the parent’s Kantian viewpoint about the marijuana, a train company’s utilitarianism to sort out the trolley problem, and so on at the bottom layer, which are instantiations of the respective specific ethical theories in the middle layer.

The Genet model was evaluated by demonstrating that those four theories can be modelled with Genet and the individual theories were evaluated with a few use cases to show that the attributes stored are relevant and sufficient for those reasoning scenarios for the individuals. For instance, eventually, Mia’s egoism wouldn’t get her another drink fetched by the carebot, but as a Kantian, she would have been served.

The details are described in the technical report “Toward Equipping Artificial Moral Agents with multiple ethical theories” [4] and the models are also available for download as XML files and an OWL file. To get all this to work in a device, there’s still the actual reasoning component to implement (a few architectures exist for that) and for a user to figure out which theory they actually subscribe to so as to have the device configured accordingly. And of course, there is a range of ethical issues with digital assistants and AMAs, but that’s a topic perhaps better suited for the SIPP (FKA computer ethics) module in our compsci programme [5] and other departments.

 

p.s.: a genet is also an agile cat-like animal mostly living in Africa, just in case you were wondering about the abbreviation of the model.

 

References

[1] Swain, F. AIs could debate whether a smart assistant should snitch on you. New Scientist, 22 February 2019. Online: https://www.newscientist.com/article/2194613-ais-could-debatewhether-a-smart-assistant-should-snitch-on-you/ (last accessed: 5 March 2020).

[2] Liao, B., Slavkovik, M., van der Torre, L. Building Jiminy Cricket: An Architecture for Moral Agreements Among Stakeholders. ACM Conference on Artificial Intelligence, Ethics, and Society 2019, Hawaii, USA. Preprint: arXiv:1812.04741v2, 7 March 2019.

[3] Millar, J. An ethics evaluation tool for automating ethical decision-making in robotsand self-driving cars. Applied Artificial Intelligence, 30(8):787–809, 2016.

[4] Rautenbach, G., Keet, C.M. Toward equipping Artificial Moral Agents with multiple ethical theories. University of Cape Town. arxiv:2003.00935, 2 March 2020.

[5] Computer Science Department. Social Issues and Professional Practice in IT & Computing. Lecture Notes. 6 December 2019.

Dancing algorithms and algorithms for dance apps

Browsing through teaching material a few years ago, I had stumbled upon dancing algorithms, which illustrate common algorithms in computing using dance [1] and couldn’t resist writing about since I used to dance folk dances. More of them have been developed in the meantime. The list further below has them sorted by algorithm and by dance style, with links to the videos on YouTube. Related ideas have also been used in mathematics teaching, such as for teaching multiplication tables with Hip Hop singing and dancing in a class in Cape Town, dancing equations, mathsdance [2], and, stretching the scope a bit, rapping fractions elsewhere.

That brought me to the notion of algorithms for dancing, which takes a systematic and mathematical or computational approach to dance. For instance, the maths in salsa [2] and an ontology to describe some of dance [3], and a few more, which go beyond the hard-to-read Labanotation that is geared toward ballet but not pair dancing, let alone a four-couple dance [4] or, say, a rueda (multiple pairs in a circle, passing on partners). Since there was little for Salsa dance, I had proposed a computer science & dance project last year, and three computer science honours students were interested to develop their Salsational Dance App. The outcome of their hard work is that now there’s a demonstrated-to-be-usable API for the data structure to describe moves (designed for beats that counts in multiples of four), a grammar for the moves to construct valid sequences, and some visualization of the moves, therewith improving on the static information from Salsa is good that counted as baseline. The data structure is extensible to other dance styles beyond Salsa that have multiples of four, such as Bachata (without the syncopations, true).

In my opinion, the grammar is the coolest component, since it is both from a scientific and from an engineering perspective the most novel aspect and it was technically the most challenging task of the project. The grammar’s expressiveness remained within a context-free grammar, which is computationally still doable. This may be because of the moves covered—the usual moves one learns during a Salsa beginners course—or maybe in any case. The grammar has been tested to cover a series of test cases in the system, which all worked well (whether all theoretically physically feasible sequences feel comfortable to dance is a separate matter). The parsing is done by the JavaCC parser, which carries out a formal verification to check if the sequence of moves is valid, and it even does that on-the-fly. That is, when a user selects a move during the planning of a sequence of moves, it instantly re-computes which one(s) of the moves in the system can follow the last one selected, as can be seen in the following screenshot.

source: http://projects.cs.uct.ac.za/honsproj/cgi-bin/view/2019/baijnath_chetty_marajh.zip/DEDANCE_website/images/parser/ui4.PNG

Screenshot of planning a sequence of moves.

The grammar thus even has a neat ‘wrapper’ in the form of an end-user usable tool, which was evaluated by several members of Evolution Dance Company in Cape Town. Special thanks go to its owner, Mr. Angus Prince, who served also as external expert on the project. Some more screenshots, the code, and the project papers by the three students—Alka Baijnath, Jordy Chetty, and Micara Marajh—are available from the CS honours project archive.

The project also showed that much more can be done, not just porting it to other dance styles, but also still for salsa. This concerns not only the grammar, but also how to encode moves in a user-friendly way and how to link it up to the graphics so that the ‘puppets’ will dance the sequence of moves selected, as well as meeting other requirements, such as a mobile app as ‘cheat sheet’ to quickly check a move during a social dance evening and choreography planning. Based on my own experiences goofing around writing down moves, the latter (choreography) seems to be less hard to realise [documenting, at least] than the ‘any move’ scenario. Either way, the honours projects topics are being finalised around now. Hopefully, there will be another 2-3 students interested in computing and dance this year, so we can build up a larger body of software tools and techniques to support dance.

 

Dancing algorithms by type

Sorting

– Quicksort as a Hungarian folk dance

– Bubble sort as a Hungarian dance, Bollywood style dance, and with synthetic music

– Shell sort as a Hungarian dance.

– Select sort as a gypsy/Roma dance

– Merge sort in ‘Transylvananian-Saxon’ dance style

– Insert sort in Romanian folk dance style

– Heap sort also as in Hungarian folk dance style

There are more sorting algorithms than these, though, so there’s plenty of choice to pick your own. A different artistic look at the algorithms is this one, with 15 sorts that almost sound like music (but not quite).

Searching

– Linear search in Flamenco style

– Binary search also in Flamenco style

Backtracking as a ballet performance

 

Dancing algorithms by dance style

European folk:

– Hungarian dance for the quicksort, bubble sort, shell sort, and heap sort;

– Roma (gypsy) dance for the select sort;

– Transylvananian-saxon dance for the merge sort;

– Romanian dance for an insert sort.

Latin American folk: Flamenco dancing a binary search and a linear search.

Bollywood dance where students are dancing a bubble sort.

Classical (Ballet) for a backtracking algorithm.

Modern (synthetic music) where a class attempts to dance a bubble sort.

 

That’s all for now. If you make a choreography for an algorithm, get people to dance it, record it, and want to have the video listed here, feel free to contact me and I’ll add it.

 

References

[1] Zoltan Katai and Laszlo Toth. Technologically and artistically enhanced multi-sensory computer programming education. Teaching and Teacher Education, 2010, 26(2): 244-251.

[2] Stephen Ornes. Math Dance. Proceedings of the National Academy of Sciences of the United States of America 2013. 110(26): 10465-10465.

[3] Christine von Renesse and Volker Ecke. Mathematics and Salsa Dancing. Journal of Mathematics and the Arts, 2011, 5(1): 17-28.

[4] Katerina El Raheb and Yannis Yoannidis. A Labanotation based ontology for representing dance movement. In: Gesture and Sign language in Human-Computer Interaction and Embodied Communication (GW’11). Springer, LNAI vol. 7206, 106-117. 2012.

[5] Michael R. Bush and Gary M. Roodman. Different partners, different places: mathematics applied to the construction of four-couple folk dances. Journal of Mathematics and the Arts, 2013, 7(1): 17-28.

DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

  1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
  2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
  3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

Some experiences on making a textbook available

I did make available a textbook on ontology engineering for free in July 2018. Meanwhile, I’ve had several “why did you do this and not a proper publisher??!?” I had tried to answer that already in the textbook’s FAQ. Turns out that that short answer may be a bit too short after all. So, here follows a bit more about that.

The main question I tried to answer in the book’s FAQ was “Would it not have been better with a ‘proper publisher’?” and the answer to that was:

Probably. The layout would have looked better, for sure. There are several reasons why it isn’t. First and foremost, I think knowledge should be free, open, and shared. I also have benefited from material that has been made openly available, and I think it is fair to continue contributing to such sharing. Also, my current employer pays me sufficient to live from and I don’t think it would sell thousands of copies (needed for making a decent amount of money from a textbook), so setting up such a barrier of high costs for its use does not seem like a good idea. A minor consideration is that it would have taken much more time to publish, both due to the logistics and the additional reviewing (previous multi-author general textbook efforts led to nothing due to conflicting interests and lack of time, so I unlikely would ever satisfy all reviewers, if they would get around reading it), yet I need the book for the next OE installment I will teach soon.

Ontology Engineering (OE) is listed as an elective in the ACM curriculum guidelines. Yet, it’s suited best for advanced undergrad/postgrad level because of the prerequisites (like knowing the basics of databases and conceptual modeling). This means there won’t be big 800-students size classes all over the world lining up for OE. I guess it would not go beyond some 500-1000/year throughout the world (50 classes of 10-20 computer science students), and surely not all classes would use the textbook. Let’s say, optimistically, that 100 students/year would be asked to use the book.

With that low volume in mind, I did look up the cost of similar books in the same and similar fields with the ‘regular’ academic publishers. It doesn’t look enticing for either the author or the student. For instance this one from Springer and that one from IGI Global are all still >100 euro. for. the. eBook., and they’re the cheap ones (not counting the 100-page ‘silver bullet’ book). Handbooks and similar on ontologies, e.g., this and that one are offered for >200 euro (eBook). Admitted there’s the odd topical book that’s cheaper and in the 50-70 euro range here and there (still just the eBook) or again >100 as well, for a, to me, inexplicable reason (not page numbers) for other books (like these and those). There’s an option to publish a textbook with Springer in open access format, but that would cost me a lot of money, and UCT only has a fund for OA journal papers, not books (nor for conference papers, btw).

IOS press does not fare much better. For instance, a softcover version in the studies on semantic web series, which is their cheapest range, would be about 70 euro due to number of pages, which is over R1100, and so again above budget for most students in South Africa, where the going rate is that a book would need to be below about R600 for students to buy it. A plain eBook or softcover IOS Press not in that series goes for about 100 euro again, i.e., around R1700 depending on the exchange rate—about three times the maximum acceptable price for a textbook.

The MIT press BFO eBook is only R425 on takealot, yet considering other MIT press textbooks there, with the size of the OE book, it then would be around the R600-700. Oxford University Press and its Cambridge counterpart—that, unlike MIT press, I had checked out when deciding—are more expensive and again approaching 80-100 euro.

One that made me digress for a bit of exploration was Macmillan HE, which had an “Ada Lovelace day 2018” listing books by female authors, but a logics for CS book was again at some 83 euros, although the softer area of knowledge management for information systems got a book down to 50 euros, and something more popular, like a book on linguistics published by its subsidiary “Red Globe Press”, was down to even ‘just’ 35 euros. Trying to understand it more, Macmillan HE’s “about us” revealed that “Macmillan International Higher Education is a division of Macmillan Education and part of the Springer Nature Group, publishers of Nature and Scientific American.” and it turns out Macmillan publishes through Red Globe Press. Or: it’s all the same company, with different profit margins, and mostly those profit margins are too high to result in affordable textbooks, whichever subsidiary construction is used.

So, I had given up on the ‘proper publisher route’ on financial grounds, given that:

  • Any ontology engineering (OE) book will not sell large amounts of copies, so it will be expensive due to relatively low sales volume and I still will not make a substantial amount from royalties anyway.
  • Most of the money spent when buying a textbook from an established publisher goes to the coffers of the publisher (production costs etc + about 30-40% pure profit [more info]). Also, scholarships ought not to be indirect subsidy schemes for large-profit-margin publishers.
  • Most publishers would charge an amount of money for the book that would render the book too expensive for my own students. It’s bad enough when that happens with other textbooks when there’s no alternative, but here I do have direct and easy-to-realise agency to avoid such a situation.

Of course, there’s still the ‘knowledge should be free’ etc. argument, but this was to show that even if one were not to have that viewpoint, it’s still not a smart move to publish the textbook with the well-known academic publishers, even more so if the topic isn’t in the core undergraduate computer science curriculum.

Interestingly, after ‘publishing’ it on my website and listing it on OpenUCT and the Open Textbook Archive—I’m certainly not the only one who had done a market analysis or has certain political convictions—one colleague pointed me to the non-profit College Publications that aims to “break the monopoly that commercial publishers have” and another colleague pointed me to UCT press. I had contacted both, and the former responded. In the meantime, the book has been published by CP and is now also listed on Amazon for just $18 (about 16 euro) or some R250 for the paperback version—whilst the original pdf file is still freely available—or: you pay for production costs of the paperback, which has a slightly nicer layout and the errata I knew of at the time have been corrected.

I have noticed that some people don’t take the informal self publishing seriously—even below the so-called ‘vanity publishers’ like Lulu—notwithstanding the archives to cater for it, the financial take on the matter, the knowledge sharing argument, and the ‘textbooks for development’ in emerging economies angle of it. So, I guess no brownie points from them then and, on top of that, my publication record did, and does, take a hit. Yet, writing a book, as an activity, is a nice and rewarding change from just churning out more and more papers like a paper production machine, and I hope it will contribute to keeping the OE research area alive and lead to better ontologies in ontology-driven information systems. The textbook got its first two citations already, the feedback is mostly very positive, readers have shared it elsewhere (reddit, ungule.it, Open Libra, Ebooks directory, and other platforms), and I recently got some funding from the DOT4D project to improve the resources further (for things like another chapter, new exercises, some tools development to illuminate the theory, a proofreading contest, updating the slides for sharing, and such). So, overall, if I had to make the choice again now, I’d still do it again the way I did. Also, I hope more textbook authors will start seeing self-publishing, or else non-profit, as a good option. Last, the notion of open textbooks is gaining momentum, so you even could become a trendsetter and be fashionable 😉

A useful abstract relational model and SQL path queries

Whilst visiting David Toman at the University of Waterloo during my sabbatical earlier this year, one of the topics we looked into was their experiments on whether their SQLP—SQL with path queries, extended from [1]—would be better than plain SQL in terms of time it takes to understand queries and correctness in writing them. Turned out (in a user evaluation) that it’s faster with SQLP whilst maintaining accuracy. The really interesting aspect in all this from my perspective, however, was the so-called Abstract Relational Model (ARM), or: the modelling side of things rather than making the querying easier, as the latter is made easier with the ARM. In simple terms, the ARM [1] is alike the relational model, but then with identifiers, which makes those path queries doable and mostly more succinct, and one can partition the relations into class-relationship-like models (approaching the look-and-feel of a conceptual model) or lump stuff together into relational-model-like models, as preferred. Interestingly, it turns out that the queries remain exactly the same regardless whether one makes the ARM look more relational-like or ontology-like, which is called “invariance under vertical partitioning” in the paper [2]. Given all these nice things, there’s now also an algorithm to go from the usual relational model to an ARM schema, so that even if one has legacy resources, it’s possible to bump it up to this newer technology with more features and ease of use.

Our paper [2] that describes these details (invariance, RM-to-ARM, the evaluation), entitled “The Utility of the Abstract Relational Model and Attribute Paths in SQL”, is being published as part of the proceedings of the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18), which will be held in Nancy, France, in about two weeks.

This sort of Conceptual Model(like)-based Data Access (CoMoDA, if you will) may sound a bit like Ontology-Based Data Access (OBDA). Yes and No. Roughly, yes on the conceptual querying sort of thing (there’s still room for quite some hair splitting there, though); no regarding the conceptual querying sort of thing. The ARM doesn’t pretend to be an ontology, but easily has a reconstruction in a Description Logic language [3] (with n-aries! and identifiers!). SQLP is much more expressive than the union of conjunctive queries one can pose in a typical OBDA setting, however, for it is full SQL + those path queries. So, both the theory and technology are different from the typical OBDA setting. Now, don’t think I’m defecting on the research topics—I still have a whole chapter on OBDA in my textbook—but it’s interesting to learn about and play with alternative approaches toward solutions to (at a high level) the same problem of trying to make querying for information easier and faster.

 

References

[1] Borgida, A., Toman, D., Weddell, G.E. On referring expressions in information systems derived from conceptual modelling. Proc. of ER’16. Springer LNCS, vol. 9974, 183-197.

[2] Ma, W., Keet, C.M., Olford, W., Toman, D., Weddell, G. The Utility of the Abstract Relational Model and Attribute Paths in SQL. 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18). Springer LNAI. (in print). 12-16 Nov. 2018, Nancy, France.

[3] Jacques, J.S., Toman, D., Weddell, G.E. Object-relational queries over CFDInc knowledge bases: OBDA for the SQL-Literate. Proc. of IJCAI’16. 1258-1264 (2016)

ISAO 2018, Cape Town, ‘trip’ report

The Fourth Interdisciplinary School on Applied Ontology has just come to an end, after five days of lectures, mini-projects, a poster session, exercises, and social activities spread over six days from 10 to 15 September in Cape Town on the UCT campus. It’s not exactly fair to call this a ‘trip report’, as I was the local organizer and one of the lecturers, but it’s a brief recap ‘trip report kind of blog post’ nonetheless.

The scientific programme consisted of lectures and tutorials on:

The linked slides (titles of the lectures, above) reveal only part of the contents covered, though. There were useful group exercises and plenary discussion with the ontological analysis of medical terms such as what a headache is, a tooth extraction, blood, or aspirin, an exercises on putting into practice the design process of a conceptual modelling language of one’s liking (e.g.: how to formalize flowcharts, including an ontological analysis of what those elements are and ontological commitments embedded in a language), and trying to prove some theorems of parthood theories.

There was also a session with 2-minute ‘blitztalks’ by participants interested in briefly describing their ongoing research, which was followed by an interactive poster session.

It was the first time that an ISAO had mini-projects, which turned out to have had better outcomes than I expected, considering the limited time available for it. Each group had to pick a term and investigate what it meant in the various disciplines (task description); e.g.: what does ‘concept’ or ‘category’ mean in psychology, ontology, data science, and linguistics, and ‘function’ in manufacturing, society, medicine, and anatomy? The presentations at the end of the week by each group were interesting and most of the material presented there easily could be added to the IAOA Education wiki’s term list (an activity in progress).

What was not a first-time activity, was the Ontology Pub Quiz, which is a bit of a merger of scientific programme and social activity. We created a new version based on questions from several ISAO’18 lecturers and a few relevant questions created earlier (questions and answers; we did only questions 1-3,6-7). We tried a new format compared to the ISAO’16 quiz and JOWO’17 quiz: each team had 5 minutes to answer a set of 5 questions, and another team marked the answers. This set-up was not as hectic as the other format, and resulted in more within-team interaction cf. among all participants interaction. As in prior editions, some questions and answers were debatable (and there’s still the plan to make note of that and fix it—or you could write an article about it, perhaps :)). The students of the winning team received 2 years free IAOA membership (and chocolate for all team members) and the students of the other two teams received one year free IAOA membership.

Impression of part of the poster session area, moving into the welcome reception

As with the three previous ISAO editions, there was also a social programme, which aimed to facilitate getting to know one another, networking, and have time for scientific conversations. On the first day, the poster session eased into a welcome reception (after a brief wine lapse in the coffee break before the blitztalks). The second day had an activity to stretch the legs after the lectures and before the mini-project work, which was a Bachata dance lesson by Angus Prince from Evolution Dance. Not everyone was eager at the start, but it turned out an enjoyable and entertaining hour. Wednesday was supposed to be a hike up the iconic Table Mountain, but of all the dry days we’ve had here in Cape Town, on that day it was cloudy and rainy, so an alternative plan of indoor chocolate tasting in the Biscuit Mill was devised and executed. Thursday evening was an evening off (from scheduled activities, at least), and Friday early evening we had the pub quiz in the UCT club (the campus pub). Although there was no official planning for Saturday afternoon after the morning lectures, there was again an attempt at Table Mountain, concluding the week.

The participants came from all over the world, including relatively many from Southern Africa with participants coming also from Botswana and Mauritius, besides several universities in South Africa (UCT, SUN, CUT). I hope everyone has learned something from the programme that is or will be of use, enjoyed the social programme, and made some useful new contacts and/or solidified existing ones. I look forward to seeing you all at the next ISAO or, better, FOIS, in 2020 in Bolzano, Italy.

Finally, as a non-trip-report comment from my local chairing viewpoint: special thanks go to the volunteers Zubeida Khan for the ISAO website, Zola Mahlaza and Michael Harrison for on-site assistance, and Sam Chetty for the IT admin.