A controlled language for competency questions

The formulation of so-called competency questions (CQs) at the start of the development of an ontology or a similar artefact is a recurring exercise in various ontology development methodologies. For instance, “Which animals are the predators of impalas?” that an African Wildlife ontology should be able to answer and “What are the main parsers for compilers?” that a software ontology may be able to answer. Yet, going by the small number of publicly available CQs, it seems like that many developers skip that step in the process. And it turned out that for those who at least try, a considerable number of purported CQs, actually aren’t at all, are mis-formulated for even having a chance for it to work smoothly (for, say, automated formalisation), or are grammatically incorrect (a depressing 1/3 of the sentences in our test set, to be more precise). Also, there’s no software support in guiding a modeller to formulate CQs, nor to actually do something with it, such as converting it automatically into SPARQL; hence, it is disjointed from the actual artefact under development, which doesn’t help uptake.

In an attempt to narrow this gap, we have developed a controlled natural language (CNL) called CLaRO: a Competency question Language for specifying Requirements for an Ontology, model, or specification [1] for CQs for ‘TBoxes’ (type-level information and knowledge, not instances). Advantages of a CNL for CQs include that it should be easier—or at least less hard—to formalise a CQ into a query over the model and to formulate a CQ in the first place. CLaRO more specifically operates at the language layer, so it deals with noun and verb phrases, rather that the primitives of a representation language and the predetermined modeling style that comes with it. It is also the first one that has been evaluated on coverage, which turned out to be good and better than earlier works on templates for CQs. To add more to it, we also made a basic tool that offers assistive authoring to write CQs (screencast).

We got there by availing of a recently published dataset of 234 CQs that had been analysed linguistically into patterns. We analysed those patterns, and that outcome informed the design of CLaRO. Given the size, this first version pf CLaRO is template-based, with core CQs and several variants, totalling to 134 templates. CLaRO was evaluated with a random sample from the original 234 CQs, a newly created set of CQs scrambled together for related work, and half of the Pizza CQs, as well as evaluated against templates presented elsewhere [2,3]. The results are summarised in the paper and discussed in more detail in a related longer technical report [4]. Here’s the nice table with the aggregate data:

Aggregated results for coverage of the three test sets. The best values are highlighted in italics. (CLaRO results are for the complete set of 134 templates) (source: based on [1])

Given the encouraging results, we also created a proof of concept CQ authoring tool, which both can assist in the authoring of CQs and may contribute to get a better idea of requirements for such a tool. One can use autocomplete so that it proposes a template, and then fill in a selected template, or just ignore it and write a free-form CQ, hit enter, and save it to a text file; the file can also be opened and CQs deleted. Here are a few screenshots on selecting and adding a CQ in the tool:

We will be presenting CLaRO at the 13th International Conference on Metadata and Semantics Research (MTSR’19) in Rome at the end of October. If you have any questions, comments, or suggestions, please don’t hesitate to contact us. The CNL specification in csv and XML formats, the evaluation data, and the tool with the source code are available from the CLaRO Github repo.

 

References:

[1] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS. (in print)

[2] Ren, Y., Parvizi, A., Mellish, C., Pan, J.Z., van Deemter, K., Stevens, R.: Towards competency question-driven ontology authoring. In: Extended Semantic Web Conference (ESWC’14). LNCS, Springer (2014)

[3] Bezerra, C., Freitas, F., Santana, F.: Evaluating ontologies with competency questions. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) – Volume 03. pp. 284-285. IEEE Computer Society, Washington, DC, USA (2013)

[4] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: A data-driven CNL for specifying competency questions. University of Cape Town. Technical Report. 17 July 2019. https://arxiv.org/abs/1907.07378

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.