A couple of OWL requirements for using ontologies in Indigenous Knowledge Management Systems

Knowledge about, say, long established agricultural practices, culinary customs and typical dishes (and its ingredient evolution over the centuries), medicinal plants and so on falls under the term indigenous knowledge in South Africa, cultural heritage in Europe (that I wrote about earlier), and traditional knowledge in other countries. Whichever term you prefer, it’s that kind of knowledge that is on the way of being lost due to changes in society. There is consensus to preserve it somehow (and possibly make some money from it along the way). Given that there’s lots of it—hence, lots of data, information, and knowledge, that has to be managed—computing and IT enter the picture.

For South Africa, this is managed through the large-scale project from the Department of Science & Technology’s NIKSO office that aims at building a “national recordal system” and an IT infrastructure (IKMS) to both store and access the indigenous knowledge. Setting up such a system consists of some typical software development themes (following consultation with stakeholders), such as the need for handling varied data formats (documents, images, audio), integration of the existing disparate databases and other IT resources in SA into the IKMS, availability of the information in all 11 official languages, the need for a citizen portal, and so on.

Some of the requirements smelled very much like a possible nice use case for Semantic Web Technologies so as to implement a really state of the art infrastructure with enhanced capabilities compared to standard applications. Ronell Alberts, Thomas Fogwill and I assessed that when I was visiting CSIR-Meraka in August and September 2010 as one of the secondments from the EU FP7 Net2 Project. The assessment of possibilities of using semantic web technologies, including the assessment of maturity for off-the-shelf usage, was accepted at IST-Africa recently [1]. We focused on enhanced querying, semantic browsing, questions answering, multilingual information access, knowledge generation, classification of information, formalisation of scientific knowledge & discovery, and knowledge-based data integration.

This we took a step further by zooming in on the ontologies-part of semantic web technologies for four of the usage scenarios, the selection of which was based on their potential for impact and maturity and inclusion into the IKMS. These are: ontology based querying and browsing; a natural language independent ontology for multilingual data access; support for collaborative knowledge generation; and the formalisation of IK for scientific discovery. More precisely, we investigated the requirements for ontology languages to meet the IKMS needs and how well they are met, if at all. A paper describing the details was just accepted for OWLED’12 [2].

In short: some of the required OWL features include representation of vagueness, mereotopology, modularisation, and extended support for internationalization (i.e., multilingualism) and annotation for collaborative ontology development. Thus, the first three put new requirements on the expressiveness of the OWL language itself, and the latter two formulate requirements akin to ‘usability’ extension for OWL. To motivate it all, we first describe each topic, provide real examples, and a few references to current research and tools, which is then followed by the OWL requirements taking into account the examples and generalizing from them; details can be found in the paper.

Hopefully there will an extensive and useful response at OWLED’12, like the feedback we received at OWLED’07 and DL’07 on the requirements on automated reasoning for bio-ontologies [3]. Obviously, if you have a solution to one or more of the gaps that we had overlooked, please leave a comment or send me an email.

References

[1] Fogwill, T., Alberts, R., Keet, C.M. The potential for use of semantic web technologies in IK management systems. IST-Africa Conference 2012. May 9-11, Dar es Salaam, Tanzania.

[2] Alberts, R., Fogwill, T., Keet, C.M. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED 2012). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol-xxx. 12p.

[3] Keet, C.M., Roos, M., Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria. CEUR-WS Vol-258. 10p. This was described informally in an earlier post.

Advertisements

Preliminary results of the Theory of Computation survey

As you may remember from the post on making Theory of Computation (ToC) more lively, I taught ToC for the first time last year at UKZN, where it also was a new core course in the CS degree programme, i.e., the students and the system also had to get used to ToC. As usual, anything can be improved upon (if you think not: look harder; they always can, at least in theory). To commence with that in a solid way, we’ve decided first to collect some data to go beyond the familiar anecdotes.

Internationally, many stories make the rounds through the grapevine about ToC. Those stories revolve around, among others, it being a difficult subject for the students, low pass rates, the course being threatened from being removed from a the programme, and textbooks becoming out of print (e.g., Pearson does not want to make reprints of Hopcropft, Mottwani & Ullman’s book unless they get single orders for more than 300 books, according to their rep for SA).

While the individual stories are true, how prevalent are they really?  How widespread are ‘low pass rates’, and when is it ‘low’? What are the enrollment numbers elsewhere? Do they have problems in the university system? It being a new course in the programme here as a result of merging a 16 credit Formal Languages & Automata Theory and a 16 credit Algorithms & Complexity, what topics are really essential in a ToC course? Should it be a core course, and if so, in which year of the programme?

These are some of the questions we were curious about as to what the answers would be. To find out, there’s a (still ongoing) survey of ToC syllabi at the various universities around the world and an opinion-survey to obtain data that cannot be found by just looking at syllabi, but concern the context around ToC, like enrollment numbers, pass rates, whether it should be in the programme vs. actually in the programme, and so on. The opinion-survey was open from 16 March to 1 April (accessible here), and I’ve put the preliminary results online, as promised in the announcement. (A paper summarizing the results and integrating it with the results of the syllabi-survey is in the pipeline, but somehow it struck a chord, and relatively many survey respondents wanted to know the results and all the details can’t go in the page-limited paper anyway).

In total, there were 77 people—mainly academics—who completed the survey, mostly from outside SA and covering all continents of the world. There’s the survey setup, results in digested format, discussion, and conclusions, as well as the raw data with aggregated numbers by question answer, and the list of ToC topics ordered by being essential. In short: The survey responses show an overwhelming agreement that ToC should be taught and a majority prefers to have it in the 2nd or 3rd year in an undergraduate programme. It is taught at most of the institutions that the respondents are affiliated with, and the course is mostly solidly in the programme as a core course. About half of the respondents note there are issues with the course, for various reasons, including, but not limited to, low pass rates and low enrollment. Roughly half observe first-time pass rates below 60%, and for only 20% the pass rate exceeds 80%. Whilst noting that several respondent spread ToC content over more than one course or integrate it with other courses, there is agreement on the typical topics that are considered as essential to ToC, covering regular and context-free languages (and grammars), automata (at least DFA, NFA, epsilon-NFA), Turing machines, undecidability, computability and complexity, where the subtopics covered vary a bit.

Several respondents also gave additional feedback and opinion via email. In case you would like so, too, drop me a line, or add it in the comments section here on the blog.