On making a Theory of Computation course more lively

Some 71 BSc in CS students at UKZN encountered the joys of automata, regular expressions, context free grammars, Turing machines, undecidability, and intractability. I’ll be teaching theory of computation again next year, and am looking for more ways to brighten up the material further in an attempt to increase the amount of students’ interest in the topic. The list below made into the course material only partially, but I intend to include more of it next year (given that, looking back at the time I was a student, I sure would have enjoyed the course more if at least some of this had made it into the course material back then).

The textbook for the course is Hopcroft, Motwani and Ullman’s Introduction to Automata Theory, Languages, and Computation from Stanford University [1]. Their slides and well as Diego Calvanese’s notes from the Free University of Bozen-Bolzano were useful in the preparation, but this is still fairly dry matter. To make it livelier, I had looked around for illustrative simulators, games and the like.

The nicest one I found when preparing the course are the AutoSim automaton simulator (which generated a lukewarm response, at best) and the CFG Grammar Editor made by Carl Burch at Hendrix College, but I may have missed even better ones. It is sort of entertaining for the answers to the exercises of chapters 2 and 5 of the Ullman text book, and in due time I’ll post them here. Recently, JFLAP came to my attention, developed at Duke University, which has lots of more features compared to AutoSim, like an implemented state minimization algorithm and conversions between DFAs NFAs and REs. It also has an accompanying textbook, but, alas, that book only covers languages and automata, not the undecidability and complexity that is also part of the course. There are many simulators of Turing machines around, from Java implementations to physical simulators (see, e.g., this list), with the Lego of Doom, developed by four students at Århus University, as most entertaining, but not exactly classroom material, and there are nice illustrative videos of the Turing Machine in the classic style by Mike Davey. If you have any suggestions, please leave them in the comments.

The intention was to brighten up the second part of the course also with some games and puzzles as examples, in addition to the standard examples of the Travelling Salesman Problem, Knapsack, Clique, SAT etc. The clique generated some interest, as it could be effectively related to Facebook (the students form a clique themselves: they have a facebook page for the course). But, overall, there wasn’t enough time to go into sufficient detail. The plan was to revisit Sudokus, which was a 2nd-year programming exercise but also happens to be NP-complete [2], like so many other games (among others: Solitaire, FreeCell, Same Game/Clickomania, etc.—mentioning this generated interest); check out, e.g., Wikipedia’s list of game complexity of 34 games, the separate list of NP-complete games and puzzles and PSPACE-complete ones and David Eppstein’s list with complexity of puzzles, solitaire and two-player games. The games generated interest. Then there are non-game fun problems, like the new Pancake Flipping Problem (sorting a stack of pancakes with one spatula) that is NP-hard [3], and certain ways of balloon twisting with equal length balloons is NP-complete [4]—both brought under my attention thanks to the Improbable Research blog. There are many more NP-complete problems that may (should?) interest the students, like more serious ones about efficiently allocating blood in a blood bank, timetabling etc. Likewise for EXPTIME problems, such as reasoning over UML Class Diagrams [5], which it did, to some extent. The Complexity Zoo, however, seemed to scare them off.

Even inclusion of some more context, old and new, may brighten it up, like the material about Alan Turing, Chomsky’s original paper [6] (tried this, didn’t generate interest), where the naming of NP-hard comes from [7] by Donald Knuth, an additional note on P and NP by Moshe Vardi [8], and blog posts about “Big problems with big iron”—or: the main gains are to be made in improving the algorithms, not the hardware—by Richard Lipton, or the SEP entry on the Church-Turing Thesis [9] for the philosophically-minded (tried this, without response).

For those of you studying or teaching a theory of computation course: this year’s midterm, quiz, and exam (with solutions!), can be downloaded from my website here, which might be of use.

References

[1] Hopcroft, J.E., Motwani, R. and Ullman, J.D. (2001/2007). Introduction to Automata Theory, Languages, and Computation, 2nd/3rd ed., Pearson Education.

[2] Takayuki Yato and Takahiro Seta. 2003. Complexity and Completeness of Finding Another Solution and Its Application to Puzzles. IEICE Trans. Fundamentals, E86-A (5):1052-1060.

[3] Laurent Bulteau, Guillaume Fertin and Irena Rusu. Pancake Flipping is Hard.  arXiv:1111.0434v1. (Submitted on 2 November 2011).

[4] Erik D. Demaine and Martin L. Demaine and Vi Hart. Computational balloon twisting: the theory of balloon polyhedra. In: Proceedings of the 20th Canadian Conference on Computational Geometry (CCCG 2008), Montréal, Québec, Canada, August 2008.

[5] Daniela Berardi, Diego Calvanese, and Giuseppe De Giacomo. Reasoning on UML class diagrams. Artificial Intelligence, 168(1-2):70-118, 2005.

[6] Chomsky, N. Three models for the description of language. IRE Transactions on Information Theory, 1956, (2): 113–124.

[7] Donald E. Knuth. A Terminological Proposal. SIGACT News, January 1974, pp13-18.

[8] Moshe Y. Vardi. On P, NP, and Computational Complexity. Communications of the ACM. Vol. 53 No. 11, Page 5, November 2010.

[9] Copeland, B. Jack. The Church-Turing Thesis. The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.), URL = <http://plato.stanford.edu/archives/fall2008/entries/church-turing/&gt;.

Advertisements

Book chapter on conceptual data modelling for biological data

My invited book chapter, entitled “Ontology-driven formal conceptual data modeling for biological data analysis” [1], recently got accepted for publication in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, edited by Mourad Elloumi and Albert Y. Zomaya, and is scheduled for printing by Wiley early 2012.

All this started off with my BSc(Hons) in IT & Computing thesis back in 2003 and my first paper about the trials and tribulations of conceptual data modelling for bio-databases [2] (which is definitely not well-written, but has some valid points and has been cited a bit). In the meantime, much progress has been made on the topic, and I’ve learned, researched, and published a few things about it, too. So, what is the chapter about?

The main aspect is the ‘conceptual data modelling’ with EER, ORM, and UML Class Diagrams, i.e., concerning implementation-independent representations of the data to be managed for a specific application (hence, not ontologies for application-independence).

The adjective ‘formal’ is to point out that the conceptual modeling is not just about drawing boxes, roundtangles, and lines with some adornments, but there is a formal, logic-based, foundation. This is achieved with the formally defined CMcom conceptual data modeling language, which has the greatest common denominator between ORM, EER, and UML Class Diagrams. CMcom has, on the one hand, a mapping the Description Logic language DLRifd and, on the other hand, mappings to the icons in the diagrammatic languages. The nice aspect of this it that, at least in theory and to some extent in practice as well, one can subject it to automated reasoning to check consistency of the classes, of the whole conceptual data model, and derive implicit constraints (an example) or use it in ontology-based data access (an example and some slides on ‘COMODA’ [COnceptual MOdel-based Data Access], tailored to ORM and the horizontal gene transfer database as example).

Then there is the ‘ontology-driven’ component: Ontology and ontologies can aid in conceptual data modeling by providing solution to recurring modeling problems, an ontology can be used to generate several conceptual data models, and one can integrate (a section of) an ontology into a conceptual data model that is subsequently converted into data in database tables.

Last, but not least, it focuses on ‘biological data analysis’. A non-(biologist or bioinformatician) might be inclined to say that should not matter, but it does. Biological information is not as trivial as the typical database design toy examples like “Student is enrolled in Course”, but one has to dig deeper and figure out how to represent, e.g., catalysis, pathway information, the ecological niche. Moreover, it requires an answer to ‘which language features are ‘essential’ for the conceptual data modeling language?’ and if it isn’t included yet, how do we get it in? Some of such important features are n-aries (n>2) and the temporal dimension. The paper includes a proposal for more precisely representing catalysis, informed by ontology (mainly thanks to making the distinction between the role and its bearer), and shows how certain temporal information can be captured, which is illustrated by enhancing the model for SARS viral infection, among other examples.

The paper is not online yet, but I did put together some slides for the presentation at MAIS’11 reported on earlier, which might serve as a sneak preview of the 25-page book chapter, or you can contact me for the CRC.

References

[1] Keet, C.M. Ontology-driven formal conceptual data modeling for biological data analysis. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Mourad Elloumi and Albert Y. Zomaya (Eds.). Wiley (in print).

[2] Keet, C.M. Biological data and conceptual modelling methods. Journal of Conceptual Modeling, Issue 29, October 2003. http://www.inconcept.com/jcm.