Granulate and Conquer

Or so goes the fancy tagline for a particular problem-solving methodology, which predominantly comprises applied mathematics and IT (soft computing) [1], and addresses to a lesser extent the philosophical and ontological aspects [2][3]. More comprehensively, the field of Granular Computing combines efforts from philosophy, Artificial Intelligence, machine learning, database theory and data mining, (applied) mathematics with fuzzy logic and rough sets, among others. Themes addressed for computational problem solving tend to emphasise quantitative aspects of granularity, whereas the others put a higher emphasis on the qualitative component of granularity.

The “granulate and conquer” then amounts to a nice methodology to manage your data, information and knowledge. Applications can be as diverse as:

  • Using clustering techniques to make sense of mRNA expression patterns in microarray data [4], in this case applied to gene expression data of the malarial parasite Plasmodium falciparum;
  • Access control models in computer security, where, as Lin summarizes in [1], for each object p Î V there is a granule B(p) Í U of objects that are in conflict; put differently: eventually, after taking into account the various access rights for the resources (like documents, folders etc.), the resulting granule contains the list of enemies. For the interested reader: details about all this goes under the term Chinese Wall Security Policy Model;
  • Individual student-tailored study feedback. A slightly outdated description of such a systems is given by McCalla and Greer [5], but anyone familiar with Computer-based Training and its ‘test exam’ facility makes use of this approach: after doing the test, for the wrong answers given, it tells you in which paragraph(s) of the study material you can find the explanation so that you don’t have to go through all the material again and re-do only those few sections that you didn’t understand sufficiently.

Although the applications are very diverse, there are some commonalities in the approaches and (oftentimes one-off) models created for a particular purpose. More specifically, the underlying semantics of how the granulation is done and the relation between the entities within a granular level (/granule/grain) is consistent – but there is not one single type of granule. A first attempt to categorise those types of granularity is made by [3], where a taxonomy is presented with seven leaf categories. The main distinctions made are between scale-dependent granularity and, for the lack of a better term, non-scale-dependent granularity. Further divisions include, among others, granulating according to some mathematical formula (e.g. seconds, minutes, hours, etc.), sorting by means of one type of (primitive) relation (e.g. [structural-]partOf, [spatially-]containedIn), and aggregation of the same collection of instances of one type that subsequently can be partitioned in various ways at lower levels of detail using semantic criteria where the entity at a lower level is a subtype of the type at the coarser-grained level (e.g. a collection of phone points and finer-grained land-line and mobile phone points). The distinctions described in the article can guide a conceptual modeller to better distinguish between the types of granularity when representing domain knowledge and can be of use to the software developer to improve applications that use granularity in one way or another.

Coincidentally, a conference on Granular Computing will take place within a few weeks, which will be held in Atlanta, USA, from 10 to 12 May 2006. There is little time left to register here.

References

[1] Lin, T.Y. Toward a Theory of Granular Computing. IEEE International Conference on Granular Computing (GrC06). 10-12 May 2006, Atlanta, USA. Draft online available

[2] Yao, Y.Y. Perspectives of Granular Computing. IEEE Conference on Granular Computing (GrC05), 1:85-90.

[3] Keet, C.M. A taxonomy of types of granularity. IEEE Conference in Granular Computing (GrC06). 10-12 May 2006, Atlanta, USA.

[4] Zhou, Y., Young, J.A., Santrosyan, A., Chen, K., Yan, S.F., Winzeler, E.A. In silico gene function prediction using ontology-based pattern identification. Bioinformatics, 2005 21(7):1237-1245.
Online information available at: http://carrier.gnf.org/publications/OPI

[5] McCalla, G.I., Greer, J.E. Granularity Hierarchies. Computers and Mathematics with Applications: Special Issue on Semantic Networks, 1992, 23:363-376.

Solving sudokus made simple

Sudokus seem to be the latest puzzle excitement in several countries, published daily in various (online) newspapers. My brother puts them in excel, which saves him from some adding up and he has added encouraging “OK” and “you have won!” etc., but it doesn’t make solving the Sudokus any easier. There are cleverer ways to do it. Knowing the ‘easy’ way maybe like cheating, but on the other hand, it also gives insight in the thought processes you go through when trying to solve them (apart from the sense of frustration an joy of course).

Alas, Helmut Simonis used Constraint Programming to get to the solution quickly [1]. How did he do it? Well, first you assume you want to solve the puzzle using constraint programming, so then you a) formulate the problem, b) make a model and write the program in e.g. ECLiPSe, c) add some constraints, a propagation scheme, and d) test with Sudokus from the The Times, Daily Telegraph, Minimum Sudoku, and from Puzzler. Here is how it goes.

A. To formulate the problem, we first define what a filled-in Sudoku square is. In plain English, the Sudoku square consists of 81 cells in a 9 by 9 square, where each cell contains some number that ranges from 1 to 9 and the cells in each row and each column as well as each of the nine 3 by 3 major blocks have different numbers throughout. To be pedantic, and swapping 3 for ‘n’: “A Sudoku square of order n consists of n4 variables formed into an n2 x n2 grid with values from 1 to n2 such that the entries in each row, each column and in each of the n2 major n x n blocks are alldifferent.”. The real Sudoku problem can then be defined as that it “consists of a partial assignment of the variables in a Sudoku square. The objective is to find a completion of the assignment which extends the partial assignment and satisfies the constraints.”. Then you want Sudokus that have only one solution, that don’t have redundant information, and should be solved without guessing. 

B. The model was written in ECLiPSe and the relevant code is in the article [1].

C. To strengthen the deduction by combining several of the original constraints (to make things more manageable for the program by linking up the various constraints), so-called redundant constraints are added. These involve how the row and column interact, how rows and blocks interact, how the three affect each other, and then “shaving”. The latter, shaving, is a constraint programming technique whereby the whole set of constraints is taken and variables tested; i.e. assign a number to a cell, if it fails then remove the lot from the domain. This helps to throw out inconsistent values before you even start for real. There are a bunch of “propagation schemes” that deal with the alldifferent constraint, that is, how to make sure that in each row, column, and block each of the numbers 1-9 occur only once.

D. The testing phase. With the previous points in place, you run the program. It could solve quite well all Sudokus except for the Daily Telegraph’s “tough” and “diabolical” Sudokus. To solve those, apparently we’d need more powerful reasoning for that, or some searching, or at least the above-mentioned shaving technique.There are two other curious things that came afore during testing. First, most of the published puzzles contain more than 10 redundant hints that you can do without with and still have that there is one unique solution. Second, it appeared that the grading of the difficulty of the Sudokus actually corresponds quite well with the real difficulty of solving the puzzle.

You can read more details and find links to further information and (Sudoku) puzzles in Simonis’ article. For those of you who are in or around Bolzano, your next chance to go to a seminar about constraint programming & solving puzzles is the 8th of June 2006, by Rosella Gennari.

[1] Simonis, H. Sudoku as a Constraint Problem. In: Proceedings of the 4th International Workshop on Modelling and Reformulating Constraint Satisfaction Problems. Hnich, B. Prosser, P., Smith, B. (ed.). 2005. pp. 13–27.

A first note

As if there is not yet enough data and information on the Internet. Like hundreds of thousands of other bloggers, I cherish the idea that more information won’t hurt and, moreover, I share the illusion with other bloggers that the information I add to the huge pile may occasionally be of some use to somebody, somewhere, some time.

Why not add the contents of the blog to my website? The tidbits of information that will be posted here don’t quite fit there. Websites, well researchers’ websites, generally contain information about their own research and teaching activities, other more or less static content that is generally too large to squeeze into a blog entry, and is pretty much uni-directional 1:n communication. Conversely, blog entries tend to be shorter, ‘dated’, with implicitly or explicitly worded opinion, and enables m:n interaction with comments and even discussions.

So, I’m giving it a try and see where it ends up. The same approach (ok, and then some) got me three degrees distinct disciplines, and to travel to many countries, a.o.t.
I hope you will find here something that piques your interest; and don’t hesitate to leave comments, suggestions, and criticism!