ACM ICPC 2014 solution to problem A – baggage

Some of you already know I was on-site coach for the UCT team at the ACM Inter Collegiate Programming Contest World Finals in Ekaterinburg, held from 22 to 26 June, 2014. The problems were unusually hard this year, and solving 4 out of the 10 problems already got teams into the bronze medal range [results]. The technical coach of the UCT team, Bruce Merry, has written an analysis of 6 problems on his blog (D, K, C, E, B, and F–update: now also I and G, J, L, and H), and over at TopCoder, SnapDragon discusses 4 of the 10 problems (A, E, F, G), except that SnapDragon does not provide a solution to A (and I don’t like his hint of brute-force code-and-try) [update 3-7: there’s description of his solution here now]. Googling, I couldn’t find someone else discussing the solution to problem A, so here’s mine, which I solved on the plane from Moscow to Frankfurt (among other activities, and one among the four flights we had to take to arrive back in Cape Town).

The problem

Stripping the “baggage problem” to its essentials: you have a sequence of alternating Bs and As, which has to be reordered to first all the As then all the Bs, and the reordering has to occur moving two letters at the same time, and remain adjacent. For instance, with an n = 4, we have 2n characters, BABABABA, and in the end after sorting, it then will be AAAABBBB, and this has to occur in the minimum amount of moves. N is randomly chosen, and is an integer between 3 and 100. The cells on your tape are numbered from 0 to -2n+1, so with our n = 4, from -7 to 8, and the first B is on position 1.

Update (3-7): This problem appears to be “Tait’s counter puzzle”, which Peter Guthrie Tait, a Scottish mathematical physicist, described in 1884 [see description] (thanks to Davi Duarte who mentioned it in the topcoder thread).

Solution

Informally, there are two parts to the algorithm: move around the As and Bs to pair them as AA and BB, which cost you n/2 moves if n is even and n/2-rounded up moves if n is odd, and then sort those pairs in the remainder to a total of n moves. The first part occurs alternating moving AB from the right to the left, starting at the last AB (position 2n-2) and then every other 4 to its left, and BA from left to right, starting from position 3 and every other 4 positions to the right (i.e., 7, 11, etc.), and then the BBs are ferried to the right from left to right, and the AAs from the right to the left, also alternating. N=3 is an exception.

One can do this with a neat mathematical proof, but I found out with pattern creation and recognition, visualizing the provided sample input and moves for n = 5 and n = 8, which can be done in 5 resp. 8 steps (given in the problem statement as sample output). Here’s a description of that approach.

First, devising one for 3 is trivial, using the following moves (the only option), using the position indicated with the position of the first letter, and highlighting the ones that will be moved next:

Start: . . . . . . BABABA
Move 2 to -1, which gives . . . . ABB . . ABA
Move 5 to 2, which gives . . . . ABBBAA . .
Move 3 to -3, which gives . . AAABBB . . . .

This already suggests that the minimum amount of moves will always be n, because it is n moves also for n = 5 and n = 8. Next, devising one for n = 4, and knowing the moves for n = 3 and n = 5, I tried to work it out for BABABABA, i.e., the first AB, like with n = 3, BABABABA, and BABABABA like with the n = 5 case. The last one is the only one that worked in 4 moves, starting with moving 6 to -1, again. Again, because for all ns so far, the first move is to -1.

Let’s put this to the test with n = 7, with the ones to be moved in bold and the empty cells indicated with dots:

..BABABABABABABA

i.e., move the last AB (thus, position 12) as this worked for n=4 and n=5, then

ABBABABABABAB..A

i.e., move the first BA after position 1 (thus, position 3)

ABBA..BABABABBAA

i.e., move the last AB before a pair (thus, position 8)

ABBAABBAB..ABBAA

i.e., move the first BA after position 1 (thus, position 5). Then sorting the BBs and AAs:

ABBAAB..BBAABBAA
A..AABBBBBAABBAA
AAAAABBBBB..BBAA
AAAAABBBBBBB..AA
AAAAAAABBBBBBB..

This was enough for me to have discovered the pattern of alternating for the matching and alternating for the sorting.

What is ‘nasty’ in the problem description, retrospectively and for the pattern-based approach vs. a neat maths-y proof at least, is that the pattern deviates for n = 3, and the provided sample output for n = 8, although in 8 steps, is an alternative solution, not one that one obtains with the algorithm. With the approach as mentioned above, we have for n = 8 the following (I did that one to double-check):

..BABABABABABABABA
ABBABABABABABAB..A
ABBA..BABABABABBAA
ABBAABBABAB..ABBAA
ABBAABBA..BBAABBAA
A..AABBABBBBAABBAA
AAAAABBABBBB..BBAA
AAAAA..ABBBBBBBBAA
AAAAAAAABBBBBBBB..

This confirmed it works. It may look a bit craft-y, but the patterns show beautifully with larger n, which are shown below for the even and odd case (made just for the blog post, not in solving it); squint your eyes if it’s not immediately clear.

N = 16

..BABABABABABABABABABABABABABABABA
ABBABABABABABABABABABABABABABAB..A
ABBA..BABABABABABABABABABABABABBAA
ABBAABBABABABABABABABABABAB..ABBAA
ABBAABBA..BABABABABABABABABBAABBAA
ABBAABBAABBABABABABABAB..ABBAABBAA
ABBAABBAABBA..BABABABABBAABBAABBAA
ABBAABBAABBAABBABAB..ABBAABBAABBAA
ABBAABBAABBAABBA..BBAABBAABBAABBAA
A..AABBAABBAABBABBBBAABBAABBAABBAA
AAAAABBAABBAABBABBBB..BBAABBAABBAA
AAAAA..AABBAABBABBBBBBBBAABBAABBAA
AAAAAAAAABBAABBABBBBBBBB..BBAABBAA
AAAAAAAAA..AABBABBBBBBBBBBBBAABBAA
AAAAAAAAAAAAABBABBBBBBBBBBBB..BBAA
AAAAAAAAAAAAA..ABBBBBBBBBBBBBBBBAA
AAAAAAAAAAAAAAAABBBBBBBBBBBBBBBB..

N = 17

..BABABABABABABABABABABABABABABABABA
ABBABABABABABABABABABABABABABABAB..A
ABBA..BABABABABABABABABABABABABABBAA
ABBAABBABABABABABABABABABABAB..ABBAA
ABBAABBA..BABABABABABABABABABBAABBAA
ABBAABBAABBABABABABABABAB..ABBAABBAA
ABBAABBAABBA..BABABABABABBAABBAABBAA
ABBAABBAABBAABBABABAB..ABBAABBAABBAA
ABBAABBAABBAABBA..BABBAABBAABBAABBAA
ABBAABBAABBAABBAABB..BAABBAABBAABBAA
A..AABBAABBAABBAABBBBBAABBAABBAABBAA
AAAAABBAABBAABBAABBBBB..BBAABBAABBAA
AAAAA..AABBAABBAABBBBBBBBBAABBAABBAA
AAAAAAAAABBAABBAABBBBBBBBB..BBAABBAA
AAAAAAAAA..AABBAABBBBBBBBBBBBBAABBAA
AAAAAAAAAAAAABBAABBBBBBBBBBBBB..BBAA
AAAAAAAAAAAAA..AABBBBBBBBBBBBBBBBBAA
AAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBB..

Neat. Writing the code is left as an exercise to the reader.

I’m looking forward to the other problems (except the crane balancing problem C, whose description I did not like, at all), the upcoming regionals and next year’s ICPC World Finals in Morocco (if not winning, then to have at least a great time like we had this time)!

Advertisements

First steps for isiZulu natural language generation

Yes, Google Translate English-isiZulu does exist, but it has many errors (some very funny) and there’s a lot more to Natural Language Generation (NLG) than machine translation, such as natural language-based query interfaces that has some AI behind it, and they are needed, too [1]. Why should one bother with isiZulu? Muendane has his lucid opinions about that [2], and in addition to that, it is the first language of about 23% of the population of South Africa (amounting to some 10 million people), about half can speak it, and it is a Bantu language, which is spoken by nearly 300 million people—what works for isiZulu grammar may well be transferrable to its related languages. Moreover, it being in a different language family than the more well-resourced languages, it can uncover some new problems to solve for NLG, and facilitate access to online information without the hurdle of having to learn English or French first, as is the case now in Sub-Saharan Africa.

The three principal approaches for NLG are canned text, templates, and grammars. While I knew from previous efforts [3] that the template-based approach is very well doable but has its limitations, and knowing some basic isiZulu, I guessed it might not work with the template-based approach but appealing if it would (for a range of reasons), that no single template could be identified so far was the other end of the spectrum. Put differently: we had to make a start with something resembling the foundations of a grammar engine.

Langa Khumalo, with the Linguistics program and director of the University Language Planning and Development Office at the University of KwaZulu-Natal, and I have been trying to come up with isiZulu NLG. We have patterns and algorithms for (‘simple’) universal and existential quantification, subsumption, negation (class disjointness), and conjunction; or: roughly OWL 2 EL and a restricted version of ALC. OWL 2 EL fist neatly with SNOMED CT, and therewith has the potential for interactive healthcare applications with the isiZulu healthcare terminologies that are being developed at UKZN.

The first results on isiZulu NLG are described in [4,5], which was not an act of salami-slicing, but we had more results than that fitted in a single paper. The first paper [4] will appear in the proceedings ofthe 4th workshop on Controlled Natural language (CNL’14), and is about finding those patterns and, for the options available, an attempt at figuring out which one would be best. The second paper [5], which will appear in the 8th International Web Rule Symposium (RuleML’14) conference proceedings, is more about devising the algorithms to make it work and how to actually generate those sentences. Langa and I plan to attend both events, so you can ask us about the details either in Prague (18-20 Aug) or Galway (20-22 Aug) in person. In the meantime, the CRCs of the papers are online (here and here).

Regarding the technical aspects, the main reasons why we cannot get away with devising templates to generate isiZulu controlled natural language is that isiZulu is non-trivial:

  • There is a whole system of noun classes: nouns are grouped in one of the 17 noun classes, each with their own peculiarities, which is illustrated in Figure 1, below;
  • Agglutination, informally: putting lots of bits and pieces together to make a word. A selection of those so-called ‘concords’ is included in Figure 2, below;
  • Phonological conditioned copulatives, meaning that the ‘is a’ depends on the term that comes after it (ng or y); and
  • Complex verb conjugation.
isiZulu noun classes with an example (source: [5]).

isiZulu noun classes with an example (source: [5]).

A selection of isiZulu concords (source: [5])

A selection of isiZulu concords (source: [5])

What does this mean for the verbalization? In English, we use ‘Each…’ or ‘For all…’ for the universal quantifier and it doesn’t matter over which noun it is quantified. In isiZulu, it does. Each noun class has its own ‘each’ and ‘for all’, and it is not acceptable (understandable) to use one for the wrong noun class. For disjointness, like “Cup is not a Glass” ({\sf Cup \sqsubseteq \neg Glass} in DL), in English we have the ‘is not a’ regardless what comes before or after the subsumption+negation, but in isiZulu, the copulative is omitted, the first noun (OWL class, if you will) brings in a so-called negative subject concord, the second noun brings in a pronominal, and they are glued together (e.g., Indebe akuyona Ingilazi, where the second word is composed of aku + yona), and to top it off, each noun class has its own concord and pronomial. A seemingly simple conjunction—just an ‘and’ in English—has to be divided into an and-when-it-is-used-in-an-enumeration and an and-when-it-is-a-connective, and when it is used in an enumeration, it depends on the first letter of the noun that comes after the ‘and’. Existential quantification is even more of a hassle. The table below shows a very brief summary comparing typical patterns in English with those for isiZulu.

A few DL symbols, their typical verbalization options in English, and an indication of possible patterns (source: [4])

A few DL symbols, their typical verbalization options in English, and an indication of possible patterns (source: [4])

We did ask isiZulu speakers which of the possible options they preferred (in a survey, with Limesurvey localized to isiZulu), but there wasn’t an overwhelming consistent agreement among them except for one of the options for existential quantification (the –dwa option), although there was more agreement among the linguists than among the non-linguists, possibly due to dialect influences (results can be found in [4]).

If you don’t feel like reading the two papers, but still would like to have some general overview and examples, you also can check out the slides of the CS colloquium I gave last week. I managed to ‘lure in’ also ICT4D people—and then smack them with a bit of logic and algorithms—but the other option, being talking about the other paper accepted at RuleML, probably would have had to be a ‘cookie colloquium’ to get anyone to attend (more about that paper in another post—it is fascinating, but possibly of less interest to a broader audience). If you want to skip the tedious bits and just get a feel of how one of the algorithms works out: check out the example starting on slide 63, which shows the steps to go from {\sf \forall x (uSolwazi(x) \rightarrow \exists y (ufundisa(x, y) \land Isifundo(y)))} in FOL, or {\sf uSolwazi \sqsubseteq \exists ufundisa.Isifundo} in DL (“Each professor teaches at least one course”, if the vocabulary were in English), to “Bonke oSolwazi bafundisa isifundo esisodwa”.

Clearly, a lot remains to be done.

References

[1] Alberts, R., Fogwill, T., Keet, C.M. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED’12). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol-849. 12p

[2] Muendane, N.M. I am an African. 2006, Soultalk CC.

[3] Jarrar, M., Keet, C.M., Dongilli, P. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[4] Keet, C.M., Khumalo, L. Toward verbalizing logical theories in isiZulu. 4th Workshop on Controlled Natural Language (CNL’14), 20-22 August 2014, Galway, Ireland. Springer LNAI. (in press)

[5] Keet, C.M., Khumalo, L. Basics for a grammar engine to verbalize logical theories in isiZulu. 8th International Web Rule Symposium (RuleML’14), August 18-20, 2014, Prague, Czech Republic. Springer LNCS (in press).