More stuff: relating stuffs and amounts of stuff to their parts and portions

With all the protests going on in South Africa, writing this post is going to be a moment of detachment of it (well, I’m trying), for it concerns foundational aspects of ontologies with respect to “stuff”. Stuff is the philosophers’ funny term for those kind of things that cannot be counted, or only counted in quantities, and are in natural language generally referred to by mass nouns. For instance, water, gold, mayonnaise, oil, and wine as kinds of things, yet one can talk of individual objects of them only in quantities, like a glass of wine, a spoonful of mayonnaise, and a litre of oil. It is one thing to be able to say which types of stuff there are [1], it is another matter how they relate to each other. The latter is described in the paper recently accepted at the 20th International Conference on Knowledge Engineering and Knowledge management (EKAW’16), entitled “Relating some stuff to other stuff” [2].

Is something like that even relevant, when students are protesting for free education, among other demands? Yes. At the end of the day, it is part and parcel of a healthy environment to live in. For instance, one should be able to realise traceability in food and medicine supply chains, to foster practices, and check compliance, of good production processes and supply chains, so that you will not buy food that makes you ill or take medicines that are fake [3,4]. Such production processes and product logistics deal with ‘stuffs’ and their portions and parts that get separated and put together to make the final product. Current implementations have only underspecified ‘links’ (if at all) that doesn’t let one infer automatically what (or who) the culprit is. Existing theoretical accounts from philosophy and in domain ontologies are incomplete, so they wouldn’t help you further either. The research described in the paper solves this issue.

Seven relations for portions and stuff-parts were identified, which have a temporal dimension where needed. For instance, the upper-half of the wine in your wine glass is a portion of the whole amount of wine in the glass, yet that amount of wine was a portion of the amount of wine in the bottle when you opened it, and yet it has as part some amount of alcohol. (Some reader may not find this example nice, for it being with alcohol, but Western Cape, where Cape Town is situated, is the wine region of the country.) The relations are structured in a little hierarchy, as informally depicted in the figure below.

Section of the basic taxonomy of part-whole relations of [5] (less and irrelevant sections in grey or suppressed), extended with the stuff relations and their position in the hierarchy.

Section of the basic taxonomy of part-whole relations of [5] (less and irrelevant sections in grey or suppressed), extended with the stuff relations and their position in the hierarchy.

Their formal definitions are included in the paper.

Another aspect of the solution is that it distinguishes between 1) the extensional and intensional level—like, between ‘an amount of wine’ and ‘wine’—because different constraints apply (following from that latter can be instantiated the former cannot), and 2) the amount of stuff and the (repeatable) quantity, as one can have 1kg of many things.

Just theory isn’t good enough, though, for one would want to use it in some way to indeed get those benefits of traceability in the supply chains. After considering the implementation options (see paper for details), I settled for an extension to the Stuff Ontology core ontology that now also imports a special purpose module OMmini of the Ontology of Units of Measure (see also the Stuff Ontology page). The latter sounds easier than that it worked in praxis, but that’s a topic of a different post. The module is there, and the links between the OMmin.owl and stuff.owl have been declared.

Although the implementation is atemporal in the end, it is still possible to do some automated reasoning for traceability. This is mainly thought availing of property chains to approximate the relevant temporal aspects. For instance, with scatteredPortionOf \circ portionOf \sqsubseteq scatteredPortionOf then one can infer that a scattered portion in my glass of wine that was a portion of bottle #1234 of organic Pinotage wine of an amount of wine, contained in cask #3, with wine from wine farm X of Stellar Winery from the 2015 harvest is a scattered portion of that amount of matter (that cask). Or take the (high-level) pharmaceutical supply chain from [4]: a portion (that is on a ‘pallet’) of the quantity of medicine produced by the manufacturer goes to the warehouse, of which a portion (in a ‘case’) goes to the distribution centre. From there, a portion ends up on the dispensing shelf, and someone buys it. Then tracing any customer’s portion of medicine—i.e., regardless the actual instance—can be inferred with the following chain: scatteredPortionOf \circ scatteredPortionOf \circ scatteredPortionOf \sqsubseteq scatteredPortionOf

Sure, the research presented hasn’t solved everything yet, but at least software developers now have a (better) way to automate traceability in supply chains. It also allows one to be more fine-grained in the analysis where a culprit may be, so that there are fewer cases of needless scares. For instance, we know that when there’s an outbreak of Salmonella, then we only have to trace where the batch of egg yolk went (typically in the tiramisu served in homes for the elderly), where it came from (which farm), and got mixed with in the production process, while the amounts of egg white on your lemon merengue still would be safe to eat even when it came from the same batch that had at least one infected egg.

I’ll be presenting the paper at EKAW’16 in November in Bologna, Italy, and hope to see you there! It’s not a good time of the year w.r.t. weather, but that’s counterbalanced by the beauty of the buildings and art works, and the actual venue room is in one of the historical buildings of the oldest university of Europe.

 

References

[1] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

[2] Keet, C.M. Relating some stuff to other stuff. 20th International Conference on Knowledge Engineering and Knowledge Management EKAW’16). Springer LNAI, 19-23 November 2016, Bologna, Italy. (accepted)

[3] Donnelly, K.A.M. A short communication – meta data and semantics the industry interface: what does the food industry think are necessary elements for exchange? In: Proc. of Metadata and Semantics Research (MTSR’10). Springer CCIS vol. 108, 131-136.

[4] Solanki, M., Brewster, C. OntoPedigree: Modelling pedigrees for traceability in supply chains. Semantic Web Journal, 2016, 7(5), 483-491.

[5] Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2):91-110.

Advertisements

My gender-balanced book reviews overall, yet with much fluctuation

In one of my random browsing moments, I stumbled upon a blog post of a writer who had her son complaining about the stories she was reading to him, as having so many books with women as protagonists. As it appeared, “only 27% of his books have a female protagonist, compared to 65% with a male protagonist.”. She linked back to another post about a similar issue but then for some TV documentary series called missed in history, where viewers complained that there were ‘too many women’ and more like a herstory than a missed in history. Their tally of the series’ episodes was that they featured 45% men, 21% women, and 34% were ungendered. All this made me wonder how I fared in my yearly book review blog posts. Here’s the summary table and the M/F/both or neither:

 

Year posted Book Nr M Nr F Both / neither Pct F
2012 Long walk to freedom, terrific majesty, racist’s guide, end of poverty, persons in community, African renaissance, angina monologues, master’s ruse, black diamond, can he be the one 4 3 3 33%
2013 Delusions of gender, tipping point, affluenza, hunger games, alchemist, eclipse, mieses karma 2 3 2 43%
2014 Book of the dead, zen and the art of motorcycle maintenance, girl with the dragon tattoo, outliers, abu ghraib effect, nice girls don’t get the corner office 2 1 3 17%
2015 Stoner, not a fairy tale, no time like the present, the time machine, 1001 nights, karma suture, god’s spy, david and goliath, dictator’s learning curve, MK 4 2 4 20%
2016 Devil to pay, black widow society, the circle, accidental apprentice, moxyland, muh, big short, 17 contradictions 2 4 2 50%
Total 14 13 14 32%

 

Actually, I did pretty well in the overall balance. It also shows that were I to have done a bean count for a single year only, the conclusion could have been very different. That said, I classified them from memory, and not by NLP of the text of the books, so the actual amount allotted to the main characters might differ. Related to this is the screenplay dialogue-based data-driven analysis of Hollywood movies, for which NLP was used. Their results show that even when there’s a female lead character, Hollywood manages to get men to speak more; e.g., The Little Mermaid (71%) and The Hunger Games (55% male). Even the chick flick Clueless is 50-50. (The website has several nice interactive graphs based on the lots of data, so you can check yourself.) For the Hunger Games, though, the books do have Katniss think, do, and say more than in the movies.

A further caveat of the data is that these books are not the only ones I’ve read over the past five years, just the ones written about. Anyhow, I’m pleased to discover there is some balance in what I pick out to write about, compared to unconscious bias.

As a last note on the fiction novels listed above, there was a lot of talk online the past week about Lionel Shriver’s keynote on defense on fiction writing-what-you-like and having had enough of the concept of ‘cultural appropriation’. Quite few authors in the list above would be thrown on the pile of authors who ‘dared’ to imagine characters different from the box they probably would by put in. Yet, most of them still did a good job to make it a worthwhile read, such as Hugh Fitzgerald Ryan on Alice the Kyteler in ‘The devil to pay’, David Safier with Kim Lange in ‘Mieses Karma’, Stieg Larsson with ‘Girl with the dragon tattoo’, and Richard Patterson in ‘Eclipse’ about Nigeria. Rather: a terrible character or setting that’s misrepresenting a minority or oppressed, marginalised, or The Other group in a novel is an indication of bad writing and the writer should educate him/herself better. For instance, JM Coetzee could come back to South Africa and learn a thing or two about the majority population here, and I hope for Zakes Mda he’ll meet some women who he can think favourably about and then reuse those experiences in a story. Anyway, even if the conceptually problematic anti-‘cultural appropriation’ police wins it from the fiction writers, then I suppose I can count myself lucky living in South Africa that, with its diversity, will have diverse novels to choose from (assuming they won’t go further overboard into dictating that I would be allowed to read only those novels that are designated to be appropriate for my (from the outside) assigned box).

UPDATE (20-9-2016): following the question on POC protagonist, here’s the table, where those books with a person (or group) of colour is a protagonist are italicised. Some notes on my counting: Angina monologues has three protagonists with 2 POCs so I still counted it, Hunger games’ Katniss is a POC in the books, Eclipse is arguable, abu ghraib effect is borderline and Moxyland is an ensemble cast so I still counted that as well. Non-POC includes cows as well (Muh), hence that term was chosen rather than ‘white’ that POC is usually contrasted with. As can be seen, it varies quite a bit by year as well.

Year posted Book POC

(italics in the list)

Non-POC or N/A Pct POC
2012 Long walk to freedom, terrific majesty, racist’s guide, end of poverty, persons in community, African renaissance, angina monologues, master’s ruse, black diamond, can he be the one 8 2 80%
2013 Delusions of gender, tipping point, affluenza, hunger games, alchemist, eclipse, mieses karma 2 5 29%
2014 Book of the dead, zen and the art of motorcycle maintenance, girl with the dragon tattoo, outliers, abu ghraib effect, nice girls don’t get the corner office 2 4 33%
2015 Stoner, not a fairy tale, no time like the present, the time machine, 1001 nights, karma suture, god’s spy, david and goliath, dictator’s learning curve, MK 4 6 40%
2016 Devil to pay, black widow society, the circle, accidental apprentice, moxyland, muh, big short, 17 contradictions 3 5 38%
Total 19 22 46%

 

Brief report on the INLG16 conference

Another long wait at the airport is being filled with writing up some of the 10 pages of notes I scribbled while attending the WebNLG’16 workshop and the 9th International Natural Language Generation conference 2016 (INLG’16), that were held from 6 to 10 September in Edinburgh, Scotland.

There were two keynote speakers, Yejin Choi and Vera Demberg, and several long and short presentations and a bunch of posters and demos, all of which had full or short papers in the (soon to appear) ACL proceedings online. My impression was that, overall, the ‘hot’ topics were image-to-text, summaries and simplification, and then some question generation and statistical approaches to NLG.

The talk by Yejin Choi was about sketch-to-text, or: pretty much anything to text, such as image captioning, recipe generation based on the ingredients, and one even could do it with sonnets. She used a range of techniques to achieve it, such as probabilistic CFGs and recurrent neural networks. Vera Demberg’s talk, on the other hand, was about psycholinguistics for NLG, starting from the ‘uniform information density hypothesis’ compared to surprisal words and grammatical errors and how that affects a person reading the text. It appears that there’s more pupil jitter when there’s a grammar error. The talk then moved on to see how one can model and predict information density, for which there are syntactic, semantic, and event surprisal models. For instance, with the semantic one: ‘peter felled a tree’: then how predictable is ‘tree’, given that its already kind of entailed in the word ‘felled’? Some results were shown for the most likely fillers for, e.g., ‘serves’ as in ‘the waitress serves…’ and ‘the prisoner serves…’, which then could be used to find suitable word candidates in the sentence generation.

The best paper award went to “Towards generating colour terms for referents in photographs: prefer the expected or the unexpected?”, by Sina Zarrieß and David Schlangen [1]. While the title might sound a bit obscure, the presentation was very clear. There is the colour spectrum, and people assign names to the colours, which one could take as RGB colour value for images. This is all nice and well on the colour strip, but when a colour is put in context of other colours and background knowledge, the colours humans would use to describe that patch on an image isn’t always in line with the actual RGB colour. The authors approached the problem by viewing it as a multi-class classification problem and used a multi-layer perceptron with some top-down recalibration—and voilá, the software returns the intended colour, most of the times. (Knowing the name of the colour, one then can go on trying to automatically annotate images with text.)

As for the other plenary presentations, I did make notes of all of them, but will select only a few due to time limitations. The presentation by Advaith Siddhartan on summarisation of news stories for children [2] was quite nice, as it needed three aspects together: summarising text (with NLG, not just repeating a few salient sentences), simplifying it with respect to children’s vocabulary, and editing out or rewording the harsh news bits. Another paper on summaries was presented by Sabita Acharya [3], which is likely to be relevant also to my student’s work on NLG for patient discharge notes [4]. Sabita focussed on trying to get doctor’s notes and plan of care into a format understandable by a layperson, and used the UMLS in the process. A different topic was NLG for automatically describing graphs to blind people, with grade-appropriate lexicons (4-5th grade learners and students) [5]. Kathy Mccoy outlined how they were happy to remember their computer science classes, and seeing that they could use graph search to solve it, with its states, actions, and goals. They evaluated the generated text for the graphs—as many others did in their research—with crowdsourcing using the Mechanical Turk. One other paper that is definitely on my post-conference reading list, is the one about mereology and geographic entities for weather forecasts [6], which was presented by Rodrigo de Oliveira. For instance, a Scottish weather forecast referring to ‘the south’ is a different region than that of the UK as a whole, and the task was how to generate the right term for the intended region.

inlg16parts

our poster on generating sentences with part-whole relations in isiZulu (click to enlarge)

My 1-minute lightning talk of Langa’s and my long paper [7] went well (one other speaker of the same session even resentfully noted afterward that I got all the accolades of the session), as did the poster and demo session afterward. The contents of the paper on part-whole relations in isiZulu were introduced in a previous post, and you can click on the thumbnail on the right for a png version of the poster (which is less text than the blog post). Note that the poster only highlights three part-whole relations from the 11 discussed in the paper.

ENLG and INLG will merge and become a yearly INLG, there is a SIG for NLG, (www.siggen.org), and one of the ‘challenges’ for this upcoming year will be on generating text from RDF triples.

Irrelevant for the average reader, I suppose, was that there were some 92 attendees, most of whom attended the social dinner where there was a ceilidh—Scottish traditional music by a band with traditional dancing by the participants—were it was even possible to have many (traditional) couples for the couples dances. There was some overlap in attendees between CNL16 and INLG16, so while it was my first INLG it wasn’t all brand new, yet also new people to meet and network with. As a welcome surprise, it was even mostly dry and sunny during the conference days in the otherwise quite rainy Edinburgh.

 

References

(links TBA shortly—neither Google nor duckduckgo found their pdfs yet)

[1] Sina Zarrieß and David Schlangen. Towards generating colour terms for referents in photographs: prefer the expected or the unexpected? INLG’16. ACL, 246-255.

[2] Iain Macdonald and Advaith Siddhartan. Summarising news stories for children. INLG’16. ACL, 1-10.

[3] Sabita Acharya. Barbara Di Eugenio, Andrew D. Boyd, Karen Dunn Lopez, Richard Cameron, Gail M Keenan. Generating summaries of hospitalizations: A new metric to assess the complexity of medical terms and their definitions. INLG’16. ACL, 26-30.

[4] Joan Byamugisha, C. Maria Keet, Brian DeRenzi. Tense and aspect in Runyankore using a context-free grammar. INLG’16. ACL, 84-88.

[5] Priscilla Morales, Kathleen Mccoy, and Sandra Carberry. Enabling text readability awareness during the micro planning phase of NLG applications. INLG’16. ACL, 121-131.

[6] Rodrigo de Oliveira, Somayajulu Sripada and Ehud Reiter. Absolute and relative properties in geographic referring expressions. INLG’16. ACL, 256-264.

[7] C. Maria Keet and Langa Khumalo. On the verbalization patterns of part-whole relations in isiZulu. INLG’16. ACL, 174-183.