A few notes and tips for forming new words

Recently, the COMMUTERM project was accepted, where we will use crowdsourcing to develop an isiZulu terminology for, first, computer science, and then in another discipline to test genericity of the approach and the tools. One of the components is that new words will have to be invented: while there are isiZulu words for the computer mouse (igundane), there is none so far for, say, ‘computational complexity of an algorithm’, or even ‘algorithm’ (though there’s a tentative candidate for the latter).

So, how would you go about inventing them? In a conversation about that and a less daunting example, the spreadsheet ‘table’, I asked whether the isiZulu word for table—itafula—could be reused. The answer was not just a “no”—a physical table is a very different kind of thing so you can’t use the same word[1], and likewise in several other languages—, but with the addition in a tone of embarrassment that there weren’t that many isiZulu words and “even itafula originates from Afrikaans”. I countered that loan words, modification and adoption are the norm, rather than the exception, in many languages (well, at least the ones I know of), and gave a few sketchy examples.

What I hope to achieve here is to structure that somewhat with examples and ‘types’ of adoptions in an accessible way. As input I use my memory of a seminar I attended in 2006 about that very topic (the Language and Communication Technologies colloquia at the Free University of Bozen-Bolzano, where the computer science faculty operates in a trilingual mode) and the languages I have learned over the years. If you have better sources, I’d be grateful if you inform me about them, which, in turn, may improve the outcome of the COMMUTERM project. I will divide it into stages of adoption of a new word, and then describe and illustrate the patterns I know of that have been used to invent new ones.

Stages of adoption

There a different stages of adoption of a new word in a language, resulting from nearness to overlapping language regions and, in these day, globalization. I am not talking of intentional usage of a foreign word, such reading in an English language text “spitting chewing gum on the street is Verboten”: English has a word for verboten (‘forbidden’), but the use of the German word is intended to convey a sense of ‘really strictly forbidden’. Instead, I am considering primarily the first one of three cases: 1) language X has some word abcd whereas language Y does not have a word for the entity but wants or needs it, 2) a speaker of language X does not speak Y well, and makes a Y-ification of abcd and that somehow creeps into the language[2], 3) there is an existing word for abcd in Y, but for some reason (whichever it may be), abcd is used anyway[3].

The first stage is just plain borrowing of abcd from X by Y; for instance, guerilla (Spanish, Sp.) or polder (Dutch, Ned.) or niche (French, Fr.) in English language. Sometimes it remains at this stage, i.e., the loan word is adopted as one’s own as is, be this in the original meaning or not. Regarding latter, you might find the following example mildly entertaining. We colloquially used the word ‘floppy’ as short-hand for ‘floppy disk’ in the Netherlands, but the ‘stiffy’ of ‘stiffy disk’ never really made it—there are translations of ‘stiffy’ into Dutch, but none that fits well, and we have the metric system, so inches were not an option either. In the time of their co-existence, we had to compare them in some way nevertheless, which ended up as grote floppy [disk/diskette] (‘big floppy’) versus a kleine floppy [disk/diskette] (‘small floppy’) or a zachte floppy [disk/diskette] (‘soft floppy’) versus a harde floppy [disk/diskette] (‘hard floppy’, even in the urban dictionary). After the real floppy disk had its exit, the stiffy disk became a plain floppy (Ned., plural: floppies) used as a noun without adjective, or plain diskette (Ned.). Writing this now, it sounds like lunacy, but it made perfect sense back then and everybody understood what you meant with these terms.

The second stage is adaptation of the word, and this also may be the final stage. Adaptation leaves the word largely intact, but modifies it a little according to rules of word formation or grammar of Y. For instance, the verb ‘to browse’ is modified in Dutch with Dutch verb rules: the verb is now browsen and jij (you) browst, wij (we) browsen, etc., and the German (Ger.) Apfelstrudel is strudel di mele in Italian (It.) where Strudel is untranslatable and Apfel is mele.

The third stage, if it occurs, is complete adoption after adaptation or invention of a new word. For instance, the English (En.) ‘to educate’ has its origin in Latin (Lat.) educare, ‘democracy’ from the Greek demos kratos, and ‘cookie’ from koekje (Ned., a longer list). The direct import ‘taxi’ originates from Greek—supposedly, all words with an x in it, are from the Greek language—and contradiction in terms is a 1-to-1 translation from contradictio in terminis (Lat.). There are very many such words in English that have their origin in other languages, and there are plenty of etymological dictionaries you may like to check (e.g., word origin’s list with stories and etymonline with just a very brief note for each entry).

Different regions may for one reason or other stay in one stage or another with some word. For instance, in the USA, ‘kindergarten’ is a common term, whereas elsewhere ‘pre-school’ is used. I won’t consider all the why-this issues here, only what. What I have observed is that different cultures in countries are more or less or not at all fanatic when it comes to their vocabulary. For instance, there is the Academie Francaise who is in charge of imposing in a top-down fashion French words for otherwise loan words (e.g., the recent mot-dièse for ‘hash tag’), the Flemish are generally more inventive than the Dutch (e.g., helikopter (Ned.) vs. wentelwiek (Be.) for ‘helicopter’), and the speakers of Italian, Spanish, and German typically come up with own words. However, comparing computing terms, this is not always the case: besturingssysteem (Ned., new word) versus sistema operativo (It., direct translation).

Types of changes

This is my attempt at structuring the ways of inventing adaptations and word inventions. I did glean a bit from [1], notably that it motivated me to add the distinction between ‘there is abcd in language X, now find one in Y’ versus the totally ab initio word creation, in the sense of ‘we created this new thingy as the first thingy in the world, now name it’. Within the COMMUTERM project, we mostly face the former, although some ideas on how people in other languages deal with the latter may be of help for the former if there is no feasible translation and you have to go back to the drawing board of word creation. I’ll go through them in the following order: more or less a translation, Y-ify a noun, Y-ify a verb, and word formation.

1-to-1 translation.

Direct translation of abcd in X to an existing word in Y, i.e., in both languages the new word or reuse of an existing word for another meaning happens in the exact same way. Examples:

  • (in computing) mouse (En.) – igundane­ (Zu.) – muis (Ned.) – topo (It.)
  • (in computing) memory (En.) – geheugen (Ned.) – memoria (It.)
  • email (En.) – correo electronico (Sp.)
  • database (En.) – base de datos (Sp.)
  • ontology (En.) – ontologie (Ned., Ger.) – ontologia (It., Sp.), although, in this case, English has taken it from philosophy, which has taken it from Latin.

There are many more terms also in computer science of which you (well, just the English-speakers) may think they are English but have a root in another language and English borrowed from that or adopted it fully. To back this up, just in case you were thinking everything comes from English: check out the etymology of, e.g., data (from datum (Lat.)), algorithm (after the Persian mathematician Al-Khwarizmi), to compute (from Latin), printer created as noun of print (from Old French preinte, which, in turn, comes from premere (Lat.)).

Almost a 1-to-1 translation.

It looks like a 1-to-1 translation of existing words, but there is a slight semantic difference, as if a nitpicking refinement occurred in the search for a translation that possibly indicates a slight difference in underlying meaning or perhaps it was felt unavoidable because a suitable equivalent was not available in Y. Examples:

  • (in computing) operating system (En.) – Betriebssystem (Ger.), where the betriebs– is literally the ‘steering’ of the system, not the ‘operating’.
  • (in computing) keyboard (En.) – toetsenbord (Ned.), i.e., literally, the keysboard, for there are multiple keys on a keyboard, not just one.
  • (in computing) save (En.) – opslaan/bewaren (Ned.) – speichern (Ger.), which means ‘to store’ in Dutch and German, not ‘to save’.
  • (in computing) file (En.) – documento (It.).

With respect to some offline comments I received, I’ll rephrase the latter point differently (perhaps too bluntly): if you cannot find an exact 1-to-1 translation but only some sort of approximation, then do not worry about that and do not put down your own language, as there are very many such cases with other languages. If you do not believe that, I can lend you a few of my bi-directional dictionaries to check: they are all inconsistent.

Partial translations.

Partial translations, I suspect, are due to compound forms where the component-words were introduced at different times or it has a readily available equivalent in Y. Examples:

  • Email address (En.) – indirrizo email (It.) – ikheli le-e-mail (Zu.)

Y-ify a noun from X.

This can be in two ways: 1) typically, change the beginning or ending of a noun to conform to the word forms/gender/alphabet of Y, 2) change the plural to adhere to the grammar for plurals of Y. One perhaps could count a third way as being the article used with it. Examples:

  • Radio (En.) – iRadio (Zu.), i.e., Zulufy a foreign word by putting an i– in front of the noun.
  • Computer (En.) – computadora (Sp.)
  • Between English and Roman languages, such as Italian and Spanish, there are quasi rules as well: nouns with –ción (Sp.) and –zione (It.) often end up as -tion in English (e.g., educa-) and –(a)dor/-(a)dora (Sp.) as -ter or -tor (e.g., investiga-).
  • Niche (En.) – nicchio (It., masculine) / nicchia (It., feminine). The nicchio ‘recess in the wall’ travelled to France, and back to Italy came the new concept of ‘niche of a species’, for which the original term was modified into nicchia (It.) to denote the conceptual distinction, i.e., a gender change. English took niche (Fr.) for both.
  • Preparations, arrangements (Eng.) are amalungiselelo (Zu.), but software settings, being similar in idea of arrangements but not the same, is isilungiselelo (Zu.), i.e., having changed noun class (from ama- to isi-).

On the other hand, I noticed that violating certain rules resulted in grumbling. The isiZulu interface of Google has idrayivu for the ‘drive’, but although the i- is following the same as mentioned the first item, above, the few people I asked were not happy with it, because the word contains an r and isiZulu does not have the r in the alphabet.

Y-ify a verb from X.

This is grammatically more elaborate to explain than the case for the nouns, because quite a few languages have a more structured grammar than English. Let me first give an example for the plain grammar rule, present tense, for ‘to speak’ in Spanish and isiZulu in the following table (omitting the you-formal).

 

Spanish

isiZulu

  hablar  root + ukukhuluma + root
I hablo -o ngikhuluma ngi-
You (singular) hablas -as ukhuluma u-
He/she/it habla -a ukhuluma u-
We hablamos -amos sikhuluma si-
You (plural) hablais -ais nikhuluma ni-
They hablaron -aron bakhuluma ba-

So, for instance, we have the English verb ‘to program’ some application and in Spanish programar, then ‘we program’ in Spanish ends up as programamos, which results from the combination of the root, which is obtained by removing the -ar from the verb, and appending the correct ending to indicate the ‘we’, i.e., -amos. The use of the gerund is composed from the auxiliary verb estar (with its root est- + -amos for the ‘we’) together with the root + -ando for the gerund, and ‘we are programming’ is in Spanish thus estamos programando. Hypothetically, if ukuprogram would be the verb for ‘to program’ in isiZulu, then ‘we program’ would be siprogram (it is not, though, see below).

Other examples of y-ifications/x-ifications—i.e., be this from X to Y or Y to X—are copiare (It.), copiar (Sp.), kopieeren (Ned.), to copy (En.), and studiare (It.), estudiar (Sp.), studeren (Ned.), to study (En.), where the Italian ­-are and Dutch –en are like the Spanish –ar and isiZulu uku- as above.

New terms for essentially different conceptualizations.

They are not direct translations or near-translations, but include also conceptually totally different ones (even though, loosely, they are translated as such). A reason why I include them as a separate option, is because here we are not even aiming at a translation, but it is intentionally different.

  • IT: Information Technology (En.) – EDV: Elektronische Datenverarbeitung (Ger.), which is, literally translated ‘electronic data processing’.
  • Computer Science (En.) – informatika (Ned.) – informatica (It., Sp.) – Informatik (Ger.): literally: the science of computers (which it is not) versus the science of information (much closer to it).

New words, using a language’s features.

Germanic languages have the fun of putting words together to create a new word with a new meaning. Arabic and Nguni language are much more semantics oriented, where the underlying idea of the stem can be reused for conceptually related entities. Examples (I looked up most in the dictionary):

  • -fund- (Zu.): something with studying/learning. ukufunda: to learn, read. umfundisi (high tones): teacher, umfundisi (low tones): preacher. imfundiso: teaching/doctine. ulwazi lemfundo: education (note: the dictionary said imfundo: knowledge, but the English ->isiZulu section says ulwazi, which I have heard before, ukwazi, and imfundiso (an example of just one of the myriad inconsistencies in bi-directional dictionaries).
  • -sebenza- (Zu.): something on working. ukusebenza: to work. umsebenzi: the work/job. abasebenzi: workers. alisebenzi: broken (not-working). insebenzo: wages (the fruit of one’s labour). uhlelokusebenza: software.

For English, a list of principles for word creation exists already, which I summarise here (with international examples added) to give you an idea, as they transfer over to several other languages as well.

Real compounding: joining words to make a new one: toothbrush and tablecloth. This is a very common feature of Germanic languages, and one of the more entertaining examples being Eisenbahnknotenpunkhinundherschieber (Ger.), which used to be an actual job title[4]. Uhlelokusebenza (Zu., ‘software’) sounds a lot like real compounding as well, based on -hlelo + -sebenza: the grammar/arrangement is working, or some such similar translation for the word components, which, to be honest, is a fabulous term compared to ‘software’ (En.). An approximation of compounding is putting a dash between the words, as in  ‘smoke-free’ (which is in Dutch just one word: rookvrij). UPDATE (29-8-2013): I just discovered there is an entire article on compounding in isiZulu [2].

Conversion: changing ‘class’ in the sense of making a noun out of a verb or vice versa, which is very common in English; e.g., to print -> printer, to push ->pusher.

Affixation: adding a prefix or a suffix; e.g., making a noun out of an adjective by adding –ness (happiness), from noun to adjective by adding –al (regional), from verb to adjective by adding –able (drinkable). The same holds for other languages, with their specific affixes; e.g.. –bar (Ger.), ­-baar (Ned.), –bile (It.) does the same in those languages as –able in English, and likewise –heid (Ned.) and –heit (Ger.) work alike the rules for the English -ness.

Other: clipping a longer word into a shorter version (flu for influenza), blending words together (smog, from smoke+fog).

Closing remarks

Ukwakhuhlelo means programming (noun), where -hlehlo is the root for ‘grammar’/‘arrangement’ (u-, izin-) and -ukwakh- relates to ‘to build’, i.e., based on compunding to form a new word. What can be modified to create a term for the verb ‘to program’? Following the basics for verb-ifying a noun by putting uku– in front of it, I would make a verb from the noun as ukukwakhuhlelo, but maybe you are more creative, like the inventor of isikhahlamezi, Thokozani Nene, was (‘fax’, and it sounds a lot nicer to the ear than one of the other translations the dictionary provides: ifeksi). Isikhahlamezi is an example of the kind of word creation where, as [1] notes, the purpose was not to create transparent output (recoverable from its origins, for there is none in this case), but to create a term with certain desired features that match word characteristics of the language, such as number of vowels and syllables.

As a last note on terms and given the readership of this blog, and having mentioned knowledge (ulwazi) before, which is easily memorizable, here it goes for ‘logic’, where the first term is easy to remember, but the other two require some practice to pronounce and remember: ilojiki; ukwazi ukuqonda nokuhlazulula ngohlelo izindaba; ukuhlela ngokulandelanisa.

Either way, I hope the range of options has given you some ideas for borrowing, adapting, and creating new words, which can give you a head start in the crowdsourcing game that we aim to launch late September/early October.

References

[1] Ronneberger-Siebold, E. On useful darkness: loss and destruction of transparency by linguistic change, borrowing, and word creation. Yearbook of Morphology 1999. Booij, G.E., Marle, J. (Eds.). Springer. 2001. V, pp97-120.

[2] Buthelezi, T.M. Exploring the Role of Conceptual Blending in Developing the Extension of Terminology in isiZulu Language. Alternation, 2008, 15(2):181-200.

Thanks to Charmaine and Nokubonga for the lively conversation about and suggestions for some of the isiZulu terms.


[1] Double-checking the spelling of itafula in the dictionary now, I noticed there is an entry “amathebula (arith. tables)” in the Scholar’s Zulu dictionary; what about that for the spreadsheet tables?

[2] An example of the latter may be the expression die Treppe herunter schendieren (going down the stairs), where schendieren is a germanification of the Italian scendere (thanks to my former colleague Andrea at FUB who mentioned this example).

[3] E.g., the Afrikaans word braai is used by the English as home language speakers in South Africa, even though elsewhere it is called barbeque.

[4] Eisen iron, bahn road, eisenbahn railway, knoten knots, punkt point, knotenpunkt crossroad or spaghetti junction, eisenbahnknotenpunkt railway point where the train can change tracks, hin to, und and, her fro, schieber pusher: Eisenbahnknotenpunkhinundherschieber is the guy who manually pushes the lever backward and forward so that the train moves onto the right railway track. In the late 1920/early 1930s, it was the longest German word in use (which my grandfather had happened to learn in the few years he went to school in Germany before the family moved back to the Netherlands before WW II).

Ontologies and Knowledge bases lecture notes for 2013

The lecture notes for the ontologies and knowledge bases module (COMP720) for semester 2 in 2013 are online available now. I’ve updated them compared to last year’s installment (mentioned here): in addition to the regular changes, like updates to reflect the advances made in the past year in ontology engineering, better explanations in several sections, and more examples, it includes the DL primer by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!), more exercises, and answers to selected exercises.

As last year, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with top-down ontology development using foundational ontologies, then bottom-up ontology development to extract knowledge from ‘legacy’ representations, and finally (perhaps too briefly), methods and methodologies. The advanced topics are balanced in two directions, where the first one certainly will be covered and the second one if time permits: ontology-based data access applications (i.e., an ontology-drive information system) and temporal ontologies.

It is essentially still an evolving document, and relative completeness of sections varies slightly. Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.