Social impact issues with LLMs – a brief write-up of my list from the SIGdial’23 panel

The SIGdial 2023 organisers wanted a panel at the jointly held SIGdial 2023 and INLG 2023 conferences in Prague that took place last week. Svetlana Stoyanchev, as PC Chair in charge of it, proposed “Social impact of LLMs”. It was to follow the keynote talk by Ryan Lowe of OpenAI, the company behind the popular ChatGPT and also Whisper for speech, and he also would participate. I ended up in the panel as well (coming from the NLG angle of the matter), as did Ehud Reiter from the University of Aberdeen (UK) and Malihe Alikhani from Northeastern University (USA), with David Traum from the University of Southern California (USA) as moderator.

There was to be a 3-5 minutes opening statement by each panel member, which I had duly prepared for, but that did not happen. What happened first, was an unassuming 1-liner with name, affiliation, and area of specialisation. It then proceeded with questions the likes of “can you provide your view on how LLMs benefit society?”, “What is more important: factualness or fluency?”, and “which ethical concerns about LLMs are overstated?”.

I didn’t get on the panel for that sort of stuff. I was cajoled into saying ‘yes’ to the invitation because I already had compiled a partial list of social issues with LLMs. I’m teaching a module on “social issues and professional practice” to first-years in computer science at UCT and touched upon it in late July and early August at the start of the semester and I had mentioned some of it at a research ethics workshop at UCT. Note that ‘issues’ can be interesting and provide ideas for new research projects, provided they’re not inherent limitations of the theory, method, technique, tool, or practice.

As preparation for the panel, I tried to structure the list into a taxonomy of sorts to try to maximise information density in the short time I thought I would have. So when the moderator opened the floor for questions from the audience and no-one queued up instantly, I jumped in the gap. It might help to get the audience into action, too, or so I thought. And someone had to state the unpleasantries and challenges. So here’s that taxonomy-like list of social issues I managed to mention (in a nutshell and still incomplete):

1. In creation of LLMs

1.1 Resource usage (sensu climate change issues):

1.1.1 Electricity use for the computations training the LLMs;

1.1.2 Water use, used in data centre cooling where the computation takes place.

1.2 Exacerbating disparities, in that the less well off can’t compete with the rich corporations in The North and end up crowded out and as consumers only (and possibly also some colonialism, as noted by the speech researchers on Maori w.r.t. OpenAI’s Whisper).

1.3 Data (text) collection, notably regarding:

1.3.1 IP/copyright issues of the text ingested to generate the LLM;

1.3.2 The lack of trust (or the angst) on what data went and go into the LLMs (the ‘could be your emails, googledocs, sharepoint files etc.’), that no-one was asked whether they consented to their content being grabbed for that purpose, and when some would have disapproved of inclusion if they could, there’s the powerlessness in that it seems one neither can opt out nor verify if one’s text was excluded if opt-out were to be possible.

1.4 Psychological harm done unto the ‘garbage collectors’, such as the Kenyans in clickfarms, who are the manual labourers hired to remove the harmful content so that the system’s responses are clean and polite.

2. In content of LLMs

2.1 Bias, amplifying the bias in the source text the LLM trained on and that may be undesirable (e.g., gender).

2.2 Cultural imperialism:

2.2.1 Coverage/performance disparities. The LLM has ingested more from one region than another, so its output may not be relevant to the locale (say, to people in the RSA) or culture where it is used but rather output something that is applicable to people in the USA as if that were valid for the whole world;

2.2.2 Language. Whose language does it use in the interaction? On pushing out language varieties and dialects that are less well-represented in the training dataset, reducing diversity in expression, and steering towards homogenization.

3. In use of LLMs

3.1 Work:

3.1.1 It creates more work without getting extra resources for it;at least so far it has created more work for, among others, us lecturers than it purportedly would save (as if we didn’t have enough to do already);

3.1.2 It puts people out of jobs; this is for many a novel computing technique and should be managed but isn’t.

3.2 Information-seeking behaviour affecting democracy. The ‘one answer’ versus equally easy accessible answer options to assess multiple sources as part of information-seeking in democratic discourse, which is problematic due to fabrications (‘hallucinations’) and being fickle in property (content) dropping and an LLM may be amenable to manipulation for use as a propaganda machine.

3.3 Learning avoidance. There’s a difference between using LLMs as time-saver when one has the skill versus skipping learning competencies at school and university, such as writing and summarisation of course material when learning a subject.

3.x [there surely is more but I didn’t even have enough time to elaborate on item 3.3 already.]

The list in my lecture and workshop slides also included issues with misinformation, disinformation, privacy, and the unclear culpability attribution when there are bugs in the code it generates, which I hadn’t gotten around to include due to time constraints.

I can very well imagine the list will change, not only ending up longer, but also that more research may solve some of the issues so they can be removed. For instance, currently, language varieties descend into getting mixed onto one cocktail (they also did when David Traum tried with several Englishes) but it’s an interesting research question how one can (re)train an LLM to detect them in the training corpus and output it correctly, be this for written text or speech. It does not sound like an insurmountable problem to solve. Fear may be addressed with openness and education; policies might address some others.

Rotating Kafka head/disco ball in the city centre of Prague. (Source: I took it a few days before the INLG’23 conference)

While I was quickly going through my list, one attendee had walked over to the microphone and so I ended it at item 3.3. The question was about the impact of LLMs on the research community. The panel was called closed soon thereafter and lively comments followed when we all strolled into the conference welcome reception that took place at the same venue. I was pleased to hear those comments. More public debate in the panel session, however, would have been better for everyone compared to relegating it to the reception. Whether the muted response during the panel session was due to it having been a long day already—a great keynote talk by Emmanuel Dupoux, two long-paper sessions with interesting research, a poster session, and Ryan’s keynote—or due to it being recorded or for some other reason, I don’t know. Perhaps it is also up for debate whether it was wise to speak up. But no-one saying anything about some of the challenges with the social impact of LLMs in society was, in my view, not an acceptable option either.

To close off this blog post, I must note that there are more lists on social issues with LLMs and there’s quite some overlap between those resources and the taxonomy-like list described above. Among others: I can suggest you read this or that paper, or, if you’re short on time, have a look here or here that all have more explanatory text and references than this blog post.

What can you do when you have to stay at home?

Most people may not be used to having to stay at home. Due to a soccer (football) injury, I had to stay put for a long time, yet, I hardly ever got bored (lonely, at times, yes, but doing things makes one forget about that, be content with one’s own company, and get lots of new knowledge experiences along the way). As a silver lining of that—and since I’m missing out on some social activities now as well—I’m compiling a (non-exhaustive) ‘what to do?’ list, which may give you some idea(s) to make good use of the time spent at home, besides working for home if you can or have to. They’re structured in three main categories: enriching the mind, being creative, and exercising the body, and there’s an ‘other’ category at the end.

 

Enrich the mind

 

Leisure reading

If you haven’t signed up for the library, or aren’t allowed to go there anymore, here are a few sources that may distract you from the flood of COVID-19 news bites:

  • Old novels for free: The Gutenberg project, where people have scanned and typed up old books.
  • Newer novels for free: here’s an index of free books, or search for ‘public domain books’ in your favourite search engine.

 

Learning

  • A new language to read, speak, and write. Currently, the most popular site for that is probably Duolingo. If you’re short on a dictionary: Wordreference is good for, at least, Spanish, Italian, and English, Leo for German<->English, and isiZulu.net for isiZulu<->English, to name but a few.
  • A programming language. There are very many free lessons, textbooks, and video lectures for young and old. If you have never done this before, try Python.
  • Dance. See ‘exercises’ below.
  • Some academic topic. There are several websites with legally free textbooks, such as the Open Textbook Archive, and there is a drive toward open educational resources at several universities, including UCT’s OpenUCT (which also has our departmental course notes on computer ethics), and there are many MOOCs.
  • Science experiments at home. Yes, some of those can be done at home, and they’re fun to do. A few suggestions: here (for kids, with household stuff), and here, or here, among many sites.

 

Be creative

 

Writing

  • Keeping a diary may sound boring, but we live in interesting times. What you’re experiencing now may easily be blurred by whatever comes next. Write it down, so you can look back and reflect on the pandemic later.
  • Write stories (though maybe don’t go down the road of apocalypses). You think you’re not creative enough for that? Then try to re-tell GoT to someone who hasn’t seen the series, or write a modern-day version of, say, red riding hood or Romeo & Juliet.
  • Write about something else. For instance, writing this blog post took me as much time as I would otherwise have spent on two dance classes, this post took me three evenings + another 2-3 hours to write, and this series of posts eventually evolved into a textbook. Or you can add a few pages to Wikipedia.

 

Arts

These activities tend to call for lots of materials, but those shops are possibly closed already. The following list is an intersection of supermarket-materials and artsy creations.

  • Durable ‘bread’ figures with salt dough, for if you have no clay. Regular dough for bread perishes, but add lots of salt, and after baking it, it will remain good for years. The solid dough allows for many creations.
  • Food art with fruit and vegetables (and then eat it, of course); there are pictures for ideas, as well as YouTube videos.
  • Paper-folding and cutting to make decorations, like paper doll chains, origami, kirigami.
  • Painting with food paints or make your own paint. For instance, when cooking beetroot, the water turns very dark red-ish—don’t throw that away. iirc onion for yellow and spinach for green. This can be used for, among others, painting eggs and water-colour painting on paper. Or take a tea sieve and a toothbrush, cut out a desired figurine, dip the toothbrush in the colour-water and scrape it against the sieve to create small irregular drops and splashes.

  • Life-size toilet roll elephant figures… or even toilet roll art (optionally with paper) 😉
  • Knitting, sewing and all that. For instance, take some clothes that don’t fit anymore and rework it into something new (trousers into shorts, t-shirt as a top, insert colourful bands on the sides).
  • Colourful thread art, which requires only a hammer, nails, and >=1 colours of sewing threads.

 

Exercise that body

one of the many COVID-19 memes (source: passed by on FB)–Let’s try not to gain too much weight.

Barbie memes aside, it is very well possible to exercise at home, even if you have only about 1-2 square meters available. If you don’t: you get double the exercise by moving the furniture out of the way 🙂

  • Yoga and pilates. There are several websites with posters and sheets demonstrating moves.
  • Gym-free exercises, like running on the spot, making a ‘steps’ from two piles of books and a plank and doing those steps or take the kitchen mini-ladder or go up and down the stairs 20 times, push-ups, squats, crunches, etc. There are several websites with examples of such exercises. If you need weights but don’t have them: fill two 500ml bottles with water or sand. Even the NHS has a page for it, and there are many other sites with ideas.
  • Dance. True, for some dance styles, one needs a lot of space. Then again, think [back at/about] the clubs you frequent[ed]: they are crowded and there isn’t a lot of space, but you still manage(d) to dance and get tired. So, this is doable even with a small space available. For instance, the Kizomba World Project: while you’d be late for that now to submit a flashmob video, you still can practice it at home, using their instruction videos and dance together once all this is over. There are also websites with dance lessons (for-payment) and tons of free instruction videos on YouTube (e.g., for Salsa and Bachata—no partner? Search for ‘salsa shines’ or ‘bachata shines’ or footwork that can be done on your own, or try Bollywood or a belly dance workout [disclaimer: I did not watch these videos]).
  • Zumba in the living room?

 

Other

Ontologically an awful category, but well, they still are good for keeping you occupied:

 

If you have more low-cost ideas that require little resources: please put them in the comments section.

p.s.: I did a good number of the activities listed above, but not all—yet.

This blog is now 10 years old

Screen Shot 2016-04-09 at 12.34.09Writing the title of this post does make me wonder how it happened. That blogs are still being read, WordPress that hosts it is still around, and I’m still in academia writing about research and other topics. Honestly, when I started dabbling in writing blog posts, I didn’t expect to last it this long, nor when it celebrated 5 years that another 5 would be added. Nor did I expect to end up persistently receiving typically over 1000 visitors/month, which is fairly popular given the blog’s topics, and surpass the 100 000th visitor some time last year. Admitted, there are not a lot of comments, but then, nor do I comment a lot on other blogs (uncontrollable digital footprint and all that). So, I sat down and wrote a few reflections, which might be of use to someone thinking of starting a (science-oriented) blog or having a dip in posting.

Some pros

Having a blog is useful for learning to try to simplify one’s own research papers into a roughly presentable ‘sound byte’ that can be read in a few minutes. (I don’t get a lot of click-throughs to my papers though, so it may not help with getting people to actually read your papers.)

It is useful to push oneself to take notes during conferences and read those papers, and therewith also reflect on the conferences and workshops one attended (Which papers were actually interesting? Which ones might be useful for your own research? Did someone present a cool solution to a problem you knew of but hadn’t had [at all/enough] time for to solve but are happy someone did?).

If you don’t know what to write about: present/discuss a paper you found interesting, or disagreed with. This helped me at the start to post and not let the blog fizzle out. Admittedly, I have plenty of accepted papers now so as to space it out to ‘market’ one per month. Nevertheless, giving ‘airtime’ to others has its merits, especially when they are in the area of your research, for it shows you actually do read other papers, critically. The offline thank-yous are just a bonus.

That said, it is nice to get feedback from other researchers and readers on the posts; and it really doesn’t matter whether that is left as a comment on the blog, by email, or in person. While they may be pleased I mentioned their paper in a post, I’m happy to know someone used up some of that extremely precious resource—time—to read what I wrote.

I like to think that my writing has improved over the years. If not that, then at least it now takes less time to write at that very same level.

 

Some caveats

That much for the good side that I could come up with. What about the negative side? Mainly, it does take up quite a lot of time to write up the posts. Writing one evening, reading and revising it the next day, and all the layouting of the post can take up several hours. Those hours could have been spent differently.

You won’t know upfront which posts will be ‘popular’; some posts that I liked aren’t popular at all, yet some that I thought were minor, are. I haven’t deciphered a correlation between effort put in to write a post and its popularity either. I write what I fancy writing about and then hope for the best. Looking at the statistics of page visits, popular-science posts have many, many more visits than posts about my research.

Even when trying to be polite when writing, reading some of the posts years hence, some things seem to be formulated harsher than intended. There is the danger to piss off someone, and therewith feeding the rumour that blogs may be harmful.

Related to the latter, is that there are some things I wanted, or even craved, to write about, but couldn’t due to the decision to have a non-anonymous blog. That said, even anonymous blogs can be ‘revealed’ (e.g., fsp), and some things just fester better through the grapevine.

You may have trolls. I did have them. This depended on the topic (such as positive posts about Cuba), but may also be immature students or a colleague with an axe (of the xenophobic, racist, and/or sexist type) to grind. Ignore them.

 

Final remarks

Does doing this blogging actually have any impact on the level of my ‘popularity’ or ‘standing’ (good or bad) in the research community? I don’t have the faintest idea.

Will I go on for another 10 years? I don’t know. For now, I still try to write at least 2 posts per month. I hope you stay with me, but I also know that interests change, so I will not hold abandoning keetblog against you J. In fact, I am grateful you have taken, take, and/or will take the time to read the blog, and I hope you consider it time well spent, not wasted, or that it may have been effective for your structured procrastination.

8 years of keetblog

The 8-year anniversary swooshed by a few days ago, but, actually it’s really only completing today, as the first blog post with real content was published on April 18, 2006, about solving sudokus with constraint programming.

The top-post among the 186 posts (>9000 visits to that page alone) is still the introduction for two lectures on top-down and bottom-up ontology development that I wrote in November 2009 as part of the Semantic Web technologies MSc course at the Free University of Bolzano; anyone wishing to read an updated version: have a look at the 2014 lecture notes (its ‘Block II’). The post most commented on is about academia.edu, and then on my wish for a semantic search of insects.

The more ‘trivia’/fun ones—still having to do with science—are, I think, about the complexity of coffee and culinary evolution, but I may be biased (my first degree up to MSc was in food science). For some reason, there were more visitors reading about failing to recognize your own incompetence and some sneakiness of academia.edu than about food (and many other topics). Ah, well. A full list sorted by year is available on the list of blog posts page.

The frequency of posting is somewhat less than a few years ago and, consequently, the visits went down from about 1500/month during its heydays [well, years] to about 1000/month now, but that’s still not bad at all for a ‘dull’ blog, and I would like to thank you again, and even more so the fans (subscribers) and those of you who have taken the effort to like a post or to leave comments both online and offline! I hope it’s been an interesting read, or else enjoyable procrastination.

Preliminary list of isiZulu terms for computing and computer literacy

As part of the COMMUTERM project, we played around with isiZulu terminology development using “the” “traditional” way of terminology development (frankly, having read up on it, I don’t think there is an established methodology), which were interesting of themselves already.

We have gathered relevant computing and computing literacy terms from extant resources, conducted a workshop with relative experts (typical way of doing it), executed two online surveys through an isiZulu-localised version of Limesurvey, and completed a voting experiment among computer literacy students. The results and analysis has been written up for a paper, but this will take some time to see the light of day (if it is accepted, that is). In the meantime, we do not want to ‘sit’ on the list that we have compiled: so far, there are 233 isiZulu terms from 8 resources for 146 entities. At the time of writing, this is the largest list of entities with isiZulu terms for the domain of computing and computer literacy.

The list is available in table format, sorted alphabetically by English term and sorted alphabetically by isiZulu term. Except from a few (very) glaring mistakes/typos, the list has not been curated in any way, so you have to use your own judgment. In fact, I don’t care which terms you’d prefer—I’m facilitating, not dictating.

Besides that you can leave a comment to this post or send me an email if you have updates you’d like to share, there are other ways to share your knowledge of isiZulu computing and computer literacy terminology with the COMMUTERM project and/or the world, being, among others:

  • Contributing to the Limesurvey localization for isiZulu, so that not only the text in two existing surveys will be entirely in isiZulu, but also any survey and the back-end admin. Members of the African Languages department at UKZN are especially interested in this so that they will be able to use it for their research.
  • The computer literacy surveys are still open (100% isiZulu interface), so you can still choose to do either this one or that one (but not both).
  • Participate in the crowdsourcing game ([link TBA]), which will be launched in February, given that it is still summer holidays for the students at present.

2012 in review (WP blog stats summary)

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog (see link below). The amount of visits is still impressive for the kind of blog this is, and feedback is on the increase. Hereby a big thank you to you all for visiting my blog and taking the effort to respond (both visibly online as well as the offline comments I received)!

I wrote fewer posts in 2012 than in the previous two years, but I do have the intention to stick to the ‘at least 2 posts/month’ frequency for 2013.

 

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 16,000 views in 2012. If each view were a film, this blog would power 4 Film Festivals

Click here to see the complete report.

My snapshots for why I do what I do

A type of conversation that occurs not infrequently goes alike:

  • Other person: “why are you here?”
  • Me: Uh?
  • Other person: “I mean, work at the university. You can earn so much more money when working in industry.”
  • Me: Ahh. Well, I have worked in industry for 3.5 years. It was fine for a while, but not enough…

Then I fill in the dots to a greater or lesser extent, depending on the occasion. Related to answering such questions is Anthony Finkelstein’s “why I do what I do” blogpost: it consists of snapshots of positive aspects and events that made him feel it makes it all worthwhile being a professor in software engineering, which is a nice idea to give small hints toward answering it. Here I compiled some of my ‘snapshots’ of positive aspects, pleasant events, and encouraging feedback that have occurred that make me enjoy my job more than to give into a latent thirst for money and possessions and go back to industry (but note that I reserve the right to change my mind again). In random order:

The excitement when you’re the first person in the whole world who solves some particular problem or discovers something hitherto unknown.

After having covered topics like relational algebra, SQL, and distributed databases in the lectures, a student comments, baffled, “I thought databases was just about playing a bit with MS Access, but there’s so much more to it. It’s really amazing!”

I got to see the Sydney Opera House—wanting to see it since I saw a slide of it in my last year of high school during art classes—right before presenting my paper at a top-ranked conference, and the university paid for the trip to the other end of the world.

“We are pleased to inform you that you paper “xxx” has been accepted for …”

I stumbled upon a paper related to my PhD thesis, stating they use my theory to solve the problem they had.

A fourth-year student emailed me at the end of the course that he’s impressed that I’m a caring lecturer also going beyond what I have to do, and that he has yet to meet someone like me.

Socializing with colleagues from different disciplines, and brainstorming about joining forces to research and devise solutions to fix the major problems in the world.

I traveled to Cuba to, upon invitation, teach a course in my research area to well-prepared and motivated students who were eager to learn. And an extension one of the course’s projects even resulted in a joint paper.

A paper cites one of my papers as if it is the default/standard paper to cite on that topic.

Free access to most of the primary sources of scientific information regardless the discipline.

I can investigate issues that I fancy looking into, and even can earn a living with it.

Seeing students surpassing their own expectations and becoming aware of the capabilities they didn’t think themselves they had but actually do have.

Meeting up with colleagues and having stimulating conversations about pressing problems and known unknowns in our oh-so-relevant sub-sub-sub-field of our discipline, alternated with pub talk on the ‘tales from the trenches’ and nerdy trivia.

I know what the box is made of, what it does, and can make it compute what it should compute.

I travel to different countries and meet many people from all over the world, reconfirming time and again we are all very human, and live in and share this world together.

Thanks and Best Wishes for 2012

Many thanks to all of you dear readers for reading my blog and especially those who took the effort to leave comments, comments on comments, and provided off-line feedback. I hope you have found it was time well spent, or else enjoyable procrastination. According to the WordPress summary of 2011, the most visited new post was the one on the Essay on the non-violent personality, and there have been some 14000 visitors in 2011.

Given that it is that time of the year again for a little public reflection, my 2011 was, on the whole, very positive, with the move to South Africa and having commenced as Senior Lecturer at the University of KwaZulu-Natal. In addition to the usual productive research and teaching and community services, I’ve been catching up on (South) African politics, history, and socioeconomics, started learning isiZulu, and trying to get to understand this complex society here in SA.

As for the blog, the amount of posts is lower compared to previous years (though with the topics just as varied), which is due to all those other activities and because I am writing lecture notes for this year’s ontologies and knowledge bases course, which otherwise would have made it piecemeal onto the blog (like these ones on ontology engineering). The deadline for those lecture notes is mid January, so stay tuned for upcoming updates.

I wish you all a happy, productive, and prosperous New Year! 

I’m counted

South Africa is conducting its once-in-10-years census from 9 to 31 October 2011. Today, when I was walking home from the supermarket, two census people walked in the street where I live looking for people to count, but seemingly not having much luck as few people were at home or unwilling. (Regarding the latter, Hayibo already has been poking fun at the news updates on Census 2011.) They were fine with questioning me on the street as they missed me earlier and it saved them walking back another 200m in the hot sun or repeat visits in the hope I’d be home.

First, there were the usual questions, like name, age, marital status, country and province of birth, since when I live in South Africa, and so forth. And then the census person kind of anticipated my response on what my first language is. “English”. “No”. “You sound as if it is”. “Oh. It really isn’t”. So, ‘other’ was ticked, and English as second language. I protested slightly, as the first words and coherent sentence I uttered in another language than my first were in French, and then German, and then English, and then Spanish, and then Italian, and then Zulu. The form did not cater for that. My mind started wandering off to database design and accuracy of the data. Ah, well.

Then there was the race question. Not that I have figured it out how it works here, and after this event, even less so. For instance, some students who look definitely Mediterranean to me, are proudly Indian, and some people who have a pale skin complexion assert vehemently they are Coloured. So I thought I’d better not bother start trying to box anyone (including myself). But the question had to be answered. The census person read aloud the question: “Are you Black, Indian, or Coloured?”. “Uh, huh?”, turning my head to see the question on the sheet, which had five possible answers. Again, “Are you Black, Indian, or Coloured?”. “Uhm, I’m from Europe. European?”. “Ok, ‘other’”, which was ticked off, and that’s fine by me. Somehow, ‘White’, whilst being in the list, was, to her, not an option worth mentioning and considering to tick off, as apparently I am clearly not White. South Africans with as much of a melanin deficiency as me start their phrases every now and then with “we Europeans…[fill in anything that doesn’t hold for all Europeans]”; are they the Real WhitesTM? And, by converse, I am a Real EuropeanTM, who is then, by definition, not White? Confusing.

The remaining questions were fairly standard, or sensible to ask in a country like South Africa (e.g., in the Netherlands, they would not ask whether I have piped water and am connected to the electricity grid; here, many still have to make do without). I am still wondering about the whole list of equipment though. That the census wants to know whether I have radio, TV, and Internet access at home makes sense in the light of information dissemination, but what’s so useful about knowing who has a DVD player? In the light of COP17 next month, it would have been nice if they had included ‘bicycle’ in the list, instead of only ‘motorcar’. There was no question about how one travels to work and how long it takes, although the answers could have been useful in the planning of the country’s infrastructure.

I got two barcode-stickers at the end: one for the door and one for my passport. The first one acts alike the ‘no Jehovahs, Evangelists, door-to-door salesmen etc.’ stickers one can observe on several front doors in some European countries, the second one for cross-checking that I will not be counted twice or not at all. It’ll be interesting to see what the statisticians are going to do with all the data.

Questionable search terms for my blog

WordPress provides a range of blog statistics, including which search terms people used to arrive on my blog. Over the years, I have seen sensible, or at least explainable, search terms, and a bunch of funny or plain weird ones. Regarding the latter, it clearly demonstrates limitations of string-based and statistical methods for web searches, and to some extent that Internet users could do with some training on how to search for information.

The top searches of the past 5 years and >100 or >>100 times used are: ontology, keet, aardappeleters, parallel processing operating system, ontologies, and philosophy of computer science, and then there are often recurring strings that are quite similar but count as different hits, mainly about women’s achievements, failing to recognize one’s incompetence, granularity, and [computer science/ontology] with [medicine/ecology/philosophy/biology]. This is understandable given the topics I have blogged about.

More interesting from a computing perspective are those that are sort of, or even plain, wrong—and their reasons why. The remainder of the post is devoted to a selection of the more curious ones that I collected intermittently over the past 2 months (in italics), and added comments to several of them (in plain text). They are divided into “search engines are not oracles”, “what were they thinking?”, “curious”, “plain wrong”, and “miscellanea”.

Search engines are not oracles!

  • should i be a scientist or an engineer
  • what should be done with the outcome of assessment and how to use the outcome of assessments. The announcement of my ESWC2011 paper comes up, but is unlikely to give the user the answer they were looking for (there aren’t that many people interested in experiments with foundational ontologies).
  • how useful is philosophy in computer science. This post on what philosophers say about computer science turns up when I searched for it, which does not deal with the usefulness, let alone the amount of usefulness of philosophy in computer science. The next search string is a bit more sensible in general and with respect to the blog post’s content:
  • is computer science a science by different philosophers
  • reasons for wildlife ontology development.  There are posts about the African Wildlife tutorial ontology and the IJMSO paper that has a list of reasons for developing an ontology, but they have not been put together to give you reasons for developing a wildlife ontology.
  • ecology lessons good? The post on ontologies for ecology turns up, not in the least bit answering the question—those authors learned valuable lessons using ontologies in ecology research.
  • do i read too much? and can you read too much. This post is on the first page of results where I explore of one can read ‘too much’, only slightly more skewed toward ‘answering’ the second search string than the first.

What were they thinking?

  • writers who do not read
  • too much work blog
  • undergraduate computer science research least publishable unit.  Since when do undergrads care about LPUs?
  • useful typology. The typology of bureaucracies turns up in my Google search results; if it is a useful one remains to be seen.
  • random structure of website. My blog was not on the first 5 pages of Google when I searched (but it is by now known that Google customizes the search results).
  • response to the dirty war. Which dirty war would that be? There are three posts on the response to the dirty war *index* that I have my opinion about (here, here, and here).
  • computational food. Perhaps the user was thinking about computation with data about food? The only one that might fit, sort of, is the post about culinary evolution. There are interesting hits on the first Google page, though, such as about computational models of microwave food processing and computational food engineering.
  • notify me if someone searches for me on google

Curious search terms, but somewhat understandable

  • non violent essay. An essay itself is never violent; there’s a post on the non-violent personality though.
  • incompetence blog. Uhmm… I fancy thinking this is not a blog about incompetence. There is a post about the Keller-Dungan effect (on being incompetent and unaware of it).
  • incompetence not realize
  • methontology ping pong. Googling for it, this post comes up, of which it is unlikely that it served the user, because it covers realism-based ontologies and methodologies (such as methonotology) that has a blog comment lamenting the “self contained ping pong matches among academics”.

Plain wrong hit

  • anatomical structure of an owl. This is a nice example of the limitations of string-based and statistical approaches compared to semantic searches.
  • salami techniques in information system. I googled it again, and my blog does not appear on the first 5 pages, and there is no post even remotely close to the search term.
  • slinging techniques. It is not on the first 5 pages of Google when I searched, and there is nothing about slinging techniques on any of the blog posts.

Miscellanea

  • ponder ontology. It appears that ponder is an object-oriented language to describe policies; I write about ontologies and do ponder about things, but have not put them together.
  • granular book. I did announce a book about granular computing, but not about books that may be granular.
  • ontologies funny photos. Are there funny photos of/about ontologies?
  • 4. dimension

The problem of listing these odd ones is that the search algorithms will not change in the very near future, and thus that, due to this post, more people will be misdirected to my blog. But perhaps this manually assessed list of odd search terms might, some time, help in improving the algorithms and summarizing the content the links point to.