Any semantic search for insects?

The draft of this post started with an example of a creepy insect living in Italy and, well, across the world in those locations where hygiene is not taken too seriously. But I will leave that be, so you can have a good night’s rest. Instead, I will take the example of an insect of which I still do not know what it is—it may still turn out to be a creepy one, but now I do have photos of it and it is living well away outside in Ineke’s poly-tunnel near Limerick, Ireland. The problem is this: neither Ineke, nor Heidi nor I know what it is, but really still want to know. How to get the answer, i.e., how to find the species name of the specimen? I’ve tried several strategies: the ones that are practically possible did not do the job and the one that would does not exist. I’ll go through them in the remainder of the post and close with a few questions on what the most feasible strategy would/should/could be to eventually have a decent entomology [ornithology/nematology/etc.] knowledge base.


Specimen viewed from the top; can anyone ID this specimen?

Basic searches

Neither one of us who were present at teatime in Ineke’s polytunnel where we observed the insect, is an entomologist nor do we have entomologist-friends. The famous ‘bug man’ Ruud Kleinpaste is a fellow alumnus of Wageningen University, but we did not study there around the same time and I could not find an email address to bother him asking to ID a specimen. Neither one of us has an insect handbook either and even if we had, I, for one, would not want to flick through it when there is a perceived need to find the species of a specimen: flicking through the insect-book (and plant-book, etc) was an entertaining pastime activity when I was young, like reading the encyclopaedia and doing the dictionary game, but in this day and age, I would have wanted to use the computer to find the answer. This is theoretically feasible, but—as far as I am aware of—not yet in practice.

To do image matching, I would need a very large data set and of the data set, to know which image fits with which species name, which I do not have; so the machine learning strategy will not work. There is an online browseable BugGuide for the US and Canada with lots of pictures that I clicked through for a while, but without finding the right picture. There are entomology databases that let me search by species name (here, here, and here), but not by properties of the insect; KONCHUR has a fancier search mechanisms but covers insects in Japan, East Asia and the Pacific only (“orange leg AND black body” did not return any results).

Sure, I did a Google search on “image of a black insect with orange legs and stingy back”, hoping that someone else already has uploaded an image of another specimen of the same type of insect, annotated it with the same or very similar terms, and that someone has made the next step to add the name of the species as well, i.e., who is not looking around for the answer like me but has the right knowledge of insects. With the many search phrases and pagerank algorithm that Google relies upon in devising the search results [1], something might turn up; however, the actual results were unhelpful. Other people had similar requests without an answer, the body colours swapped twice (orange bug with black legs), many unrelated insects where one of them has orange legs (Ichneumon wasp (Rhyssa persuasoria)) but its legs are only partially orange, it has white dots on its body, and the back is not as stingy as our specimen (see picture of the Ichneumon wasp), or utterly irrelevant land and sea images. That’s about it for the first page of the Google query answer.

Semantic searches

Now, if there was a proper ontology of insects, and I mean not a bare taxonomic tree but one where the classes have properties and those properties have their ranges defined as well, then it would be a simple exercise of selecting the properties along the line of

adult insect
  AND length 2cm
  AND colour black
  AND has wingtype transparent
  [*AND body shape similar to a wasp*]
  AND leg colour orange
  AND rear body stingy
  AND location at least west Ireland

so that the reasoner (FaCT++, Pellet, and the like) would classify it near-instantly, or if the ontology were to be really large, then still within an hour or so (ignoring for a moment the [*AND body shape similar to a wasp*] because that requires a bit more work). It would be even funkier if that ontology were linked to a database of images of insects to cross-check it with the visuals. Even more so when such a database also were to have information about its habitat with feeding habits, principal role in the food web, and any diseases it may cause or transmit. Then one would also be able to start the search from another direction along the line of “give me all the insects that live in the west of Ireland” as a first step to narrow down the possible answers.

Aside from the instance classification problem of this particular specimen, the question arises if it would it be up to

a) Google to work on their technology so as to be able to get the answer for me?

b) Entomologists to develop their domain ontology about insects and link it to some database with pictures and additional textual information to have indeed a properly searchable knowledge base?

c) Volunteer labour, like me having taken pictures and annotated each one with the physical characteristics, location, time of observation, etc. and categorise it as “bug” or “insect” or “insekt” or “insetto” to eventually have a grass-roots bugbase (that likely will have some imperfections with gaps in data fields and sloppy terminology)?

d) Everyone to buy insect books?

e) …?

Shelving option d, I am explicitly looking for a computational option, i.e., a, b, c, or e. I prefer a web-accessible version of option b, which can be done with scalable Semantic Web technologies; one only needs to find the money, time, and people to realise it.

Although I gave the example here with insects, the same story can be made for birds, worms, and so forth. When such searchable knowledge bases exist, it will not only save time for many lay people looking up the information and learning more about the flora and fauna around them, but I can imagine it will also make research a lot easier for interdisciplinary scientists who have to forage into knowledge of insects [/birds/worms/etc] as well as the entomologists [/ornithologists/nematologists/etc.] themselves.

specimen from the side

specimen from the side


[1] Alon Halevy, Peter Norvig, and Fernando Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, March/April 2009, 8-12.


15 responses to “Any semantic search for insects?

  1. thanks for the link; I was not aware of this service that “is [a] human powered identification of anything or anyone.”. It won’t help getting the semantic search going, but at least the chances of ID-ing the specimen has increased :). I’ve posted the photo there just now (here), and will update the blogpost here once it’s identified.

    • The current best guess at id-this is of the family of Mud Dauber wasps, i.e., Sphecidae. There are many of them in the family, but I have not found a picture that matches the specimen above yet.

  2. I have had a very similar wish for a long time now. I think the first time I wrote about it was back here. Since I finished my PhD and started my new career as a wannabe startup founder / sys admin /webmaster / unemployed father to be, I’ve been thinking a lot about building an iPhone app to do pretty much exactly what you describe. Much of the data could be assembled and the reasoning required (to make something useful isn’t fancy at all. There is already a great program called iBird for bird identification. I wish I wrote it ;). My thoughts were to bring all the data into Freebase and let them handle my reasoning cpu cycles…

    • yes, me too, but I never bothered writing it up until I had a nagging example for it.
      I checked out the iBird, which is funky for being portable. It is based on the online WhatBird version and is for birds of North America only, but at least they let you browse by system-provided attributes instead of by species alone. Something like that for insects would have been a great help already.
      I agree it can be done (with or without Semantic Web technolgoies); there are only the mere tiny little hurdles of finances, time, and human commitment to overcome…

  3. I’m trying to put together a knowledge database of bugs and insects as we speak. The site is in its infancy but if you’d like to check it out it is at

    It has 3d views of each bug. Using flash, you can turn each little critter to see a 360 degree view.

    I’ve gone to Google, Web Tree of Life project, and wikipedia and they wouldn’t even return any emails so I started in on my own.

    My site isn’t listed on google yet and probably won’t be until I can drum up funding for it’s own domain name. (I work as a wedding photographer and would hate to lose business because my bug page happens to pop up first).

    • The photos are beautiful and the 360 degrees turing feature is really nice! It sure would be valuable to have those pictures in an entomology knowledge base.

  4. The reason I mentioned Vince is that he understands databases, is interested in bugs, and runs the digital version of a massive natural history museum. They were talking about using Photosynth to rapidly assemble 360 degree views like you are talking about for millions of bugs in the museum collection.

  5. admitted, I haven’t read all te material on his extensive blog. I got sidetracked in a post about the ‘post-taxonomist era‘ that we are apparently in (according to him and Rod Page’s presentation he discusses), its problems and the notion of pro-amateur communities online. Maybe instances like Pierre’s spider and my wasp (??) gives some motivation to keep going with the taxonomy efforts.

  6. Pingback: 8 years of keetblog | Keet blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.