Google searches, sneaky Academia.edu, and data duplication

Aside from this blog and Facebook, I recently signed up for Academia.edu, a Web 2.0ish site where researchers can connect and follow each other academically. It even was so ‘smart’ that it could tell me who of my FB friends were already in the system without me giving explicitly the information about my FB account, and it found most of my papers for me (with some noise, though). Setting aside the uncomfortable former aspect, the ‘finding and handling my papers for me’ is actually really sneaky. I’ll spend the remainder of this post on that, just so you know what you’ll be letting yourself into just in case you sign up for it.

The first annoying thing is, that if you let Academia.edu collect your papers automatically when you build up your profile there, which it seemingly does ‘intelligently’, it snatches your papers and either takes them from citeseer or puts them on scribd, even though they are all on my and the publisher’s websites, too. And then there’s some noise; e.g., it links to a pdf of the presentation you did at the conference instead of the paper itself. Also, it does not provide full publication details (just the title), even though Academia.edu easily could be programmed to screen scrape that from any researcher’s website, or, better, ask me for a bib file.

And then there’s the real catch: when someone now searches for your papers, the Academia.edu URL to their version of the paper comes up higher in the Google ranking than either yours or the publisher’s. ‘Thanks’ to Academia.edu’s services, I now know with which Google search terms they clicked to retrieve which paper. So when on March 20 someone from an unknown country at 04:50am local time searched for “data information granularity bioinformatics”, it found as the very first Google hit—yes, the rank is given in the stats as well—the slides of my PhD defense on scribd, not my thesis on my website (that it should have done); and even if s/he wanted to have download it, upon clicking the “Download” button, it complains that “You must be logged in to download” (!). The slides were probably not what s/he was after (idem ditto for the visitor from Poland, who searched for this). There are many such misdirected instances. In general, essentially they would have had to do the search again to get the publication data and, in case of the linking of a wrong file, searching for the right file. It is correctable on Academia.edu—manually. Subsequently adding new papers is also a manual process with an impractical GUI.

And then I have not said anything yet about the scruffy page rendering by scribd. Besides, I never gave scribd approval to offer my work on/through their site (there are more malfunctioning aggregators, which is an issue of its own). In addition, it would not surprise me if that would violate the delicately balanced copyright arrangements that exist for CS publications. UPDATE: the terms and conditions (d.d. 21-4-2011) says that “By displaying or publishing (“posting”) any Content on or through the Academia.edu Services, you hereby grant to Academia.edu a limited license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content solely on and through the Academia.edu Services.”. That does violate the delicately balanced copyright arrangements that exist for CS publications. The terms & conditions also say it is the Member’s own responsibility to get the approval from the respective publishers/copyright holders.

Moreover, there is a preposterous “message” on the right-hand side of the search statistics: “Tip: To make your page appear higher up on Google: Link to your Academia.edu page from your department website Upload more documents – papers, talks and a CV”. But my own site was higher up in the Google ranking before I signed up to your devious service! Honestly, I want to lure people to my site when they are interested in my contributions, not to a place where there is partial information badly duplicated. Ok, this smells of ego-tripping, but my site is worth almost $17K according to WebsValue.com and has a page rank of 5; if I run out of money, I’ll have an asset to sell fairly easily without much disruption. Seriously though, this ‘rerouting’ of visitors away from the source toward some obscure other location on the Internet is obviously a more important issue at the institutional level. No sane university or research institute would want to have as policy to redirect visitors to any other site than their own when it comes to displaying the scientific impact its employees have made. If one is at a non-indexable institute that only happens to carry the title ‘university’ bit is not in substance, then perhaps Academia.edu helps with your visibility. But I am not at such an institute; UKZN is one of the 5 top research-intensive universities in South Africa.

So what can you do? Remove your papers on Academia.edu. This I did this morning, one by one. The sad thing is that “following” other researchers is, in theory at least, an easy way to be notified automatically of their new publications compared to manually checking their homepages regularly, but this is precisely that which Academia.edu manages to mess up, badly.

I can envision a couple of mean scenarios why anyone would have wanted to set up the site in the way it is, like that they first pollute Google rankings and then ask for a fee in the near future (after all, they already require you to login to download the file, and a “fee” item is included in the terms & conditions file). The statistics they are gathering on who-follows-who gives a better insight in research networks and its leaders than the more common citation-network analyses. Finding out which scientific papers and topics are ‘hot’ must be valuable material as well, and become perhaps just as important as the rather imprecise ISI impact factor that is quite useless for CS at the moment. You also could use the data for NLP and semantic annotations to, in near future, offer indispensible academic semantic search facilities (at a price). And no doubt there are more scenarios.

In short, that was the end of that “Web 2.0” experiment for me.

(p.s.: just in case someone wants to see some proof: I did make several screenshots that I can share)

About these ads

10 responses to this post.

  1. Posted by cattlehill on February 29, 2012 at 1:00 AM

    I spotted your blog via google. Search terms: “academia.edu copyright”. I agree with your statements and removed my research-papers, too.
    The benefits are small and the GUI is still poor of this site.
    However, if you are a “mobile element” in science and change institutions after funding ended you may be difficult to track. academia.edu gives you the opportunity to create a universal personal link-hub to your current position.
    Again, I share your opinion to choose the information content wisely, presented on that site. This should be common sense anyway.

    Reply

  2. Hi cattelhill,

    If the
    “if you are a “mobile element” in science and change institutions after funding ended you may be difficult to track.”
    would be the only reason, then that can be easily fixed by other means: get a domain name and set up your own website, be it separately hosted on a friend’s server like I have (at http://www.meteck.org), or use the department’s users homepage directory and put a redirect from the chosen domain name to the temporary url for your department’s users home page directory.
    Regards,
    Maria

    Reply

  3. I also found your post through googling “academia.edu pdf copyright”. Their pdf handling is pretty sneaky indeed. And – of course – they have their terms covering everything, but it is still a very bad practice, aimed at maximizing traffic. I prefer Mendeley.com as a 2.0 academia network. They ask permission for everything, and you don’t have the feeling they are being unethical.

    Reply

  4. Hi,

    Thanks for the post. I agree. Initially, I was impressed by how Academia.edu found many of my publications. Unfortunately, there were duplicates and because I didn’t want ‘my’ profile looking messy, I felt I had to dedicate some time to go through each citation and add the missing detail/delete duplicates.

    Best wishes,
    Pete
    @threeprisoners

    Reply

    • impressed/surprised, yes. But Google Scholar found them (almost) all, too–and still does–automatically, which much less noise and more data. i.e., GS is better than academia.edu when it comes to finding papers. Also, at least GS shows link also to the openly accessible copies without any modification or new access restriction as scribd does (the links I checked on my GS page were to the publisher’s website and to copies on citeseer, a co-author’s homepage, workshop’s website etc., or to to my homepage)

      Reply

  5. Wow, that’s what I was searching for, what a stuff! existing here at this blog, thanks admin of this website.

    Reply

  6. Posted by neri on February 26, 2013 at 2:09 AM

    Dear Keet,

    I found your article insightful, thank you for sharing your opinions.
    I am a graduate student and I have found Academia.edu’s service excellent so far, so that I was surprised by your article and I decided to verify your statements.
    I found out that what you say about the terms and conditions (T&C) is indeed literally true, but I believe that it may mislead the reader. This is the reason I decided to post this comment.

    First, you omitted that T&C also states that:

    “Academia.edu *does not claim any ownership rights* in the Content that you post to the Academia.edu Services”

    and that

    “you *continue to retain all ownership rights in such Content*, and you continue to have the right to use your Content in any way you choose”.

    This seems pretty fair.
    Moreover, even if the T&C states that you give Academia.edu a “limited license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content”, this is true “solely on and through the Academia.edu Services”.

    This means that you authorize Academia to show your papers on Academia.edu (and only on Academia.edu), to let people download them from A.edu, to visualize them on their PC, print them, read them, etcetera. This seems the point of sharing a paper on Academia. So far so good!

    The right to “modify” seems more sneaky. Nevertheless, I think this may be determined by Academia’s need to convert some file formats, for instance powerpoint to pdf. (another remark here: you forgot to make an update about Academia’s policy of uploading paper on scribd – it ended several months ago). I have never heard of any Academia.edu’s paper being modified by Academia.edu’s in its content, and I believe it would not be in Academia.edu’s staff interest to do so – remember that they do not have any ownership rights, and the putative “modified” copy would still be yours (and useless – therefore unlikely to exist).

    Lastly, the T&C states that the license is limited. This means that whenever you want you can revoke Academia’s rights on your paper. This seems a pretty good way to solve any problem with Academia’s service, if you find out something isn’t going the way you expected (even if I don’t understand what you fear Academia would do with your paper, if I have understood the T&C correctly so far).

    To conclude, I understand that Academia.edu’s high ranking on google might be annoying for a well-known professor – but this is also a great advantage for the “small fishes” like me, and it may represent a stimulus to a (virtually) democratic academic agora (I’m not saying this is necessarily positive – I am only suggesting a less dysphoric interpretation of this feature of Academia.edu).

    Please notify me if you believe I somehow misunderstood Academia.edu’s T&C, given that I am neither an native English speaker nor a law student.

    Best,

    Neri

    Reply

  7. Dear Neri,
    Thank you for your extensive reply.
    Please note that the blogpost was written about 2 years ago, taking into account the then active T&C and how the site’s features were working then. Given that situation, I wrote the blog post and I was sufficiently disappointed to leave it aside indefinitely.
    If it has changed for the better, fine, as the idea of automatically following other researchers itself is good. But the bad first experiences don’t make jump on board now, as it may well revert back, and, still, as for myself, I’d rather direct people searching for me to my homepage than to Academia.edu.
    Regards,
    Maria

    Reply

  8. Pretty nice post. I just stumbled upon your weblog
    and wished to say that I’ve truly enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I
    hope you write again soon!

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: