Google searches, sneaky Academia.edu, and data duplication

Aside from this blog and Facebook, I recently signed up for Academia.edu, a Web 2.0ish site where researchers can connect and follow each other academically. It even was so ‘smart’ that it could tell me who of my FB friends were already in the system without me giving explicitly the information about my FB account, and it found most of my papers for me (with some noise, though). Setting aside the uncomfortable former aspect, the ‘finding and handling my papers for me’ is actually really sneaky. I’ll spend the remainder of this post on that, just so you know what you’ll be letting yourself into just in case you sign up for it.

The first annoying thing is, that if you let Academia.edu collect your papers automatically when you build up your profile there, which it seemingly does ‘intelligently’, it snatches your papers and either takes them from citeseer or puts them on scribd, even though they are all on my and the publisher’s websites, too. And then there’s some noise; e.g., it links to a pdf of the presentation you did at the conference instead of the paper itself. Also, it does not provide full publication details (just the title), even though Academia.edu easily could be programmed to screen scrape that from any researcher’s website, or, better, ask me for a bib file.

And then there’s the real catch: when someone now searches for your papers, the Academia.edu URL to their version of the paper comes up higher in the Google ranking than either yours or the publisher’s. ‘Thanks’ to Academia.edu’s services, I now know with which Google search terms they clicked to retrieve which paper. So when on March 20 someone from an unknown country at 04:50am local time searched for “data information granularity bioinformatics”, it found as the very first Google hit—yes, the rank is given in the stats as well—the slides of my PhD defense on scribd, not my thesis on my website (that it should have done); and even if s/he wanted to have download it, upon clicking the “Download” button, it complains that “You must be logged in to download” (!). The slides were probably not what s/he was after (idem ditto for the visitor from Poland, who searched for this). There are many such misdirected instances. In general, essentially they would have had to do the search again to get the publication data and, in case of the linking of a wrong file, searching for the right file. It is correctable on Academia.edu—manually. Subsequently adding new papers is also a manual process with an impractical GUI.

And then I have not said anything yet about the scruffy page rendering by scribd. Besides, I never gave scribd approval to offer my work on/through their site (there are more malfunctioning aggregators, which is an issue of its own). In addition, it would not surprise me if that would violate the delicately balanced copyright arrangements that exist for CS publications. UPDATE: the terms and conditions (d.d. 21-4-2011) says that “By displaying or publishing (“posting”) any Content on or through the Academia.edu Services, you hereby grant to Academia.edu a limited license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content solely on and through the Academia.edu Services.”. That does violate the delicately balanced copyright arrangements that exist for CS publications. The terms & conditions also say it is the Member’s own responsibility to get the approval from the respective publishers/copyright holders.

Moreover, there is a preposterous “message” on the right-hand side of the search statistics: “Tip: To make your page appear higher up on Google: Link to your Academia.edu page from your department website Upload more documents – papers, talks and a CV”. But my own site was higher up in the Google ranking before I signed up to your devious service! Honestly, I want to lure people to my site when they are interested in my contributions, not to a place where there is partial information badly duplicated. Ok, this smells of ego-tripping, but my site is worth almost $17K according to WebsValue.com and has a page rank of 5; if I run out of money, I’ll have an asset to sell fairly easily without much disruption. Seriously though, this ‘rerouting’ of visitors away from the source toward some obscure other location on the Internet is obviously a more important issue at the institutional level. No sane university or research institute would want to have as policy to redirect visitors to any other site than their own when it comes to displaying the scientific impact its employees have made. If one is at a non-indexable institute that only happens to carry the title ‘university’ bit is not in substance, then perhaps Academia.edu helps with your visibility. But I am not at such an institute; UKZN is one of the 5 top research-intensive universities in South Africa.

So what can you do? Remove your papers on Academia.edu. This I did this morning, one by one. The sad thing is that “following” other researchers is, in theory at least, an easy way to be notified automatically of their new publications compared to manually checking their homepages regularly, but this is precisely that which Academia.edu manages to mess up, badly.

I can envision a couple of mean scenarios why anyone would have wanted to set up the site in the way it is, like that they first pollute Google rankings and then ask for a fee in the near future (after all, they already require you to login to download the file, and a “fee” item is included in the terms & conditions file). The statistics they are gathering on who-follows-who gives a better insight in research networks and its leaders than the more common citation-network analyses. Finding out which scientific papers and topics are ‘hot’ must be valuable material as well, and become perhaps just as important as the rather imprecise ISI impact factor that is quite useless for CS at the moment. You also could use the data for NLP and semantic annotations to, in near future, offer indispensible academic semantic search facilities (at a price). And no doubt there are more scenarios.

In short, that was the end of that “Web 2.0” experiment for me.

(p.s.: just in case someone wants to see some proof: I did make several screenshots that I can share)

Advertisements

34 responses to “Google searches, sneaky Academia.edu, and data duplication

  1. I spotted your blog via google. Search terms: “academia.edu copyright”. I agree with your statements and removed my research-papers, too.
    The benefits are small and the GUI is still poor of this site.
    However, if you are a “mobile element” in science and change institutions after funding ended you may be difficult to track. academia.edu gives you the opportunity to create a universal personal link-hub to your current position.
    Again, I share your opinion to choose the information content wisely, presented on that site. This should be common sense anyway.

  2. Hi cattelhill,

    If the
    “if you are a “mobile element” in science and change institutions after funding ended you may be difficult to track.”
    would be the only reason, then that can be easily fixed by other means: get a domain name and set up your own website, be it separately hosted on a friend’s server like I have (at http://www.meteck.org), or use the department’s users homepage directory and put a redirect from the chosen domain name to the temporary url for your department’s users home page directory.
    Regards,
    Maria

  3. I also found your post through googling “academia.edu pdf copyright”. Their pdf handling is pretty sneaky indeed. And – of course – they have their terms covering everything, but it is still a very bad practice, aimed at maximizing traffic. I prefer Mendeley.com as a 2.0 academia network. They ask permission for everything, and you don’t have the feeling they are being unethical.

  4. Hi,

    Thanks for the post. I agree. Initially, I was impressed by how Academia.edu found many of my publications. Unfortunately, there were duplicates and because I didn’t want ‘my’ profile looking messy, I felt I had to dedicate some time to go through each citation and add the missing detail/delete duplicates.

    Best wishes,
    Pete
    @threeprisoners

    • impressed/surprised, yes. But Google Scholar found them (almost) all, too–and still does–automatically, which much less noise and more data. i.e., GS is better than academia.edu when it comes to finding papers. Also, at least GS shows link also to the openly accessible copies without any modification or new access restriction as scribd does (the links I checked on my GS page were to the publisher’s website and to copies on citeseer, a co-author’s homepage, workshop’s website etc., or to to my homepage)

  5. Dear Keet,

    I found your article insightful, thank you for sharing your opinions.
    I am a graduate student and I have found Academia.edu’s service excellent so far, so that I was surprised by your article and I decided to verify your statements.
    I found out that what you say about the terms and conditions (T&C) is indeed literally true, but I believe that it may mislead the reader. This is the reason I decided to post this comment.

    First, you omitted that T&C also states that:

    “Academia.edu *does not claim any ownership rights* in the Content that you post to the Academia.edu Services”

    and that

    “you *continue to retain all ownership rights in such Content*, and you continue to have the right to use your Content in any way you choose”.

    This seems pretty fair.
    Moreover, even if the T&C states that you give Academia.edu a “limited license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content”, this is true “solely on and through the Academia.edu Services”.

    This means that you authorize Academia to show your papers on Academia.edu (and only on Academia.edu), to let people download them from A.edu, to visualize them on their PC, print them, read them, etcetera. This seems the point of sharing a paper on Academia. So far so good!

    The right to “modify” seems more sneaky. Nevertheless, I think this may be determined by Academia’s need to convert some file formats, for instance powerpoint to pdf. (another remark here: you forgot to make an update about Academia’s policy of uploading paper on scribd – it ended several months ago). I have never heard of any Academia.edu’s paper being modified by Academia.edu’s in its content, and I believe it would not be in Academia.edu’s staff interest to do so – remember that they do not have any ownership rights, and the putative “modified” copy would still be yours (and useless – therefore unlikely to exist).

    Lastly, the T&C states that the license is limited. This means that whenever you want you can revoke Academia’s rights on your paper. This seems a pretty good way to solve any problem with Academia’s service, if you find out something isn’t going the way you expected (even if I don’t understand what you fear Academia would do with your paper, if I have understood the T&C correctly so far).

    To conclude, I understand that Academia.edu’s high ranking on google might be annoying for a well-known professor – but this is also a great advantage for the “small fishes” like me, and it may represent a stimulus to a (virtually) democratic academic agora (I’m not saying this is necessarily positive – I am only suggesting a less dysphoric interpretation of this feature of Academia.edu).

    Please notify me if you believe I somehow misunderstood Academia.edu’s T&C, given that I am neither an native English speaker nor a law student.

    Best,

    Neri

  6. Dear Neri,
    Thank you for your extensive reply.
    Please note that the blogpost was written about 2 years ago, taking into account the then active T&C and how the site’s features were working then. Given that situation, I wrote the blog post and I was sufficiently disappointed to leave it aside indefinitely.
    If it has changed for the better, fine, as the idea of automatically following other researchers itself is good. But the bad first experiences don’t make jump on board now, as it may well revert back, and, still, as for myself, I’d rather direct people searching for me to my homepage than to Academia.edu.
    Regards,
    Maria

  7. Pretty nice post. I just stumbled upon your weblog
    and wished to say that I’ve truly enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I
    hope you write again soon!

  8. I had just started to be so active in Academia, but also was reluctant to put my papers there. Then I stumbled upon this entry – and read it – and thought: oh well… maybe putting the titles of my papers without uploading the files should be fine. I despise things related to copyright infringements & plagiarism and the likes, and really — reading your entry has made me more aware BEFORE uploading my stuff to Academia. I guess it’s just right to have the website for building networks among scientists, but if the website gets sneaky (to use your term), it’s another thing. Thank’s for the head’s up! 🙂

  9. Pingback: Google searches, sneaky Academia.edu, and data ...

  10. Great post. I literally just signed up for academia.edu, and the first thing they asked me to do was upload my publications. Naturally, the next thing I did was search ‘academia.edu copyright infringement’ and found this post. I have now deleted my account until I check out this situation in detail. Thanks for writing.

  11. I’ve been surfing online more than 3 hours today,
    yet I never found any interesting article like yours. It is pretty worth enough for me.
    In my opinion, if all webmasters and bloggers made good content as you did, the web will be much more useful than ever before.

  12. Well, to add my two cents worth of opinion, Academia.edu is fine with me. I managed to congregate my publications under one roof, albeit not mine, but then there are always tradeoffs. And yes many colleagues and other interested persons downloaded the ones that interested them.
    I have not really searched the issue of searches but I am glad that Academia is found in Google. Otherwise where would I post my talks and make them accessible and available in web searches?
    Thanks for hosting my comment.

  13. Pingback: 8 years of keetblog | Keet blog

  14. Have you ever considered writing an e-book or guest authoring on other blogs?
    I have a blog centered on the same subjects you discuss and would really like to
    have you share some stories/information. I know my viewers would appreciate your work.
    If you are even remotely interested, feel free to shoot me an e mail.

  15. I’m no longer sure where you’re getting your information, but good topic.
    I needs to spend a while learning much more or figuring out more.
    Thank you for fantastic information I used to be on the lookout
    for this information for my mission.

  16. Pingback: Academia.edu Login - www.Academia.edu Sign In Page

  17. Is it really Academia.edu’s “fault” that their sites get to rank higher than the original sources of the papers? I would think they don’t have much influence on that (albeit they might consider it lucky), since it is Google’s search algorithms that determines which content “deserves” to rank highest.

    • There are ‘interesting’ things one can do from the SEO side, but my main point was that I (and I presume also any well-known university) don’t want Academia.edu showing my profile higher up than my own webpage resp. the university’s website due to their aggregate results, for several reasons mentioned in the post. Also, because an aggregate may come out higher by PageRank doesn’t imply it has better results for individuals and organizations (e.g., due to dirty or incomplete data Academia.edu has crawled).

    • afaik, no (but I don’t use academia.edu anymore). if not logged in, then certainly not, but you’ll see just IP address and all that comes with it (city etc.)

  18. Definitely imagine that that you stated. Your favorite reason appeared to
    be on the internet the simplest factor tto
    bear in mind of. I say to you, I definitely geet annoyed whilst folks think about issues that they just
    do not realize about. You controlled to hit the nail upon the highest and defined out the whbole thing without having side effect , other people can take a
    signal. Will probably be again to get more. Thanks

  19. GIVEN THESE CRITICISMS, CAN YOU SUGGEST THE QUICKEST WAY TO GET YOUR PDF’S INTO GOOGLE SCHOLAR (OR ACADEMIA EVEN) WITHOUT SETTING UP A WEB PAGE? – I HAVE MADE PDFS OF OLD PAPERS (PRE 1978) THAT I WANT TO BE ACCESSIBLE.

    • one does not ‘post’ or ‘put’ them into google scholar; their bots crawl the web and their algorithms try to figure out which one are the scientific papers. I’m not sure if researchgate is any better than academia.edu, but I’ve heard fewer complaints about them. Another option may be to put your papers on a preprint server such as arxiv (or similar, depending on your field of research). Also, registering with e.g. wordpress (that sets up the blog for you) is very easy to do and then make a blog-as-homepage [choose ‘new page’ rather than ‘new post’].

  20. Pingback: Academia.edu UH OH | jschoolmann

  21. It’s really a cool and useful piece of info.
    I’m glad that you just shared this helpful information with
    us. Please stay us informed like this. Thank you for sharing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s