Google searches, sneaky Academia.edu, and data duplication

Posted on April 20, 2011 by keet

Aside from this blog and Facebook, I recently signed up for Academia.edu, a Web 2.0ish site where researchers can connect and follow each other academically. It even was so ‘smart’ that it could tell me who of my FB friends were already in the system without me giving explicitly the information about my FB account, and it found most of my papers for me (with some noise, though). Setting aside the uncomfortable former aspect, the ‘finding and handling my papers for me’ is actually really sneaky. I’ll spend the remainder of this post on that, just so you know what you’ll be letting yourself into just in case you sign up for it.

The first annoying thing is, that if you let Academia.edu collect your papers automatically when you build up your profile there, which it seemingly does ‘intelligently’, it snatches your papers and either takes them from citeseer or puts them on scribd, even though they are all on my and the publisher’s websites, too. And then there’s some noise; e.g., it links to a pdf of the presentation you did at the conference instead of the paper itself. Also, it does not provide full publication details (just the title), even though Academia.edu easily could be programmed to screen scrape that from any researcher’s website, or, better, ask me for a bib file.

And then there’s the real catch: when someone now searches for your papers, the Academia.edu URL to their version of the paper comes up higher in the Google ranking than either yours or the publisher’s. ‘Thanks’ to Academia.edu’s services, I now know with which Google search terms they clicked to retrieve which paper. So when on March 20 someone from an unknown country at 04:50am local time searched for “data information granularity bioinformatics”, it found as the very first Google hit—yes, the rank is given in the stats as well—the slides of my PhD defense on scribd, not my thesis on my website (that it should have done); and even if s/he wanted to have download it, upon clicking the “Download” button, it complains that “You must be logged in to download” (!). The slides were probably not what s/he was after (idem ditto for the visitor from Poland, who searched for this). There are many such misdirected instances. In general, essentially they would have had to do the search again to get the publication data and, in case of the linking of a wrong file, searching for the right file. It is correctable on Academia.edu—manually. Subsequently adding new papers is also a manual process with an impractical GUI.

And then I have not said anything yet about the scruffy page rendering by scribd. Besides, I never gave scribd approval to offer my work on/through their site (there are more malfunctioning aggregators, which is an issue of its own). ~~In addition, it would not surprise me if that would violate the delicately balanced copyright arrangements that exist for CS publications~~. UPDATE: the terms and conditions (d.d. 21-4-2011) says that “By displaying or publishing (“posting”) any Content on or through the Academia.edu Services, you hereby grant to Academia.edu a limited license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content solely on and through the Academia.edu Services.”. That does violate the delicately balanced copyright arrangements that exist for CS publications. The terms & conditions also say it is the Member’s own responsibility to get the approval from the respective publishers/copyright holders.

Moreover, there is a preposterous “message” on the right-hand side of the search statistics: “Tip: To make your page appear higher up on Google: Link to your Academia.edu page from your department website Upload more documents – papers, talks and a CV”. But my own site was higher up in the Google ranking before I signed up to your devious service! Honestly, I want to lure people to my site when they are interested in my contributions, not to a place where there is partial information badly duplicated. Ok, this smells of ego-tripping, but my site is worth almost $17K according to WebsValue.com and has a page rank of 5; if I run out of money, I’ll have an asset to sell fairly easily without much disruption. Seriously though, this ‘rerouting’ of visitors away from the source toward some obscure other location on the Internet is obviously a more important issue at the institutional level. No sane university or research institute would want to have as policy to redirect visitors to any other site than their own when it comes to displaying the scientific impact its employees have made. If one is at a non-indexable institute that only happens to carry the title ‘university’ bit is not in substance, then perhaps Academia.edu helps with your visibility. But I am not at such an institute; UKZN is one of the 5 top research-intensive universities in South Africa.

So what can you do? Remove your papers on Academia.edu. This I did this morning, one by one. The sad thing is that “following” other researchers is, in theory at least, an easy way to be notified automatically of their new publications compared to manually checking their homepages regularly, but this is precisely that which Academia.edu manages to mess up, badly.

I can envision a couple of mean scenarios why anyone would have wanted to set up the site in the way it is, like that they first pollute Google rankings and then ask for a fee in the near future (after all, they already require you to login to download the file, and a “fee” item is included in the terms & conditions file). The statistics they are gathering on who-follows-who gives a better insight in research networks and its leaders than the more common citation-network analyses. Finding out which scientific papers and topics are ‘hot’ must be valuable material as well, and become perhaps just as important as the rather imprecise ISI impact factor that is quite useless for CS at the moment. You also could use the data for NLP and semantic annotations to, in near future, offer indispensible academic semantic search facilities (at a price). And no doubt there are more scenarios.

In short, that was the end of that “Web 2.0” experiment for me.

(p.s.: just in case someone wants to see some proof: I did make several screenshots that I can share)

36 responses to “Google searches, sneaky Academia.edu, and data duplication”

cattlehill says:

February 29, 2012 at 1:00 AM

I spotted your blog via google. Search terms: “academia.edu copyright”. I agree with your statements and removed my research-papers, too.
The benefits are small and the GUI is still poor of this site.
However, if you are a “mobile element” in science and change institutions after funding ended you may be difficult to track. academia.edu gives you the opportunity to create a universal personal link-hub to your current position.
Again, I share your opinion to choose the information content wisely, presented on that site. This should be common sense anyway.

Reply
keet says:

February 29, 2012 at 9:24 AM

Hi cattelhill,

If the
“if you are a “mobile element” in science and change institutions after funding ended you may be difficult to track.”
would be the only reason, then that can be easily fixed by other means: get a domain name and set up your own website, be it separately hosted on a friend’s server like I have (at http://www.meteck.org), or use the department’s users homepage directory and put a redirect from the chosen domain name to the temporary url for your department’s users home page directory.
Regards,
Maria

Reply
Steven says:

March 6, 2012 at 12:10 PM

I also found your post through googling “academia.edu pdf copyright”. Their pdf handling is pretty sneaky indeed. And – of course – they have their terms covering everything, but it is still a very bad practice, aimed at maximizing traffic. I prefer Mendeley.com as a 2.0 academia network. They ask permission for everything, and you don’t have the feeling they are being unethical.

Reply
threeprisoners says:

July 16, 2012 at 12:50 PM

Hi,

Thanks for the post. I agree. Initially, I was impressed by how Academia.edu found many of my publications. Unfortunately, there were duplicates and because I didn’t want ‘my’ profile looking messy, I felt I had to dedicate some time to go through each citation and add the missing detail/delete duplicates.

Best wishes,
Pete
@threeprisoners

Reply
- keet says:
  
  July 18, 2012 at 3:06 PM
  
  impressed/surprised, yes. But Google Scholar found them (almost) all, too–and still does–automatically, which much less noise and more data. i.e., GS is better than academia.edu when it comes to finding papers. Also, at least GS shows link also to the openly accessible copies without any modification or new access restriction as scribd does (the links I checked on my GS page were to the publisher’s website and to copies on citeseer, a co-author’s homepage, workshop’s website etc., or to to my homepage)
  
  Reply
  - threeprisoners says:
    
    July 20, 2012 at 12:18 AM
    
    Hi,
    
    Thanks, that’s useful to know. I’ll give GS a try.
    
    Best wishes,
    Pete
    @threeprisoners
jocuri online masini says:

February 5, 2013 at 2:02 PM

Wow, that’s what I was searching for, what a stuff! existing here at this blog, thanks admin of this website.

Reply
neri says:

February 26, 2013 at 2:09 AM

Dear Keet,

I found your article insightful, thank you for sharing your opinions.
I am a graduate student and I have found Academia.edu’s service excellent so far, so that I was surprised by your article and I decided to verify your statements.
I found out that what you say about the terms and conditions (T&C) is indeed literally true, but I believe that it may mislead the reader. This is the reason I decided to post this comment.

First, you omitted that T&C also states that:

“Academia.edu *does not claim any ownership rights* in the Content that you post to the Academia.edu Services”

and that

“you *continue to retain all ownership rights in such Content*, and you continue to have the right to use your Content in any way you choose”.

This seems pretty fair.
Moreover, even if the T&C states that you give Academia.edu a “limited license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content”, this is true “solely on and through the Academia.edu Services”.

This means that you authorize Academia to show your papers on Academia.edu (and only on Academia.edu), to let people download them from A.edu, to visualize them on their PC, print them, read them, etcetera. This seems the point of sharing a paper on Academia. So far so good!

The right to “modify” seems more sneaky. Nevertheless, I think this may be determined by Academia’s need to convert some file formats, for instance powerpoint to pdf. (another remark here: you forgot to make an update about Academia’s policy of uploading paper on scribd – it ended several months ago). I have never heard of any Academia.edu’s paper being modified by Academia.edu’s in its content, and I believe it would not be in Academia.edu’s staff interest to do so – remember that they do not have any ownership rights, and the putative “modified” copy would still be yours (and useless – therefore unlikely to exist).

Lastly, the T&C states that the license is limited. This means that whenever you want you can revoke Academia’s rights on your paper. This seems a pretty good way to solve any problem with Academia’s service, if you find out something isn’t going the way you expected (even if I don’t understand what you fear Academia would do with your paper, if I have understood the T&C correctly so far).

To conclude, I understand that Academia.edu’s high ranking on google might be annoying for a well-known professor – but this is also a great advantage for the “small fishes” like me, and it may represent a stimulus to a (virtually) democratic academic agora (I’m not saying this is necessarily positive – I am only suggesting a less dysphoric interpretation of this feature of Academia.edu).

Please notify me if you believe I somehow misunderstood Academia.edu’s T&C, given that I am neither an native English speaker nor a law student.

Best,

Neri

Reply
keet says:

February 26, 2013 at 9:14 AM

Dear Neri,
Thank you for your extensive reply.
Please note that the blogpost was written about 2 years ago, taking into account the then active T&C and how the site’s features were working then. Given that situation, I wrote the blog post and I was sufficiently disappointed to leave it aside indefinitely.
If it has changed for the better, fine, as the idea of automatically following other researchers itself is good. But the bad first experiences don’t make jump on board now, as it may well revert back, and, still, as for myself, I’d rather direct people searching for me to my homepage than to Academia.edu.
Regards,
Maria

Reply
website traffic says:

June 19, 2013 at 7:22 PM

Pretty nice post. I just stumbled upon your weblog
and wished to say that I’ve truly enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I
hope you write again soon!

Reply
Dee says:

July 1, 2013 at 6:33 AM

I had just started to be so active in Academia, but also was reluctant to put my papers there. Then I stumbled upon this entry – and read it – and thought: oh well… maybe putting the titles of my papers without uploading the files should be fine. I despise things related to copyright infringements & plagiarism and the likes, and really — reading your entry has made me more aware BEFORE uploading my stuff to Academia. I guess it’s just right to have the website for building networks among scientists, but if the website gets sneaky (to use your term), it’s another thing. Thank’s for the head’s up! 🙂

Reply
Pingback: Google searches, sneaky Academia.edu, and data ...
James says:

July 19, 2013 at 3:33 AM

Great post. I literally just signed up for academia.edu, and the first thing they asked me to do was upload my publications. Naturally, the next thing I did was search ‘academia.edu copyright infringement’ and found this post. I have now deleted my account until I check out this situation in detail. Thanks for writing.

Reply
Shantell says:

August 28, 2013 at 12:03 PM

I am actually delighted to read this webpage posts which
includes plenty of useful data, thanks for providing such statistics.

Reply
slideshare.net says:

September 12, 2013 at 3:12 AM

I’ve been surfing online more than 3 hours today,
yet I never found any interesting article like yours. It is pretty worth enough for me.
In my opinion, if all webmasters and bloggers made good content as you did, the web will be much more useful than ever before.

Reply
Fredericka says:

September 21, 2013 at 5:03 PM

Thanks for finally talking about >Google searches, sneaky
Academia.edu, and data duplication | Keet blog <Loved it!

Reply
Paul says:

November 8, 2013 at 11:02 PM

Well, to add my two cents worth of opinion, Academia.edu is fine with me. I managed to congregate my publications under one roof, albeit not mine, but then there are always tradeoffs. And yes many colleagues and other interested persons downloaded the ones that interested them.
I have not really searched the issue of searches but I am glad that Academia is found in Google. Otherwise where would I post my talks and make them accessible and available in web searches?
Thanks for hosting my comment.

Reply
Pingback: 8 years of keetblog | Keet blog
Internet says:

May 23, 2014 at 12:06 PM

Have you ever considered writing an e-book or guest authoring on other blogs?
I have a blog centered on the same subjects you discuss and would really like to
have you share some stories/information. I know my viewers would appreciate your work.
If you are even remotely interested, feel free to shoot me an e mail.

Reply
soup.io says:

June 13, 2014 at 12:48 PM

I’m no longer sure where you’re getting your information, but good topic.
I needs to spend a while learning much more or figuring out more.
Thank you for fantastic information I used to be on the lookout
for this information for my mission.

Reply
Pingback: Academia.edu Login - www.Academia.edu Sign In Page
David Ondai says:

November 2, 2014 at 3:46 AM

Social media is good but there will always duplication of data.
If you need to find a job, find a career, we have a free job search engine at http://find-careers.com

Reply
Anna says:

November 23, 2014 at 5:43 AM

Is it really Academia.edu’s “fault” that their sites get to rank higher than the original sources of the papers? I would think they don’t have much influence on that (albeit they might consider it lucky), since it is Google’s search algorithms that determines which content “deserves” to rank highest.

Reply
- keet says:
  
  November 25, 2014 at 8:18 PM
  
  There are ‘interesting’ things one can do from the SEO side, but my main point was that I (and I presume also any well-known university) don’t want Academia.edu showing my profile higher up than my own webpage resp. the university’s website due to their aggregate results, for several reasons mentioned in the post. Also, because an aggregate may come out higher by PageRank doesn’t imply it has better results for individuals and organizations (e.g., due to dirty or incomplete data Academia.edu has crawled).
  
  Reply
Eventos Sociales says:

January 22, 2015 at 7:54 PM

Pretty! Este ha sido increíblemente maravilloso artículo.
Muchas gracias por prestan esta información .

Reply
manie says:

April 5, 2015 at 12:03 AM

Can you tell WHO checked your profile? Like their name ?
What if they are not logged in?

Reply
- keet says:
  
  April 7, 2015 at 4:58 PM
  
  afaik, no (but I don’t use academia.edu anymore). if not logged in, then certainly not, but you’ll see just IP address and all that comes with it (city etc.)
  
  Reply
SEO Content says:

July 12, 2016 at 6:02 PM

Hi to all, it’s actually a nice for me to visit this site, it contains valuable Information.

Reply
Simoleones Hack SimCity BuildIt says:

January 8, 2017 at 12:46 AM

Definitely imagine that that you stated. Your favorite reason appeared to
be on the internet the simplest factor tto
bear in mind of. I say to you, I definitely geet annoyed whilst folks think about issues that they just
do not realize about. You controlled to hit the nail upon the highest and defined out the whbole thing without having side effect , other people can take a
signal. Will probably be again to get more. Thanks

Reply
John Caddy says:

February 3, 2017 at 4:00 PM

GIVEN THESE CRITICISMS, CAN YOU SUGGEST THE QUICKEST WAY TO GET YOUR PDF’S INTO GOOGLE SCHOLAR (OR ACADEMIA EVEN) WITHOUT SETTING UP A WEB PAGE? – I HAVE MADE PDFS OF OLD PAPERS (PRE 1978) THAT I WANT TO BE ACCESSIBLE.

Reply
- keet says:
  
  February 6, 2017 at 8:46 AM
  
  one does not ‘post’ or ‘put’ them into google scholar; their bots crawl the web and their algorithms try to figure out which one are the scientific papers. I’m not sure if researchgate is any better than academia.edu, but I’ve heard fewer complaints about them. Another option may be to put your papers on a preprint server such as arxiv (or similar, depending on your field of research). Also, registering with e.g. wordpress (that sets up the blog for you) is very easy to do and then make a blog-as-homepage [choose ‘new page’ rather than ‘new post’].
  
  Reply
ioncasino.info,Ion Casino says:

March 30, 2017 at 3:34 PM

Generally I don’t read post on blogs, but I wish to say that this write-up very pressured me to check out and do it!
Your writing taste has been amazed me. Thanks, quite nice post.

Reply
Pingback: Academia.edu UH OH | jschoolmann
ievue web says:

June 9, 2017 at 5:52 PM

It’s really a cool and useful piece of info.
I’m glad that you just shared this helpful information with
us. Please stay us informed like this. Thank you for sharing.

Reply
Francine says:

January 9, 2020 at 1:11 PM

I don’t belong to acadamia. Edu…. But received message today that they granted exses to my Google account… How is this possible

Reply
Pingback: A brief reflection on maintaining a blog for 15 years (going on 16) | Keet blog