Blog scraping at HCI? How would you have handled this?

by Janet Clarey on December 18, 2009

This is a post for bloggers…

I received a Google alert* hit on my name this morning that pointed to the Human Capital Institute (HCI).

(*I have a Google alert set up for my name and for links to various sites I write for so I can make sure I’m able to comment or <more likely> defend myself. If you have any kind of web presence, I think it’s important to have some way of monitoring what people are saying about you.)

Anyway…

I followed the link from the alert to HCI and noticed several blog posts (in their entirety) from this very blog. I was listed as the author and there’s a way to comment on their site. Not here. So I’m kind of like…the hell?

I fired off this via a contact form (and I’m not proud of it…questioned even putting it here):

I object to you taking content directly from MY COMPANY blog and placing it on YOUR site. It is especially problematic because comments can be made on YOUR site leaving me without an opportunity to properly monitor my content.
If you are looking to aggregate content from around the web, look at this example http://www.elearninglearning.com/. It provides a “read more” (does
not scrape the entire blog post) and directs you to the authors site to comment. If my content exists here because I am a member, let me know and I will cancel my account immediately. This is copyright infringement.

So I’m kind of an asshole writing it with that tone however, I had just returned from making two “Mom, I’m gonna be late!!!” trips to the school. And, I drove in PJ’s, a robe, a winter coat, a kids hat with a pom pom on top and some Uggs. Not pretty. Not at all. Not on either drive.

Now Human Capital Institute (HCI) is a well-respected organization…

HCI is the global association for talent management and new economy leadership, and a clearinghouse for best practices and new ideas. Our network of expert practitioners, Fortune 1000 and Global 2000 corporations, government agencies, global consultants and business schools contribute a stream of constantly-evolving information, the best of which is organized, analyzed and shared with members through HCI communities, research, education and events. And there’s more – a Center for Excellence, Research, Education, and Event .

I hate to call them blog scrapers…but read the Wikipedia definition of blog scraping and let me know what you think.

I received two emails from them. The first shortly after I contacted them:

Email #1: To another HCI person from someone in HCI marketing, cc me – “not sure who or how this should be reacted too. “B” may have sent it to someone also but wanted to be sure someone had seen it.

And the second, at the end of my day…

Email #2: Please accept my apologies on behalf of HCI. We have complied with your request and removed your blog from the list of blogs that are fed into our site. We certainly meant no offense, on the contrary – we feel that your blog represents some of the most interesting work in the talent management space and we plan to promote the list of the 100 blogs we pull from as the HCI 100 Best Blogs – our members seem to appreciate this service and several of the bloggers on the list have expressed their thanks for the increased exposure. Unfortunately, the comment frame is uniquitous across all the articles and blog posts on our site and we can’t change that architecture. Best in your future endeavors.

(that last line stung because I it is my version of ‘you ungrateful “rhymes with witch”)

So now I’m wondering. I mean it’s probably a good thing to be promoted by a prestigious organization. And, I’m sure I could’ve just subscribed via RSS and monitored my post that way…so, yes I’m second guessing myself. How would you have handled this? Would you go back and say…never mind, I was just being a wise ass or…is this blog scraping?

{ 1 trackback }

Should you add your blog to a B2B online blog community? — Spinning the Social Web
January 1, 2010 at 4:38 pm

{ 34 comments… read them below or add one }

Chris Bailey December 18, 2009 at 5:54 pm

Hi Janet, I think you do have a good reason for your reaction.

One, it seems HCI should have asked your permission first, rather than assuming it was okay for them to aggregate your content. That's just common courtesy and good form and they failed there.

Two, having comments of your content located there (when you didn't even know it existed) is equally bad form. Like you said, how in the world are you supposed to engage with commenters when its set up like that?

Three, getting beyond all of this is the fact that HCI really, really should not post the full content of your work on their site. A far better option would be to post snippets with full content redirected back to your site. If their objective is to deliver content in a way that honors your creative time and energy (as well as provide service to their own visitors), then this is likely the best option for making it happen.

Sad to see that HCI is so cavalier with how they build services online. Its not impressive for an organization that should know better. Let's hope they change their ways soon.

SamIAm December 18, 2009 at 6:08 pm

Janet, what makes you think this is a “prestigious” company? “Consulting Charlatans” is more like it, beginning with their ridiculous ripoff “Human Capital Strategist” certification program. Very attractive to HR managers desperate to build up their credibility. The very act of blogscraping, as you accurately call it, speaks to their real nature. I wonder if they have an Ethics program LOL!!

no bs December 18, 2009 at 6:29 pm

Total blog scraping. Your content is your content. Make sure you say so legally at the bottom of your blog with something along the lines of 'this content is mine, no using without permission, etc' Eff em.

jclarey December 18, 2009 at 6:43 pm

Thanks for weighing in. I think I'll revisit the whole issue of what should be on my blog regarding content.

jclarey December 18, 2009 at 6:45 pm

Perhaps prestigious wasn't the right word…in my mind I was separating them from sploggers. Probably not the best description since I've had little interaction with them over the years. Thanks for commenting.

jclarey December 18, 2009 at 6:49 pm

Thank you Cathy. And agree re: monitoring your identity. It wouldn't be that big a deal to just take and excerpt and link back.

jclarey December 18, 2009 at 6:53 pm

Hi Chris – I'm happy to see my content when it is (1) aggregated by an excerpt and (2) when it's properly linked back to. I think if a company is taking other people's stuff in its entirety it gives the appearance that I write for them. I'd like to think they just didn't think it through. Thanks for commenting on this. I'm feeling better.

Philip Hutchison December 18, 2009 at 8:14 pm

Acting as a feed aggregator for a particular industry is one thing (as you mention, elearninglearning is a polite way to do this), but presenting your work as if you had written it for HCI and then letting other people comment without your awareness, consent, or participation is borderline criminal. ESPECIALLY for an institution that's supposed to be fostering academics! They should know better.

This is akin to finding a self-published magazine at a local book store and publishing its articles verbatim without asking for permission or paying royalties. I guess you can't call it plagiarism since they left your name on it, but it's still a form of theft.

On a lighter note, this line made me chuckle:
“And, I drove in PJ’s, a robe, a winter coat, a kids hat with a pom pom on top and some Uggs. Not pretty. Not at all.”

jclarey December 18, 2009 at 9:30 pm

Later I thought…is it really such a big deal to run upstairs and throw some jeans and a sweatshirt on? I'm sure it sent a message to the kids though. Next time I'll walk them in dressed like that.A walk of shame.

Matt Crosslin December 18, 2009 at 9:59 pm

I would have to agree – total scraping. And the whole comment thing – BS. I know there are WordPress plugin-ins where you can import content but still link back to the original post for comments. In fact, I can't think of one god reason why they can't do that. And only one bad reason – they want to keep the traffic (and credit) on their site.

Is “uniquitous” a real word? Google and FF both seem to act like it is not…

Harold Jarche December 18, 2009 at 11:26 pm

My stuff is scraped and put all over the Web & I know that I have no control over it. I don't really care. HCI asked me if they could use my content and I responded that it was CC-Attrib licensed, so go ahead. I'm not crazy about the fact that there is a comment form on my posts on their site, but then I have no intention of responding to any comments there.

It's the Net & stuff gets copied. If my name is on it, I'm OK. It would be worse if nobody copied my stuff; then I would know that I'm boring ;-)

Cheers;
Harold

avilbeckford December 19, 2009 at 7:26 am

Janet,

I would say that you're caught between a rock and a hard place (smile). I hadn't thought about the comment aspect. I have Google alerts set-up for my name, my company name, and my blog and I am often surprised where by blog ends up. I didn't know there was a name for it until you mentioned it (blog scraping).

I am a member of a few blog directories and every morning some grab my blog and I guess all their members' and use it to populate their website. Members can subscribe to my blog on their website, so they aren't actually my subscribers. It used to bother me but I let it go because my work is getting out there.

It would be very tricky for you to go back to HCI at this point. Avil @avilbeckford

Jeff McGowan December 19, 2009 at 7:59 am

I wholeheartedly agree that it is scraping, and I think your initial reaction was natural. If their intentions were so good, they probably would have checked with you before repackaging your work. That said – In hindsight, it might have been more productive for you to take a less aggressive approach in resolving the situation. This could have given you an opportunity to voice your objection and some time to cool off.

Of course – if we all knew how our decisions would turn out, we would never make a mistake! Bottom line – don't worry about it. They are in the wrong no matter how they try to make you feel. It was their mistake and their loss.

jclarey December 19, 2009 at 8:18 pm

You'll never have to worry about being boring! I have seen my stuff elsewhere but not like this. Appreciate your input on this…your a long timer in the edublogosphere so I'm sure you've experienced it all.

jclarey December 19, 2009 at 8:21 pm

Hi Avil-
It's something I guess I'll just have to not let bother me…as you and Harold do.You're right though…I guess the up side it that your work is out there more.

jclarey December 19, 2009 at 8:26 pm

I think my default is set to assertive. No always a productive option. Although, sometimes perfect.

Thanks so much for your honesty. Feeling much better now : )

avilbeckford December 19, 2009 at 8:29 pm

At first I was stunned, but it's not that important to me. And I agree with what Harold say about it being terrible if no one copied your stuff. One question I ask myself though, are firms like HCI brilliant for aggregating many posts or just plain lazy to create their own content. Avil

Rick Scott December 20, 2009 at 8:33 am

I have had some dealings with the company and agree with the earlier comment that “prestigious” isn't the right word. They have largely repackaged other information and training and are reselling it at high prices, all the while acting like it is somehow original or innovative. The “certifications” they sell are pretty much “management 101″ courses that could be had much more cheaply elsewhere. What concerns me most about your experience is that one of their big selling points is a “library” full of articles (seriously? In 2009?). Now I wonder what is in those articles and where the content really came from. I am glad you called them on it and hope others will take this as a “caveat emptor” lesson in dealing with them.

Clark Quinn December 20, 2009 at 10:50 am

Janet, like Harold I usually am pretty relaxed about it, taking it as a compliment. The times I get my knickers in a knot are when they don't provide attribution (which has happened). And sometimes the sites have no person or contact associated!

Or I might feel ripped off if they add ads without adding any extra value (e.g. aggregate and advertise, hoping for a revenue stream on others' work). I haven't yet gone the advertising route, though I suppose my blog is implicitly an ad for Quinnovation. Interestingly, HCI is down right now and I can't check :) .

Mike Telesca December 21, 2009 at 8:10 am

I'm sure their members appreciate the content, and some of them will remember the authors of other scraped blogs and perhaps follow back or “go to the source.” I'm also certain other bloggers will willingly give up their material for the exposure HCI can provide.

This is supposed to be web 2.0, the “new age.” Connectivity is easier than it's ever been, and being connected can show that an entity is not stuck in the 1900's and merely publishing its newsletter on the web to reach its target audience. To simply say “we're not going to change our architecture” is equal to ” this is the way it's been done, we see no need to change.”

It's all about “the new model” and making it work better for everyone. Join us in the new world, won't you?

jclarey December 21, 2009 at 8:25 am

Mike, at first I though you were criticizing me for not being part of the web 2.0 “new age” as you put it. (Spend some time here, and I'll think you'll find that I'm very generous.) Your “we're not going to change th architecture” quote is directed toward the scraper no? I'm not sure who you're asking to join the real world…can you clarify?

Janet Clarey December 21, 2009 at 8:33 am

I think the things that bug me about this (and not some of the other “republishing” activities are:
-naming themselves as “source”
-allowing comments on their site
-full content posted
-no hyperlink on title

jclarey December 21, 2009 at 8:35 am

I think what bothers me most is the “source” is HCI, there is no hyperlink on the post title, and the comments stay on their site (leaving me with two options: not responding to comments or monitoring all the blogs they feature via RSS (not just my own)
<img src=”http://brandon-hall.com/janetclarey/wp-content/uploads/2009/12/hci.jpg”/>

jclarey December 21, 2009 at 8:38 am

Thanks Rick.

johnstearns December 21, 2009 at 11:20 am

Well, Janet – since you know my views on blogging in general, you could probably guess that I'd say if you hang 'em out on the line, you have to expect someone might come by and take 'em away. Course, you being a seasoned blogger, you know the protocols for protecting copyright material. Bet when those kids you're chauffering around get old enough, you'll gonna give 'em holy hell the first time a file disappears they didn't back up! “You SHOULDA KNOWN BETTER!”

Your…experience…brings to mind the old addage – no such thing as bad publicity, or, in bloggerworld, no bad thing about being quoted. If you make somebody mad enough to want to take you on, I'm sure they'll track you down.

I'd be more worried about possibly being seen by the president of the PTA in mufti.

jclarey December 21, 2009 at 11:28 am

You always make me smile John!

brian bauer December 22, 2009 at 2:09 pm

interesting reaction on your part. I personally think of my blog as entirely public, and I assume that whatever content I put there will be scraped, aggregated, borrowed, stolen, and otherwise abused. it is the internet after all. now, if I found that someone had represented themselves as the author of my work on their blog, I would not be happy, and would likely to somewhat creative in how I handled the situation. have a look at what Google is up against in France for book scanning. a Content mashing monster against an entire country. we are but specs in the wind compared to that conflict of titans. doesn't make it right, but we need to pick our battles and save our energy for when it really counts.

fyi, here is my blog: blog.ontracktechnology.com

jclarey December 23, 2009 at 8:43 am

Thanks Brian. Added your blog to my reader.
It may seem an energy waster but you fight the larger battle incrementally.

yin wah kreher December 26, 2009 at 10:57 am

Janet, I'd probably reacted the same way. Somehow I've this guardedness about intellectual property, copyright, attributing credit when it's due, too much formal education messing up my head? I think it's always polite to at least have a hyperlink to the source. NLS would have said I'm so 'proper'. Is being 'nice' and doing things 'properly' and politely such a rarity now?

jclarey December 27, 2009 at 1:27 pm

Hi Yin…I hope doing the right thing isn't going out of style. I suspect this would bother the academics more…especially those viewing content from the outside (i.e., not producers).

Connie Malamed December 28, 2009 at 10:13 pm

Hi Janet,
This happens to me all the time with The eLearning Coach blog, even recently from a blog at the Bloomsburg University, where I would think they would know better. When it's a scraper site, you just expect it. When it happens from a respected organization, you have to wonder how they can be so clueless. I think these organizations should follow a policy of: 1) asking the author's permission, 2) giving proper and obvious attribution, 3) guiding people back to the original site for comments so we can interact. Perhaps you can help this organization improve their approach and policy.
Good luck.
Connie Malamed
http://theelearningcoach.com

jclarey December 29, 2009 at 6:59 am

Good idea Connie.

Candace January 13, 2010 at 5:31 pm

Janet, you just get cooler and cooler. Keep up the clever writing, regardless of where it ends up.

jclarey January 13, 2010 at 7:25 pm

Thanks Candace : )

Leave a Comment

Previous post:

Next post: