green_amber: (Default)
[personal profile] green_amber
I know I asked this before but could someone remind me: if you are an on line creator (eg the P V P people or Boing Boing say) can you STOP your work being syndicated by RSS using particular code? if so, how? does it have any drawbacks? Thanks!!! Could you allow some people to RSS it and not others? Would that require password protection effectively?

Date: 2006-01-03 02:45 pm (UTC)
From: [identity profile] sbisson.livejournal.com
Well, there's nothing to stop someone scraping your site HTML and building a feed from it.

If you're running RSS and want to use it to only distribute to specific folk you can just use standard HTTP authentication tools (after all, RSS is just XML over HTTP).

Date: 2006-01-03 02:53 pm (UTC)
From: [identity profile] surliminal.livejournal.com
Well, there's nothing to stop someone scraping your site HTML and building a feed from it.

So there's no code you can insert a bit like the robots.txt that stops people making an RSS feed out of your site? can you build your site not in XML just in HTTP?

Date: 2006-01-03 02:57 pm (UTC)
From: [identity profile] sbisson.livejournal.com
Not everyone respects robots.txt :-)

The thing is, once you have content in an open format like HTML, anyone can do anything with it. You'd need to put your site content in FLash or similar.

One option would be to build you site as a content negotiated CMS and just block out the IP addresses or HTTP User Agents of scraping tools. That would work...

Date: 2006-01-03 03:02 pm (UTC)
drplokta: (Default)
From: [personal profile] drplokta
Not if the scraper is using a user agent like "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" and coming via an ADSL connection with dynamic IP addressing or a megaproxy farm from an ISP like AOL or NTL. I spend a fair amount of time trying to block screen-scraping spiders, and it's not a trivial exercise if they don't want to be blocked.

Date: 2006-01-03 03:04 pm (UTC)
From: [identity profile] sbisson.livejournal.com
True.

The problem is that the bad guys have access to the same technologies as you do. It's like dealing ith spam...

Date: 2006-01-03 03:05 pm (UTC)
From: [identity profile] surliminal.livejournal.com
Wow. Thanks guys. V helpful..

Date: 2006-01-03 03:00 pm (UTC)
drplokta: (Default)
From: [personal profile] drplokta
A robots.txt doesn't physically prevent anything from happening, it just politely asks robots not to index certain pages. It's like putting up a "keep out" sign on an unfenced piece of land -- people know they're not supposed to trespass, but there's no physical barrier preventing them.

HTTP is a protocol not a markup language -- I assume you meant HTML. XML or HTML makes no difference, anything that is human-readable is also machine readable unless you hide it behind something that needs human-level pattern-matching skills like a CAPTCHA image.

Date: 2006-01-03 03:38 pm (UTC)
andrewducker: (Default)
From: [personal profile] andrewducker
You can block pictures from being viewed inline from elsewhere, but that's about your lot.

Profile

green_amber: (Default)
green_amber

May 2009

S M T W T F S
     12
3 456789
10111213141516
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 7th, 2025 08:18 pm
Powered by Dreamwidth Studios