snabelen.no er en av mange uavhengige Mastodon-servere du kan bruke for å delta i det desentraliserte sosiale nettet.
Ein norsk heimstad for den desentraliserte mikroblogge-plattformen.

Administrert av:

Serverstatistikk:

359
aktive brukere

#scraping

6 innlegg5 deltakere0 innlegg i dag

Als je een AI browser gebruikt zoals die van OpenAI, die met je meeleest en alles onthoudt, is er dan sprake van scrapen? En als dit geen scrapen is, dan lijkt me het handhaven van een scrapingsverbod voor AI moeilijk worden. Gebruikers zullen in de nabije toekomst hun browser namelijk vragen op de achtergrond wat "browse werk" voor hun te doen.
tweakers.net/nieuws/241004/bro

Tweakers · Browsers OpenAI en Perplexity omzeilen paywalls namens gebruikersAv Arnoud Wokke

Open source and the foundation of modern software development is cracking.
The reason: AI companies are scraping entire registries. Enterprise CI/CD systems hammer servers with wasteful, uncached requests."

An open letter to the industry, written by stewards of public infrastructure openssf.org/blog/2025/09/23/op

openssf.orgOpen Infrastructure is Not Free: A Joint Statement on Sustainable Stewardship – Open Source Security Foundation

1/2 You know: they are every text and book to ingest them in . Large companies don't care about intellectual property rights or remuneration.

But did you know that they do not simply lend books or give them away to children, people in need, or libraries afterwards? They are torn apart, destroyed, thrown away. arstechnica.com/ai/2025/06/ant

To show + Anthropic's degree of contempt:

Hundreds of books in chaotic order
Ars Technica · Anthropic destroyed millions of print books to build its AI modelsAv Benj Edwards

Since people are dunking on again I'll share one tidbit: when @jonah and I was investigating some performance issues, I noticed that Meta-ExternalAgent was scraping /auth/sign_up and one specific invite link with different `accept` parameters (which indicates acceptance of rules), however because Mastodon returns 200 (and shows the rules again) on invalid `accept` parameters the just keeps going...

Replied in thread

@FediPact

601 instances of us.archive.org

also 858 URLs containing .gov, of which 614 are gov.xx, that is government sites of other countries. So 244 US government sites, including consumer.ftc.gov, houstontx.gov, webharvest.gov (a NARA site), emergency.cdc.gov, ...
I get it that US gov sites are not copyrighted, but still, talk about freeloading

The only positive is that this may be the most comprehensive list of websites existing today.

#META#AI#scraping

When it comes to , could you tf not with all that ? Pay-per-packet could be the future now if some people can't control themselves.

The more you , the more have to pay, which should yield better and improved infrastructure to decrease the cost, but instead: it could turn the internet into a true "transactional" network.

Suddenly the is run on ... hell hath arrived. Granted, this fringe scenario is a bit hyperbolic, but still.