edicted

Stack Overflow AI Scraper

An interesting tale of WEB2 vs WEB3

Stack Overflow, a legendary internet forum for programmers and developers, is coming under heavy fire from its users after it announced it was partnering with OpenAI to scrub the site's forum posts to train ChatGPT. Many users are removing or editing their questions and answers to prevent them from being used to train AI — decisions which have been punished with bans from the site's moderators.

I actually find this a little bit funny and sad at the same time. Stack Overflow is a site that I've used many times over to answer programming questions that I had at the time. I've used it a couple times more recently as well for more specific questions that basic tutorials can't handle. Well, apparently... users on Stack Overflow seem to think that their data belongs to them, which of course we all know that it doesn't. We can think of SO as a very technical type of social media site. As we all know: social media companies own all the data; if you mess with the company and do something that makes them lose money: you're banned. End of argument.

This false belief that Stack Overflow posts somehow belong to the users that put their blood, sweat, and tears into the site in order to be helpful and cultivate their reputation led to many being angry that AI would scrape their data and cut them out of the equation. After all if some random AI is going to rehash the data how will the user that actually answered the question get any credit for their troubles? What a pickle!

So out of protest people started deleting their own highly ranked posts, and got suspended until they changed it back... and if they didn't change it back it just got changed back anyway. Whoopsie!

Ben continues in his thread, "[The moderator crackdown is] just a reminder that anything you post on any of these platforms can and will be used for profit. It's just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you."

Oh... are we finally starting to get it now?

Funny it's taken people this long. If I had to guess I'd say that a lot of the highly ranked users on Stack Overflow are the exact type of people who would take one look at crypto and be like, "Well obviously that's a scam." I've encountered these types of older-school devs who litter the corporate world many times over. They refuse to see the potential and only see the fallout. Well it looks like they'll be forced to change their minds sooner or later at the rate this is all going.

Users are also asking why ChatGPT could not simply share the source of the answers it will dispense in this new partnership, both citing its sources and adding credibility to the tool. Of course, this would reveal how the sausage of LLMs is made, and would not look like the shiny, super-smart generative AI assistant of the future promised to users and investors.

LoL!

We don't want to show the end-users that all Large Language Models are actually just counterfeiting plagiarists built on the backs of others doing the work... so instead we're just going to pretend that the AI came up with the answer on its own. Amazing logic, that. I mean lets be honest they own that data so they can do whatever they want with it.

Site moderators preventing high-popularity posts from being deleted is legally above-board. Angry users claim they are enabled to delete their own content from the site through the "right to forget," a common name for a legal right most effectively codified into law through the EU's General Data Protection Regulation (GDPR).

Wow these users have no shame.

They want to exploit a law designed to protect people from forever having their dirty-laundry posted on the internet in order to delete actually helpful data so that AI can't monetize it. I can't believe I'm saying this but I'm on the side of the corporations on this one. This is a completely hypocritical stance to take as a user. On the one hand you say you want the AI to credit your work but on the other you're trying to leverage a law designed for the exact opposite of that. This is honestly despicable childish behavior. Sorry you didn't read the contract you signed. Take the loss and move on. It's time for these users to learn the hard lesson.

Users who disagree with having their content scraped by ChatGPT are particularly outraged by Stack Overflow's rapid flip-flop on its policy concerning generative AI. For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts.

Beginning last week, however, the company began a rapid about-face in its public policy towards AI.

Wow are people learning just now that corporations change their mind when they can make money? These Stack Overflow users really are autistic aren't they?
And I mean that in the most respectful way. We all love an autist around here especially in crypto. I am not here to blame the "victim".
Perhaps there's another way?

Stack is not alone in reversing a principled stance on AI for profit; Valve also silently removed its AI-art ban on Steam, allowing over 1,000 AI-powered games to flood the storefront. Stack Overflow's partnership with OpenAI also follows the LLM company's recent push for increased partnerships and marquee deals, including their major announcement of a $100 billion datacenter to be built with Microsoft.

THIS IS THE FUTURE

DEAL WITH IT.

This is a tidal wave; it cannot be stopped. People need to stop fighting the obvious path of least resistance and learn to pivot and adapt to the new environment. LLMs and AI aren't going away; they are spiraling outward as far as they possibly can. Calling corporations hypocrites because they change their mind when money is involved is ironically hypocritical in itself. That's EXACTLY WHAT THEY DO EVERY TIME. Please, stop acting shocked. It's a bad look. We aren't that naïve.

And this is actually not a problem that WEB3 solves.

In fact... one might argue that data in WEB3 is even easier to scrape because all the data is public to everyone. At least with a centralized WEB2 corporation it's their decision whether they want to make such a move. In crypto: anyone could move in to monetize the data in any way they saw fit.

The difference with WEB3 is that we can actually get paid up front for our contributions. This in combination with the fact that if someone privatized data profiteering anyone else can come along and make that same data public for all to see, which completely undercuts the private model and makes it exponentially more risky to pursue. No one can legally send a cease and desist order on WEB3. These variables may prove useful going forward.

Conclusion

Are Stack Overflow users completely out of touch with reality and the impending direction of modern technology? It seems like they are, which is really ironic considering it's a technical site used only by software developers who should obviouly know better than that. I guess nobody cares until they're on the receiving end of the stick. Go figure.

Will WEB3 solve this issue... or make it even worse? I believe that Hive is a particularly useful solution to a situation like this. After all, how many crypto social media sites can we actually get paid on? There's been a lot of hype but they've all been failures. Sometimes it feels like we are the only survivors in a river of carnage.

Return from Stack Overflow AI Scraper to edicted's Web3 Blog

edicted's Hive Profile

Stack Overflow AI Scraper was published on 19 May 2024 and last updated on 19 May 2024.