Reddit Showing How Much Data Is Worth
Most of us are aware that the world is seeking data. This is something that is mentioned each time something like ChatGPT is talking about. In fact, we are facing a period where we are looking at a data shortage.
How can that be if we are producing an ever increasing record amount of data?
This is one of those situations where supply simply is outpaced by demand. The need for data to train artificial intelligence system is growing at an incredible pace. This is causing companies to look for more.
We are starting to get some idea how valuable it is, in monetary terms.
Reddit Agrees To $60 Million Per Year Deal
Reddit is not the largest social media companies. Actually, when we mention social media platforms, rarely does its name come up. Instead, we talk about Facebook, Instagram, X, YouTube, and TikTok. Heck, even Snap is talked about more.
That said, Reddit is a legitimate platform with an estimated 50 million daily users making it the 10th most popular social media network. It was starting in 2005, putting it at around 20 years old. Source
In this era, this makes the database the company is housing very valuable. How much is it worth?
Evidently, to train a simply AI system, it is worth $60 million per year.
Of course, here is the kicker: this does not have to be a one time deal. It is something that can be repeated.
The company behind Reddit still owns the data. It is providing access to this other entity yet still retaining the ownership. Unless there was something in the contract where Reddit agreed not to let anyone else use the data, it can simply duplicate this agreement elsewhere.
What this means is databases have enormous value.
Of course, this should come as no surprise.
Web 3.0 Needs To Learn This Lesson
Each day, we see billions flocking to Web 2.0 platforms to enrich those owners. Web 3.0 promises a new structure in this regard. Yet, still, we see it lacking.
There are many reasons for this. However, as I stated in the past, there is no way Web 3.0 is going to succeed if it depends upon Web 2.0 data. Another way of phrasing it is we have to stop feeding those entities.
Many within cryptocurrency understand that each time they remove a dollar from the banking system, and move it to crypto, there is a small shift. With each move of this, we see the crypto world getting stronger.
How come, then, people do not see this with data? It is really simple. The values are starting to emerge and yet we have people residing on Web 2.0.
By now, many are aware the NY Times sued OpenAI over the unauthorized use of its data. We will see if the Times can win this one but the message is clear: data can be monetized is ways not seen before.
What Is In The Database?
There was a commercial for a credit card some time back that asked "what is in your wallet". We can adapt this to Web 3.0 by asking "what is in the database"? Essentially, what does Web 3.0 have that is worth anything?
In my mind, this is something that is massively overlooked. Distributed ledger technology basically generates databases that look like banks. Notice how we do not see technology companies seeking to use the data from Bank of America or JPMorgan. Financial data doesn't have much value outside a short window. It certainly isn't very useful for machine learning engines.
Social media platforms, on the other hand, seem ideal. Here is where Google, X, and Facebook (Meta) are excelling. They have a growing amount of their own data available to them. Each day, users simply add more to the database.
Of course, much of what is in there is nonsense. That said, there are two pathways they follow.
The first is how language is used. For this, most of the data is applicable since it does show how people interact and the systems can learn some of the nuances of language.
Our second factor is information. If we want to ask a chatbot the years Winston Churchill lived, that data has to be available. Obviously, scraping the Internet would do this. However, that is getting harder as companies, including Reddit, secure their APIs.
A well developed database is a gold mine. Here is a basic level of where Web 3.0 can develop value in a short period of time. Unfortunately, it seems most do not focus upon this.
Reddit is now cashing it. Over the next couple years, if the platform remains strong, it will have more value as the database gets larger.
It is unlikely the need for data will disappear.
This is something Web 3.0 has to consider.
Posted Using InLeo Alpha
As a doctor “chatgpt” has been very help on my research and studies this year.
I believe it’s the future and it’s here to elevate minds.
Based on the fact that we still enrich web2 and that’s leading to the breakdown of web3.
I think It’s inevitable because web2 still serves its purpose very fine.
There is little doubt that, as these become more specialized, they will be major booms to certain industries. We will see a lot of them start to be catered to specific jobs or fields.
Reddit has a lot of active users even though it is unknown to many people. Honestly many of us don't know about this where Facebook Twitter and other platforms are popular with them. Thanks for the nice information.
Reddit has a strong following. Being in the top 10 largest social media apps is still a huge success. And it is around for a long period of time, meaning the data accumulated is enormous.
This is something we have to consider on Web 3.0.
That figure is actually huge though for a year. Well let's actually see how it eventually plays out. I believe Reddit will grow to become something more greater as year goes by
If you are training an AI to be a troll, I think this is an ideal arrangement. Reddit is, if I may borrow from Obi-Wan Kenobi, "a wretched hive of scum and villainy". I think they will learn about garbage in, garbage out.
Haha I have to agree with you on that. I actually found my way to Hive because of the growing toxicity of that platform. If they plan to use the content on it to train AI I can't help but remember Microsoft's Tay Chatbot which went live on Twitter for a few hours before Microsoft had to take it down for making racist comments and promoting misinformation.
That isnt exactly how it works in my understanding. These LLMs learn the nuances of language so I dont think your point holds.
Understanding technology means tossing away the biases and grasping what is taking place in spite of personal opinions about certain platforms or individiuals.
I think that bolsters the idea. So much of the awfulness of Reddit does use subtle phrases and puns in the comments. They can be quite clever at times. Maybe they're looking to outsnark Grok.
bluh bluh bluh. would u pay 60 mln per year for 'their chat data' ? i wouldnt
Then you would not have a LLM and likely, if you were a tech company, be out of business in a few years.
are u plannin on being a tech company? i care more about being a person than companies and their bussiness of invading my private space
Indeed data is life ! So therefore, as we navigate the evolving digital landscape, it's imperative to recognize and leverage the transformative power of data effectively.
I have to say out of all the social apps Reddit is actully one of the best. Sure it takes more time but that more time involved means better information, cleaner information and a rather solid platform. AI would most likely dig up a treasure trove of data from it.
The quality of the application means little. It is the amount and quality of the data.
Reddit has "conversations" which is helpful in training the LLMs and teaching it about language. That is the struggle they all have.
Don’t forget it was Reddit users that broke the short squeeze on GameStop. This bought Wall Street down a peg or two.
I fail to see the long term implications of that. It was a nice story but changed nothing.
Well, the implication is that the greedy banks have to have one eye in the rear view mirror before they do anything. Slows them down a bit even when there is nothing there. They don’t want to get burned again.
Not in the least and it wasnt banks affected. It was hedge funds that were affected.
And the one bankrolling it, Point72, paid its owner $1.8 billion last year.
So I disagree. What you state really has no basis in reality.
Ftfy.
Public databases provide what you put in them for all to see for free.
Content on hive's value is in how much attention it brings.
We can't stop people from scraping public posts.
Yep. That is very true and the game changer. Sure the large entities could scrape it too but it does provide access to smaller ones.
And if we start putting more into public databases, it will not be only relegated to the major companies that can pay for access.
Control freaks gotta control.™
Yep. And that nets them money which ends up giving them more power.
Until we turn our backs on them and begin to cooperate rather than fight like dogs.
We have to have workers, we don't have to have dollars.™
https://theanarchistlibrary.org/library/petr-kropotkin-the-conquest-of-bread
IF the accounting department fell off into a lake the line workers would never know the difference until they went to the store and everything was free.
All this requires is that folks be willing to continue doing the work they are already doing but for a fraction of its true value in dollars.
Adults can solve this locally.
We don't have to have banksters in bunkers telling us how things are gonna be.
This is the first time I'm hearing of this, and was thinking why they need to pay Reddit when the info is publicly available. After reading, I understand it more now. Things like the NYT can happen. But I was wondering about securing the APIs option. Is that possible with Hive? The blockchain is publicly available, but can we do the same thing?
Anyone can set up an API so no that wont work.
And why would you want to limit anyone using the data on a public network? That is the point, it is open to anyone.
I guess to prevent OpenAI or other AI tools to just easily use the data that we have.
Am happy web3 is here over the year on web2 the rich as been getting richer with our data.
Of course, the data Reddit is selling is user information. Once again, Web2 proves that if the product is free, then the users are the product. Will Reddit share any of that $60 million with its users? Probably not.
This is a false narrative that people like to believe it. It is not user information, it is the company information.
It is that simple. Users created it, by choice. Nobody forced people onto Reddit and post for years on end. But people do.
As long as people play victim and claim these tech companies are unfair because they stole data that is on their network, those individuals are screwed.
Even today, with it known what they companies are doing, people flock to these application. Look at all the people "on Hive" who spend their days on Twitter and Reddit feeding those databases.
It is true, they're doing it by choice. I didn't say they're being forced to do it. But people don't see that information as valuable. They've been trained to give it up for free, then the company monetizes it. Many of the users don't even know it's happening, sadly. Or don't care.
I would say for years, it was the first part. Now it is the second. There is plenty of info out there about it but people run to X and Facebook. Even those on Hive do so.
Yeah, I agree. I think people see a net benefit to an extended reach with their social media posts, even if they have to give up some of their personal data. The question remains, will the scales ever tip the other direction?
I believe it can. Network effects work in reverse. As more activity is taken away, it has a greater than 1 impact.
But you need people to start pulling away, especially who are who essentially hubs.
True. But I think the time it will take will be longer than it took to move from Web1 to Web2, which was essentially 8-14 years depending on when you pinpoint the beginning of Web2. I would say the transition was culminated with the advent of YouTube in 2005.
If we begin counting when the bitcoin whitepaper was published and count 21 years from there, we're looking at 2028 at the earliest but most likely sometime around 2032-35 before we start to see a mass migration over to Web3. The beginning of the early mass adoption phase.
The other likely date is the first use of Web3, in 2014. That's about the time there was any real consensus developing in the crypto community that a new layer of the web was on the rise. Counting from there, it could be 2034 at the earliest before the early mass adopters start to make the move.
To me, I would say the start date for Web3 was smart contract technology. That was the real game changer (after decentralized consensus from Bitcoin).
Hard to create much with those contracts.
We're pretty close to the same time period. Smart contracts were created in the 1990s, but they didn't start becoming useful until Ethereum launched in 2015. So, that could be the date when we start counting. Still looking at 2030-2035 time frame for early mass adopters to begin to migrate.
That would make sense based upon history.
One question is whether acceleration could accelerate. The general public is more technologically advanced as compared to 20-25 years ago. While human tendency is slower than technology, it might be closing somewhere.
While it's true that people are more technologically advanced, there is much more resistance to Web3 than there was to Web2. In fact, there was hardly any resistance at all to Web2. Everyone knew that user-generated content was a step up. But Web2 brought us massive siloes and those who support the siloed web system are going to resist anything that threatens to tear down those siloes. That includes the silo owners, shareholders, and end users who are making money from those siloes. That's a lot of people who were very pro Web2 in 2004 and 2005 when Facebook and YouTube came online. In general, people will always resist any threat to their entrenched power structures.
Yea I guess I have heard about Reddit before. But didn't know much about it though