Hive: Vital To The Democratization Of Data
Sometimes it does not take long for things to start going as we project.
What is Hive's main utility? This is something that many will dispute yet I think it is clear:
Hive's main role is the democratization of data. This is something that a decentralized database that natively stores text data provides.
Why is this important?
There is a war taking place and it is over data. companies, especially start ups, are out there looking for data. They are turning to scraping of sites, something that platform owners are fighting.
This is only going to get worse as time passes.
In this article we will discuss another move along why something like Hive is becoming crucial.
Image generated by Ideogram
The Democratization of Data
Many thought Elon was an idiot for spending $40 billion on Twitter. There were roars of laughter when advertisers left and people put estimates on the value. He lost a cool $20 billion (estimated) in about a year.
What was overlooked was the database he acquired. Since 2006, the company has been saving every tweet that was posted on its site. This has added up to an enormous trove of data.
In a world of LLMs, this is like gold (or, if you prefer, Bitcoin).
Every month, roughly half a billion people keep adding to Musk's treasure chest. They do the same for Zuckerberg (Meta), Google, and Reddit.
Here is where we have to be clear: the data we provide is theirs.
Many like to claim they stole people's data. They did not.
People making this assertion refuse to take responsibility. The truth is individuals opt to give their data away. Individuals were not forced to post on Twitter or Facebook. Instagram photo sharing is a choice. Nobody was pressured into uploading a video on YouTube.
We give the data to these companies willingly.
Of course, now that people are learning about AI and how it comes about, what are they doing? Still feeding the same beast by heading to the same platforms on a daily basis.
When it comes to "data being the new oil", however accurate that assessment might be, it is obvious the public wants to keep providing to Big Tech.
Unfortunately, this is creating a system that is closed off more than what we presently have.
Reddit Fighting AI Crawlers
Reddit is another site that was fed by users over the last 2 decades. Since 2005, people voluntarily keep adding to the database.
This is now being monetized by the company, which only recently went public. There was a deal with Google for $60 million to give them access to the data.
Again, this belongs to the company even though millions of people provided it.
We see Reddit is taking steps that unauthorized players are not pulling from their oil field.
Reddit announced on Tuesday that it’s updating its Robots Exclusion protocol (robots.txt file), which tells automated web bots whether they are permitted to crawl a site.
It is also taking some further measures:
Along with the updated robots.txt file, Reddit will continue rate-limiting and blocking unknown bots and crawlers from accessing its platform. The company told TechCrunch that bots and crawlers will be rate-limited or blocked if they don’t abide by Reddit’s Public Content policy and don’t have an agreement with the platform.
This is the future.
Only those who can pay are gong to be able to access the data.
Consider the business model employed. These platforms gathered the data over the decades, which was provided by millions of users. As technology advanced, the value of this data kept reproducing, at it become more than just a tool to target advertise. Suddenly, companies needed it.
Now it is up for sale (rental) to feed into the LLM training.
Of course, to a company like Google, this is a nothing hurdle. To that entity, $60 million is a rounding error. The same is true for many of the other major players.
Where this is a problem is with start ups. What about those companies that have the ability to train these models, perhaps using a different approach, yet lack the access to data?
Basically, they are screwed.
The Freeing of Information
The Internet was a massive step towards freedom.
Pre-Internet, we lived in a time where there were purveyors of information. Companies were actually the ones who doled it out. Examples of this are news, encyclopedias, and road maps. Entities actually published the information that people had to pay for.
It all changed with the Internet. No longer did you require the newspaper to tell you what was happening. People were posting news all over the place.
This went on for a while until we realized there was a new sheriff in town. We went from one set of corporations to Big Tech.
The aforementioned companies along with the likes of Amazon, Spotify, and PayPal took over.
When it comes to information, i.e. data, we see the same situation, This is being sold, just not to the general public. Actually, it is the same public that is providing it and companies are selling it to AI firms.
Can anyone see how catastrophic this could be?
Hive Provides An Answer
With Hive, we have a decentralized blockchain that is a text database. Anything can be posted and stored on the servers. Unlike Web 2.0 platforms, the servers are not controlled by any single entity. Also, the data is available for anyone to utilize.
This is what is meant by the "democratization of data".
A start up is free to set up an API and engage with the data however desired. It all can be scraped and used by for any purpose. Nobody owns the data, ergo cannot prohibit the use of it.
Over the next couple years, this is going to take on added importance.
Presently, data is just one barrier to entry. The biggest obstacle is the fact that the amount of compute required to train something as Llama3 is enormous. We see the orders that X.ai is placing for NVIDIA H200. The amount of money quickly runs into the billions.
That said, many predict that something like Llama3 will cost around $10K in a couple years. Consider the impact of a start up constructing a LLM of this nature for that type of money. Suddenly, a company that raises a few million can be in the game.
At least that is the case from the processing standpoint. But what about data?
Here is where we circle back to the democratization of data. If the data that is on the Internet is locked down by the different entities, we are looking at a situation where these start ups are dead on arrival. Even with the compute, if there is nothing to feed, it goes hungry.
This is a very important point to consider.
Posted Using InLeo Alpha
I absolutely love the idea of Hive democratizing data brother. It feels empowering to think that startups can access and use data freely without the big players gatekeeping everything like they always do. It's a step towards true information freedom and I'm proud to be part of it in its early stages
If the data that is on the Internet is locked down by the different entities, we are looking at a situation where these start ups are dead on arrival. I am just imagining how this big centralized techs would look like in the future with such data. They would become trillion dollar companies leaving their users broke.
I agree. Data should be available to all, especially if they are open to public viewing, and the data came from multiple people. Companies like Reddit and Twitter that profit from people's data without any type of compensation is wild. At least with Hive, posts and comments can earn something for the individual.
STOP THE ABUSE ON HIVE
FAKE HIVE POLICE
HIVE IS DEAD
HIVE IS ONE BIG FARM
!LOL !WEED !MEME !GIF
https://hive.blog/hive-104387/@bpcvoter3/se19g2
Downvoted BY HIVE FARMERS
6 days ago in #life by slobberchops (81)$0.00
Reply 3
Sort: Trending
[-]bpcvoter2 (-5)(1) 4 days ago · Will be hidden due to low rating
NO DOWNVOTES FOR MARKY MARK'S SELF VOTING !WEED
https://hive.blog/life/@slobberchops/re-gogreenbuddy-sdr1wb
https://www.publish0x.com/the-dark-side-of-hive
https://www.reddit.com/r/stoptheabuseonhive/
https://peakd.com/hive-164833/@bilpcoinbpc/you-have-all-witnessed-this-account-being-downvoted-to-a-negative-reputation-for-no-reason
SHAME ON YOU ALL SAD PEOPLE WHO SCAM LIE AND FARM HIVE AND HIVERS
[-]acidyo (82) 3 months ago
Haha at "I shouldn't make commercial" :D Maybe something to worry about when you have more youtube subscribers! Quite interesting content though, these things should get more subs on there and here.
$4.28
1 vote
Reply
[-]gogreenbuddy (67) 3 months ago
Yeah thought that was funny too
$4.54
4 votes
gogreenbuddy: $4.47
acidyo: $0.06
holbein81: $0.01
koleso: -$0.00
https://hive.blog/life/@gogreenbuddy/i-had-to-give-up-on-my-gabion-fencing-too-much-work
@themarkymark @buildawhale @upmyvote @ipromote @gogreenbuddy @usainvote @punkteam @makerhacks @apeminingclub @leovoter @blockheadgamds @rollingbones farm @cwow2 @solominer @steevc @adm @abit @theycallmedan @dalz @abh12345
HE MAKES CURATION REWARDS WITH ALL HIS ALT ACCOUNTS BURNING TOKENS LIKE THIS DOES NOT TAKE OLD TOKENS IT MINTS NEW ONES AND BURNS THEM
https://hive.blog/hive-167922/@bilpcoinbpc/the-power
https://www.bilpcoin.com/hive-167922/@bilpcoinbpc/the-power
https://www.bilpcoin.com/hive-167922/@bpcvoter1/response-4-to-themarkymark-marcus-buildawhale-usainvote-ipomote-leovoter-gogreenbuddy-and-over-100-of-his-alt-accounts
Reply 2 to @eddiespino
Reply to @eddiespino
cwow stop the drink and drugs
Stop supporting the fake Hive police who do nothing but exploit Hive
Many people on Hive will never get to spend any of their Hive as it may become illegal in certain countries
This post has been manually curated by @bala41288 from Indiaunited community. Join us on our Discord Server.
Do you know that you can earn a passive income by delegating your Leo power to @india-leo account? We share 100 % of the curation rewards with the delegators.
100% of the rewards from this comment goes to the curator for their manual curation efforts. Please encourage the curator @bala41288 by upvoting this comment and support the community by voting the posts made by @indiaunited.
Keep up the good work. 👏
Recognized by Mystic artist Gudasol
You are loved.
Interested to to help music map cXc.world spread more good vibes on Hive?.