The Data-Oil Analogy: What It Means For Hive

"Data is the new oil".

This is something we all heard. But what does it mean? How is this applicable in the world of AI? More importantly, what does it mean for blockchain?

Due to recent discussioins, some are trying to draw analogies. What does it mean to add data to a public database? What, exactly, are we doing?

These are some of the questions we will answer in this article.


Image generated by Grok

The Data-Oil Analogy

There was a time, not long ago, where data is the new oil was incomplete. While it was accurate in a general sense, it was lacking in utility.

For data tobe of use, it needs structure. Random bits of information are meaningless. When looking at this from a business context, going to a company with massive amounts of data is useless. It requires structuriing in a way whereby the entity can apply it to the ongoing operations.

Hence, most of the data that was available was worthless. It was akin to a dry well.

This all changed with the emergence of generative AI along with the combining of vector databases. This union brought the data-oil analogy to the forefront.

When training a model, large amounts of data are run through a neural network. Through the training and post-training process, a model emerges that people can utilize. At that point in time, the data is as current to the point it was amassed.

To provide up-to-date information, a vector databse is added. The best way to think about this is additional files the model can access. Before providing a response, the model will check the vector database to get more information. This helps to reduce hallucinations while also providing current answers.

It is also why the major social media companies, Meta, X, and Google, have an advantage. Their models can provide up-to-the-moment answers due to the fact that people are adding data every second of the day. They are not dependent upon the next iteration of the model being trained on the latest data.

When using a Grok or a Llama, we can ask it about the score of a game last night. Even if the model was trained months ago, that data is available.

Of course, this is all under the control of the individual companies.

Oil Miners?

If we are going to look at the analogy, what are people doing when they are adding to a database such as Hive? Are people miners in this instance?

The answer is possibly. Data generation and extraction are two different phases.

If we are looking at oil, drillers are the ones who extract the oil from the ground. Adding data to a public blockchain is not doing this. The "drillers" in this instance are the application developers who utilize thei data. They are pulling the resource from which they
"acquire" the rights. Naturally, since this is democratized data, anyone has that right.

Those who generate the data, i.e. social media users, can be thought of as nature. They are the ones filling the oil pit.

Go back to the process above. The vector databases being run by Big Tech are enhanced each time someone adds a video, post, ot image. The users are the ones providing more oil to these companies, which are the "drillers' since they extract it.

Actually, we go even one step further.

When we utilize a chatbot, we are increasing the "oil well" those companies have. Each time we prompt something, we get a response. Where is that data located? Naturally, it resides on the company's servers, which it then feeds into the next iternation of training.

OpenAi stuggles to gain relevant data due to the fact it lacks the social media component. However, it is one of the most utilized chatbots, having prompts that are in the billions. Each response is "oil" for them to use.

Hive Database

Hive offers something radically different.

With this blockchain, we are dealiing with a public network that anyone can write to. That means data addition only requires minimal access. This comes from having the base coin staked, alleviating transaction fees.

Here we have a major component in public database creation. Many of the early blockchain networks were focused upon finance. Because of this, transsaction fees were simply built in. This was unnoticed due to the fact that fees of this sort are commonplace within finance.

That is not the case with social media. Nor it is that way with data storage. Most entities do not incur a transaction fees each time one of their employees hits "save". There is, however, a cost. Storage is not free.

For this reason, the design of Hive is use staking for access. Using a rechargable token system, usage is quantified. The ability to write (store data) is contingent upon the amount of coins staked.

There is another crucial feature: the network will store any text, not just financial transactions. Full length articles can be stored, something that is vital to the aforementioned vector database. The relationships in these databases are not based upon keywork association. Entire documents are indexed and embedded in such a way that they are corrlated to others pieces of data.

This also means the idea of quality data is out the window. This is a concept that many still try to apply. Vector database excel at taking nonsensical garbage (which most of social media is) and making it applicable. Something as simple as the announcement of the death of a person is correlated through the indexing and embedding system.

Major Oil Find

Anything related to the oil business is based upon oil finds. Where are the fields and how big are they?

Of course, this is vital for nations. Those with large oil deposits are in strong position geopolitically. Their economies tend to have a baseline since oil has value around the world. Even if the country is not drilling itself, the rights fees paid by oil companies are substantial.

The wildcatters are interested in finding the oil. Naturally, they want to find as big a well as possible. It is not cost effective to find a well and only get 3 or 4 days output. Thus, the more oil in a well, the more valuable it is.

This is no different. According to many, we are quickly moving towards a time when data, at least online, is going to run out. There is a lot of data that is untapped yet it is not open. Companies have their data behind their private walls, not sharing it with others.

X is no longer allowing anyone to scrape the entire database. Neither is Reddit. The same is true for many other websites. They are locking the data down.

Hive has the solution. Due to the fact that it is a public database, anyone can set up an API and pull the data. This means, to increase the value simply means having more data available.

Here is where users are like nature: they are simply increasing the size of the "oil well".

As that grows, others can follow up and look to "drill" for the oil. This can be done repeatedly, by anyone interested. They will set up their own vector databases to improve the "quality" of the oil. In other words, they refine it as they see fit.

When it comes to the democratization of data, Hive is certainly one network that can contribute. There is already almost 9 years worth of data, and it is growing each day. Like any other user contributed platform, each day more is added, increasing the amount of data available.


What Is Hive

Posted Using InLeo Alpha



0
0
0.000
10 comments
avatar

This is something everyone should know about, maybe not very deeply, but we all need to be aware of how the world we're living on, and interacting with, works.
So thanks a lot for everyday sharing this knowledge.

0
0
0.000
avatar

"Knowledge is power" This adage is immortal. Now, information are readily available. Gone are the days that one has to search and look around for information. Data are at our fingertips and AI exponentially expedited this process. !BBH !LOLZ !PIZZA

0
0
0.000
avatar

Generative AI is the output of cognative power. That is actually what is being created.

Consider the implications of that.

0
0
0.000
avatar

Thinking of this, I have deduced that this piece is highly constructively intellectual. You have done but revealed a most useful secret. Cheers !

0
0
0.000
avatar

data is the new oil has been over marketed though. I worked for a social enterprise that was focused on IT many years ago and we had so much data.

Getting people to pay for it is another thing though, you often find all the money in data at the start was for politicial parties (ID Voters, Marketing companies and ad revenue). Ad revenue remains the core to this data.

Government agencies tend to shy away from data. Because data can show failures and increase costs.

0
0
0.000
avatar

It was overmarketed due to the fact that people were running around providing unstructured data. With advancements, we see the ability for databases to handle this.

The companies that succeed in the future will be the ones with data access. There will be the haves and have nots.

0
0
0.000
avatar

I still see a lot of unstructured data which leads to people saying "Data can be misleading". Data can not be misleading, people are misleading and misusing Data.

We need accountability and transparency when it comes to data.

0
0
0.000