LeoAI: Basic Unit - Tokens

about 22 hours ago

The basic unit of LeoAI, at least from an input perspective, is tokens.

This is something that we are accustomed to due to crytpocurrency. However, this is an offshoot of what tokens traditional are.

When it comes to computer science, a token is a unit of data. In other words, it represents something else. It could be a word, character, or symbol.

We often discuss data. When it comes to AI models, the key is to accumulate as much data as possible. This is the starting point. From there, it needs to be structured and then used for training of models.

A number of ways exist to go about this with different methods being utilized by developers. Whatever the approach, it always follows a similar path.

LeoAI is no exception. For this reason, we have to stress the basic unti and keep harping upon it.

Source

LeoAI: Basic Unit - Tokens

By now, most of us are familiar with the saying "garbage in, garbage out." This dates back to the early days of data science.

Today, it is no longer as accurate. Naturally, developers prefer to have better datasets. That said, the quest for volume is overtaking that.

For this reason, methods were developed that allow the taking of unstructured (garbage) data and making it useful for AI training. Hence, the focus on quantity wins out.

This is important for LeoAI. By talking in tokens, we can quantify what we are dealing with.

A model like Llama3 was trained on 16 trillion tokens. That is a great deal of data. Of course, not many have the compute to process that much data, outside a couple companies.

For the rest, including LeoAI, the numbers are much smaller. In spite of that, it is still crucial to keep producting as many tokens as possible. When democratizing data, we was the database to be as large as possible.

Of course, other databases, i.e. permissionless blockchains, need to do the same. This is not the sole responsibility for one chain.

Improving LeoAI

Social media platforms have a major advantage in the AI game. They are continually fed new data though a streaming of posting throughout the day. This new data can be added to a vector database, which is tied to the model being utilized.

LeoAI is constructing things in this manner. For this reason, it is important to create as many tokens as we can. Another way of phrasing it, we need to fill the Hive database in a big way.

Trillions is a big number. That means we need to do a lot to get to billions. As stated, these systems require more wtih each level of training.

Ultimately, the more data that is fed in, the better the results will be. People often attack AI models for the responses, and rightly so. However, with something like LeoAI, we have the ability to train it on what we desire.

That means we are responsible for the results we receive. If we fail to fill the database, the information that comes out will be lacking.

Here is where humans have control over the direction AI systems take. The major of the training is still dependent upon human data. This is something that will continue for some time.

With LeoAI, it is even more so since the model will be much smaller than what Meta or Google create.

Posted Using INLEO

hive-167922 leoai data slm llm model mancave neoxian proofofbrain

0.000

3 comments

@guruvaj 67

about 20 hours ago

I see, so this statement is for the AI that needs to learn.

Don’t mess with humans. We are fragile.

😂🤣

0.000

@typebox 65

about 14 hours ago

The emphasis on data quantity and quality for improving LeoAI really highlights our role in shaping AI outcomes. It reminds us that our contributions are crucial in driving meaningful advancements in AI, especially for smaller models. Let's harness that responsibility to create a more informed and responsive system!

0.000

@taskmaster4450le 81

about 5 hours ago

If you want it to come out of the database, then it is important to get it in there.

0.000