Tech

Is AI about to run out of data? The history of oil says no

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


IIs the AI ​​bubble about to burst? Every day that the share prices of semiconductor champion Nvidia and the so-called “Fab Five” technology giants (Microsoft, Apple, Alphabet, Amazon and Meta) fail to recover their mid-year highs, more people do this question.

It wouldn’t be the first time in financial history that enthusiasm around a new technology led investors to boost the value of companies selling it to unsustainable levels — and then pull back. Political uncertainty surrounding the US election is increasing the likelihood of a selloff, as Donald Trump expresses his lingering resentments against Big Tech and his ambivalence toward Taiwan, where key semiconductors are mainly manufactured. artificial intelligence.

The deeper question is whether AI can deliver the surprising long-term value that the Internet has. If you invested in Amazon at the end of 1999, it would have fallen more than 90% in early 2001. But today it would have risen more than 4,000%.

A chorus of skeptics now loud claims that AI progress is about to to knock The brick wall. Models like GPT-4 and Gemini have already absorbed most of the Internet’s data for training, the story goes, and won’t have the data they need to become much smarter.

See more information: 4 Charts That Show Why AI Progress Is Unlikely to Slow Down

However, history gives us a strong reason to doubt the doubters. In fact, we think they’re likely to end up in the same unfortunate place as those who in 2001 cast aspersions on the future of Jeff Bezos’s disorganized online bookstore.

The generative AI revolution has breathed new life into the TED-ready aphorism: “data is the new oil.” But when LinkedIn influencers bring up that 2006 quote from British businessman Clive Humby, most of them don’t get it. Data is like oil, but not just in the easy sense that each is the essential resource that defines a technological era. As futurist Ray Kurzweil notes, the key is that both data and oil vary greatly in the difficulty—and therefore the cost—of extracting and refining them.

Some of the oil is light crude oil just below the ground, which gushes out if you dig a deep enough hole in the earth. Other oil is trapped far beneath the earth or trapped in shale sedimentary rocks and requires deep drilling and elaborate hydraulic fracturing or high-temperature pyrolysis to be usable. When oil prices were low before the 1973 embargo, only the cheapest sources were economically viable to exploit. But during periods of rising prices over the following decades, producers were encouraged to use increasingly expensive means to unlock new reserves.

The same dynamic applies to data – which, after all, is the plural of the Latin word given away. Some data exists in organized datasets – labeled, annotated, verified and free to download in a common file format. But most of the data is buried deeper. The data may be on poorly digitized handwritten pages; may consist of terabytes of raw video or audio, with no labels about relevant resources; they can be rife with inaccuracies and measurement errors or distorted by human biases. And most of the data is not on the public internet.

See more information: The Billion Dollar Price of Building AI

An esteemed 96% for 99.8% of all online data is inaccessible to search engines — for example, paywalled media, password-protected corporate databases, legal documents and medical records, and an exponentially growing volume of private cloud storage. Furthermore, the vast majority of printed material has still never been digitized – around 90% for high-value collections such as the Smithsonian It is UK National Archivesand probably a much larger proportion in all the world’s archives.

However, by far the biggest unexplored category is the information that is not currently captured, from the hand movements of surgeons in the operating room to the subtle expressions of actors on a Broadway stage.

During the first decade after massive amounts of data became the key to training next-generation AI, commercial applications were very limited. Therefore, it made sense for technology companies to collect only the cheapest data sources. But the launch of Open AI’s ChatGPT in 2022 changed everything. Now, tech titans around the world are engaged in a frantic race to turn theoretical AI advances into consumer products worth billions. Many millions of users now pay around $20 a month for access to premium AI models produced by Google, OpenAI and Anthropic. But this is a pittance compared to the economic value that will be unlocked by future models capable of reliably performing professional tasks such as legal writing, computer programming, medical diagnosis, financial analysis and scientific research.

Skeptics are right that the industry is about to run out of cheap data. However, as smarter models enable wider adoption of AI for profitable use cases, powerful incentives will drive drilling into increasingly expensive data sources – whose proven reserves are orders of magnitude greater than those have been used until now. This is already catalyzing a new sector of training data, as companies like Scale AI, Sama and Labelbox specialize in the digital refinement needed to make less accessible data usable.

See more information: OpenAI used Kenyan workers earning less than $2 an hour to make ChatGPT less toxic

This is also an opportunity for data owners. Many companies and nonprofits have mountains of proprietary data that are gathering dust today but could be used to drive the next generation of AI innovations. OpenAI now spent hundreds of millions of dollars licensing training data, signing blockbuster deals with Shutter and the Associated Press to access your files. Just as there was speculation over mineral rights during previous oil booms, we may soon see a rise in the number of data brokers finding and licensing data in hopes of profiting when AI companies recover.

Like the geopolitical race for oil, competition for high-quality data could also affect superpower politics. Countries’ national privacy laws affect the availability of new training data for their technology ecosystems. The European Union’s 2016 General Data Protection Regulation leaves Europe’s nascent AI sector on an uphill climb to international competitiveness, while China’s expansive surveillance state allows Chinese companies to access larger, richer data sets than ever before. than those that can be explored in America. Given the military and economic imperatives to stay ahead of Chinese AI labs, Western companies may be forced to look abroad for data sources not available at home.

However, just as alternative energy is rapidly eroding the dominance of fossil fuels, new AI development techniques could reduce industry’s dependence on massive amounts of data. Premier Labs is now working to perfect techniques known as “synthetic data” generation and “autoplay,” which allow AI to create its own training data. And although AI models currently learn several orders of magnitude less efficiently than humans, as models develop more advanced reasoning, they will likely be able to improve their capabilities with much less data.

There are legitimate questions about how long AI’s recent intense progress can be sustained. Despite the enormous long-term potential, the short-term market bubble will likely burst before AI is smart enough to live up to the intense hype. But just as generations of “peak oil” predictions have been thwarted by new extraction methods, we shouldn’t bet on AI failing due to data scarcity.



This story originally appeared on Time.com read the full story

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 9,595

Don't Miss