Tech

Adobe’s ‘ethical’ AI Firefly was trained on images mid-journey

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


(Bloomberg) — When Adobe Inc. launched its Firefly imaging software last year, the company said the artificial intelligence model was trained primarily on Adobe Stock, its database of hundreds of millions of licensed images. Firefly, Adobe said, was a “commercially safe” alternative to competitors like Midjourney, which learned by scraping photos from the Internet.

Bloomberg’s Most Read

But behind the scenes, Adobe also relied in part on AI-generated content to train Firefly, including from the same AI rivals. In countless presentations and public posts about how Firefly is safer than the competition because of its training data, Adobe never made it clear that its model actually used images from some of those same competitors.

Massive amounts of data are needed to train AI models underlying popular content creation products, and there is increasing scrutiny on AI technology companies over their use of copyrighted materials in this process. Companies like Midjourney, OpenAI, maker of Dall-E, and Stability AI, maker of Stable Diffusion, have built their media generation models with datasets that pull images from across the internet, a practice that has sparked outrage and lawsuits from by various artists.

“This shows the obscurity of the definition of responsible AI and also illustrates the difficulties of avoiding, if not legal problems, then social and cultural problems, or ethical problems, with the content generated,” said Luke Stark, assistant professor at Western University in Ontario, which studies the social and ethical impacts of AI.

Adobe’s decision to build Firefly with content that the company owns the rights to and that is in the public domain was intended to differentiate its AI imaging tool in the fast-growing market for generative artificial intelligence. The company promoted it as a more ethical and legally correct option for customers interested in creating images with just a few words but wary of potential copyright issues. It will not generate content based on the intellectual property of other people or brands, Adobe said, and will also avoid the production of harmful images.

AI-generated content entered Firefly’s training set because creators were allowed to upload millions of images to Adobe’s stock market that used technology from other companies. “AI generative images from the Adobe Stock collection are a small part of the Firefly training dataset,” Adobe representative Michelle Haarhoff wrote in September in a Discord group for photographers and artists who contribute to the market. .

Adobe said that a relatively small amount – about 5% – of the images used to train its AI tool were generated by other AI platforms. “Every image uploaded to Adobe Stock, including a small subset of AI-generated images, goes through a rigorous moderation process to ensure it does not include IP, trademarks, recognizable characters or logos, or names of reference artists,” said a port. -company voice. he said.

Criticism of the practice has come from within the company: From Firefly’s earliest days, there have been internal disagreements over the ethics and optics of ingesting AI-generated images into the model, according to several employees familiar with its development who asked not to do so. it. be appointed because the discussions were private. Some have suggested removing the system from images generated over time, but one of the people said there are no current plans to do that.

Adobe has attacked competitors over their data collection practices. Other models are built on “open” data, Chief Strategy Officer Scott Belsky said last year. One way Firefly is better than OpenAI’s comparable model is because it shows respect for the creative community by training only on licensed or freely available data, Adobe states on its website. And in a blog post last March titled “Responsible Innovation in the Era of Generative AI,” General Counsel Dana Rao pointed out that generative AI “is only as good as the data it is trained on.”

“Training on diverse, curated datasets inherently gives your model a competitive advantage when it comes to producing commercially safe and ethical results,” he wrote, while also highlighting that Adobe trained Firefly on Adobe stock images , licensed content, and public domain content where the copyright has ended.

“Our enterprise customers came to us when we launched Firefly and said, ‘We love what you’re doing and we really appreciate that you’re not stealing all of our intellectual property on the open Internet,’” said Adobe’s Ashley Still. senior vice president, she said earlier this month during a Bloomberg Intelligence event.

Still, Adobe has never made it clear publicly that Firefly trained in part with images from competing tools that are supposedly less ethical. However, she outlined these details in at least two online discussion groups the company maintains on Discord — one for Adobe Stock and another dedicated to Firefly — according to messages Bloomberg saw.

In March 2023, Adobe revealed Firefly as a “beta” product. That month, Raúl Cerón, who works with the Adobe Stock community, posted on Discord that the company did not plan to use generated images to train the next public version of Firefly.

“Once we are out of beta, we will have a new training database for it, leaving the Gen AI content out,” he wrote in a post in June.

When Adobe announced the public launch of Firefly on September 13, the company also paid a special “Firefly bonus” to Adobe Stock contributors “whose content was used to train the first Firefly commercial model.” Contributors who used generative AI were among those who received the bonus payment, according to a Discord message from Mat Hayward, who also works with the Adobe Stock community.

The AI-generated images in Adobe Stock “improve our dataset training model, and we decided to include this content in the commercial version of Firefly,” Hayward wrote.

Read more: Adobe’s very cautious strategy to inject AI into everything

Brian Penny, a writer and stock image contributor who has uploaded thousands of AI-generated images (most made with Midjourney) to Adobe Stock, was surprised to receive the bonus. He realized that as an AI contributor he would not be eligible. Despite the financial gain, Penny thinks the decision to train Firefly on content like his is a bad one and said the company should be more upfront about how it is training the image creation software.

“They need to be ethical, they need to be more transparent, they need to do more,” he said.

Adobe Stock’s library has grown since it began formally accepting AI content in late 2021. Today, there are about 57 million images, or about 14% of the total, marked as AI-generated images. Artists submitting AI images must specify that the work was created using the technology, although they do not need to say which tool they used. To fuel its AI training suite, Adobe also offered to pay employees to submit a large number of photos for AI training – such as images of bananas or flags.

Training on AI-generated content probably wouldn’t make Adobe’s Firefly image generator any less commercially safe, and the company isn’t required to say what it’s training on as long as it isn’t misleading consumers, said Harvard professor Rebecca Tushnet, which focuses on copyright and advertising law. But training on AI images, like those created by Midjourney, undermines the idea that Firefly is unique from competing services, she said.

“Adobe basically wants to position itself as a superior alternative, but it also wants really cheap inputs, and AI is a great way to get cheap inputs,” she said.

Bloomberg Businessweek Most Read

©2024 Bloomberg LP



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

Don't Miss

Minnesota drops to 19th place in education in state rankings

June 18—ROCHESTER — As Minnesota falls in national education rankings,

Microsoft is making Edge faster… starting with this menu

Browser bloat has always been a problem, but now Microsoft