Tech

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


Polls have been used to gain insights into populations, products, and public opinion since time immemorial. And while methodologies may have changed over the millennia, one thing has remained constant: the need for people, lots of people.

But what if you can’t find enough people to build a large enough sample group to generate meaningful results? Or, what if you could find enough people, but budget constraints limit the amount of people you can hire and interview?

This is where Fairgen want to help. The Israeli startup today launches a platform that uses “statistical AI” to generate synthetic data that it says is as good as the real thing. The company is also announcing a new $5.5 million fundraising from Maverick Ventures Israel, The Creator Fund, Tal Ventures, Ignia and a handful of angel investors, bringing its total money raised since inception to $8 million .

‘False data’

The data can be the lifeblood of AI, but it has also been the basis of market research forever. So when the two worlds collide, as they do in Fairgen’s world, the need for quality data becomes a little more pronounced.

Founded in Tel Aviv, Israel in 2021, Fairgen was previously focused on combating bias in AI. But at the end of 2022, the company switched to a new product, Fairboostwhich is now leaving beta.

Fairboost promises to “boost” a smaller data set by up to three times, enabling more granular insights into niches that would otherwise be very difficult or expensive to reach. Using this, companies can train a deep machine learning model for each dataset they upload to the Fairgen platform, with AI learning statistical patterns across the different segments of the search.

The concept of “synthetic data” – data created artificially rather than from real-world events – is not new. Its roots go back to the early days of computing, when it was used to test software and algorithms and simulate processes. But synthetic data, as we understand it today, has taken on a life of its own, especially with the advent of machine learning, where it is increasingly used to train models. We can solve both data scarcity problems and data privacy concerns by using artificially generated data that does not contain sensitive information.

Fairgen is the latest startup to test synthetic data and has market research as its main target. It’s important to note that Fairgen doesn’t produce data out of thin air, nor does it throw millions of historical surveys into an AI-powered cauldron – market researchers need to conduct a survey for a small sample of their target market, and from that, Fairgen establishes standards for expanding the sample. The company claims it can guarantee at least a two-fold increase over the original sample, but on average it can achieve a three-fold increase.

This way, Fairgen will be able to establish that someone of a certain age group and/or income level is more inclined to answer a question in a certain way. Or combine any number of data points to extrapolate from the original data set. It’s basically about generating what the co-founder and CEO of Fairgen Samuel Cohen says they are “stronger and more robust data segments, with a smaller margin of error”.

“The main finding was that people are becoming increasingly diverse – brands need to adapt to this and understand their customer segments,” Cohen explained to TechCrunch. “The segments are very different – ​​Generation Z thinks differently than older people. And to be able to have this market understanding at the segment level costs a lot of money, requires a lot of time and operational resources. And that’s when I realized the problem was. We knew synthetic data had a role to play in this.”

An obvious criticism – which the company admits it has faced – is that this all sounds like a huge shortcut to having to go into the field, interview real people and collect real opinions.

Surely any underrepresented group should be concerned that their real voices are being replaced by, well, fake voices?

“Every customer we talk to in search has huge blind spots – completely hard-to-reach audiences,” said Fairgen’s head of growth, Fernando Zatz, told TechCrunch. “In fact, they don’t sell projects because there aren’t enough people available, especially in an increasingly diverse world where there is a lot of market segmentation. in projects because they cannot reach their quotas. They have a minimum number. [of respondents]and if they don’t reach that number, they don’t sell the insights.”

Fairgen is not the only company applying generative AI to the field of market research. Qualtrics said last year it was investing $500 million more than four years to bring generative AI to its platform, albeit with a substantive focus in qualitative research. However, it is further proof that synthetic data is here and here to stay.

But validating the results will play an important role in convincing people that this is the real deal and not a cost-cutting measure that will produce suboptimal results. Fairgen does this by comparing a “real” upsampling with a “synthetic” upsampling – he takes a small sample from the data set, extrapolates it, and puts it side by side with reality.

“With every client we sign up, we do exactly the same type of testing,” Cohen said.

Statistically speaking

Cohen has a master’s degree in statistical sciences from the University of Oxford and a PhD in machine learning from UCL London, part of which involved a nine-month stint as a research scientist at Meta.

One of the company’s co-founders is president Benny Schnaiderwhich previously operated in the enterprise software space, with four exits to its name: Ravello to Oracle for US$500 million in 2016; left Qumranet for Red Hat for US$107 million in 2008; P-Cube for Cisco for US$200 million in 2004; and Pentacom for Cisco for US$118 in 2000.

And then there is Emmanuel Candesprofessor of statistics and electrical engineering at Stanford University, who serves as Fairgen’s chief scientific advisor.

This business and mathematical backbone is a major selling point for a company trying to convince the world that fake data can be as good as real data if applied correctly. This is also how they can clearly explain the limits and limitations of their technology – how big samples need to be to achieve optimal magnifications.

According to Cohen, ideally they need at least 300 real respondents for a survey, and from that, Fairboost can increase the size of a segment that makes up no more than 15% of the broader survey.

“Below 15%, we can guarantee an average increase of 3x after validating it with hundreds of parallel tests,” said Cohen. “Statistically, the gains are less dramatic above 15%. The data already shows good confidence levels, and our synthetic respondents can only potentially match it or bring a marginal increase. In business terms, there are also no problems above 15 % – brands can already learn from these groups; they are only stuck at the niche level.”

The non-LLM factor

It is important to note that Fairgen does not use large language models (LLMs) and its platform does not generate responses in “plain English” à la ChatGPT. The reason for this is that an LLM will utilize learnings from a myriad of other data sources outside the parameters of the study, which increases the chances of introducing biases incompatible with quantitative research.

Fairgen deals with statistical models and tabular data, and its training depends exclusively on the data contained in the loaded dataset. This effectively allows market researchers to generate new, synthetic respondents by extrapolating from adjacent segments in the survey.

“We don’t use any LLM for a very simple reason: if we were to pre-train in many [other] polls, that would just convey misinformation,” Cohen said. “Because there would be cases where you would learn something in other research, and we don’t want that. It’s all about reliability.”

In terms of business model, Fairgen is sold as a SaaS, with companies uploading their surveys in any structured format (.CSV or .SAV) to Fairgen’s cloud-based platform. According to Cohen, it takes up to 20 minutes to train the model based on the survey data provided, depending on the number of questions. The user then selects a “segment” (a subset of respondents who share certain characteristics) – for example, “Gen Z working in industry x” – and then Fairgen delivers a new file structured identically to the original training file, with exactly the same questions, just new lines.

Fairgen is being used by BVA and French research and market research company IFOP, which have already integrated the startup’s technology into their services. IFOP, which is a little like Gallup in the US, it is using Fairgen for polling purposes in the European elections, although Cohen thinks it could also end up being used in the US elections later this year.

“IFOPs are basically our seal of approval because they’ve been around for about 100 years,” Cohen said. “They validated the technology and were our original design partners. We are also testing or already integrating with some of the biggest market research companies in the world, which I am not allowed to talk about yet.”



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 6,239

Don't Miss

Fazoli’s Lexington-based parent company, ex-CEO charged in $47 million fraud scheme

Federal prosecutors in Los Angeles indicted Fazoli’s parent company, based

Taiwan President thanks pilots who fought China drills

By Ben Blanchard and Ann Wang HUALIEN, Taiwan (Reuters) –