Austrian AI company MOSTLY AI has launched a bold new initiative — a $100,000 global challenge to create the most accurate and efficient synthetic dataset derived from real-world data.
The open competition invites developers, researchers, and innovators from around the world to demonstrate how synthetic data can outperform traditional anonymisation methods. Entries will be judged on four key metrics: privacy, accuracy, usability, and computational efficiency. The winning code will be open-sourced, reinforcing the company’s mission to advance transparency and innovation within the synthetic data ecosystem.
MOSTLY AI’s own privacy-preserving platform already sets the benchmark in the field, producing high-fidelity datasets that mirror real data without exposing sensitive information — enabling safe, large-scale AI and machine learning development across sectors. Backed by over $25 million in funding and trusted by major institutions including Citi Bank, the U.S. Department of Homeland Security, and Erste Group, the company continues to position Austria as a leading hub for synthetic intelligence innovation.
Speaking on the initiative, Alexandra Ebert, Chief AI & Data Democratisation Officer at MOSTLY AI, compared the concept to the Netflix Prize, which famously accelerated recommendation algorithms two decades ago.
“We wanted to do something that hasn’t been attempted at this scale for years — to inspire the next wave of synthetic data innovation,” said Ebert.
As global attention intensifies around AI privacy, security, and fairness, synthetic data is increasingly seen as a critical enabler of responsible development. Governments, corporations, and startups alike are recognising its power to replace fragile anonymisation with mathematically robust alternatives.
Ebert adds:
“Synthetic data can accelerate healthcare research, climate science, and unlock innovation for startups by providing access to granular, privacy-safe data. It’s about moving beyond simplified toy datasets and into real-world complexity.”
The competition features two core categories:
Flat Data Challenge: static datasets, such as customer records.
Sequential Data Challenge: dynamic datasets, such as transaction streams or location data — a far greater technical challenge.
Participation has already surged from early-career data scientists and students worldwide, particularly from the Global South. While the $100K prize may not rival corporate AI salaries, it’s a meaningful catalyst for talent, creativity, and open scientific progress.