• About
  • Subscribe
  • Contact
Tuesday, May 20, 2025
  • Login
  • Management Leadership
    • Growth Strategies
    • Finance
    • Operations
    • Sales and Marketing
    • Careers
  • Technology
    • Infrastructure and Platforms
    • Business Applications and Databases
    • Big Data, Analytics and Intelligence
    • Security
  • Industry Verticals
    • Finance and Insurance
    • Manufacturing
    • Logistics and Transportation
    • Retail and Wholesale
    • Hospitality and Tourism
    • Government and Public Services
    • Utilities
    • Media and Telecommunications
  • Resources
    • Whitepapers
    • PodChats
    • Videos
  • Events
No Result
View All Result
  • Management Leadership
    • Growth Strategies
    • Finance
    • Operations
    • Sales and Marketing
    • Careers
  • Technology
    • Infrastructure and Platforms
    • Business Applications and Databases
    • Big Data, Analytics and Intelligence
    • Security
  • Industry Verticals
    • Finance and Insurance
    • Manufacturing
    • Logistics and Transportation
    • Retail and Wholesale
    • Hospitality and Tourism
    • Government and Public Services
    • Utilities
    • Media and Telecommunications
  • Resources
    • Whitepapers
    • PodChats
    • Videos
  • Events
No Result
View All Result
No Result
View All Result
Home Technology Big Data, Analytics & Intelligence

Strategies for the use of synthetic data

Allan Tan by Allan Tan
October 21, 2024
Photo by Google DeepMind: https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-depicts-how-ai-could-adapt-to-an-infinite-amount-of-uses-it-was-created-by-nidia-dias-as-part-of-the-visualising-ai-pr-17485657/

Photo by Google DeepMind: https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-depicts-how-ai-could-adapt-to-an-infinite-amount-of-uses-it-was-created-by-nidia-dias-as-part-of-the-visualising-ai-pr-17485657/

Arguably a byproduct of the increased use of, or intent to use, generative AI (GenAI), synthetic data addresses the data shortfall needed for training AI algorithms while enhancing security and privacy. It allows organisations to avoid collecting sensitive information, thus ensuring compliance with stringent privacy regulations.

Rena Bhattacharyya

This is particularly crucial in sectors like healthcare and finance, where data protection is paramount. Rena Bhattacharyya, chief analyst and practice lead for Enterprise Technology and Services at GlobalData, comments that utilising synthetic data, allows firms to conduct risk evaluations, fraud prevention, and predictive analytics without exposing real user data.

This reduces the risk of data breaches and enhances operational efficiency, making synthetic data a secure alternative for various applications across industries.

Synthetic data risks

But as history tells us, new technologies often come with new risks that have yet to be discovered.

Zeid Khater, analyst at Forrester, suggests one such risk might arise from misrepresentation. This happens when attempting to up-sample for a missing attribute or element in the data, that may be missing for some real-world reason. “Simply augmenting a missing element (like a demographic group) might bias your sample or undermine the factors that initially led to the absence of a particular group in your dataset to begin with,” he explains.

He also posits dimensionality as potentially posing a problem. “For high dimensional data, particularly for structured data, there can be accuracy and reliability issues often associated with what data scientists refer to as “the curse of dimensionality,” he continues.

Role of synthetic data in AI model training

While academia has been using it for nearly 30 years, synthetic data has only entered mainstream commercial use in recent years.

According to Khater, the idea to generate synthetic data as a tool for broadening access to sensitive microdata was proposed for the first time three decades ago. “While first applications of the idea emerged around the turn of the century, the approach gained momentum over the last ten years, stimulated at least in parts by some recent developments in computer science,” he continues.

Gartner estimates that as of 2021, only 1% of data was synthetic, but that by 2024 that figure will astonishingly grow to 60%.

Khater says when used in combination with small high-quality real data, synthetic data has proven to produce higher performant models (see Microsoft’s research on their use of synthetic data to train their phi-model: [2306.11644] Textbooks Are All You Need (arxiv.org).

Zeid Khater

He also believes that synthetic data will continue to be used for training data for prompt engineering and retrieval augmented generation (RAG) testing to ensure the outputs are working as designed, for things like driver-monitoring systems (images of drivers falling asleep at the wheel to alert drowsy or sleepy drivers), digital twins, simulation testing and more.

“The ease of generating synthetic data via GenAI (GANs and VAEs) has made it increasingly more popular to rely on synthetic data for these use cases and similar ones.” Zeid Khater

Approaches to synthetic data generation

While users can always build their synthetic data, Khater reckons organisations will face multiple challenges related to dimensionality.

“There are many new data and platforms that act as standalone vendors including Gretel, Tonic, DataCebo, Franz, MostlyAI, and Mockaroo. These provide either the data itself or a platform to integrate synthetic data creation and usage inside your existing tech stack,” lists out Khater.

He also points to others who specialise in leveraging the data along with LLMs to generate dynamic market research augmentation and product development such as DayOne Strategy and Synthetic Human/Fantasy AI (AI Synthetic Humans: Revolutionising User Research & Team Collaboration | Fantasy Interactive (synthetic-humans.ai). “You may also find that your existing Customer Analytics service providers can do this for you as well such as Fractal or Tredence,” suggests Khater.

Use synthetic data while staying compliant

Asked how can organisations leverage synthetic data while complying with data privacy regulations, Khater argues that synthetic data is in itself compliant because in most cases it can’t be traced back to the original from which it was synthesised (See Gretel’s Differential Privacy).

However, he cautions that in some cases, even an original from which to create synthetic data is problematic due to several reasons: it either doesn’t exist (rare diseases) or is highly regulated (financial or patient data).

“Some data scientists have gotten around this by manually building smaller datasets of roughly 200 rows or so, then have them validated by subject matter experts or others with access to the original data to confirm statistical accuracy and augment off of that,” continues Khater. For example, MedWGAN-based synthetic dataset generation for Uveitis pathology – ScienceDirect.

For best results

Given the growing roster of vendors offering solutions, enterprises will need to identify which GenAI techniques will provide the best results for specific data needs. How should organisations evaluate the effectiveness of such approaches?

Khater says this starting point is business intent. He notes that GANs (generative adversarial networks) often generate the most realistic data. He warns that there is little control over the resultant dataset. “On the other hand, VAEs (variational autoencoders) provide some control through the manipulation of “latent space” – in simple terms, a compressed version of the original dataset that holds all its essential dimensions – but tends to be less accurate than GANs,” he comments.

“Your business intent and use case will determine which method makes the most sense based on those criteria. In some instances, rules-based synthetic data might still be leveraged for maximum control, though it is usually too simplistic and therefore not useful for complex data relationships,” he elaborates.

Measuring the impact

Alexander Linden

Alexander Linden, VP analyst at Gartner, says synthetic data makes AI possible where lack of data makes AI unusable due to bias or inability to recognise rare or unprecedented scenarios.  

“Real-world data is happenstance and does not contain all permutations of conditions or events possible in the real world. Synthetic data can counter this by generating data at the edges, or for conditions not yet seen,” he explains.

When measuring the impact of synthetic data on AI initiatives will require looking at, Khater suggests looking at speed to value. “Access to data might typically slow down data for insights if stringently governed in an organisation or in instances where there is no data, low-quality data, or not enough data,” he elaborates.

He also suggests measuring speed along with compliance. “You can move faster without the fear of regulatory backlash. And of course, model performance via benchmarking,” he concludes.

Related:  SailPoint ups the stake in identity security
Tags: ForresterGartnergenerative AIGlobalDatasynthetic data
Allan Tan

Allan Tan

Allan is Group Editor-in-Chief for CXOCIETY writing for FutureIoT, FutureCIO and FutureCFO. He supports content marketing engagements for CXOCIETY clients, as well as moderates senior-level discussions and speaks at events. Previous Roles He served as Group Editor-in-Chief for Questex Asia concurrent to the Regional Content and Strategy Director role. He was the Director of Technology Practice at Hill+Knowlton in Hong Kong and Director of Client Services at EBA Communications. He also served as Marketing Director for Asia at Hitachi Data Systems and served as Country Sales Manager for HDS’ Philippines. Other sales roles include Encore Computer and First International Computer. He was a Senior Industry Analyst at Dataquest (Gartner Group) covering IT Professional Services for Asia-Pacific. He moved to Hong Kong as a Network Specialist and later MIS Manager at Imagineering/Tech Pacific. He holds a Bachelor of Science in Electronics and Communications Engineering degree and is a certified PICK programmer.

No Result
View All Result

Recent Posts

  • Equinix signs first renewable energy PPA in Japan
  • Vertiv launches new cooling system for AI applications
  • Informatica launches new AI-powered cloud integration and master data management capabilities
  • SG businesses are investing in emerging technology, but ICT suppliers must adapt strategies, study finds
  • ASEAN application development landscape: Navigating challenges and embracing opportunities

Live Poll

Categories

  • Big Data, Analytics & Intelligence
  • Business Applications & Databases
  • Business-IT Alignment
  • Careers
  • Case Studies
  • CISO
  • CISO strategies
  • Cloud, Virtualization, Operating Environments and Middleware
  • Computer, Storage, Networks, Connectivity
  • Corporate Social Responsibility
  • Customer Experience / Engagement
  • Cyber risk management
  • Cyberattacks and data breaches
  • Cybersecurity careers
  • Cybersecurity operations
  • Education
  • Education
  • Finance
  • Finance & Insurance
  • FutureCISO
  • General
  • Governance, Risk and Compliance
  • Government and Public Services
  • Growth Strategies
  • Hospitality & Tourism
  • HR, education and Training
  • Industry Verticals
  • Infrastructure & Platforms
  • Insider threats
  • Latest Stories
  • Logistics & Transportation
  • Management Leadership
  • Manufacturing
  • Media and Telecommunications
  • News Stories
  • Operations
  • Opinion
  • Opinions
  • People
  • Process
  • Remote work
  • Retail & Wholesale
  • Sales & Marketing
  • Security
  • Tactics and Strategies
  • Technology
  • Utilities
  • Videos
  • Vulnerabilities and threats
  • White Papers

Strategic Insights for Chief Information Officers

FutureCIO is about enabling the CIO, his team, the leadership and the enterprise through shared expertise, know-how and experience - through a community of shared interests and goals. It is also about discovering unknown best practices that will help realize new business models.

Quick Links

  • Videos
  • Resources
  • Subscribe
  • Contact

Cxociety Media Brands

  • FutureIoT
  • FutureCFO
  • FutureCIO

Categories

  • Privacy Policy
  • Terms of Use
  • Cookie Policy

Copyright © 2022 Cxociety Pte Ltd | Designed by Pixl

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Management Leadership
    • Growth Strategies
    • Finance
    • Operations
    • Sales and Marketing
    • Careers
  • Technology
    • Infrastructure and Platforms
    • Business Applications and Databases
    • Big Data, Analytics and Intelligence
    • Security
  • Industry Verticals
    • Finance and Insurance
    • Manufacturing
    • Logistics and Transportation
    • Retail and Wholesale
    • Hospitality and Tourism
    • Government and Public Services
    • Utilities
    • Media and Telecommunications
  • Resources
    • Whitepapers
    • PodChats
    • Videos
  • Events
  • Login

Copyright © 2022 Cxociety Pte Ltd | Designed by Pixl

Subscribe