ELI5: What is synthetic data

5 views Mar 12, 2026 2 min read

Synthetic data is fake information created to look and act like real information, but it doesn't come from real people or events.

Think of it like this: Imagine you want to learn how to bake cookies, but you don't want to waste your ingredients practicing.

  • Instead of using real flour, sugar, and eggs, you use play dough.
  • The play dough represents synthetic data. It looks like the real ingredients (real data), and you can use it to practice your cookie-making steps without any actual cost.
  • Once you're good at using the play dough, you can then use real ingredients to bake real cookies.
Another example: Let's say you want to teach a computer to recognize cats in pictures.
  • Instead of showing the computer thousands of real pictures of cats (which can be hard to collect and might have privacy issues), you can create synthetic images of cats using a computer program.
  • These synthetic cats aren't real, but they look like cats.
  • The computer can learn to recognize cats using these synthetic images, and then it can hopefully recognize real cats in real pictures.
So, synthetic data is a safe and often easier way to practice, train, or test things without using real information that might be expensive, difficult to get, or private. It's like using a practice dummy instead of a real person when learning self-defense.

Follow-Up Questions

Still curious? Ask a follow-up!

Test Your Understanding

Take a quick quiz and challenge your friends!

Want to learn more?

Ask another question and get a simple explanation!

Ask a New Question