Do you have questions about what synthetic data is, or whether synthetic data is right for you? In this tutorial, we're going to cover:
Let's start with a definition: synthetic data is artificial data that is designed to be similar in structure to real data, but without any sensitive information. It is generated based on a model of how we expect the data to be distributed. Since it does not include any real data, it eliminates the need for masking or subsetting, and can be used safely for testing and sharing.
With Gradio, synthetic data is generated based on the real data that you collect. This is so valuable because it allows you to continue to get value from data that you collect while still preserving privacy (more on that ahead!).
When you need to work with sensitive data, you should be using synthetic data instead of real data.
For example, if you are sharing customer data with your development team, there are risks that the data might leak, exposing your customers' privacy. These exposures carry financial and reputation risks that your company can easily avoid using synthetic data, which can be generated by Gradio on-demand, mimicking the structure and characteristics of real data. Here are some common reasons why a company like yours might use synthetic data:
Is synthetic data the only tool for you to protect sensitive data? Nope, you may be familiar with encryption, in which your data is converted or hashed into secret code that hides the information's true meaning. The main limitation with encryption is that it is not designed to be decrypted on the fly, making the data difficult to use for downstream analysis or machine learning. Encryption is a good tool for data at rest, not data at work.
Alternatively, there exist classification anonymization methods to remove certain kinds of sensitive information or metadata from data. While sometimes useful, anonymization is a blunt hammer in that it often removes a lot of value from the information as well. Synthetic data can often provide a more fine-grained approach that allows you to preserve the value in information while still mitigating privacy and security-related risks.
As mentioned earlier, one of the key advantages of synthetic data is that it allows you to preserve the value of your machine learning or analytics pipelines, while simply swapping out your original sensitive data with synthetic data. In a variety of experiments, synthetic data consistently provides higher value than other anonymization methods.
Will synthetic data work for you? The best way to is to try our simple APIs yourself. Our team is here to show you additional experiments & help you try synthetic data on your own dataset.