Differential privacy is the gold standard definition of privacy protection. The SmartNoise project endeavors to connect theoretical solutions from the academic community with the practical lessons learned from real-world deployments, to make differential privacy broadly accessible to future deployments. Specifically, we provide to people involved with sensitive data the following building blocks with implementations based on vetted and mature differential privacy research:
A pluggable open source library of differentially private algorithms and mechanisms for releasing privacy preserving queries and statistics.
APIs for defining an analysis and a validator for evaluating these analyses and composing the total privacy loss on a dataset.
An SDK with tools which allow researchers and analysts to:
Use SQL dialect to create differentially private results over tabular data stores
Host a service to compose queries from heterogeneous differential privacy modules (including non-SQL) against shared privacy budget
Perform privacy algorithm stochastic testing against differential privacy modules
The mechanisms library provides a fast, memory-safe native runtime for validating and running differentially private analyses.
Differentially private computations are specified as an analysis graph that can be validated and executed to produce differentially private releases of data. Releases include metadata about accuracy of outputs and the complete privacy cost of the analysis.
We recommend reading the Microsoft SmartNoise Differential Privacy Machine Learning Case Studies which further describes applications of differentially privacy using SmartNoise.
The SmartNoise mechanisms as well as the validator and runtime are written in Rust to take advantage of this language’s memory safety. In practice, SmartNoise users work with the SmartNoise Python package which includes the core Rust functionality as well as an SDK.
The diagram below shows how the SmartNoise GitHub repositories interconnect.
smartnoise-sdk - The SDK includes tools built upon the Python bindings.
In addition the smartnoise-samples repository includes a set of exemplar notebooks which range from demonstrating basic functionality and utility to showing how to create a synthetic dataset with high utility for machine learning.
In practice, full SmartNoise functionality is available through a single Python package which is compatible with Python 3.6 to 3.8:
pip install opendp-smartnoise
To best way to get started with SmartNoise is by reviewing and trying examples from the smartnoise-samples repository which include:
Sample Analysis Notebooks - In addition to a brief tutorial, there are examples of histograms, differentially private covariance, how dataset size and privacy-loss parameter selection impact utility, and working with unknown dataset sizes.
Attack Notebooks - Walk-throughs of how SmartNoise mitigates basic attacks as well as a database reconstruction attack.
SQL Data Access - Code examples and notebooks show how to issue SQL queries against CSV files, database engines, and Spark clusters.
SmartNoise Whitepaper Demo Notebooks - Based on the whitepaper titled Microsoft SmartNoise Differential Privacy Machine Learning Case Studies these notebooks include a demonstration of how to perform supervised machine learning with differential privacy and an example of creating a synthetic dataset with high utility for machine learning as well as examples of creating DP releases with histograms and protecting against a reidentification attack.