opendp.smartnoise.sql package

class opendp.smartnoise.sql.PrivateReader(reader, metadata, epsilon_per_column=1.0, delta=1e-15, *ignore, privacy=None)[source]

Bases: opendp.smartnoise.reader.base.Reader

Executes SQL queries against tabular data sources and returns differentially private results.

PrivateReader should be created using the from_connection factory method. For example, using pyodbc:

meta = 'datasets/PUMS.yaml'
conn = pyodbc.connect(dsn)
privacy = Privacy(epsilon=0.1, delta=1/10000)
reader = PrivateReader.from_connection(conn, metadata=meta, privacy=privacy)

result = reader.execute('SELECT COUNT(*) AS n FROM PUMS.PUMS GROUP BY educ')

or using a Pandas dataframe:

meta = 'datasets/PUMS.yaml'
csv = 'datasets/PUMS.csv'
pums = pd.read_csv(csv)

privacy = Privacy(epsilon=0.1, delta=1/10000)
reader = PrivateReader.from_connection(pums, metadata=meta, privacy=privacy)

result = reader.execute('SELECT COUNT(*) AS n FROM PUMS.PUMS GROUP BY educ')
property engine

The engine being used by this private reader.

df = pd.read_csv(‘datasets/PUMS.csv’) reader = PrivateReader.from_connection(df, metadata=meta, privacy=privacy) assert(reader.engine == ‘pandas’)

execute(query_string, accuracy: bool = False)[source]

Executes a query and returns a recordset that is differentially private.

Follows ODBC and DB_API convention of consuming query as a string and returning recordset as tuples. This is useful for cases where existing DB_API clients want to swap out API calls with minimal changes.

Parameters

query_string – A query string in SQL syntax

Returns

A recordset structured as an array of tuples, where each tuple represents a row, and each item in the tuple is typed. The first row should contain column names.

execute_with_accuracy(query_string: str)[source]
execute_with_accuracy_df(query_string: str, *ignore, privacy: bool = False)[source]
classmethod from_connection(conn, *ignore, privacy, metadata, engine=None, **kwargs)[source]

Create a private reader over an established SQL connection. If engine is not passed in, the engine will be automatically detected.

Parameters
  • conn – An established database connection. Can be pyodbc, psycopg2, SparkSession, Pandas DataFrame, or Presto.

  • privacy – A Privacy object with epsilon, delta, and other privacy properties. Keyword-only.

  • metadata – The metadata describing the database. Metadata documentation is here. Keyword-only.

  • engine – Optional keyword-only argument that can be used to specify engine-specific rules if automatic detection fails. This should only be necessary when using an uncommon database or middleware.

Returns

A PrivateReader object initialized to process queries against the supplied connection, using the supplied Privacy properties.

get_budget_multiplier(query_string)[source]

Analyzes the query and tells how many differentially private mechanism invocations will be required to execute the query. SUM and COUNT both use 1, while MEAN (composed of a SUM divided by a COUNT) uses 2. VAR and STDDEV use 3.

mul = priv.get_budget_multiplier(‘SELECT AVG(age) FROM PUMS.PUMS GROUP BY sex’) assert(mul == 2)

get_privacy_cost(query_string)[source]

Estimates the epsilon and delta cost for running the given query.

parse_query_string(query_string)opendp.smartnoise._ast.ast.Query[source]

Parse a query string using this PrivateReader’s metadata, returning a Query from the AST.

reader = PrivateReader.from_connection(pums, metadata=meta, privacy=privacy) query_string = ‘SELECT STDDEV(age) AS age FROM PUMS.PUMS’ query = reader.parse_query_string(query_string) age_node = query.xpath_first(“//NamedExpression[@name=’age’]”) dot = age_node.visualize() # visualize the formula in the AST dot.render(‘age’, view=True, cleanup=True)

class opendp.smartnoise.sql.QueryParser(metadata=None)[source]

Bases: object

parse_expression(expression_string)[source]
parse_named_expressions(expression_string)[source]
parse_only(query_string)[source]
queries(query_string, metadata=None)[source]
query(query_string, metadata=None)[source]
start_parser(stream)[source]

Database Readers

class opendp.smartnoise.sql.PandasReader(df=None, metadata=None, conn=None, **kwargs)[source]

Bases: opendp.smartnoise.sql.reader.base.SqlReader

ENGINE = 'Pandas'
db_name()[source]

Get the database associated with this connection

execute(query, *ignore, accuracy: bool = False)[source]

Executes a raw SQL string against the database and returns tuples for rows. This will NOT fix the query to target the specific SQL dialect. Call execute_typed to fix dialect.

class opendp.smartnoise.sql.PostgresReader(host=None, database=None, user=None, password=None, port=None, conn=None, **kwargs)[source]

Bases: opendp.smartnoise.sql.reader.base.SqlReader

A dumb pipe that gets a rowset back from a database using a SQL string, and converts types to some useful subset

ENGINE = 'Postgres'
execute(query, *ignore, accuracy: bool = False)[source]
switch_database(dbname)[source]
class opendp.smartnoise.sql.SqlServerReader(host=None, database=None, user=None, password=None, port=None, conn=None)[source]

Bases: opendp.smartnoise.sql.reader.base.SqlReader

A dumb pipe that gets a rowset back from a database using a SQL string, and converts types to some useful subset

ENGINE = 'SqlServer'
db_name()[source]
execute(query, *ignore, accuracy: bool = False)[source]
switch_database(dbname)[source]
update_connection_string()[source]
class opendp.smartnoise.sql.PrestoReader(host=None, database=None, user=None, password=None, port=None, conn=None, **kwargs)[source]

Bases: opendp.smartnoise.sql.reader.base.SqlReader

A dumb pipe that gets a rowset back from a database using a SQL string, and converts types to some useful subset

ENGINE = 'Presto'
db_name()[source]
execute(query, *ignore, accuracy: bool = False)[source]

Executes a raw SQL string against the database and returns tuples for rows. This will NOT fix the query to target the specific SQL dialect. Call execute_typed to fix dialect.

switch_database(dbname)[source]
update_connection_string()[source]