opendp.smartnoise.core.base package

class opendp.smartnoise.core.base.Analysis(*, dynamic=True, eager=False, neighboring='substitute', group_size=1, filter_level='public', protect_floating_point=True, protect_elapsed_time=False, protect_sensitivity=True, stack_traces=True, strict_parameter_checks=False)[source]

Bases: object

Top-level class that contains a definition of privacy and collection of statistics. This class tracks cumulative privacy usage for all components within.

The dynamic flag makes the library easier to use, because multiple batches may be strung together before calling release(). However, it opens the execution up to potential side channel timing attacks. Disable this if side channels are a concern.

The eager flag makes the library easier to debug, because stack traces pass through malformed components. As a library user, it may be useful to enable eager and find a small, similar public dataset to help shape your analysis. Building an analysis with a large dataset and eager enabled is not recommended, because the analysis is re-executed for each additional node.

filter_level determines what data is included in the release:

  • public only newly released public data is included in the release

  • public_and_prior will also retain private values previously included in the release

  • all for including all evaluations from all nodes, which is useful for system debugging

There are several arguments for enabling/disabling individual protections.

  • protect_floating_point (enabled by default):
    • if enabled, disables the runtime if the runtime was not compiled against mpfr

    • if enabled, prevents the usage of the laplace and gaussian mechanisms

    • if enabled, noise-addition statistics on floating point numbers default to the snapping mechanism

  • protect_sensitivity (enabled by default):
    • if enabled, users may not pass custom sensitivities to mechanisms

  • protect_elapsed_time (disabled by default):
    • if enabled, forces all computations to run in constant time, regardless of the private dataset

    • WARNING: this feature is still in development. Some components (like resize) may still have different execution times on neighboring datasets.

  • strict_parameter_checks (enabled by default):
    • if enabled, analyses may not consume more then epsilon=1, or delta greater than a value proportional to the number of records

  • stack_traces (enabled by default):
    • Disable stack traces to limit the amount of private information leaked should an error be encountered.

    • This only turns off stack traces from the runtime- the rest of the library is not affected.

    • The library does not take epsilon consumed from errors into account

Parameters
  • dynamic – flag for enabling dynamic validation

  • eager – release every time a component is added

  • neighboring – may be substitute or add_remove

  • group_size – number of individuals to protect simultaneously

  • filter_level – may be public, public_and_prior or all

  • protect_floating_point – enable for protection against floating point attacks

  • protect_elapsed_time – enable for protection against side-channel timing attacks

  • protect_sensitivity – disable to pass custom sensitivities

  • stack_traces – set to False to suppress potentially sensitive stack traces

  • strict_parameter_checks – enable this to fail when some soft privacy violations are detected

add_component(component, value=None, value_format=None, value_public=False)[source]

Every component must be contained in an analysis.

Parameters
  • component – The description of computation

  • value – Optionally, the result of the computation.

  • value_format – Optionally, the format of the result of the computation- array indexmap jagged

  • value_public – set to true if the value is considered public

clean()[source]

Remove all nodes from the analysis that do not have public descendants with released values.

This can be helpful to clear away components that fail property checks.

enter()[source]

Set the current analysis as active. This allows building analyses outside of context managers, in a REPL environment. All new Components will be attributed to the entered analysis.

exit()[source]

Set the current analysis as inactive.

Components constructed after exit() will not longer be attributed to the previously active analysis.

plot()[source]

Visual utility for graphing the analysis. Each component is a node, and arguments are edges. networkx and matplotlib are necessary, but must be installed separately

print_warnings()[source]

print internal warnings about failed nodes after running the graph dynamically

property privacy_usage

Compute the overall privacy usage of an analysis. This function is data agnostic. It calls the validator rust FFI with protobuf objects.

Returns

A privacy usage response

release()[source]

Evaluate an analysis and release the differentially private results. This function touches private data. It calls the runtime rust FFI with protobuf objects.

The response is stored internally in the analysis instance and the all further components are computed in the next batch.

report()[source]

FFI Helper. Generate a json string with a summary/report of the Analysis and Release This function is data agnostic. It calls the validator rust FFI with protobuf objects.

Returns

parsed JSON array of summaries of releases

update_properties(component_ids=None, suppress_warnings=False)[source]

If new nodes have been added or there has been a release, recompute the properties for all of the components.

validate()[source]

Check if an analysis is differentially private, given a set of released values. This function is data agnostic. It calls the validator rust FFI with protobuf objects.

Returns

A success or failure response

class opendp.smartnoise.core.base.Component(name: str, arguments: Optional[dict] = None, options: Optional[dict] = None, constraints: Optional[dict] = None, value=None, value_format=None, value_public=False)[source]

Bases: object

Representation for the most atomic computation. There are helpers to construct these in components.py.

The response from the helper functions are instances of this class. This class facilitates accessing releases, extending the graph, and viewing static properties.

Many components are linked together to form an analysis graph.

Parameters
  • name – The id of the component. A list of component id is here: https://opendifferentialprivacy.github.io/smartnoise-core/doc/smartnoise_validator/docs/components/index.html

  • arguments – Inputs to the component that come from prior nodes on the graph.

  • options – Inputs to the component that are passed directly via protobuf.

  • constraints – Additional modifiers on data inputs, like data_lower, or left_categories.

  • value – A value that is already known about the data, to be stored in the release.

  • value_format – The format of the value, one of array, jagged, indexmap

property actual_privacy_usage

If a component is designed to potentially use less privacy usage than it was budgeted, this provides the reduced value

Returns

A privacy usage

property categories

view the statically derived category set

property data_type

view the statically derived data type

property dimensionality

view the statically derived dimensionality (number of axes)

from_accuracy(value, alpha)[source]

Retrieve the privacy usage necessary such that the true value differs from the estimate by at most “value amount” with (1 - alpha)100% confidence

get_accuracy(alpha, privacy_usage=None)[source]

Retrieve the accuracy for the values released by the component. The true value differs from the estimate by at most “accuracy amount” with (1 - alpha)100% confidence.

get_parents()[source]

List all nodes that use this node as a dependency/argument. :return: {[node_id]: [parent]}

property lower

view the statically derived lower bound on the data

property nullity

view the statically derived nullity property on the data

property num_columns

view the statically derived number of columns

property num_records

view the statically derived number of records

static of(value, value_format=None, public=True)Optional[opendp.smartnoise.core.base.Component][source]

Given an array, list of lists, or dictionary, attempt to wrap it in a component and place the value in the release. Loose literals are by default public. This is an alternative constructor for a Literal, that potentially doesn’t wrap the value in a Literal component if it is None or already a Component

Parameters
  • value – The value to be wrapped.

  • value_format – must be one of array, indexmap, jagged

  • public – Loose literals are by default public.

Returns

A Literal component with the value attached to the parent analysis’ release.

property partition_keys
property properties

view protobuf representing all known properties of the component

property releasable

check if the data from this component is releasable/public

set(value)[source]
property upper

view the statically derived upper bound on the data

property value

Retrieve the released values from the analysis’ release. If this returns None, then this node is not releasable.

Returns

The value stored in the release corresponding to this node

class opendp.smartnoise.core.base.Dataset(*, path=None, skip_row=True, column_names=None, value=None, value_format=None, value_type=None, value_columns=None, public=False)[source]

Bases: object

Datasets represent a single tabular resource. Datasets are assumed to be private, and may be loaded from csv files or as literal arrays.

Parameters
  • path – Path to a csv file on the filesystem. It is assumed that the csv file is well-formed.

  • skip_row – Set to True if the first row is the csv header. The csv header is always ignored.

  • column_names – Alternatively, the set of column names in the data resource.

  • value – Alternatively, a literal value/array to pass via protobuf

  • value_format – If ambiguous, the data format of the value (either array, indexmap or jagged)

  • public – Whether to flag the data in the dataset as public. This is of course private by default.