opendp.smartnoise.core.components package

Warning, this file is autogenerated by code_generation.py. Don’t modify this file manually. (Generated: 2021-01-04 16:18:07.740468)

opendp.smartnoise.core.components.abs(data, **kwargs)[source]

Abs Component

Absolute value of data.

Parameters
  • data – Atomic types must be of type float or integer.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.add(left, right, **kwargs)[source]

Add Component

Mathematical addition. Value types of arguments must match.

Parameters
  • left – Left value to add. Must be of type float or integer.

  • right – Right value to add. Must be of type float or integer.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.cast(data, atomic_type, true_label=None, lower=None, upper=None, **kwargs)[source]

Cast Component

Cast data to an atomic type.

Parameters
  • data – Data to be cast to another type.

  • true_label – Positive class (class to be mapped to true) for each column. Used only if casting to bool.

  • lower – Minimum allowable imputation value. Used only if casting to i64.

  • upper – Maximum allowable imputation value. Used only if casting to i64.

  • atomic_type – Type to which data should be cast. One of [string, int, bool, float]

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.clamp(data, lower=None, upper=None, categories=None, null_value=None, **kwargs)[source]

Clamp Component

Clamps data to the provided bounds.

If data are numeric, clamping maps elements outside of an interval [lower, upper] to the closer endpoint. If data are categorical, clamping maps elements outside of the categories argument to the associated null. Using clamp sets the categories property for the analysis with value categories plus null_value in the last position.

Parameters
  • data – Data to be clamped.

  • lower – Desired lower bound for each column of the data. Used only if categories is None.

  • upper – Desired upper bound for each column of the data. Used only if categories is None.

  • categories – The set of categories you want to be represented for each column of the data, or None.

  • null_value – The value to which elements not included in categories will be mapped for each column of the data. Used only if categories is not None.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Clamped data.

opendp.smartnoise.core.components.column_bind(arguments)[source]

ColumnBind Component

Bind arguments as columns of an array to produce a larger array

Parameters

arguments – dictionary of arguments to supply to the function

Returns

opendp.smartnoise.core.components.count(data, distinct=False, **kwargs)[source]

Count Component

Returns the number of rows in the data.

Parameters
  • data

  • distinct – Set to true for the number of unique members in the data.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Row count.

opendp.smartnoise.core.components.covariance(data=None, left=None, right=None, finite_sample_correction=True, **kwargs)[source]

Covariance Component

Calculate covariance.

If data argument is provided as a 2D array, calculate covariance matrix. Otherwise, left and right 1D arrays are used to calculate a cross-covariance matrix between elements of the two arrays.

Parameters
  • data – 2D data array used to construct covariance matrix.

  • left – Left data array used to calculate cross-covariance matrix. Used only if data not provided.

  • right – Right data array used to calculate cross-covariance matrix. Used only if data not provided.

  • finite_sample_correction – Whether or not to use the finite sample correction (Bessel’s correction).

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Flattened covariance or cross-covariance matrix.

opendp.smartnoise.core.components.digitize(data, edges, null_value=None, inclusive_left=True, **kwargs)[source]

Digitize Component

Maps data to bins.

Bins will be of the form [lower, upper) or (lower, upper]. The null value is the final category.

Parameters
  • data – Data to be binned.

  • edges – Values representing the edges of bins. Edges must be sorted, and may not contain duplicates.

  • null_value – Value to which to map if there is no valid bin (e.g. if the element falls outside the bin range). The null value is the final category.

  • inclusive_left – Whether or not the left edge of the bin is inclusive, i.e. the bins are of the form [lower, upper).

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.divide(left, right, **kwargs)[source]

Divide Component

Parameters
  • left – Atomic type must match right

  • right – Atomic type must match left

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.dp_count(data, lower=0, upper=None, distinct=False, mechanism='SimpleGeometric', privacy_usage=None, **kwargs)[source]

DPCount Component

Returns a differentially private row count.

Parameters
  • data

  • lower – Estimated minimum possible value of the statistic. Useful to help bound elapsed time when sampling for the geometric mechanism. Required for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Useful to help bound elapsed time when sampling for the geometric mechanism. Required for the snapping mechanism.

  • distinct – Set to true for the number of unique members in the data.

  • mechanism – Privatizing mechanism to use. One of [SimpleGeometric, Laplace, Snapping, Gaussian, AnalyticGaussian]. Only SimpleGeometric is accepted if floating-point protections are enabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private row count.

opendp.smartnoise.core.components.dp_covariance(left=None, right=None, data=None, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, finite_sample_correction=True, **kwargs)[source]

DPCovariance Component

Calculate differentially private covariance.

If data argument is provided as a 2D array, calculate covariance matrix. Otherwise, left and right 1D arrays are used to calculate a cross-covariance matrix between elements of the two arrays.

Parameters
  • left – Left data array used to calculate cross-covariance matrix. Used only if data not provided.

  • right – Right data array used to calculate cross-covariance matrix. Used only if data not provided.

  • data – 2D data array used to construct covariance matrix.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • mechanism – Privatizing mechanism to use. One of [Laplace, Snapping, Gaussian, AnalyticGaussian]

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • finite_sample_correction – Whether or not to use the finite sample correction (Bessel’s correction).

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Flattened covariance or cross-covariance matrix.

opendp.smartnoise.core.components.dp_gumbel_median(data, lower, upper, enforce_constant_time=True, privacy_usage=None, **kwargs)[source]

DPGumbelMedian Component

Returns differentially private estimates of the median of each column of the data.

Parameters
  • data

  • lower – Min candidate

  • upper – Max candidate

  • enforce_constant_time – Enforce constant time for median

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimates of the median of each column of the data.

opendp.smartnoise.core.components.dp_histogram(data, edges=None, categories=None, null_value=None, lower=0, upper=None, inclusive_left=True, mechanism='SimpleGeometric', privacy_usage=None, **kwargs)[source]

DPHistogram Component

Returns a differentially private histogram over user-defined categories. The final cell contains the counts for null values (outside the set of categories).

Parameters
  • data – Atomic type must be numeric.

  • edges – Set of edges to bin continuous-valued data. Used only if data are of continuous nature.

  • categories – Set of categories in data. Used only if data are of categorical nature.

  • null_value – The value to which elements not included in categories will be mapped for each column of the data. Used only if categories is not None. The null value is the final category- counts for the null category are at the end of the vector of counts.

  • lower – Estimated minimum possible value of bin counts. Useful to help bound elapsed time when sampling for the geometric mechanism. Required for the snapping mechanism.

  • upper – Estimated maximum possible value of bin counts. Useful to help bound elapsed time when sampling for the geometric mechanism. Required for the snapping mechanism.

  • inclusive_left – Whether or not the left edge of the bin is inclusive. If true bins are of the form [lower, upper). Otherwise, bins are of the form (lower, upper]. Used only if data are of continuous nature.

  • mechanism – Privatizing mechanism to use. One of [SimpleGeometric, Laplace, Snapping, Gaussian, AnalyticGaussian]. Only SimpleGeometric is accepted if floating-point protections are enabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private histogram.

opendp.smartnoise.core.components.dp_linear_regression(data_x, data_y, k=None, lower_slope=None, upper_slope=None, lower_intercept=None, upper_intercept=None, implementation='theil-sen-k-match', privacy_usage=None, **kwargs)[source]

DPLinearRegression Component

Returns differentially private estimates of the slope and intercept.

Parameters
  • data_x – Predictor variable

  • data_y – Target variable

  • k – Number of matchings. Memory usage is quadratic in K.

  • lower_slope – Estimated minimum possible value of the slope.

  • upper_slope – Estimated maximum possible value of the slope.

  • lower_intercept – Estimated minimum possible value of the intercept.

  • upper_intercept – Estimated maximum possible value of the intercept.

  • implementation – Theil-Sen implementation to use. One of [theil-sen, theil-sen-k-match]

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimate of the slope and intercept of the line fit to the data.

opendp.smartnoise.core.components.dp_maximum(data, candidates=None, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, **kwargs)[source]

DPMaximum Component

Returns differentially private estimates of the maximum elements of each column of the data.

Parameters
  • data

  • candidates – Set from which the Exponential mechanism will return an element. Type must match with atomic type of data. This value must be column-conformable with data. Only useful for Exponential mechanism.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • mechanism – Privatizing mechanism to use. Value must be one of [Automatic, Laplace, Snapping, Gaussian, AnalyticGaussian]

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimates of the maximum elements of the data.

opendp.smartnoise.core.components.dp_mean(data, lower=None, upper=None, implementation='resize', mechanism='Automatic', privacy_usage=None, **kwargs)[source]

DPMean Component

Returns differentially private estimates of the means of each column of the data.

Parameters
  • data – Atomic type must be numeric.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • implementation – Privatizing algorithm to use. One of [resize, plug-in]

  • mechanism – Privatizing mechanism to use. One of [Laplace, Snapping, Gaussian, AnalyticGaussian].

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimate of the mean of each column of the data.

opendp.smartnoise.core.components.dp_median(data, candidates=None, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, interpolation='midpoint', **kwargs)[source]

DPMedian Component

Returns differentially private estimates of the median of each column of the data.

Parameters
  • data – Atomic type must be numeric. For Gumbel mechanism, must be limited to a single column of data.

  • candidates – Set from which the Exponential mechanism will return an element. Type must match with atomic type of data. This value must be column-conformable with data. Only useful for Exponential mechanism.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • mechanism – Privatizing mechanism to use. Value must be one of [Exponential, Laplace, Snapping, Gaussian, AnalyticGaussian, Gumbel]. Automatic chooses Exponential if candidates provided, otherwise chooses Laplace.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. For Gumbel mechanism, must be limited to a single column of data. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • interpolation – Interpolation strategy. One of [lower, upper, midpoint, nearest, linear]

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimates of the median of each column of the data.

opendp.smartnoise.core.components.dp_minimum(data, candidates=None, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, **kwargs)[source]

DPMinimum Component

Returns differentially private estimates of the minimum elements of each column of the data.

Parameters
  • data

  • candidates – Set from which the Exponential mechanism will return an element. Type must match with atomic type of data. This value must be column-conformable with data. Only useful for Exponential mechanism.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • mechanism – Privatizing mechanism to use. Value must be one of [Automatic, Exponential, Laplace, Snapping, Gaussian, AnalyticGaussian]. Automatic chooses Exponential if candidates provided.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimates of the minimum elements of the data.

opendp.smartnoise.core.components.dp_quantile(data, alpha, candidates=None, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, interpolation='midpoint', **kwargs)[source]

DPQuantile Component

Returns differentially private estimates of specified quantiles for each column of the data.

Parameters
  • data – Atomic type must be numeric.

  • candidates – Set from which the Exponential mechanism will return an element. Type must match with atomic type of data. This value must be column-conformable with data. Only useful for Exponential mechanism.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • alpha – Desired quantiles, defined on [0,1].

  • mechanism – Privatizing mechanism to use. Value must be one of [Automatic, Exponential, Laplace, Snapping, Gaussian, AnalyticGaussian]. Automatic chooses Exponential if candidates provided.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • interpolation – Interpolation strategy. One of [lower, upper, midpoint, nearest, linear]

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private estimate of the quantile.

opendp.smartnoise.core.components.dp_raw_moment(data, order, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, **kwargs)[source]

DPRawMoment Component

Returns differentially private sample estimate of a raw moment for each column of the data.

Parameters
  • data – Data for which you would like the kth raw moments. Atomic data type must be float.

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism.

  • order – Integer statistical moment indicator.

  • mechanism – Privatizing mechanism to use. Value must be one of [Automatic, Laplace, Snapping, Gaussian, AnalyticGaussian].

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private sample estimate of kth raw moment for each column of the data.

opendp.smartnoise.core.components.dp_sum(data, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, **kwargs)[source]

DPSum Component

Returns differentially private estimates of the sums of each column of the data.

Parameters
  • data

  • lower – Estimated minimum possible value of the statistic, on integral data. Useful to help bound elapsed time when sampling for the geometric mechanism. Useful for the snapping mechanism.

  • upper – Estimated maximum possible value of the statistic, on integral data. Useful to help bound elapsed time when sampling for the geometric mechanism. Useful for the snapping mechanism.

  • mechanism – Privatizing mechanism to use. Value must be one of [Automatic, Laplace, Gaussian, AnalyticGaussian, SimpleGeometric]. Automatic chooses based on the input data type.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private sum over elements for each column of the data.

opendp.smartnoise.core.components.dp_variance(data, lower=None, upper=None, mechanism='Automatic', privacy_usage=None, finite_sample_correction=True, **kwargs)[source]

DPVariance Component

Returns a differentially private estimate of the variance for each column of the data.

Parameters
  • data

  • lower – Estimated minimum possible value of the statistic. Only useful for the snapping mechanism. Atomic data type must be float.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism. Atomic data type must be float.

  • mechanism – Privatizing mechanism to use. Value must be one of [Laplace, Snapping, Gaussian, AnalyticGaussian].

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Atomic data type value must be float. Example value: {‘epsilon’: 0.5}

  • finite_sample_correction – Whether or not to use the finite sample correction (Bessel’s correction).

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Differentially private sample variance for each column of the data.

opendp.smartnoise.core.components.equal(left, right, **kwargs)[source]

Equal Component

Parameters
  • left – Atomic type must match right

  • right – Atomic type must match left

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.exponential_mechanism(utilities, candidates, sensitivity=None, privacy_usage=None, **kwargs)[source]

ExponentialMechanism Component

Returns an element from a finite set with probability relative to its utility.

Parameters
  • utilities – Respective scores for each candidate. Total number of records must match candidates.

  • candidates – Set from which the Exponential mechanism will return an element. Total number of records must match utilities.

  • sensitivity – Override the sensitivity computed by the library. Rejected unless protect_sensitivity is disabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Length of privacy_usage must be exactly one.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Element from the candidate set selected via the Exponential mechanism.

opendp.smartnoise.core.components.filter(data, mask, **kwargs)[source]

Filter Component

Filters data down into only the desired rows.

Parameters
  • data

  • mask – Boolean mask giving whether or not each row should be kept. Example value: data[‘age’] == ‘4’

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Data with only the desired rows.

opendp.smartnoise.core.components.gaussian_mechanism(data, sensitivity=None, privacy_usage=None, analytic=True, **kwargs)[source]

GaussianMechanism Component

Privatizes a result by returning it perturbed with Gaussian noise.

Parameters
  • data – Result to be released privately via the Gaussian mechanism. Atomic type must be numeric.

  • sensitivity – Override the sensitivity computed by the library. Rejected unless protect_sensitivity is disabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release.

  • analytic – Set to enable use of the analytic gaussian mechanism.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Original data perturbed with Gaussian noise.

opendp.smartnoise.core.components.greater_than(left, right, **kwargs)[source]

GreaterThan Component

Parameters
  • left – Atomic values must be numeric and of the same type. Type must match right.

  • right – Atomic values must be numeric and of the same type. Type must match left.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.histogram(data, edges=None, categories=None, null_value=None, inclusive_left=True, **kwargs)[source]

Histogram Component

Parameters
  • data

  • edges – Set of edges to bin continuous-valued data. Used only if data are of continuous nature. Must have a value if categories not specified.

  • categories – Set of categories in data. Used only if data are of categorical nature. Must have a value if edges not specified.

  • null_value – The value to which elements not included in categories will be mapped for each column of the data. Used only if categories is not None.

  • inclusive_left – Whether or not the left edge of the bin is inclusive. If true bins are of the form [lower, upper). Otherwise, bins are of the form (lower, upper]. Used only if data are of continuous nature.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.impute(data, lower=None, upper=None, categories=None, null_values=None, weights=None, distribution=None, shift=None, scale=None, **kwargs)[source]

Impute Component

Replaces null values with draws from a specified distribution.

If the categories argument is provided, the data are considered to be categorical regardless of atomic type and the elements provided in null_value will be replaced with those in categories according to weights.

If the categories argument is not provided, the data are considered to be numeric and elements that are f64::NAN will be replaced according to the specified distribution.

Parameters
  • data – The data for which null values will be imputed.

  • lower – A lower bound on data elements for each column. Used only if categories is None.

  • upper – An upper bound on data elements for each column. Used only if categories is None.

  • categories – The set of categories you want to be represented for each column of the data, if the data is categorical. Atomic type must match atomic type of data.

  • null_values – The set of values that are considered null for each column of the data, if the data is categorical. Atomic type must match atomic type of data.

  • weights – Optional. The weight of each category when imputing. Uniform weights are used if not specified.

  • distribution – The distribution to be used when imputing records. Used only if categories is None.

  • shift – The expectation of the Gaussian distribution to be used for imputation. Used only if distribution is Gaussian.

  • scale – The standard deviation of the Gaussian distribution to be used for imputation. Used only if distribution is Gaussian.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Data with null values replaced by imputed values.

opendp.smartnoise.core.components.index(data, names=None, indices=None, mask=None, **kwargs)[source]

Index Component

Index into data frames, partitions and arrays to retrieve homogeneously typed contiguous arrays

Parameters
  • data

  • names

  • indices

  • mask

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.laplace_mechanism(data, sensitivity=None, privacy_usage=None, **kwargs)[source]

LaplaceMechanism Component

Privatizes a result by returning it perturbed with Laplace noise.

Parameters
  • data – True value to be released privately via the Laplace mechanism.

  • sensitivity – Override the sensitivity computed by the library. Rejected unless protect_sensitivity is disabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Original data perturbed with Laplace noise.

opendp.smartnoise.core.components.less_than(left, right, **kwargs)[source]

LessThan Component

Parameters
  • left – Atomic type must be numeric, and match with atomic type of right.

  • right – Atomic type must be numeric, and match with atomic type of left.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.literal(**kwargs)[source]

Literal Component

Parameters

kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.log(data, base=2.71828, **kwargs)[source]

Log Component

Parameters
  • data – Atomic type must be float.

  • base

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.logical_and(left, right, **kwargs)[source]

And Component

Parameters
  • left – Left argument for the logical AND.

  • right – Right argument for the logical AND.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Logical AND of left and right.

opendp.smartnoise.core.components.logical_or(left, right, **kwargs)[source]

Or Component

left and right arguments must share the same data types

Parameters
  • left

  • right

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.map(arguments, component)[source]

Map Component

Apply Component to each data partition.

Parameters
  • arguments – dictionary of arguments to supply to the function

  • component

Returns

opendp.smartnoise.core.components.materialize(column_names, file_path, public=False, skip_row=True, **kwargs)[source]

Materialize Component

Load a tabular frame from a data source

Parameters
  • column_names

  • public

  • skip_row – when set, skip the first line (header) in a csv

  • file_path – Path to the file on the system. File format must be CSV.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.maximum(data, candidates=None, **kwargs)[source]

Maximum Component

Find the maximum value of each column in the data.

Parameters
  • data – Data for which you want the maximum value in each column.

  • candidates – Set from which the Exponential mechanism will return an element.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Maximum of each column in the data.

opendp.smartnoise.core.components.mean(data, **kwargs)[source]

Mean Component

Calculates the arithmetic mean of each column in the provided data.

Parameters
  • data

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Arithmetic mean for each column of the data in question.

opendp.smartnoise.core.components.median(data, candidates=None, **kwargs)[source]

Median Component

Find the median value of each column in the data.

Parameters
  • data – Data for which you want the median value in each column.

  • candidates – Set from which to compute scores for the Exponential mechanism.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Median of each column in the data.

opendp.smartnoise.core.components.minimum(data, candidates=None, **kwargs)[source]

Minimum Component

Find the minimum value of each column in the data.

Parameters
  • data – Data for which you want the maximum value in each column.

  • candidates – Set from which the Exponential mechanism will return an element.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Maximum of each column in the data.

opendp.smartnoise.core.components.modulo(left, right, **kwargs)[source]

Modulo Component

Parameters
  • left – Atomic type must be numeric. Atomic type must match right.

  • right – Atomic type must be numeric. Atomic type must match left.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.multiply(left, right, **kwargs)[source]

Multiply Component

Parameters
  • left – Atomic type must be numeric. Atomic type must match right.

  • right – Atomic type must be numeric. Atomic type must match left.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.negate(data, **kwargs)[source]

Negate Component

Parameters
  • data – Atomic type must be boolean.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.negative(data, **kwargs)[source]

Negative Component

Parameters
  • data – Atomic type must be numeric.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.partition(data, num_partitions=None, by=None, **kwargs)[source]

Partition Component

Split the rows of data into either k equally sized partitions, or by the categories of a vector

Parameters
  • data – Must be a dataframe or an array

  • num_partitions

  • by

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.power(data, radical, **kwargs)[source]

Power Component

Parameters
  • data – Atomic types must be numeric and homogenous.

  • radical – Atomic values may not be negative.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.quantile(data, alpha, candidates=None, interpolation='midpoint', **kwargs)[source]

Quantile Component

Get values corresponding to specified quantiles for each column of the data.

Parameters
  • data – Atomic type must be numeric.

  • candidates – Set from which the Exponential mechanism will return an element. Type must match with atomic type of data. This value must be column-conformable with data.

  • alpha – Desired quantiles, defined on [0,1]. Examples: 0: min, 0.5: median, 1: max

  • interpolation – Interpolation strategy. One of [lower, upper, midpoint, nearest, linear]

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Quantile values for each column.

opendp.smartnoise.core.components.raw_moment(data, order, **kwargs)[source]

RawMoment Component

Returns sample estimate of kth raw moment for each column of the data.

Parameters
  • data – Data for which you would like the kth raw moments. Atomic data type must be float.

  • order – Indicate the kth integer statistical moment.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

kth raw sample moment for each column.

opendp.smartnoise.core.components.reshape(data, shape, symmetric=False, layout='row', **kwargs)[source]

Reshape Component

Reshapes a row vector into a matrix.

Parameters
  • data – Vector of data to stack into a matrix. A Indexmap of matrices will be emitted if multiple rows are provided.

  • symmetric – Set if data are elements from the upper triangle of a symmetric matrix.

  • layout – Consecutive elements of either the row or column reside next to each other. Note that multi-row inputs are reshaped to partitional outputs, having one matrix per partition.

  • shape – The shape of the output matrix. Dimensionality may not be greater than 2.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Reshape of data.

opendp.smartnoise.core.components.resize(data, number_rows=None, number_columns=None, lower=None, upper=None, categories=None, weights=None, distribution=None, shift=None, scale=None, sample_proportion=None, minimum_rows=None, **kwargs)[source]

Resize Component

Resizes the data in question to be consistent with a provided sample size, n.

The library does not, in general, assume that the sample size of the data being analyzed is known. This introduces a number of problems around how to calculate statistics that are a function of the sample size.

To address this problem, the library asks the user to provide n, an estimate of the true sample size based on their own beliefs about the data or a previous differentially private count of the number of rows in the data. This component then either subsamples or appends to the data in order to make it consistent with the provided n.

Note that lower/upper/categorical arguments must be provided, or lower/upper/categorical properties must be known on data.

Note that if using categories constraint, data are treated as categorical regardless of atomic type.

Parameters
  • data – The data to be resized. Atomic type of data must match atomic type of categories. If categories not populated, data are treated as numeric and any necessary imputation is done according to a continuous distribution.

  • number_rows – An estimate of the number of rows in the data. This could be the guess of the user, or the result of a DP release. Cannot be set with minimum_rows.

  • number_columns – An estimate of the number of columns in the data. This must be the guess of the user, if not previously known (optional). A non-empty value must be positive. A non-empty value is incompatiable with an attempt to resize number of columns and results in an error.

  • lower – A lower bound on data elements for each column. This value must be less than upper.

  • upper – An upper bound on data elements for each column. This value must be greater than lower.

  • categories – The set of categories you want to be represented for each column of the data, if the data is categorical. Atomic type of data must match atomic type of categories.

  • weights – Optional. The weight of each category when imputing. Uniform weights are used if not specified.

  • distribution – The distribution to be used when imputing records.

  • shift – The expectation of the Gaussian distribution used for imputation (used only if distribution = Gaussian).

  • scale – The standard deviation of the Gaussian distribution used for imputation (used only if distribution = Gaussian).

  • sample_proportion – The proportion of underlying data that may be used to construct the new data. May be > 1.

  • minimum_rows – Only add synthetic data if the actual row count is less than this number. No sampling is performed. Cannot be set with number_rows

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

A resized version of data consistent with the provided n

opendp.smartnoise.core.components.row_max(left, right, **kwargs)[source]

RowMax Component

Returns the maximum of the left and right arguments, per row. Note that left and right arguments must share the same data types.

Parameters
  • left – Member data type must match that of right.

  • right – Member data type must match that of left.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.row_min(left, right, **kwargs)[source]

RowMin Component

Returns the minimum of the left and right arguments, per row. Note that left and right arguments must share the same data types.

Parameters
  • left – Member data type must match that of right.

  • right – Member data type must match that of left.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.simple_geometric_mechanism(data, lower=None, upper=None, sensitivity=None, privacy_usage=None, **kwargs)[source]

SimpleGeometricMechanism Component

Privatizes a result by returning it perturbed with Geometric noise.

Parameters
  • data – Result to be released privately via the Geometric mechanism. Member data type must be integer.

  • lower – Lower bound of the statistic to be privatized. Member data type must be integer.

  • upper – Upper bound of the statistic to be privatized. Member data type must be integer.

  • sensitivity – Override the sensitivity computed by the library. Rejected unless protect_sensitivity is disabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release. Values of zero or less, and values of greater than one, will result in warnings.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Original data perturbed with Geometric noise.

opendp.smartnoise.core.components.snapping_mechanism(data, lower=None, upper=None, binding_probability=None, sensitivity=None, privacy_usage=None, **kwargs)[source]

SnappingMechanism Component

Privatizes a result by returning it perturbed via the Snapping mechanism. This mechanism is generally intended for non-integer numerical data. Note that snapping may not operate on integers when floating-point protections are enabled. For this situation, use the geometric mechanism instead.

Parameters
  • data – Result to be released privately via the Snapping mechanism. Array members must be of type float or of type integer.

  • lower – Estimated minimum possible value of the data. Only useful for the snapping mechanism. This argument is required.

  • upper – Estimated maximum possible value of the statistic. Only useful for the snapping mechanism. This argument is required.

  • binding_probability – Upper bound on probability that final clamp binds. Must be within [0, 1).

  • sensitivity – Override the sensitivity computed by the library. Rejected unless protect_sensitivity is disabled.

  • privacy_usage – Object describing the type and amount of privacy to be used for the mechanism release.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Original data perturbed via the Snapping mechanism.

opendp.smartnoise.core.components.subtract(left, right, **kwargs)[source]

Subtract Component

Mathematical subtraction. Value types of arguments must match.

Parameters
  • left – Value from which to subtract. Must be of type float or integer.

  • right – Value which to subtract. Must be of type float or integer.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.sum(data, **kwargs)[source]

Sum Component

Calculates the sum of each column of the data. Data must be of type float or integer.

Parameters
  • data – Data for which you want the sum of each column.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Sum of each column of the data.

opendp.smartnoise.core.components.theil_sen(data_x, data_y, implementation='theil-sen-k-match', k=0, **kwargs)[source]

TheilSen Component

Returns slope and intercept estimates for point pairs

Parameters
  • data_x – value(s) from the first coordinate axis

  • data_y – value(s) from the second coordinate axis

  • implementation – Theil-Sen implementation to use. One of [theil-sen, theil-sen-k-match]

  • k – Number of trials to run for Theil-Sen K Match.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

All slope and intercept estimates for point pairs

opendp.smartnoise.core.components.to_bool(data, true_label, **kwargs)[source]

ToBool Component

Cast data to a bool atomic type.

Parameters
  • data – Data to be cast to Boolean type.

  • true_label – Positive class (class to be mapped to true) for each column.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

opendp.smartnoise.core.components.to_dataframe(data, names, **kwargs)[source]

ToDataframe Component

Name columns of an array to produce a Dataframe with the specified names. Typically used when partitioning a dataframe with preprocessed columns.

Parameters
  • data – ndarray (structured or homogeneous), Iterable, dict, or DataFrame

  • names – Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Dataframe in target language, for example <a href=”https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html”>pandas.DataFrame</a>.

opendp.smartnoise.core.components.to_float(data, **kwargs)[source]

ToFloat Component

Cast data to a float atomic type.

Parameters
  • data – Data to be cast to float.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Array containing the converted float value(s).

opendp.smartnoise.core.components.to_int(data, lower, upper, **kwargs)[source]

ToInt Component

Cast data to a int atomic type.

Parameters
  • data – Data to be cast to integer type.

  • lower – Minimum allowable imputation value. Integers cannot represent null, so values that cannot be parsed are imputed.

  • upper – Maximum allowable imputation value.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Array containing the converted integer value(s).

opendp.smartnoise.core.components.to_string(data, **kwargs)[source]

ToString Component

Cast data to a string atomic type.

Parameters
  • data – Data to be cast to string type.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

The return is the result of the to_string on the arguments.

opendp.smartnoise.core.components.union(arguments, flatten=True)[source]

Union Component

Union the arrays in the arguments into one array.

Parameters
  • arguments – dictionary of arguments to supply to the function

  • flatten – When set, the output is an array. When unset, the output is an indexmap of arrays.

Returns

Array (or indexmap of arrays) containing item(s) representing the concatenation of all partitions

opendp.smartnoise.core.components.variance(data, finite_sample_correction=True, **kwargs)[source]

Variance Component

Calculates the sample variance for each column of the data.

Parameters
  • data

  • finite_sample_correction – Whether or not to use the finite sample correction (Bessel’s correction) to correct the bias in the estimation of the population variance.

  • kwargs – data bounds of the form [argument]_[bound]=[lower | upper | categories | …]

Returns

Sample variance for each column of the data.