opendp.smartnoise.synthesizers package

class opendp.smartnoise.synthesizers.QUAILSynthesizer(epsilon, dp_synthesizer, dp_classifier, target, test_size=0.2, seed=None, eps_split=0.9)[source]

Bases: opendp.smartnoise.synthesizers.base.SDGYMBaseSynthesizer

fit(data, categorical_columns=None, ordinal_columns=None)

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame) – The data for fitting the synthesizer model.

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

sample(samples, categorical_columns=None, ordinal_columns=None)

Sample from the synthesizer model.

Parameters
  • samples (int) – The number of samples to create

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

class opendp.smartnoise.synthesizers.MWEMSynthesizer(epsilon, q_count=400, iterations=30, mult_weights_iterations=20, splits=[], split_factor=None, max_bin_count=500, custom_bin_count={})[source]

Bases: opendp.smartnoise.synthesizers.base.SDGYMBaseSynthesizer

fit(data, categorical_columns=None, ordinal_columns=None)

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame) – The data for fitting the synthesizer model.

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

mwem()[source]

Runner for the mwem algorithm. Initializes the synthetic histogram, and updates it for self.iterations using the exponential mechanism and multiplicative weights. Draws from the initialized query store for measurements.

Returns

synth_hist, self.histogram - synth_hist is the synthetic data histogram, self.histogram is original histo

Return type

np.ndarray, np.ndarray

sample(samples, categorical_columns=None, ordinal_columns=None)

Sample from the synthesizer model.

Parameters
  • samples (int) – The number of samples to create

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame