pybdm package

Submodules

pybdm.algorithms module

Algorithms based on BDM objects.

class pybdm.algorithms.PerturbationExperiment(bdm, X=None, metric='bdm')[source]

Bases: object

Perturbation experiment class.

Perturbation experiment studies change of BDM / entropy under changes applied to the underlying dataset. This is the main tool for detecting parts of a system having some causal significance as opposed to noise parts.

Parts which when perturbed yield negative contribution to the overall complexity after change are likely to be important for the system, since their removal make it more noisy. On the other hand parts that yield positive contribution to the overall complexity after change are likely to be noise since they elongate the system’s description length.

bdm

BDM object. It has to be configured properly to handle the dataset that is to be studied.

Type

BDM

X

Dataset for perturbation analysis. May be set later.

Type

array_like (optional)

metric

Which metric to use for perturbing.

Type

{‘bdm’, ‘ent’}

See also

pybdm.bdm.BDM

BDM computations

Examples

>>> import numpy as np
>>> from pybdm import BDM, PerturbationExperiment
>>> X = np.random.randint(0, 2, (100, 100))
>>> bdm = BDM(ndim=2)
>>> pe = PerturbationExperiment(bdm, metric='bdm')
>>> pe.set_data(X)
>>> idx = np.argwhere(X) # Perturb only ones (1 --> 0)
>>> delta_bdm = pe.run(idx)
>>> len(delta_bdm) == idx.shape[0]
True

More examples can be found in Usage.

property ndim

Data number of axes getter.

perturb(idx, value=-1, keep_changes=False)[source]

Perturb element of the dataset.

Parameters
  • idx (tuple) – Index tuple of an element.

  • value (int or callable or None) – Value to assign. If negative then new value is randomly selected from the set of other possible values. For binary data this is just a bit flip and no random numbers generation is involved in the process.

  • keep_changes (bool) – If True then changes in the dataset are persistent, so each perturbation step depends on the previous ones.

Returns

BDM value change.

Return type

float

Examples

>>> from pybdm import BDM
>>> bdm = BDM(ndim=1)
>>> X = np.ones((30, ), dtype=int)
>>> perturbation = PerturbationExperiment(bdm, X)
>>> perturbation.perturb((10, ), -1) 
26.91763012739709
run(idx=None, values=None, keep_changes=False)[source]

Run perturbation experiment.

Parameters
  • idx (array_like or None) – Numpy integer array providing indexes (in rows) of elements to perturb. If None then all elements are perturbed.

  • values (array_like or None) – Value to assign during perturbation. Negative values correspond to changing value to other randomly selected symbols from the alphabet. If None then all values are assigned this way. If set then its dimensions must agree with the dimensions of idx (they are horizontally stacked).

  • keep_changes (bool) – If True then changes in the dataset are persistent, so each perturbation step depends on the previous ones.

Returns

1D float array with perturbation values.

Return type

array_like

Examples

>>> from pybdm import BDM
>>> bdm = BDM(ndim=1)
>>> X = np.ones((30, ), dtype=int)
>>> perturbation = PerturbationExperiment(bdm, X)
>>> changes = np.array([10, 20])
>>> perturbation.run(changes) 
array([26.91763013, 27.34823681])
set_data(X)[source]

Set dataset for the perturbation experiment.

Parameters

X (array_like) – Dataset to perturb.

property shape

Data shape getter.

property size

Data size getter.

pybdm.bdm module

Block Decomposition Method

BDM class provides a top-level interface for configuring an instance of a block decomposition method as well as running actual computations approximating algorithmic complexity of given datasets.

Configuration step is necessary for specifying dimensionality of allowed datasets, reference CTM data as well as boundary conditions for block decomposition etc. This is why BDM is implemented in an object-oriented fashion, so an instance can be first configured properly and then it exposes a public method BDM.bdm() for computing approximated complexity via BDM.

class pybdm.bdm.BDM(ndim, nsymbols=2, shape=None, partition=<class 'pybdm.partitions.PartitionIgnore'>, ctmname=None, warn_if_missing_ctm=True, raise_if_zero=True, **kwds)[source]

Bases: object

Block decomposition method.

ndim

Number of dimensions of target dataset objects. Positive integer.

Type

int

nsymbols

Number of symbols in the alphabet.

Type

int

partition

Partition algorithm class object. The class is called with the shape attribute determined automatically if not passed and other attributes passed via **kwds.

Type

Partition class

ctmname

Name of the CTM dataset. If None then a CTM dataset is selected automatically based on ndim and nsymbols.

Type

str

warn_if_missing_ctm

Should BDMRuntimeWarning be sent in the case there is missing CTM value. Some CTM values may be missing for larger alphabets as it is computationally infeasible to explore entire parts space. Missing CTM values are imputed with mean CTM complexities over all parts of a given shape. This can be also disabled globally with the global option of the same name, i.e. pybdm.options.set(warn_if_missing_ctm=False).

Type

bool

raise_if_zero

Should error be raised if BDM value is zero. Zero value indicates that a dataset could have incorrect dimensions. This can be also disabled globally with the global option of the same name, i.e. pybdm.options.set(raise_if_zero=False).

Type

bool

Notes

Block decomposition method in PyBDM is implemented in an object-oriented fashion. This design choice was dictated by the fact that BDM can not be applied willy-nilly to any dataset, but has to be configured for a particular type of data (e.g. binary matrices). Hence, it is more convenient to first configure and instatiate a particular instance of BDM and the apply it freely to data instead of passing a lot of arguments at every call.

BDM has also natural structure corresponding to the so-called split-apply-combine strategy in data analysis. First, a large dataset it decomposed into smaller block for which precomputed CTM values can be efficiently looked up. Then CTM values for slices are aggregated in a theory-informed way into a global approximation of complexity of the full dataset. Thus, BDM computations naturally decomposes into four stages:

  1. Partition (decomposition) stage. First a dataset is decomposed into block. This is done by the decompose() method. The method itself is dependent on the partition attribute which points to a pybdm.partitions object, which implements and configures a particular variant of the decomposition algorithm. Detailed description of the available algorithms can be found in Theory & Design.

  2. Lookup stage. At this stage CTM values for blocks are looked up. This is when the CTM reference dataset is used. It is implemented in the :py:meth`lookup` method.

  3. Count stage. Unique dataset blocks are counted and arranged in an efficient data structure together with their CTM values.

  4. Aggregate stage. Final BDM value is computed based on block counter data structure.

See also

pybdm.ctmdata

available CTM datasets

pybdm.partitions

available partition and boundary condition classes

bdm(X, normalized=False, check_data=True)[source]

Approximate complexity of a dataset with BDM.

Parameters
  • X (array_like) – Dataset representation as a numpy.ndarray. Number of axes must agree with the ndim attribute.

  • normalized (bool) – Should BDM be normalized to be in the [0, 1] range.

  • check_data (bool) – Should data format be checked. May be disabled to gain some speed when calling multiple times.

Returns

Approximate algorithmic complexity.

Return type

float

Raises
  • TypeError – If X is not an integer array and check_data=True.

  • ValueError – If X has more than nsymbols unique values and check_data=True.

  • ValueError – If X has symbols outside of the 0 to nsymbols-1 range and check_data=True.

  • ValueError – If computed BDM value is 0 and raise_if_zero is True.

Notes

Detailed description can be found in Theory & Design.

Examples

>>> import numpy as np
>>> bdm = BDM(ndim=2)
>>> bdm.bdm(np.ones((12, 12), dtype=int)) 
25.176631293734488
compute_bdm(*counters)[source]

Approximate Kolmogorov complexity based on the BDM formula.

Parameters

*counters – Counter objects grouping object keys and occurences.

Returns

Approximate algorithmic complexity.

Return type

float

Notes

Detailed description can be found in Theory & Design.

Examples

>>> from collections import Counter
>>> bdm = BDM(ndim=1)
>>> c1 = Counter([('111111111111', 1.95207842085224e-08)])
>>> c2 = Counter([('111111111111', 1.95207842085224e-08)])
>>> bdm.compute_bdm(c1, c2) 
1.000000019520784
compute_ent(*counters)[source]

Compute block entropy from a counter obejct.

Parameters

*counters – Counter objects grouping object keys and occurences.

Returns

Block entropy in base 2.

Return type

float

Examples

>>> from collections import Counter
>>> bdm = BDM(ndim=1)
>>> c1 = Counter([('111111111111', 1.95207842085224e-08)])
>>> c2 = Counter([('000000000000', 1.95207842085224e-08)])
>>> bdm.compute_ent(c1, c2) 
1.0
count(ctms)[source]

Count unique blocks.

Parameters

ctms (sequence of 2-tuples) – Ordered 1D sequence of string keys and CTM values.

Returns

Set of unique blocks with their CTM values and numbers of occurences.

Return type

Counter

Examples

>>> bdm = BDM(ndim=1)
>>> data = np.ones((24, ), dtype=int)
>>> parts = bdm.decompose(data)
>>> ctms = bdm.lookup(parts)
>>> bdm.count(ctms) 
Counter({('111111111111', 25.610413747641715): 2})
decompose(X)[source]

Decompose a dataset into blocks.

Parameters

x (array_like) – Dataset of arbitrary dimensionality represented as a Numpy array.

Yields

array_like – Dataset blocks.

Raises

AttributeError – If blocks’ shape and dataset’s shape have different numbers of axes.

Acknowledgments

Special thanks go to Paweł Weroński for the help with the design of the non-recursive partition algorithm.

Examples

>>> bdm = BDM(ndim=2, shape=(2, 2))
>>> [ x for x in bdm.decompose(np.ones((4, 3), dtype=int)) ]
[array([[1, 1],
       [1, 1]]), array([[1, 1],
       [1, 1]])]
decompose_and_count(X, lookup_ctm=True)[source]

Decompose and count blocks.

Parameters
  • X (array_like) – Dataset representation as a numpy.ndarray. Number of axes must agree with the ndim attribute.

  • lookup_ctm (bool) – Should CTM values be looked up.

Returns

Lookup table mapping 2-tuples with string keys and CTM values to numbers of occurences.

Return type

collections.Counter

Notes

This is equivalent to calling decompose(), lookup() and count().

Examples

>>> import numpy as np
>>> bdm = BDM(ndim=1)
>>> bdm.decompose_and_count(np.ones((12, ), dtype=int)) 
Counter({('111111111111', 25.610413747641715): 1})
ent(X, normalized=False, check_data=True)[source]

Block entropy of a dataset.

Parameters
  • X (array_like) – Dataset representation as a numpy.ndarray. Number of axes must agree with the ndim attribute.

  • normalized (bool) – Should entropy be normalized to be in the [0, 1] range.

  • check_data (bool) – Should data format be checked. May be disabled to gain some speed when calling multiple times.

Returns

Block entropy in base 2.

Return type

float

Raises
  • TypeError – If X is not an integer array and check_data=True.

  • ValueError – If X has more than nsymbols unique values and check_data=True.

  • ValueError – If X has symbols outside of the 0 to nsymbols-1 range and check_data=True.

Examples

>>> import numpy as np
>>> bdm = BDM(ndim=2)
>>> bdm.ent(np.ones((12, 12), dtype=int)) 
0.0
lookup(blocks, lookup_ctm=True)[source]

Lookup CTM values for blocks in a reference dataset.

Parameters
  • blocks (sequence) – Ordered sequence of dataset parts.

  • lookup_ctm (bool) – Should CTM values be looked up.

Yields

tuple – 2-tuple with string representation of a dataset part and its CTM value.

Raises

KeyError – If key of an object can not be found in the reference CTM lookup table.

Warns

BDMRuntimeWarning – If warn_if_missing_ctm=True and there is no precomputed CTM value for a part during the lookup stage. This can be always disabled with the global option of the same name.

Examples

>>> bdm = BDM(ndim=1)
>>> data = np.ones((12, ), dtype=int)
>>> parts = bdm.decompose(data)
>>> [ x for x in bdm.lookup(parts) ] 
[('111111111111', 25.610413747641715)]
nbdm(X, **kwds)[source]

Alias for normalized BDM

Other arguments are passed as keywords.

See also

bdm()

BDM method

nent(X, **kwds)[source]

Alias for normalized block entropy.

Other arguments are passed as keywords.

See also

ent()

block entropy method

pybdm.encoding module

Encoding and decoding of arrays with fixed number of unique symbols.

While computing BDM dataset blocks have to be encoded into simple hashable objects such as strings or integers for efficient lookup of CTM values from reference datasets.

Currently string-based keys are used in CTM datasets. However, this may be changed to integer keys in the future in order to lower the memory footprint.

Integer encoding can be also used for easy generation of objects of fixed dimensionality as each such object using a fixed, finite alphabet of symbols can be uniquely mapped to an integer code.

pybdm.encoding.array_from_string(x, shape, cast_to=<class 'int'>)[source]

Make array from string code.

Parameters
  • x (str) – String code.

  • shape (tuple) – Desired shape of the output array.

  • cast_to (type or None) – Cast array to given type. No casting if None. Defaults to integer type.

Returns

Array encoded in the string code.

Return type

array_like

Examples

>>> array_from_string('1010', shape=(4,))
array([1, 0, 1, 0])
>>> array_from_string('1000', shape=(2, 2))
array([[1, 0],
       [0, 0]])
pybdm.encoding.decode_array(code, shape, base=2, **kwds)[source]

Decode array of integer-symbols from a sequence code.

Parameters
  • code (int) – Non-negative integer.

  • shape (tuple of ints) – Expected array shape.

  • base (int) – Encoding base.

  • **kwds – Keyword arguments passed to numpy.reshape().

Returns

Numpy array.

Return type

array_like

pybdm.encoding.decode_sequence(code, base=2, min_length=None)[source]

Decode sequence from a sequence code.

Parameters
  • code (int) – Non-negative integer.

  • base (int) – Encoding base. Should be equal to the number of unique symbols in the alphabet.

  • min_length (int or None) – Minimal number of represented bits. Use shortest representation if None.

Returns

1D Numpy array.

Return type

array_like

Examples

>>> decode_sequence(4)
array([1, 0, 0])
pybdm.encoding.encode_array(x, base=2, **kwds)[source]

Encode array of integer-symbols.

Parameters
  • x ((N, k) array_like) – Array of integer symbols.

  • base (int) – Encoding base.

  • **kwds – Keyword arguments passed to numpy.ravel().

Returns

Integer code of an array.

Return type

int

pybdm.encoding.encode_sequence(seq, base=2)[source]

Encode sequence of integer-symbols.

Parameters
  • seq ((N, ) array_like) – Sequence of integer symbols represented as 1D Numpy array.

  • base (int) – Encoding base. Should be equal to the number of unique symbols in the alphabet.

Returns

Integer code of a sequence.

Return type

int

Raises
  • AttributeError – If seq is not 1D.

  • TypeError – If seq is not of integer type.

  • ValueError – If seq contain values which are negative or beyond the size of the alphabet (encoding base).

Examples

>>> encode_sequence(np.array([1, 0, 0]))
4
pybdm.encoding.normalize_array(X)[source]

Normalize array so symbols are consecutively mapped to 0, 1, 2, …

Parameters

X (array_like) – Numpy array of arbitrary dimensions.

Returns

Numpy array of the same dimensions with mapped symbols.

Return type

array_like

Examples

>>> X = np.array([1, 2, 3], dtype=int)
>>> normalize_array(X)
array([0, 1, 2])
>>> X = np.array([[1,2],[2,1]], dtype=int)
>>> normalize_array(X)
array([[0, 1],
       [1, 0]])
pybdm.encoding.normalize_key(key)[source]

Normalize part key so symbols are consecutively mapped to 0, 1, 2, …

Parameters

key (str) – Part key as returned by string_from_array().

Returns

Normalized key with mapped symbols.

Return type

str

Examples

>>> normalize_key('123')
'012'
>>> normalize_key('40524')
'01230'
pybdm.encoding.string_from_array(arr)[source]

Encode an array as a string code.

Parameters

arr ((N, k) array_like) – Numpy array.

Returns

String code of an array.

Return type

str

Examples

>>> string_from_array(np.array([1, 0, 0]))
'100'
>>> string_from_array(np.array([[1,0], [3,4]]))
'1034'

pybdm.exceptions module

PyBDM exception and warning classes.

exception pybdm.exceptions.BDMConfigurationError[source]

Bases: AttributeError

General BDM configuration error.

exception pybdm.exceptions.BDMRuntimeWarning[source]

Bases: RuntimeWarning

General BDM related runtime warning class.

exception pybdm.exceptions.CTMDatasetNotFoundError[source]

Bases: LookupError

Missing CTM exception class.

pybdm.options module

Global package options.

pybdm.options.warn_if_missing_ctm

Should warnings for missing CTM values be sent.

Type

bool

pybdm.options.raise_if_zero

Should error be raised in the case of zero BDM value, which is usually indicative of malformed data.

Type

bool

pybdm.options.get(name=None)[source]

Get option value or options dict.

Parameters

name (str or None) – If None then the copy of the option dict is returned. If str then the given option value is returned.

See also

set_options()

description of the available global options

Raises

KeyError – If name does not give a proper option name.

pybdm.options.set(warn_if_missing_ctm=None, raise_if_zero=None)[source]

Set global package options.

Parameters
  • warn_if_missing_ctm (bool) – Should warnings for missing CTM values be sent.

  • raise_if_zero (bool) – Should error be raised in the case of zero BDM value, which is usually indicative of malformed data.

pybdm.partitions module

Partition algorithm classes.

Partition algorithms are used during the decomposition stage of BDM (see Theory & Design and pybdm.bdm), in which datasets are sliced into blocks of appropriate sizes.

Decomposition can be done in multiple ways that handles boundaries differently. This is why partition algorithms have to be properly configured, so it is well-specified what approach exactly is to be used.

class pybdm.partitions.PartitionCorrelated(shape, shift=1)[source]

Bases: pybdm.partitions.PartitionIgnore

Partition with the ‘correlated’ boundary condition.

shape

Part shape.

Type

tuple

shift

Shift parameter for the sliding window.

Type

int (positive)

Notes

See Theory & Design for a detailed description.

Raises

AttributeError – If shift is not positive.

decompose(X)[source]

Decompose with the ‘correlated’ boundary.

_Partition.decompose(X)

Decompose a dataset into blocks.

Parameters

x (array_like) – Dataset of arbitrary dimensionality represented as a Numpy array.

Yields

array_like – Dataset blocks.

name = 'correlated'
property params
class pybdm.partitions.PartitionIgnore(shape)[source]

Bases: pybdm.partitions._Partition

Partition with the ‘ignore’ boundary condition.

shape

Part shape.

Type

tuple

Notes

See Theory & Design for a detailed description.

decompose(X)[source]

Decompose with the ‘ignore’ boundary.

_Partition.decompose(X)

Decompose a dataset into blocks.

Parameters

x (array_like) – Dataset of arbitrary dimensionality represented as a Numpy array.

Yields

array_like – Dataset blocks.

name = 'ignore'
class pybdm.partitions.PartitionRecursive(shape, min_length=2)[source]

Bases: pybdm.partitions._Partition

Partition with the ‘recursive’ boundary condition.

shape

Part shape.

Type

tuple

min_length

Minimum parts’ length. Non-negative. In case of multidimensional objects it specifies minimum length of any single dimension.

Type

int

Notes

See Theory & Design for a detailed description.

decompose(X)[source]

Decompose with the ‘recursive’ boundary.

_Partition.decompose(X)

Decompose a dataset into blocks.

Parameters

x (array_like) – Dataset of arbitrary dimensionality represented as a Numpy array.

Yields

array_like – Dataset blocks.

name = 'recursive'
property params

pybdm.utils module

Utility functions.

pybdm.utils.decompose_dataset(X, shape, shift=0)[source]

Decompose a dataset into blocks.

Parameters
  • X (array_like) – Daataset represented as a Numpy array.

  • shape (tuple) – Slice shape.

  • shift (int) – Shift value for slicing. Nonoverlaping slicing if non-positive.

Yields

array_like – Dataset blocks.

Examples

>>> import numpy as np
>>> X = np.ones((5, 3), dtype=int)
>>> [ x for x in decompose_dataset(X, (3, 3)) ]
[array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]]), array([[1, 1, 1],
       [1, 1, 1]])]
pybdm.utils.get_ctm_dataset[source]

Get CTM dataset by name.

This function uses a global cache, so each CTM dataset is loaded to the memory only once.

Parameters

name (str) – Name of a dataset.

Returns

CTM lookup table.

Return type

dict

Raises

ValueError – If non-existent CTM dataset is requested.

pybdm.utils.iter_part_shapes(X, shape, shift=0)[source]

Iterate over part shapes induced by slicing.

Parameters
  • X (array_like) – Dataset represented as a Numpy array.

  • shape (tuple) – Slice shape.

  • shift (int) – Shift value for slicing. Nonoverlaping slicing if non-positive.

Yields

tuple – Part shapes.

Examples

>>> import numpy as np
>>> X = np.ones((5, 3), dtype=int)
>>> [ x for x in iter_part_shapes(X, (3, 3)) ]
[(3, 3), (2, 3)]
pybdm.utils.iter_slices(X, shape, shift=0)[source]

Iter over slice indices of a dataset.

Slicing is done in a way that ensures that only pieces on boundaries of the sliced dataset can have leftovers in regard to a specified shape.

Parameters
  • X (array_like) – Daataset represented as a Numpy array.

  • shape (tuple) – Slice shape.

  • shift (int) – Shift value for slicing. Nonoverlaping slicing if non-positive.

Yields

slice – Slice indices.

Examples

>>> import numpy as np
>>> X = np.ones((5, 3), dtype=int)
>>> [ x for x in iter_slices(X, (3, 3)) ]
[(slice(0, 3, None), slice(0, 3, None)), (slice(3, 5, None), slice(0, 3, None))]
pybdm.utils.list_ctm_datasets()[source]

Get a list of available precomputed CTM datasets.

Examples

>>> list_ctm_datasets()
['CTM-B2-D12', 'CTM-B2-D4x4', 'CTM-B4-D12', 'CTM-B5-D12', 'CTM-B6-D12', 'CTM-B9-D12']
pybdm.utils.prod(seq)[source]

Product of a sequence of numbers.

Parameters

seq (sequence) – A sequence of numbers.

Returns

Product of numbers.

Return type

float or int

Notes

This is defined as:

\[\prod_{i=1}^n x_i\]

Module contents

PyBDM: Block Decomposition Method

This package provides the pybdm.BDM class for computing approximated algorithmic complexity of arbitrarily large binary 1D and 2D arrays as well as 1D arrays with 4, 5, 6 or 9 unique symbols based on the Block Decomposition Method (BDM). Theory and the design of the package are described in Theory & Design.