Core Classes¶
These are the standard classes for computing statistics on datasets. If your data contains NaN values, they will be handled by removing the entire sample (row) that contains them.
- class batchstats.BatchSum(axis=0)¶
Class for calculating the sum of batches of data.
The algorithm used is a simple cumulative sum. Each time a new batch is added, the sum of the new batch is added to the existing sum.
import numpy as np from batchstats import BatchSum # create some data data1 = np.array([[1, 2], [3, 4]]) data2 = np.array([[5, 6], [7, 8]]) # create a BatchSum object bs = BatchSum() # update with the first batch bs.update_batch(data1) # update with the second batch bs.update_batch(data2) # get the sum total_sum = bs() # verify the result expected_sum = np.array([16, 20]) np.testing.assert_allclose(total_sum, expected_sum)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchSum object to sum over the last two axes bs = BatchSum(axis=(1, 2)) # update with the first batch bs.update_batch(data1) # update with the second batch bs.update_batch(data2) # get the sum total_sum = bs() # verify the result expected_sum = data1.sum(axis=(1,2)) + data2.sum(axis=(1,2)) np.testing.assert_allclose(total_sum, expected_sum)- __call__() ndarray¶
Calculate the sum.
- Returns:
Sum of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)¶
- class batchstats.BatchMean(axis=0)¶
Class for calculating the mean of batches of data.
The algorithm uses an incremental mean calculation. The new mean is computed from the previous mean, the new data, and the number of samples.
import numpy as np from batchstats import BatchMean # create some data data1 = np.array([[1, 2], [3, 4]]) data2 = np.array([[5, 6], [7, 8]]) # create a BatchMean object bm = BatchMean() # update with the first batch bm.update_batch(data1) # update with the second batch bm.update_batch(data2) # get the mean total_mean = bm() # verify the result expected_mean = np.array([4., 5.]) np.testing.assert_allclose(total_mean, expected_mean)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchMean object to get the mean over the last two axes bm = BatchMean(axis=(1, 2)) # update with the first batch bm.update_batch(data1) # update with the second batch bm.update_batch(data2) # get the mean total_mean = bm() # verify the result d = np.concatenate((data1, data2)) expected_mean = d.mean(axis=(1,2)) np.testing.assert_allclose(total_mean, expected_mean)- __call__() ndarray¶
Calculate the mean.
- Returns:
Mean of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)¶
- class batchstats.BatchVar(axis=0, ddof=0)¶
Class for calculating the variance of batches of data.
The algorithm is an implementation of Welford’s online algorithm for computing variance. It is numerically stable and avoids a two-pass approach.
- Parameters:
ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
import numpy as np from batchstats import BatchVar # create some data data1 = np.array([[1, 2], [3, 4]]) data2 = np.array([[5, 6], [7, 8]]) # create a BatchVar object bv = BatchVar() # update with the first batch bv.update_batch(data1) # update with the second batch bv.update_batch(data2) # get the variance total_var = bv() # verify the result expected_var = np.array([5., 5.]) np.testing.assert_allclose(total_var, expected_var)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchVar object to get the var over the last two axes bv = BatchVar(axis=(1, 2)) # update with the first batch bv.update_batch(data1) # update with the second batch bv.update_batch(data2) # get the var total_var = bv() # verify the result d = np.concatenate((data1, data2)) expected_var = d.var(axis=(1,2)) np.testing.assert_allclose(total_var, expected_var)- __call__() ndarray¶
Calculate the variance.
- Returns:
Variance of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0, ddof=0)¶
- class batchstats.BatchStd(axis=0, ddof=0)¶
Class for calculating the standard deviation of batches of data.
This class uses BatchVar internally and takes the square root of the result.
- Parameters:
ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
import numpy as np from batchstats import BatchStd # create some data data1 = np.array([[1, 2], [3, 4]]) data2 = np.array([[5, 6], [7, 8]]) # create a BatchStd object bs = BatchStd() # update with the first batch bs.update_batch(data1) # update with the second batch bs.update_batch(data2) # get the standard deviation total_std = bs() # verify the result expected_std = np.array([2.23606798, 2.23606798]) np.testing.assert_allclose(total_std, expected_std)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchStd object to get the std over the last two axes bs = BatchStd(axis=(1, 2)) # update with the first batch bs.update_batch(data1) # update with the second batch bs.update_batch(data2) # get the std total_std = bs() # verify the result d = np.concatenate((data1, data2)) expected_std = d.std(axis=(1,2)) np.testing.assert_allclose(total_std, expected_std)- __call__() ndarray¶
Calculate the standard deviation.
- Returns:
Standard deviation of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0, ddof=0)¶
- update_batch(batch, assume_valid=False)¶
Update the standard deviation with a new batch of data.
- Parameters:
batch (numpy.ndarray) – Input batch.
assume_valid (bool, optional) – If True, assumes all elements in the batch are valid. Default is False.
- Returns:
Updated BatchStd object.
- Return type:
- class batchstats.BatchCov(ddof=0)¶
Class for calculating the covariance of batches of data.
The algorithm is an implementation of an online covariance calculation. It is numerically stable and avoids a two-pass approach.
- Parameters:
ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
import numpy as np from batchstats import BatchCov # create some data data1 = np.array([[1, 2], [3, 4]]) data2 = np.array([[5, 6], [7, 8]]) # create a BatchCov object bc = BatchCov() # update with the first batch bc.update_batch(data1) # update with the second batch bc.update_batch(data2) # get the covariance total_cov = bc() # verify the result expected_cov = np.array([[5., 5.], [5., 5.]]) np.testing.assert_allclose(total_cov, expected_cov)- __call__() ndarray¶
Calculate the covariance.
- Returns:
Covariance of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(ddof=0)¶
- update_batch(batch1, batch2=None, assume_valid=False)¶
Update the covariance with new batches of data.
- Parameters:
batch1 (numpy.ndarray) – Input batch 1.
batch2 (numpy.ndarray, optional) – Input batch 2. Default is None.
assume_valid (bool, optional) – If True, assumes all elements in the batches are valid. Default is False.
- Returns:
Updated BatchCov object.
- Return type:
- class batchstats.BatchMin(axis=0)¶
Class for calculating the minimum of batches of data.
The algorithm keeps track of the element-wise minimum. When a new batch is added, the element-wise minimum between the current minimum and the new batch’s minimum is computed.
import numpy as np from batchstats import BatchMin # create some data data1 = np.array([[1, 8], [3, 4]]) data2 = np.array([[5, 6], [7, 2]]) # create a BatchMin object bm = BatchMin() # update with the first batch bm.update_batch(data1) # update with the second batch bm.update_batch(data2) # get the minimum total_min = bm() # verify the result expected_min = np.array([1, 2]) np.testing.assert_allclose(total_min, expected_min)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchMin object to get the min over the last two axes bm = BatchMin(axis=(1, 2)) # update with the first batch bm.update_batch(data1) # update with the second batch bm.update_batch(data2) # get the min total_min = bm() # verify the result expected_min = np.minimum(data1.min(axis=(1,2)), data2.min(axis=(1,2))) np.testing.assert_allclose(total_min, expected_min)- __call__() ndarray¶
Calculate the minimum.
- Returns:
Minimum of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)¶
- class batchstats.BatchMax(axis=0)¶
Class for calculating the maximum of batches of data.
The algorithm keeps track of the element-wise maximum. When a new batch is added, the element-wise maximum between the current maximum and the new batch’s maximum is computed.
import numpy as np from batchstats import BatchMax # create some data data1 = np.array([[1, 8], [3, 4]]) data2 = np.array([[5, 6], [7, 2]]) # create a BatchMax object bm = BatchMax() # update with the first batch bm.update_batch(data1) # update with the second batch bm.update_batch(data2) # get the maximum total_max = bm() # verify the result expected_max = np.array([7, 8]) np.testing.assert_allclose(total_max, expected_max)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchMax object to get the max over the last two axes bm = BatchMax(axis=(1, 2)) # update with the first batch bm.update_batch(data1) # update with the second batch bm.update_batch(data2) # get the max total_max = bm() # verify the result expected_max = np.maximum(data1.max(axis=(1,2)), data2.max(axis=(1,2))) np.testing.assert_allclose(total_max, expected_max)- __call__() ndarray¶
Calculate the maximum.
- Returns:
Maximum of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)¶
- class batchstats.BatchPeakToPeak(axis=0)¶
Class for calculating the peak-to-peak (max - min) of batches of data.
This class uses BatchMax and BatchMin internally to keep track of the element-wise maximum and minimum values. The peak-to-peak value is the difference between the maximum and minimum.
import numpy as np from batchstats import BatchPeakToPeak # create some data data1 = np.array([[1, 8], [3, 4]]) data2 = np.array([[5, 6], [7, 2]]) # create a BatchPeakToPeak object bpp = BatchPeakToPeak() # update with the first batch bpp.update_batch(data1) # update with the second batch bpp.update_batch(data2) # get the peak-to-peak total_ptp = bpp() # verify the result expected_ptp = np.array([6, 6]) np.testing.assert_allclose(total_ptp, expected_ptp)Example with multiple axes and data > 2 dimensions
# create some 3d data data1 = np.arange(24).reshape(2, 3, 4) data2 = np.arange(24, 48).reshape(2, 3, 4) # create a BatchPeakToPeak object to get the ptp over the last two axes bpp = BatchPeakToPeak(axis=(1, 2)) # update with the first batch bpp.update_batch(data1) # update with the second batch bpp.update_batch(data2) # get the ptp total_ptp = bpp() # verify the result d = np.concatenate((data1, data2)) expected_ptp = d.max(axis=(1,2)) - d.min(axis=(1,2)) np.testing.assert_allclose(total_ptp, expected_ptp)- __call__() ndarray¶
Calculate the peak-to-peak.
- Returns:
Peak-to-peak of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)¶
- update_batch(batch, assume_valid=False)¶
Update the peak-to-peak with a new batch of data.
- Parameters:
batch (numpy.ndarray) – Input batch.
assume_valid (bool, optional) – If True, assumes all elements in the batch are valid. Default is False.
- Returns:
Updated BatchPeakToPeak object.
- Return type: