NaN-handling Classes
These classes are designed to handle datasets that contain NaN values, similar to numpy’s nan* functions. They compute statistics by ignoring NaN values.
BatchNanSum
- class batchstats.nanstats.BatchNanSum(axis=0)
Class for calculating the sum of batches of data that can contain NaN values.
The algorithm is a simple cumulative sum, ignoring NaN values.
import numpy as np from batchstats import BatchNanSum # create some data with NaNs data1 = np.array([[1, 2], [3, np.nan]]) data2 = np.array([[5, 6], [np.nan, 8]]) # create a BatchNanSum object bns = BatchNanSum() # update with the first batch bns.update_batch(data1) # update with the second batch bns.update_batch(data2) # get the sum total_sum = bns() # verify the result expected_sum = np.array([9., 16.]) np.testing.assert_allclose(total_sum, expected_sum)Example with multiple axes and data > 2 dimensions
# create some 3d data with NaNs data1 = np.arange(24).reshape(2, 3, 4).astype(float) data1[0, 1, 1] = np.nan data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float) data2[1, 2, 0] = np.nan # create a BatchNanSum object to sum over the last two axes bns = BatchNanSum(axis=(1, 2)) # update with the first batch bns.update_batch(data1) # update with the second batch bns.update_batch(data2) # get the sum total_sum = bns() # verify the result d = np.concatenate((data1, data2)) expected_sum = np.nansum(d, axis=(1,2)) np.testing.assert_allclose(total_sum, expected_sum)- __call__()
Calculate the sum of the batches that can contain NaN values.
- Returns:
Sum of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)
Initialize the BatchNanSum object.
- update_batch(batch)
Update the sum with a new batch of data that can contain NaN values.
- Parameters:
batch (numpy.ndarray) – Input batch.
- Returns:
Updated BatchNanSum object.
- Return type:
BatchNanMean
- class batchstats.nanstats.BatchNanMean(axis=0)
Class for calculating the mean of batches of data that can contain NaN values.
The algorithm uses BatchNanSum to compute the sum and the number of valid samples, then divides the sum by the number of samples to get the mean.
import numpy as np from batchstats import BatchNanMean # create some data with NaNs data1 = np.array([[1, 2], [3, np.nan]]) data2 = np.array([[5, 6], [np.nan, 8]]) # create a BatchNanMean object bnm = BatchNanMean() # update with the first batch bnm.update_batch(data1) # update with the second batch bnm.update_batch(data2) # get the mean total_mean = bnm() # verify the result expected_mean = np.array([3., 5.33333333]) np.testing.assert_allclose(total_mean, expected_mean)Example with multiple axes and data > 2 dimensions
# create some 3d data with NaNs data1 = np.arange(24).reshape(2, 3, 4).astype(float) data1[0, 1, 1] = np.nan data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float) data2[1, 2, 0] = np.nan # create a BatchNanMean object to get the mean over the last two axes bnm = BatchNanMean(axis=(1, 2)) # update with the first batch bnm.update_batch(data1) # update with the second batch bnm.update_batch(data2) # get the mean total_mean = bnm() # verify the result d = np.concatenate((data1, data2)) expected_mean = np.nanmean(d, axis=(1,2)) np.testing.assert_allclose(total_mean, expected_mean)- __call__()
Calculate the mean of the batches that can contain NaN values.
- Returns:
Mean of the batches.
- Return type:
numpy.ndarray
- __init__(axis=0)
Initialize the BatchNanMean object.
- update_batch(batch)
Update the mean with a new batch of data that can contain NaN values.
- Parameters:
batch (numpy.ndarray) – Input batch.
- Returns:
Updated BatchNanMean object.
- Return type:
BatchNanMin
- class batchstats.nanstats.BatchNanMin(axis=0)
Class for calculating the minimum of batches of data that can contain NaN values.
The algorithm keeps track of the element-wise minimum. When a new batch is added, the element-wise minimum between the current minimum and the new batch’s minimum is computed, ignoring NaNs.
import numpy as np from batchstats import BatchNanMin # create some data with NaNs data1 = np.array([[1, 2], [3, np.nan]]) data2 = np.array([[5, 6], [np.nan, 8]]) # create a BatchNanMin object bnm = BatchNanMin() # update with the first batch bnm.update_batch(data1) # update with the second batch bnm.update_batch(data2) # get the minimum total_min = bnm() # verify the result expected_min = np.array([1., 2.]) np.testing.assert_allclose(total_min, expected_min)Example with multiple axes and data > 2 dimensions
# create some 3d data with NaNs data1 = np.arange(24).reshape(2, 3, 4).astype(float) data1[0, 1, 1] = np.nan data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float) data2[1, 2, 0] = np.nan # create a BatchNanMin object to get the min over the last two axes bnm = BatchNanMin(axis=(1, 2)) # update with the first batch bnm.update_batch(data1) # update with the second batch bnm.update_batch(data2) # get the min total_min = bnm() # verify the result d = np.concatenate((data1, data2)) expected_min = np.nanmin(d, axis=(1,2)) np.testing.assert_allclose(total_min, expected_min)- __call__() ndarray
Calculate the minimum.
- Returns:
Minimum of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)
Initialize the BatchNanStat object.
- update_batch(batch)
Update the minimum with a new batch of data that can contain NaN values.
- Parameters:
batch (numpy.ndarray) – Input batch.
- Returns:
Updated BatchNanMin object.
- Return type:
BatchNanMax
- class batchstats.nanstats.BatchNanMax(axis=0)
Class for calculating the maximum of batches of data that can contain NaN values.
The algorithm keeps track of the element-wise maximum. When a new batch is added, the element-wise maximum between the current maximum and the new batch’s maximum is computed, ignoring NaNs.
import numpy as np from batchstats import BatchNanMax # create some data with NaNs data1 = np.array([[1, 2], [3, np.nan]]) data2 = np.array([[5, 6], [np.nan, 8]]) # create a BatchNanMax object bnm = BatchNanMax() # update with the first batch bnm.update_batch(data1) # update with the second batch bnm.update_batch(data2) # get the maximum total_max = bnm() # verify the result expected_max = np.array([5., 8.]) np.testing.assert_allclose(total_max, expected_max)Example with multiple axes and data > 2 dimensions
# create some 3d data with NaNs data1 = np.arange(24).reshape(2, 3, 4).astype(float) data1[0, 1, 1] = np.nan data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float) data2[1, 2, 0] = np.nan # create a BatchNanMax object to get the max over the last two axes bnm = BatchNanMax(axis=(1, 2)) # update with the first batch bnm.update_batch(data1) # update with the second batch bnm.update_batch(data2) # get the max total_max = bnm() # verify the result d = np.concatenate((data1, data2)) expected_max = np.nanmax(d, axis=(1,2)) np.testing.assert_allclose(total_max, expected_max)- __call__() ndarray
Calculate the maximum.
- Returns:
Maximum of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)
Initialize the BatchNanStat object.
- update_batch(batch)
Update the maximum with a new batch of data that can contain NaN values.
- Parameters:
batch (numpy.ndarray) – Input batch.
- Returns:
Updated BatchNanMax object.
- Return type:
BatchNanPeakToPeak
- class batchstats.nanstats.BatchNanPeakToPeak(axis=0)
Class for calculating the peak-to-peak (max - min) of batches of data that can contain NaN values.
This class uses BatchNanMax and BatchNanMin internally to keep track of the element-wise maximum and minimum values. The peak-to-peak value is the difference between the maximum and minimum.
import numpy as np from batchstats import BatchNanPeakToPeak # create some data with NaNs data1 = np.array([[1, 2], [3, np.nan]]) data2 = np.array([[5, 6], [np.nan, 8]]) # create a BatchNanPeakToPeak object bnpp = BatchNanPeakToPeak() # update with the first batch bnpp.update_batch(data1) # update with the second batch bnpp.update_batch(data2) # get the peak-to-peak total_ptp = bnpp() # verify the result expected_ptp = np.array([4., 6.]) np.testing.assert_allclose(total_ptp, expected_ptp)Example with multiple axes and data > 2 dimensions
# create some 3d data with NaNs data1 = np.arange(24).reshape(2, 3, 4).astype(float) data1[0, 1, 1] = np.nan data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float) data2[1, 2, 0] = np.nan # create a BatchNanPeakToPeak object to get the ptp over the last two axes bnpp = BatchNanPeakToPeak(axis=(1, 2)) # update with the first batch bnpp.update_batch(data1) # update with the second batch bnpp.update_batch(data2) # get the ptp total_ptp = bnpp() # verify the result d = np.concatenate((data1, data2)) expected_ptp = np.nanmax(d, axis=(1,2)) - np.nanmin(d, axis=(1,2)) np.testing.assert_allclose(total_ptp, expected_ptp)- __call__() ndarray
Calculate the peak-to-peak.
- Returns:
Peak-to-peak of the batches.
- Return type:
numpy.ndarray
- Raises:
NoValidSamplesError – If no valid samples are available.
- __init__(axis=0)
Initialize the BatchNanStat object.
- update_batch(batch)
Update the peak-to-peak with a new batch of data that can contain NaN values.
- Parameters:
batch (numpy.ndarray) – Input batch.
- Returns:
Updated BatchNanPeakToPeak object.
- Return type: