NaN-handling Classes

These classes are designed to handle datasets that contain NaN values, similar to numpy’s nan* functions. They compute statistics by ignoring NaN values.

class batchstats.BatchNanSum(axis=0)

Class for calculating the sum of batches of data that can contain NaN values.

The algorithm is a simple cumulative sum, ignoring NaN values.

import numpy as np
from batchstats import BatchNanSum

# create some data with NaNs
data1 = np.array([[1, 2], [3, np.nan]])
data2 = np.array([[5, 6], [np.nan, 8]])

# create a BatchNanSum object
bns = BatchNanSum()

# update with the first batch
bns.update_batch(data1)

# update with the second batch
bns.update_batch(data2)

# get the sum
total_sum = bns()

# verify the result
expected_sum = np.array([9., 16.])
np.testing.assert_allclose(total_sum, expected_sum)

Example with multiple axes and data > 2 dimensions

# create some 3d data with NaNs
data1 = np.arange(24).reshape(2, 3, 4).astype(float)
data1[0, 1, 1] = np.nan
data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float)
data2[1, 2, 0] = np.nan


# create a BatchNanSum object to sum over the last two axes
bns = BatchNanSum(axis=(1, 2))

# update with the first batch
bns.update_batch(data1)

# update with the second batch
bns.update_batch(data2)

# get the sum
total_sum = bns()

# verify the result
d = np.concatenate((data1, data2))
expected_sum = np.nansum(d, axis=(1,2))
np.testing.assert_allclose(total_sum, expected_sum)
__call__()

Calculate the sum of the batches that can contain NaN values.

Returns:

Sum of the batches.

Return type:

numpy.ndarray

Raises:

NoValidSamplesError – If no valid samples are available.

__init__(axis=0)

Initialize the BatchNanSum object.

update_batch(batch)

Update the sum with a new batch of data that can contain NaN values.

Parameters:

batch (numpy.ndarray) – Input batch.

Returns:

Updated BatchNanSum object.

Return type:

BatchNanSum

class batchstats.BatchNanMean(axis=0)

Class for calculating the mean of batches of data that can contain NaN values.

The algorithm uses BatchNanSum to compute the sum and the number of valid samples, then divides the sum by the number of samples to get the mean.

import numpy as np
from batchstats import BatchNanMean

# create some data with NaNs
data1 = np.array([[1, 2], [3, np.nan]])
data2 = np.array([[5, 6], [np.nan, 8]])

# create a BatchNanMean object
bnm = BatchNanMean()

# update with the first batch
bnm.update_batch(data1)

# update with the second batch
bnm.update_batch(data2)

# get the mean
total_mean = bnm()

# verify the result
expected_mean = np.array([3., 5.33333333])
np.testing.assert_allclose(total_mean, expected_mean)

Example with multiple axes and data > 2 dimensions

# create some 3d data with NaNs
data1 = np.arange(24).reshape(2, 3, 4).astype(float)
data1[0, 1, 1] = np.nan
data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float)
data2[1, 2, 0] = np.nan

# create a BatchNanMean object to get the mean over the last two axes
bnm = BatchNanMean(axis=(1, 2))

# update with the first batch
bnm.update_batch(data1)

# update with the second batch
bnm.update_batch(data2)

# get the mean
total_mean = bnm()

# verify the result
d = np.concatenate((data1, data2))
expected_mean = np.nanmean(d, axis=(1,2))
np.testing.assert_allclose(total_mean, expected_mean)
__call__()

Calculate the mean of the batches that can contain NaN values.

Returns:

Mean of the batches.

Return type:

numpy.ndarray

__init__(axis=0)

Initialize the BatchNanMean object.

update_batch(batch)

Update the mean with a new batch of data that can contain NaN values.

Parameters:

batch (numpy.ndarray) – Input batch.

Returns:

Updated BatchNanMean object.

Return type:

BatchNanMean

class batchstats.BatchNanMin(axis=0)

Class for calculating the minimum of batches of data that can contain NaN values.

The algorithm keeps track of the element-wise minimum. When a new batch is added, the element-wise minimum between the current minimum and the new batch’s minimum is computed, ignoring NaNs.

import numpy as np
from batchstats import BatchNanMin

# create some data with NaNs
data1 = np.array([[1, 2], [3, np.nan]])
data2 = np.array([[5, 6], [np.nan, 8]])

# create a BatchNanMin object
bnm = BatchNanMin()

# update with the first batch
bnm.update_batch(data1)

# update with the second batch
bnm.update_batch(data2)

# get the minimum
total_min = bnm()

# verify the result
expected_min = np.array([1., 2.])
np.testing.assert_allclose(total_min, expected_min)

Example with multiple axes and data > 2 dimensions

# create some 3d data with NaNs
data1 = np.arange(24).reshape(2, 3, 4).astype(float)
data1[0, 1, 1] = np.nan
data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float)
data2[1, 2, 0] = np.nan

# create a BatchNanMin object to get the min over the last two axes
bnm = BatchNanMin(axis=(1, 2))

# update with the first batch
bnm.update_batch(data1)

# update with the second batch
bnm.update_batch(data2)

# get the min
total_min = bnm()

# verify the result
d = np.concatenate((data1, data2))
expected_min = np.nanmin(d, axis=(1,2))
np.testing.assert_allclose(total_min, expected_min)
__call__() ndarray

Calculate the minimum.

Returns:

Minimum of the batches.

Return type:

numpy.ndarray

Raises:

NoValidSamplesError – If no valid samples are available.

__init__(axis=0)

Initialize the BatchNanStat object.

update_batch(batch)

Update the minimum with a new batch of data that can contain NaN values.

Parameters:

batch (numpy.ndarray) – Input batch.

Returns:

Updated BatchNanMin object.

Return type:

BatchNanMin

class batchstats.BatchNanMax(axis=0)

Class for calculating the maximum of batches of data that can contain NaN values.

The algorithm keeps track of the element-wise maximum. When a new batch is added, the element-wise maximum between the current maximum and the new batch’s maximum is computed, ignoring NaNs.

import numpy as np
from batchstats import BatchNanMax

# create some data with NaNs
data1 = np.array([[1, 2], [3, np.nan]])
data2 = np.array([[5, 6], [np.nan, 8]])

# create a BatchNanMax object
bnm = BatchNanMax()

# update with the first batch
bnm.update_batch(data1)

# update with the second batch
bnm.update_batch(data2)

# get the maximum
total_max = bnm()

# verify the result
expected_max = np.array([5., 8.])
np.testing.assert_allclose(total_max, expected_max)

Example with multiple axes and data > 2 dimensions

# create some 3d data with NaNs
data1 = np.arange(24).reshape(2, 3, 4).astype(float)
data1[0, 1, 1] = np.nan
data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float)
data2[1, 2, 0] = np.nan

# create a BatchNanMax object to get the max over the last two axes
bnm = BatchNanMax(axis=(1, 2))

# update with the first batch
bnm.update_batch(data1)

# update with the second batch
bnm.update_batch(data2)

# get the max
total_max = bnm()

# verify the result
d = np.concatenate((data1, data2))
expected_max = np.nanmax(d, axis=(1,2))
np.testing.assert_allclose(total_max, expected_max)
__call__() ndarray

Calculate the maximum.

Returns:

Maximum of the batches.

Return type:

numpy.ndarray

Raises:

NoValidSamplesError – If no valid samples are available.

__init__(axis=0)

Initialize the BatchNanStat object.

update_batch(batch)

Update the maximum with a new batch of data that can contain NaN values.

Parameters:

batch (numpy.ndarray) – Input batch.

Returns:

Updated BatchNanMax object.

Return type:

BatchNanMax

class batchstats.BatchNanPeakToPeak(axis=0)

Class for calculating the peak-to-peak (max - min) of batches of data that can contain NaN values.

This class uses BatchNanMax and BatchNanMin internally to keep track of the element-wise maximum and minimum values. The peak-to-peak value is the difference between the maximum and minimum.

import numpy as np
from batchstats import BatchNanPeakToPeak

# create some data with NaNs
data1 = np.array([[1, 2], [3, np.nan]])
data2 = np.array([[5, 6], [np.nan, 8]])

# create a BatchNanPeakToPeak object
bnpp = BatchNanPeakToPeak()

# update with the first batch
bnpp.update_batch(data1)

# update with the second batch
bnpp.update_batch(data2)

# get the peak-to-peak
total_ptp = bnpp()

# verify the result
expected_ptp = np.array([4., 6.])
np.testing.assert_allclose(total_ptp, expected_ptp)

Example with multiple axes and data > 2 dimensions

# create some 3d data with NaNs
data1 = np.arange(24).reshape(2, 3, 4).astype(float)
data1[0, 1, 1] = np.nan
data2 = np.arange(24, 48).reshape(2, 3, 4).astype(float)
data2[1, 2, 0] = np.nan

# create a BatchNanPeakToPeak object to get the ptp over the last two axes
bnpp = BatchNanPeakToPeak(axis=(1, 2))

# update with the first batch
bnpp.update_batch(data1)

# update with the second batch
bnpp.update_batch(data2)

# get the ptp
total_ptp = bnpp()

# verify the result
d = np.concatenate((data1, data2))
expected_ptp = np.nanmax(d, axis=(1,2)) - np.nanmin(d, axis=(1,2))
np.testing.assert_allclose(total_ptp, expected_ptp)
__call__() ndarray

Calculate the peak-to-peak.

Returns:

Peak-to-peak of the batches.

Return type:

numpy.ndarray

Raises:

NoValidSamplesError – If no valid samples are available.

__init__(axis=0)

Initialize the BatchNanStat object.

update_batch(batch)

Update the peak-to-peak with a new batch of data that can contain NaN values.

Parameters:

batch (numpy.ndarray) – Input batch.

Returns:

Updated BatchNanPeakToPeak object.

Return type:

BatchNanPeakToPeak