jess.scipy_cupy package¶

Submodules¶

jess.scipy_cupy.stats module¶

Cupy versions of scipy functions.

jess.scipy_cupy.stats.combined(array: cupy._core.core.ndarray, axis: int = 0, fisher: bool = True, bias: bool = True, nan_policy: Optional[str] = 'propagate', winsorize_args: Optional[Tuple] = None)¶

Compute the kurtosis (Fisher or Pearson) and skewness of a dataset. Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators

For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to zero, statistically speaking. Parameters ———- a : ndarray

Input array.

axisint or None, optional

Axis along which skewness is calculated. Default is 0. If None, compute over the whole array a.

biasbool, optional

If False, then the calculations are corrected for statistical bias.

fisherbool, optional

If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).

biasbool, optional

If False, then the calculations are corrected for statistical bias.

nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional

Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

‘propagate’: returns nan

‘raise’: throws an error

‘omit’: performs the calculations ignoring nan values

None: Don’t check for nans, omit

winsorize_argsIf not None, this array gets passed to winsorize, see

stats.winsorize

(skewnessndarray kurtosisarray): The skewness of values along an axis, returning 0 where all values are equal. The kurtosis of values along an axis. If all values are equal, return -3 for Fisher’s definition and 0 for Pearson’s definition.

The sample skewness is computed as the Fisher-Pearson coefficient of skewness, i.e. .. math:

g_1=\frac{m_3}{m_2^{3/2}}

where .. math:

m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i

is the biased sample \(i\texttt{th}\) central moment, and \(\bar{x}\) is the sample mean. If bias is False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e. .. math:

G_1=\frac{k_3}{k_2^{3/2}}=
    \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}.

1: Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 2.2.24.1

>>> from scipy.stats import skew
>>> combined([1, 2, 3, 4, 5])
(0.0, 1.7)
>>> combined([2, 8, 0, 4, 1, 9, 9, 0])
(0.2650554122698573, 1.333998924716149)

jess.scipy_cupy.stats.iqr_med(x: cupy._core.core.ndarray, axis: Optional[int] = None, rng: Union[Tuple, List] = (25, 75), scale: Union[float, str] = 1.0, nan_policy: Optional[str] = 'propagate', interpolation: str = 'linear', keepdims: bool = False) → cupy._core.core.ndarray¶

Compute the interquartile range of the data along the specified axis. The interquartile range (IQR) is the difference between the 75th and 25th percentile of the data. It is a measure of the dispersion similar to standard deviation or variance, but is much more robust against outliers [2]_. The rng parameter allows this function to compute other percentile ranges than the actual IQR. For example, setting rng=(0, 100) is equivalent to numpy.ptp. The IQR of an empty array is np.nan. .. versionadded:: 0.18.0 Parameters ———- x : array_like

Input array or object that can be converted to an array.

axisint or sequence of int, optional

Axis along which the range is computed. The default is to compute the IQR for the entire array.

rngTwo-element sequence containing floats in range of [0,100] optional

Percentiles over which to compute the range. Each must be between 0 and 100, inclusive. The default is the true IQR: (25, 75). The order of the elements is not important.

scalescalar or str, optional

The numerical value of scale will be divided out of the final result. The following string values are recognized:

‘raw’ : No scaling, just return the raw IQR. Deprecated! Use scale=1 instead.

‘normal’ : Scale by \(2 \sqrt{2} erf^{-1}(\frac{1}{2}) \approx 1.349\).

The default is 1.0. The use of scale=’raw’ is deprecated. Array-like scale is also allowed, as long as it broadcasts correctly to the output such that out / scale is a valid operation. The output dimensions depend on the input array, x, the axis argument, and the keepdims flag.

nan_policy{‘propagate’, ‘raise’, ‘omit’, None}, optional

Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

‘propagate’: returns nan

‘raise’: throws an error

‘omit’: performs the calculations ignoring nan values

None: Don’t check for nans, uses cp.percentile

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’,

‘nearest’}, optional

Specifies the interpolation method to use when the percentile boundaries lie between two data points i and j. The following options are available (default is ‘linear’):

‘linear’: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

‘lower’: i.

‘higher’: j.

‘nearest’: i or j whichever is nearest.

‘midpoint’: (i + j) / 2.

keepdimsbool, optional

If this is set to True, the reduced axes are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array x.

iqrscalar or ndarray: If axis=None, a scalar is returned. If the input contains integers or floats of smaller precision than np.float64, then the output data-type is np.float64. Otherwise, the output data-type is the same as that of the input.

numpy.std, numpy.var Notes —– From https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.iqr.html modifled to also return the median

This function is heavily dependent on the version of numpy that is installed. Versions greater than 1.11.0b3 are highly recommended, as they include a number of enhancements and fixes to numpy.percentile and numpy.nanpercentile that affect the operation of this function. The following modifications apply: Below 1.10.0 : nan_policy is poorly defined.

The default behavior of numpy.percentile is used for ‘propagate’. This is a hybrid of ‘omit’ and ‘propagate’ that mostly yields a skewed version of ‘omit’ since NaNs are sorted to the end of the data. A warning is raised if there are NaNs in the data.

Below 1.9.0: numpy.nanpercentile does not exist.: This means that numpy.percentile is used regardless of nan_policy and a warning is issued. See previous item for a description of the behavior.
Below 1.9.0: keepdims and interpolation are not supported.: The keywords get ignored with a warning if supplied with non-default values. However, multiple axes are still supported.

1: “Interquartile range” https://en.wikipedia.org/wiki/Interquartile_range
2: “Robust measures of scale” https://en.wikipedia.org/wiki/Robust_measures_of_scale
3: “Quantile” https://en.wikipedia.org/wiki/Quantile

>>> from scipy.stats import iqr
>>> x = np.array([[10, 7, 4], [3, 2, 1]])
>>> x
array([[10,  7,  4],
       [ 3,  2,  1]])
>>> iqr(x)
4.0
>>> iqr(x, axis=0)
array([ 3.5,  2.5,  1.5])
>>> iqr(x, axis=1)
array([ 3.,  1.])
>>> iqr(x, axis=1, keepdims=True)
array([[ 3.],
       [ 1.]])

jess.scipy_cupy.stats.median_abs_deviation(x, axis=0, center=<function median>, scale=1.0, nan_policy='propagate')¶

Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, [1]_) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation but more robust to outliers [2]_. The MAD of an empty array is np.nan. .. versionadded:: 1.5.0 Parameters ———- x : array_like

Input array or object that can be converted to an array.

axisint or None, optional: Axis along which the range is computed. Default is 0. If None, compute the MAD over the entire array.
centercallable, optional: A function that will return the central value. The default is to use np.median. Any user defined function used will need to have the function signature func(arr, axis).
scalescalar or str, optional: The numerical value of scale will be divided out of the final result. The default is 1.0. The string “normal” is also accepted, and results in scale being the inverse of the standard normal quantile function at 0.75, which is approximately 0.67449. Array-like scale is also allowed, as long as it broadcasts correctly to the output such that out / scale is a valid operation. The output dimensions depend on the input array, x, and the axis argument.
nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional: Defines how to handle when input contains nan. The following options are available (default is ‘propagate’): * ‘propagate’: returns nan * ‘raise’: throws an error * ‘omit’: performs the calculations ignoring nan values

madscalar or ndarray: If axis=None, a scalar is returned. If the input contains integers or floats of smaller precision than np.float64, then the output data-type is np.float64. Otherwise, the output data-type is the same as that of the input.

numpy.std, numpy.var, numpy.median, scipy.stats.iqr, scipy.stats.tmean, scipy.stats.tstd, scipy.stats.tvar Notes —– The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in center=np.mean will calculate the MAD around the mean - it will not calculate the mean absolute deviation. The input array may contain inf, but if center returns inf, the corresponding MAD for that data will be nan. References ———- .. [1] “Median absolute deviation”,

https://en.wikipedia.org/wiki/Median_absolute_deviation

2: “Robust measures of scale”, https://en.wikipedia.org/wiki/Robust_measures_of_scale

When comparing the behavior of median_abs_deviation with np.std, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes: >>> from scipy import stats >>> x = stats.norm.rvs(size=100, scale=1, random_state=123456) >>> x.std() 0.9973906394005013 >>> stats.median_abs_deviation(x) 0.82832610097857 >>> x[0] = 345.6 >>> x.std() 34.42304872314415 >>> stats.median_abs_deviation(x) 0.8323442311590675 Axis handling example: >>> x = np.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4],

[ 3, 2, 1]])

>>> stats.median_abs_deviation(x)
array([3.5, 2.5, 1.5])
>>> stats.median_abs_deviation(x, axis=None)
2.0
Scale normal example:
>>> x = stats.norm.rvs(size=1000000, scale=2, random_state=123456)
>>> stats.median_abs_deviation(x)
1.3487398527041636
>>> stats.median_abs_deviation(x, scale='normal')
1.9996446978061115

jess.scipy_cupy.stats.median_abs_deviation_med(x: cupy._core.core.ndarray, axis: int = 0, center: object = <function median>, scale: typing.Union[float, str] = 1.0, nan_policy: str = 'propagate')¶

Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, [1]_) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation but more robust to outliers [2]_. The MAD of an empty array is np.nan. .. versionadded:: 1.5.0 Parameters ———- x : array_like

Input array or object that can be converted to an array.

axisint or None, optional: Axis along which the range is computed. Default is 0. If None, compute the MAD over the entire array.
centercallable, optional: A function that will return the central value. The default is to use np.median. Any user defined function used will need to have the function signature func(arr, axis).
scalescalar or str, optional: The numerical value of scale will be divided out of the final result. The default is 1.0. The string “normal” is also accepted, and results in scale being the inverse of the standard normal quantile function at 0.75, which is approximately 0.67449. Array-like scale is also allowed, as long as it broadcasts correctly to the output such that out / scale is a valid operation. The output dimensions depend on the input array, x, and the axis argument.
nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional: Defines how to handle when input contains nan. The following options are available (default is ‘propagate’): * ‘propagate’: returns nan * ‘raise’: throws an error * ‘omit’: performs the calculations ignoring nan values

madscalar or ndarray: If axis=None, a scalar is returned. If the input contains integers or floats of smaller precision than np.float64, then the output data-type is np.float64. Otherwise, the output data-type is the same as that of the input.

center : The centers of each array See Also ——– numpy.std, numpy.var, numpy.median, scipy.stats.iqr, scipy.stats.tmean, scipy.stats.tstd, scipy.stats.tvar Notes —– Modifed from scipy.stats.median_abs_devation

The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in center=np.mean will calculate the MAD around the mean - it will not calculate the mean absolute deviation. The input array may contain inf, but if center returns inf, the corresponding MAD for that data will be nan. References ———- .. [1] “Median absolute deviation”,

https://en.wikipedia.org/wiki/Median_absolute_deviation

2: “Robust measures of scale”, https://en.wikipedia.org/wiki/Robust_measures_of_scale

When comparing the behavior of median_abs_deviation with np.std, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes: >>> from scipy import stats >>> x = stats.norm.rvs(size=100, scale=1, random_state=123456) >>> x.std() 0.9973906394005013 >>> stats.median_abs_deviation(x) 0.82832610097857 >>> x[0] = 345.6 >>> x.std() 34.42304872314415 >>> stats.median_abs_deviation(x) 0.8323442311590675 Axis handling example: >>> x = np.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4],

[ 3, 2, 1]])

>>> stats.median_abs_deviation(x)
array([3.5, 2.5, 1.5])
>>> stats.median_abs_deviation(x, axis=None)
2.0
Scale normal example:
>>> x = stats.norm.rvs(size=1000000, scale=2, random_state=123456)
>>> stats.median_abs_deviation(x)
1.3487398527041636
>>> stats.median_abs_deviation(x, scale='normal')
1.9996446978061115

jess.scipy_cupy.stats.winsorize(array: cupy._core.core.ndarray, sigma: float, chans_per_fit: int, nan_policy: Optional[str]) → cupy._core.core.ndarray¶

Winsorize a array by clipping values sigma above the fit. The trend is flitted using the median fitter. The noise is calculated from the difference between the array and fit using IQR.

Args:

array - Array to Winsorize, processed in place.

sigma - Sigma to clip at

chans_per_fit - Channels per fitting order, see: jess.fitters.median_fitter

nan_policy - nan policy, if None IQR doesn’t check for nans

Returns:

winsorized array

jess.scipy_cupy package¶

Submodules¶

jess.scipy_cupy.stats module¶

Module contents¶

jess

Navigation

Related Topics