jess.scipy_cupy package¶
Submodules¶
jess.scipy_cupy.stats module¶
Cupy versions of scipy functions.
- jess.scipy_cupy.stats.combined(array: cupy._core.core.ndarray, axis: int = 0, fisher: bool = True, bias: bool = True, nan_policy: Optional[str] = 'propagate', winsorize_args: Optional[Tuple] = None)¶
Compute the kurtosis (Fisher or Pearson) and skewness of a dataset. Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators
For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to zero, statistically speaking. Parameters ———- a : ndarray
Input array.
- axisint or None, optional
Axis along which skewness is calculated. Default is 0. If None, compute over the whole array a.
- biasbool, optional
If False, then the calculations are corrected for statistical bias.
- fisherbool, optional
If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
- biasbool, optional
If False, then the calculations are corrected for statistical bias.
- nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional
Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):
‘propagate’: returns nan
‘raise’: throws an error
‘omit’: performs the calculations ignoring nan values
None: Don’t check for nans, omit
- winsorize_argsIf not None, this array gets passed to winsorize, see
stats.winsorize
- (skewnessndarray kurtosisarray)
The skewness of values along an axis, returning 0 where all values are equal. The kurtosis of values along an axis. If all values are equal, return -3 for Fisher’s definition and 0 for Pearson’s definition.
The sample skewness is computed as the Fisher-Pearson coefficient of skewness, i.e. .. math:
g_1=\frac{m_3}{m_2^{3/2}}
where .. math:
m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i
is the biased sample \(i\texttt{th}\) central moment, and \(\bar{x}\) is the sample mean. If
bias
is False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e. .. math:G_1=\frac{k_3}{k_2^{3/2}}= \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}.
- 1
Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 2.2.24.1
>>> from scipy.stats import skew >>> combined([1, 2, 3, 4, 5]) (0.0, 1.7) >>> combined([2, 8, 0, 4, 1, 9, 9, 0]) (0.2650554122698573, 1.333998924716149)
- jess.scipy_cupy.stats.iqr_med(x: cupy._core.core.ndarray, axis: Optional[int] = None, rng: Union[Tuple, List] = (25, 75), scale: Union[float, str] = 1.0, nan_policy: Optional[str] = 'propagate', interpolation: str = 'linear', keepdims: bool = False) cupy._core.core.ndarray ¶
Compute the interquartile range of the data along the specified axis. The interquartile range (IQR) is the difference between the 75th and 25th percentile of the data. It is a measure of the dispersion similar to standard deviation or variance, but is much more robust against outliers [2]_. The
rng
parameter allows this function to compute other percentile ranges than the actual IQR. For example, settingrng=(0, 100)
is equivalent to numpy.ptp. The IQR of an empty array is np.nan. .. versionadded:: 0.18.0 Parameters ———- x : array_likeInput array or object that can be converted to an array.
- axisint or sequence of int, optional
Axis along which the range is computed. The default is to compute the IQR for the entire array.
- rngTwo-element sequence containing floats in range of [0,100] optional
Percentiles over which to compute the range. Each must be between 0 and 100, inclusive. The default is the true IQR: (25, 75). The order of the elements is not important.
- scalescalar or str, optional
The numerical value of scale will be divided out of the final result. The following string values are recognized:
‘raw’ : No scaling, just return the raw IQR. Deprecated! Use scale=1 instead.
‘normal’ : Scale by \(2 \sqrt{2} erf^{-1}(\frac{1}{2}) \approx 1.349\).
The default is 1.0. The use of scale=’raw’ is deprecated. Array-like scale is also allowed, as long as it broadcasts correctly to the output such that
out / scale
is a valid operation. The output dimensions depend on the input array, x, the axis argument, and the keepdims flag.- nan_policy{‘propagate’, ‘raise’, ‘omit’, None}, optional
Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):
‘propagate’: returns nan
‘raise’: throws an error
‘omit’: performs the calculations ignoring nan values
None: Don’t check for nans, uses cp.percentile
- interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’,
‘nearest’}, optional
Specifies the interpolation method to use when the percentile boundaries lie between two data points i and j. The following options are available (default is ‘linear’):
‘linear’: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
‘lower’: i.
‘higher’: j.
‘nearest’: i or j whichever is nearest.
‘midpoint’: (i + j) / 2.
- keepdimsbool, optional
If this is set to True, the reduced axes are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array x.
- iqrscalar or ndarray
If
axis=None
, a scalar is returned. If the input contains integers or floats of smaller precision thannp.float64
, then the output data-type isnp.float64
. Otherwise, the output data-type is the same as that of the input.
numpy.std, numpy.var Notes —– From https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.iqr.html modifled to also return the median
This function is heavily dependent on the version of numpy that is installed. Versions greater than 1.11.0b3 are highly recommended, as they include a number of enhancements and fixes to numpy.percentile and numpy.nanpercentile that affect the operation of this function. The following modifications apply: Below 1.10.0 : nan_policy is poorly defined.
The default behavior of numpy.percentile is used for ‘propagate’. This is a hybrid of ‘omit’ and ‘propagate’ that mostly yields a skewed version of ‘omit’ since NaNs are sorted to the end of the data. A warning is raised if there are NaNs in the data.
- Below 1.9.0: numpy.nanpercentile does not exist.
This means that numpy.percentile is used regardless of nan_policy and a warning is issued. See previous item for a description of the behavior.
- Below 1.9.0: keepdims and interpolation are not supported.
The keywords get ignored with a warning if supplied with non-default values. However, multiple axes are still supported.
- 1
“Interquartile range” https://en.wikipedia.org/wiki/Interquartile_range
- 2
“Robust measures of scale” https://en.wikipedia.org/wiki/Robust_measures_of_scale
- 3
“Quantile” https://en.wikipedia.org/wiki/Quantile
>>> from scipy.stats import iqr >>> x = np.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4], [ 3, 2, 1]]) >>> iqr(x) 4.0 >>> iqr(x, axis=0) array([ 3.5, 2.5, 1.5]) >>> iqr(x, axis=1) array([ 3., 1.]) >>> iqr(x, axis=1, keepdims=True) array([[ 3.], [ 1.]])
- jess.scipy_cupy.stats.median_abs_deviation(x, axis=0, center=<function median>, scale=1.0, nan_policy='propagate')¶
Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, [1]_) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation but more robust to outliers [2]_. The MAD of an empty array is
np.nan
. .. versionadded:: 1.5.0 Parameters ———- x : array_likeInput array or object that can be converted to an array.
- axisint or None, optional
Axis along which the range is computed. Default is 0. If None, compute the MAD over the entire array.
- centercallable, optional
A function that will return the central value. The default is to use np.median. Any user defined function used will need to have the function signature
func(arr, axis)
.- scalescalar or str, optional
The numerical value of scale will be divided out of the final result. The default is 1.0. The string “normal” is also accepted, and results in scale being the inverse of the standard normal quantile function at 0.75, which is approximately 0.67449. Array-like scale is also allowed, as long as it broadcasts correctly to the output such that
out / scale
is a valid operation. The output dimensions depend on the input array, x, and the axis argument.- nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional
Defines how to handle when input contains nan. The following options are available (default is ‘propagate’): * ‘propagate’: returns nan * ‘raise’: throws an error * ‘omit’: performs the calculations ignoring nan values
- madscalar or ndarray
If
axis=None
, a scalar is returned. If the input contains integers or floats of smaller precision thannp.float64
, then the output data-type isnp.float64
. Otherwise, the output data-type is the same as that of the input.
numpy.std, numpy.var, numpy.median, scipy.stats.iqr, scipy.stats.tmean, scipy.stats.tstd, scipy.stats.tvar Notes —– The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=np.mean
will calculate the MAD around the mean - it will not calculate the mean absolute deviation. The input array may contain inf, but if center returns inf, the corresponding MAD for that data will be nan. References ———- .. [1] “Median absolute deviation”,- 2
“Robust measures of scale”, https://en.wikipedia.org/wiki/Robust_measures_of_scale
When comparing the behavior of median_abs_deviation with
np.std
, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes: >>> from scipy import stats >>> x = stats.norm.rvs(size=100, scale=1, random_state=123456) >>> x.std() 0.9973906394005013 >>> stats.median_abs_deviation(x) 0.82832610097857 >>> x[0] = 345.6 >>> x.std() 34.42304872314415 >>> stats.median_abs_deviation(x) 0.8323442311590675 Axis handling example: >>> x = np.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4],[ 3, 2, 1]])
>>> stats.median_abs_deviation(x) array([3.5, 2.5, 1.5]) >>> stats.median_abs_deviation(x, axis=None) 2.0 Scale normal example: >>> x = stats.norm.rvs(size=1000000, scale=2, random_state=123456) >>> stats.median_abs_deviation(x) 1.3487398527041636 >>> stats.median_abs_deviation(x, scale='normal') 1.9996446978061115
- jess.scipy_cupy.stats.median_abs_deviation_med(x: cupy._core.core.ndarray, axis: int = 0, center: object = <function median>, scale: typing.Union[float, str] = 1.0, nan_policy: str = 'propagate')¶
Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, [1]_) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation but more robust to outliers [2]_. The MAD of an empty array is
np.nan
. .. versionadded:: 1.5.0 Parameters ———- x : array_likeInput array or object that can be converted to an array.
- axisint or None, optional
Axis along which the range is computed. Default is 0. If None, compute the MAD over the entire array.
- centercallable, optional
A function that will return the central value. The default is to use np.median. Any user defined function used will need to have the function signature
func(arr, axis)
.- scalescalar or str, optional
The numerical value of scale will be divided out of the final result. The default is 1.0. The string “normal” is also accepted, and results in scale being the inverse of the standard normal quantile function at 0.75, which is approximately 0.67449. Array-like scale is also allowed, as long as it broadcasts correctly to the output such that
out / scale
is a valid operation. The output dimensions depend on the input array, x, and the axis argument.- nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional
Defines how to handle when input contains nan. The following options are available (default is ‘propagate’): * ‘propagate’: returns nan * ‘raise’: throws an error * ‘omit’: performs the calculations ignoring nan values
- madscalar or ndarray
If
axis=None
, a scalar is returned. If the input contains integers or floats of smaller precision thannp.float64
, then the output data-type isnp.float64
. Otherwise, the output data-type is the same as that of the input.
center : The centers of each array See Also ——– numpy.std, numpy.var, numpy.median, scipy.stats.iqr, scipy.stats.tmean, scipy.stats.tstd, scipy.stats.tvar Notes —– Modifed from scipy.stats.median_abs_devation
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=np.mean
will calculate the MAD around the mean - it will not calculate the mean absolute deviation. The input array may contain inf, but if center returns inf, the corresponding MAD for that data will be nan. References ———- .. [1] “Median absolute deviation”,- 2
“Robust measures of scale”, https://en.wikipedia.org/wiki/Robust_measures_of_scale
When comparing the behavior of median_abs_deviation with
np.std
, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes: >>> from scipy import stats >>> x = stats.norm.rvs(size=100, scale=1, random_state=123456) >>> x.std() 0.9973906394005013 >>> stats.median_abs_deviation(x) 0.82832610097857 >>> x[0] = 345.6 >>> x.std() 34.42304872314415 >>> stats.median_abs_deviation(x) 0.8323442311590675 Axis handling example: >>> x = np.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4],[ 3, 2, 1]])
>>> stats.median_abs_deviation(x) array([3.5, 2.5, 1.5]) >>> stats.median_abs_deviation(x, axis=None) 2.0 Scale normal example: >>> x = stats.norm.rvs(size=1000000, scale=2, random_state=123456) >>> stats.median_abs_deviation(x) 1.3487398527041636 >>> stats.median_abs_deviation(x, scale='normal') 1.9996446978061115
- jess.scipy_cupy.stats.winsorize(array: cupy._core.core.ndarray, sigma: float, chans_per_fit: int, nan_policy: Optional[str]) cupy._core.core.ndarray ¶
Winsorize a array by clipping values sigma above the fit. The trend is flitted using the median fitter. The noise is calculated from the difference between the array and fit using IQR.
- Args:
array - Array to Winsorize, processed in place.
sigma - Sigma to clip at
- chans_per_fit - Channels per fitting order, see
jess.fitters.median_fitter
nan_policy - nan policy, if None IQR doesn’t check for nans
- Returns:
winsorized array