java.lang.Object

org.hipparchus.distribution.continuous.AbstractRealDistribution

org.hipparchus.stat.fitting.EmpiricalDistribution

所有已实现的接口:: Serializable, RealDistribution

public class EmpiricalDistribution extends AbstractRealDistribution

表示一个经验概率分布 -- 从观察到的数据中导出的概率分布，而不对数据来自的总体分布的函数形式做任何假设。

EmpiricalDistribution维护称为分布摘要的数据结构，描述经验分布并支持以下操作：

从观察到的数据值文件加载分布
将输入数据划分为“区间范围”，并报告区间频率计数（直方图数据）
报告描述完整数据值集合以及每个区间内观察值的单变量统计信息
从分布生成随机值

应用程序可以使用EmpiricalDistribution构建代表输入数据的分组频率直方图，或生成“类似”于输入文件中的值的随机值 -- 即，生成的值将遵循文件中值的分布。

该实现使用了类似于可变核方法与高斯平滑：

处理输入文件

通过文件一次计算最小值和最大值。
将从最小到最大的范围划分为binCount“区间”。
再次通过数据文件，计算区间计数和每个区间的单变量统计信息（均值，标准差）
将区间（0,1）划分为与区间相关联的子区间，子区间的长度与其计数成比例。

从分布生成随机值

生成一个在（0,1）范围内均匀分布的值
选择该值所属的子区间。
生成一个具有均值=相关联区间的均值和std dev =相关联区间的std dev的随机高斯值。

EmpiricalDistribution按照以下方式实现了RealDistribution接口。给定x在数据集中的值范围内，让B是包含x的区间，K是区间B的内核。让P(B-)是B下方区间的概率之和，K(B)是K下的B的质量（即，区间B上的内核密度的积分）。然后设置P(X < x) = P(B-) + P(B) * K(x) / K(B)，其中K(x)是在x处评估的内核分布。这将导致一个cdf，与区间端点处的分组频率分布匹配，并使用区间内核在区间内插值。

使用说明:

binCount默认设置为1000。一个经验法则是将bin count设置为输入文件长度除以10。
输入文件必须是包含每行一个有效数值的纯文本文件。

另请参阅:

字段概要

字段

修饰符和类型

字段

说明

static final int

DEFAULT_BIN_COUNT

默认的区间计数

protected final RandomDataGenerator

randomData

用于在重复调用getNext()时使用的RandomDataGenerator实例

从类继承的字段 org.hipparchus.distribution.continuous.AbstractRealDistribution
DEFAULT_SOLVER_ABSOLUTE_ACCURACY
构造器概要

构造器

构造器

说明

EmpiricalDistribution()

使用默认区间计数创建一个新的EmpiricalDistribution。

EmpiricalDistribution(int binCount)

使用指定的区间计数创建一个新的EmpiricalDistribution。

EmpiricalDistribution(int binCount, RandomGenerator generator)

使用提供的RandomGenerator作为随机数据源，使用指定的区间计数创建一个新的EmpiricalDistribution。

EmpiricalDistribution(RandomGenerator generator)

使用提供的RandomGenerator作为随机数据源，使用默认区间计数创建一个新的EmpiricalDistribution。
方法概要

修饰符和类型

方法

说明

double

cumulativeProbability(double x)

对于值根据此分布分布的随机变量X，此方法返回P(X <= x)。

double

density(double x)

返回在指定点x处评估的此分布的概率密度函数（PDF）。

int

getBinCount()

返回区间的数量。

List<StreamingStatistics>

getBinStats()

返回包含描述每个区间中值的统计信息的StreamingStatistics实例列表。

double[]

getGeneratorUpperBounds()

返回用于从经验分布生成数据的[0,1]的子区间的上限边界数组的新副本。

protected RealDistribution

getKernel(StreamingStatistics bStats)

区间内的平滑内核。

double

getNextValue()

从此分布生成一个随机值。

double

getNumericalMean()

使用此方法获取此分布均值的数值值。

double

getNumericalVariance()

使用此方法获取此分布方差的数值值。

StatisticalSummary

getSampleStats()

返回描述此分布的StatisticalSummary。

double

getSupportLowerBound()

访问支持的下限。

double

getSupportUpperBound()

访问支持的上限。

double[]

getUpperBounds()

返回区间的上限边界数组的新副本。

double

inverseCumulativeProbability(double p)

计算此分布的分位数函数。

boolean

isLoaded()

表示分布是否已加载的属性。

boolean

isSupportConnected()

使用此方法获取有关支持是否连接的信息，即支持的下限和上限之间的所有值是否包含在支持中。

void

load(double[] in)

从提供的数字数组计算经验分布。

void

load(File file)

从输入文件计算经验分布。

void

load(URL url)

使用从URL读取的数据计算经验分布。

void

reSeed(long seed)

重新设置由getNextValue()使用的随机数生成器。

void

reseedRandomGenerator(long seed)

重新设置底层PRNG。

从类继承的方法 org.hipparchus.distribution.continuous.AbstractRealDistribution
getSolverAbsoluteAccuracy, logDensity, probability

从类继承的方法 java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

字段详细资料
- DEFAULT_BIN_COUNT
  
  public static final int DEFAULT_BIN_COUNT
  
  默认的区间计数
  另请参阅:
  
  常量字段值
- randomData
  
  protected final RandomDataGenerator randomData
  
  用于在重复调用getNext()时使用的RandomDataGenerator实例
构造器详细资料
- EmpiricalDistribution
  
  public EmpiricalDistribution()
  
  使用默认区间计数创建一个新的EmpiricalDistribution。
- EmpiricalDistribution
  
  public EmpiricalDistribution(int binCount)
  
  使用指定的区间计数创建一个新的EmpiricalDistribution。
  
  参数:
  
  binCount - 区间数量。必须严格为正数。
  
  抛出:
  
  MathIllegalArgumentException - 如果binCount <= 0。
- EmpiricalDistribution
  
  public EmpiricalDistribution(int binCount, RandomGenerator generator)
  
  使用提供的RandomGenerator作为随机数据源，使用指定的区间计数创建一个新的EmpiricalDistribution。
  
  参数:
  
  binCount - 区间数量。必须严格为正数。
  
  generator - 随机数据生成器（可以为null，导致使用默认JDK生成器）
  
  抛出:
  
  MathIllegalArgumentException - 如果binCount <= 0。
- EmpiricalDistribution
  
  public EmpiricalDistribution(RandomGenerator generator)
  
  使用提供的RandomGenerator作为随机数据源，使用默认区间计数创建一个新的EmpiricalDistribution。
  
  参数:
  
  generator - 随机数据生成器（可以为null，导致使用默认JDK生成器）
方法详细资料
- load
  
  public void load(double[] in) throws NullArgumentException
  
  从提供的数字数组计算经验分布。
  
  参数:
  
  in - 输入数据数组
  
  抛出:
  
  NullArgumentException - 如果in为null
- load
  
  public void load(URL url) throws IOException, MathIllegalArgumentException, NullArgumentException
  
  计算使用从URL读取的数据的经验分布。
  输入文件必须是包含每行一个有效数字条目的ASCII文本文件。
  
  参数:
  
  url - 输入文件的URL
  
  抛出:
  
  IOException - 如果发生IO错误
  
  NullArgumentException - 如果URL为null
  
  MathIllegalArgumentException - 如果URL不包含数据
- load
  
  public void load(File file) throws IOException, NullArgumentException
  
  从输入文件计算经验分布。
  输入文件必须是包含每行一个有效数字条目的ASCII文本文件。
  
  参数:
  
  file - 输入文件
  
  抛出:
  
  IOException - 如果发生IO错误
  
  NullArgumentException - 如果文件为null
- getNextValue
  
  public double getNextValue() throws MathIllegalStateException
  从该分布生成一个随机值。前提条件：
  
  在调用此方法之前必须加载分布
  返回:
  
  随机值。
  
  抛出:
  
  MathIllegalStateException - 如果分布尚未加载
- getSampleStats
  
  public StatisticalSummary getSampleStats()
  返回描述此分布的StatisticalSummary。 前提条件：
  
  在调用此方法之前必须加载分布
  返回:
  
  样本统计
  
  抛出:
  
  IllegalStateException - 如果分布尚未加载
- getBinCount
  
  public int getBinCount()
  
  返回箱数。
  
  返回:
  
  箱数。
- getBinStats
  
  public List<StreamingStatistics> getBinStats()
  
  返回包含描述每个箱中值的统计信息的StreamingStatistics实例列表。该列表按箱号索引。
  
  返回:
  
  箱统计信息列表。
- getUpperBounds
  
  public double[] getUpperBounds()
  
  返回箱的上限数组的新副本。箱是：
  [min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., (upperBounds[binCount-2], upperBounds[binCount-1] = max]。
  
  返回:
  
  箱上限数组
- getGeneratorUpperBounds
  
  public double[] getGeneratorUpperBounds()
  返回用于从经验分布生成数据的[0,1]子区间的上限数组的新副本。子区间对应于长度与箱计数成比例的箱。
  前提条件：
  
  在调用此方法之前必须加载分布
  返回:
  
  用于数据生成中的子区间上限数组
  
  抛出:
  
  NullPointerException - 除非事先调用了load方法。
- isLoaded
  
  public boolean isLoaded()
  
  表示分布是否已加载的属性。
  
  返回:
  
  如果分布已加载，则为true
- reSeed
  
  public void reSeed(long seed)
  
  重新设置由getNextValue()使用的随机数生成器。
  
  参数:
  
  seed - 随机生成器种子
- density
  
  public double density(double x)
  返回在指定点x处评估的该分布的概率密度函数（PDF）。一般来说，PDF是CDF的导数。如果在x处导数不存在，则应返回适当的替代值，例如Double.POSITIVE_INFINITY，Double.NaN，或差商的下限或上限。
  返回核密度，使得其在每个箱上的积分等于箱质量。
  
  算法描述：
  
  找到x所属的箱B。
  
  计算K(B) =相对于箱内核的B的质量（即，核密度在B上的积分）。
  
  返回k(x) * P(B) / K(B)，其中k是箱内核密度，P(B)是B的质量。
  参数:
  
  x - 评估PDF的点
  
  返回:
  
  点x处概率密度函数的值
- cumulativeProbability
  
  public double cumulativeProbability(double x)
  For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other words, this method represents the (cumulative) distribution function (CDF) for this distribution.
  Algorithm description:
  
  Find the bin B that x belongs to.
  
  Compute P(B) = the mass of B and P(B-) = the combined mass of the bins below B.
  
  Compute K(B) = the probability mass of B with respect to the within-bin kernel and K(B-) = the kernel distribution evaluated at the lower endpoint of B
  
  Return P(B-) + P(B) * [K(x) - K(B-)] / K(B) where K(x) is the within-bin kernel distribution function evaluated at x.
  
  If K is a constant distribution, we return P(B-) + P(B) (counting the full mass of B).
  参数:
  
  x - the point at which the CDF is evaluated
  
  返回:
  
  the probability that a random variable with this distribution takes a value less than or equal to x
- inverseCumulativeProbability
  
  public double inverseCumulativeProbability(double p) throws MathIllegalArgumentException
  Computes the quantile function of this distribution. For a random variable X distributed according to this distribution, the returned value is
  
  inf{x in R | P(X<=x) >= p} for 0 < p <= 1,
  
  inf{x in R | P(X<=x) > 0} for p = 0.
  
  The default implementation returns
  
  RealDistribution.getSupportLowerBound() for p = 0,
  
  RealDistribution.getSupportUpperBound() for p = 1.
  
  Algorithm description:
  
  Find the smallest i such that the sum of the masses of the bins through i is at least p.
  
  Let K be the within-bin kernel distribution for bin i.
  Let K(B) be the mass of B under K.
  Let K(B-) be K evaluated at the lower endpoint of B (the combined mass of the bins below B under K).
  Let P(B) be the probability of bin i.
  Let P(B-) be the sum of the bin masses below bin i.
  Let pCrit = p - P(B-)
  
  Return the inverse of K evaluated at
  K(B-) + pCrit * K(B) / P(B)
  指定者:
  
  inverseCumulativeProbability 在接口中 RealDistribution
  
  覆盖:
  
  inverseCumulativeProbability 在类中 AbstractRealDistribution
  
  参数:
  
  p - the cumulative probability
  
  返回:
  
  the smallest p-quantile of this distribution (largest 0-quantile for p = 0)
  
  抛出:
  
  MathIllegalArgumentException - if p < 0 or p > 1
- getNumericalMean
  
  public double getNumericalMean()
  
  Use this method to get the numerical value of the mean of this distribution.
  
  返回:
  
  the mean or Double.NaN if it is not defined
- getNumericalVariance
  
  public double getNumericalVariance()
  
  Use this method to get the numerical value of the variance of this distribution.
  
  返回:
  
  the variance (possibly Double.POSITIVE_INFINITY as for certain cases in TDistribution) or Double.NaN if it is not defined
- getSupportLowerBound
  
  public double getSupportLowerBound()
  
  Access the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0). In other words, this method must return
  inf {x in R | P(X <= x) > 0}.
  
  返回:
  
  lower bound of the support (might be Double.NEGATIVE_INFINITY)
- getSupportUpperBound
  
  public double getSupportUpperBound()
  
  Access the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1). In other words, this method must return
  inf {x in R | P(X <= x) = 1}.
  
  返回:
  
  upper bound of the support (might be Double.POSITIVE_INFINITY)
- isSupportConnected
  
  public boolean isSupportConnected()
  
  Use this method to get information about whether the support is connected, i.e. whether all values between the lower and upper bound of the support are included in the support.
  
  返回:
  
  whether the support is connected or not
- reseedRandomGenerator
  
  public void reseedRandomGenerator(long seed)
  
  Reseed the underlying PRNG.
  
  参数:
  
  seed - new seed value
- getKernel
  
  protected RealDistribution getKernel(StreamingStatistics bStats)
  
  The within-bin smoothing kernel. Returns a Gaussian distribution parameterized by bStats, unless the bin contains less than 2 observations, in which case a constant distribution is returned.
  
  参数:
  
  bStats - summary statistics for the bin
  
  返回:
  
  within-bin kernel parameterized by bStats

类 EmpiricalDistribution

字段概要

从类继承的字段 org.hipparchus.distribution.continuous.AbstractRealDistribution

构造器概要

方法概要

从类继承的方法 org.hipparchus.distribution.continuous.AbstractRealDistribution

从类继承的方法 java.lang.Object

字段详细资料

DEFAULT_BIN_COUNT

randomData

构造器详细资料

EmpiricalDistribution

EmpiricalDistribution

EmpiricalDistribution

EmpiricalDistribution

方法详细资料

load

load

load

getNextValue

getSampleStats

getBinCount

getBinStats

getUpperBounds

getGeneratorUpperBounds

isLoaded

reSeed

density

cumulativeProbability

inverseCumulativeProbability

getNumericalMean

getNumericalVariance

getSupportLowerBound

getSupportUpperBound

isSupportConnected

reseedRandomGenerator

getKernel