最新第一类错误和第二类错误

合集下载

统计学中的假设检验中的类型I和类型II错误

统计学中的假设检验中的类型I和类型II错误统计学中的假设检验是一种推断性统计方法，用于评估样本数据与所假设的总体参数之间的关系。

在进行假设检验时，我们通常会做出两种可能的错误判断，即类型I错误和类型II错误。

本文将详细介绍这两种错误及其在假设检验中的作用。

一、类型I错误类型I错误是指在原假设为真的情况下，拒绝原假设的错误判断。

换句话说，当实际上不存在显著差异时，我们错误地得出了存在显著差异的结论。

类型I错误的发生概率称为显著性水平（α），通常设置在0.01或0.05。

在假设检验中，我们会首先建立一个零假设（H0），即假设两个样本或总体没有差异。

然后通过计算样本数据的p值（或计算出来的显著性水平）来判断是否拒绝零假设。

如果p值小于设定的显著性水平，我们将拒绝零假设，并得出结论有显著差异。

然而，这种结论可能是错误的，即发生了类型I错误。

类型I错误的概率在理论上是可以控制的，通常通过设定显著性水平来控制。

较小的显著性水平可以减少类型I错误的概率，但也会增加类型II错误的概率。

二、类型II错误类型II错误是指在原假设为假的情况下，接受原假设的错误判断。

换句话说，当实际上存在显著差异时，我们未能得出存在显著差异的结论。

类型II错误的概率称为β，通常难以确定。

类型II错误的概率与样本大小、效应大小以及显著性水平等因素有关。

当样本大小较小时，可能存在较高的类型II错误概率。

当效应较小或显著性水平较高时，也会增加类型II错误的概率。

为了最小化类型II错误的概率，可以通过增加样本大小、明确效应大小以及适当选择显著性水平来进行调整。

三、平衡类型I和类型II错误在进行假设检验时，我们希望能够在保证控制类型I错误概率的同时，尽量减少类型II错误概率。

通常情况下，类型I错误概率（α）和类型II错误概率（β）是相互制约的。

当我们降低显著性水平以减少类型I错误时，往往会增加类型II错误的概率。

相反，若提高显著性水平以减少类型II错误，则可能会增加类型I错误的概率。

置信区间的I型错误和II型错误

置信区间的I型错误和II型错误
前⾔
本⽂主要分两部份，第⼀部分置信区间的定义和应⽤，第⼆部分是置信区间的⼀⼆型错误
⼀、置信区间
置信区间是指由样本统计量所构造的总体参数的估计区间。

在统计学中，⼀个概率样本的置信区间（Confidence interval）是对这个样本的某个总体参数的区间估计。

置信区间展现的是这个参数的真实值有⼀定概率落在测量结果的周围的程度，其给出的是被测量参数的测量值的可信程度，即前⾯所要求的“⼀个概率”。

⼆、错误类型
第⼀类错误：原假设是正确的，却拒绝了原假设。

第⼆类错误：原假设是错误的，却没有拒绝原假设
关系：①α与β是在两个前提下的概率，所以α+β不⼀定等于1，这是两类错误的关系中较为重要的⼀点。

②在其他条件不变的情况下，α与β不可能同时减⼩或增⼤，此消彼长的关系
是更怕I型错误还是II型错误？从风控的⾓度来回答，我觉得将换⼈放进来（第⼆类错误）会⽐将好⼈拒绝（第⼀类错误）要严重。

第一类错误名词解释

第一类错误名词解释错误的名词解释是指由于不同的文化背景和语言环境，导致一个名词在两个不同语言之间存在不同的概念或意义。

这种错误的名词解释可以产生严重的后果。

它不仅会影响到正确的沟通，而且可能会导致双方的误解和误解。

第一类错误的名词解释是指同一个语言中不同的地方使用的名词有不同的概念或意思。

在这种错误的名词解释中，即使是两个同义词，可能在不同的环境或语境中有不同的含义。

例如，在英语中，另一个同义词“jacket”的定义可能与“jacket”不同，但它们在英语中意思不同。

这类错误的名词解释可能会导致双方因为某个词的理解不一致而产生误解。

在第二类错误的名词解释中，不同语言中使用的名词可能有不同的概念或意义。

尽管两个语言中的名词可能有不同的定义，但它们可能在不同的文化背景中有不同的概念和意义，这可能会导致两个语言之间的人们没有正确的沟通。

例如，在英语中，一个名词“shoe”可以指的是鞋子，而在日语中的该词的含义可能指的是“手表”。

这类错误的名词解释有可能引起双方的沟通问题，从而产生混乱和误解。

当然，错误的名词解释也可能是由于某些特定的词汇不熟悉造成的，可能是由于一方使用了一个不熟悉的词汇，另一方没有得到正确的理解。

这类错误的名词解释也可能导致双方误解、混乱。

了解错误名词解释的重要性，并加强文化和语言的交流，可以有效地解决上述问题。

通过不断的学习和语言沟通，双方可以相互理解，解决着错误的名词解释。

此外，应该强调正确使用语言，从而避免语言混淆和误解。

总之，错误的名词解释可能导致严重后果，因此应该加以重视，并采取有效的措施去解决问题。

通过正确的沟通、熟知文化背景和语言环境，可以有效地减少类似的错误，从而提高沟通效率，取得良好的沟通效果。

统计学知识(一类错误和二类错误)

Type I and type II errors(α) the error of rejecting a "correct" null hypothesis, and(β) the error of not rejecting a "false" null hypothesisIn 1930, they elaborated on these two sources of error, remarking that "in testing hypotheses two considerations must be kept in view, (1) we must be able to reduce the chance of rejecting a true hypothesis to as low a value as desired; (2) the test must be so devised that it will reject the hypothesis tested when it is likely to be false"[1]When an observer makes a Type I error in evaluating a sample against its parent population, s/he is mistakenly thinking that a statistical difference exists when in truth there is no statistical difference (or, to put another way, the null hypothesis is true but was mistakenly rejected). For example, imagine that a pregnancy test has produced a "positive" result (indicating that the woman taking the test is pregnant); if the woman is actually not pregnant though, then we say the test produced a "false positive". A Type II error, or a "false negative", is the error of failing to reject a null hypothesis when the alternative hypothesis is the true state of nature. For example, a type II error occurs if a pregnancy test reports "negative" when the woman is, in fact, pregnant.Statistical error vs. systematic errorScientists recognize two different sorts of error:[2]Statistical error: Type I and Type IIStatisticians speak of two significant sorts of statistical error. The context is that there is a "null hypothesis" which corresponds to a presumed default "state of nature", e.g., that an individual is free of disease, that an accused is innocent, or that a potential login candidate is not authorized. Corresponding to the null hypothesis is an "alternative hypothesis" which corresponds to the opposite situation, that is, that the individual has the disease, that the accused is guilty, or that the login candidate is an authorized user. Thegoal is to determine accurately if the null hypothesis can be discarded in favor of the alternative. A test of some sort is conducted (a blood test, a legal trial, a login attempt), and data is obtained. The result of the test may be negative (that is, it does not indicate disease, guilt, or authorized identity). On the other hand, it may be positive (that is, it may indicate disease, guilt, or identity). If the result of the test does not correspond with the actual state of nature, then an error has occurred, but if the result of the test corresponds with the actual state of nature, then a correct decision has been made. There are two kinds of error, classified as "Type I error" and "Type II error," depending upon which hypothesis has incorrectly been identified as the true state of nature.Type I errorType I error, also known as an "error of the first kind", an α error, or a "false positive": the error of rejecting a null hypothesis when it is actually true. Plainly speaking, it occurs when we are observing a difference when in truth there is none. Type I error can be viewed as the error of excessive skepticism.Type II errorType II error, also known as an "error of the second kind", a βerror, or a "false negative": the error of failing to reject a null hypothesis when it is in fact false. In other words, this is the error of failing to observe a difference when in truth there is one. Type II error can be viewed as the error of excessive gullibility.See Various proposals for further extension, below, for additional terminology.Understanding Type I and Type II errorsHypothesis testing is the art of testing whether a variation between two sample distributions can be explained by chance or not. In many practical applications Type I errors are more delicate than Type II errors. In these cases, care is usually focused on minimizing the occurrence of this statistical error. Suppose, the probability for a Type I error is 1% or 5%, then there is a 1% or 5% chance that the observed variation is not true. This is called the level of significance. While 1% or 5% might be an acceptable level of significance for one application, a different application can require a very different level. For example, the standard goal of six sigma is to achieve exactness by 4.5 standard deviations above or below the mean. That is, for a normally distributed process only 3.4 parts per million are allowed to be deficient. The probability of Type I error is generally denoted with the Greek letter alpha.In more common parlance, a Type I error can usually be interpreted as a false alarm, insufficient specificity or perhaps an encounter with fool's gold. A Type II error could be similarly interpreted as an oversight, a lapse in attention or inadequate sensitivity.EtymologyIn 1928, Jerzy Neyman (1894-1981) and Egon Pearson (1895-1980), both eminent statisticians, discussed the problems associated with "deciding whether or not a particular sample may bejudged as likely to have been randomly drawn from a certain population" (1928/1967, p.1): and, as Florence Nightingale David remarked, "it is necessary to remember the adjective ‘random’ [in the term ‘random sample’] should apply t o the method of drawing the sample and not to the sample itself" (1949, p.28).They identified "two sources of error", namely:(a) the error of rejecting a hypothesis that should have been accepted, and(b) the error of accepting a hypothesis that should have been rejected (1928/1967, p.31). In 1930, they elaborated on these two sources of error, remarking that:…in testing hypotheses two considerations must be kept in view, (1) we must be able to reduce the chance of rejecting a true hypothesis to as low a value as desired; (2) the test must be so devised that it will reject the hypothesis tested when it is likely to be false (1930/1967, p.100).In 1933, they observed that these "problems are rarely presented in such a form that we can discriminate with certainty between the true and false hypothesis" (p.187). They also noted that, in deciding whether to accept or reject a particular hypothesis amongst a "set of alternative hypotheses" (p.201), it was easy to make an error:…[and] these errors will be of two kinds:(I) we reject H[i.e., the hypothesis to be tested] when it is true,(II) we accept H0when some alternative hypothesis Hiis true. (1933/1967, p.187)In all of the papers co-written by Neyman and Pearson the expression Halways signifies "the hypothesis to be tested" (see, for example, 1933/1967, p.186).In the same paper[4] they call these two sources of error, errors of type I and errors of type II respectively.[5]Statistical treatmentDefinitionsType I and type II errorsOver time, the notion of these two sources of error has been universally accepted. They are now routinely known as type I errors and type II errors. For obvious reasons, they are very often referred to as false positives and false negatives respectively. The terms are now commonly applied in much wider and far more general sense than Neyman and Pearson's original specific usage, as follows:Type I errors (the "false positive"): the error of rejecting the null hypothesis given that it is actually true; e.g., A court finding a person guilty of a crime that they did not actually commit.Type II errors(the "false negative"): the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., A court finding a person not guilty of a crime that they did actually commit.These examples illustrate the ambiguity, which is one of the dangers of this wider use: They assume the speaker is testing for guilt; they could also be used in reverse, as testing for innocence; or two tests could be involved, one for guilt, the other for innocence. (This ambiguity is one reason for the Scottish legal system's third possible verdict: not proven.)The following tables illustrate the conditions.Example, using infectious disease test results:Example, testing for guilty/not-guilty:Example, testing for innocent/not innocent – sense is reversed from previous example:Note that, when referring to test results, the terms true and false are used in two different ways: the state of the actual condition (true=present versus false=absent); and the accuracy or inaccuracy of the test result (true positive, false positive, true negative, false negative). This is confusing to some readers. To clarify the examples above, we have used present/absent rather than true/false to refer to the actual condition being tested.False positive rateThe false positive rate is the proportion of negative instances that were erroneously reported as being positive.It is equal to 1 minus the specificity of the test. This is equivalent to saying the false positive rate is equal to the significance level.[6]It is standard practice for statisticians to conduct tests in order to determine whether or not a "speculative hypothesis" concerning the observed phenomena of the world (or its inhabitants) can be supported. The results of such testing determine whether a particular set of results agrees reasonably (or does not agree) with the speculated hypothesis.On the basis that it is always assumed, by statistical convention, that the speculated hypothesis is wrong, and the so-called "null hypothesis" that the observed phenomena simply occur by chance (and that, as a consequence, the speculated agent has no effect) — the test will determine whether this hypothesis is right or wrong. This is why the hypothesis under test is often called the null hypothesis (most likely, coined by Fisher (1935, p.19)), because it is this hypothesis that is to be either nullified or not nullified by the test. When the null hypothesis is nullified, it is possible to conclude that data support the "alternative hypothesis" (which is the original speculated one).The consistent application by statisticians of Neyman and Pearson's convention of representing "the hypothesis to be tested" (or "the hypothesis to be nullified") with the expression H0has led to circumstances where many understand the term "the null hypothesis" as meaning "the nil hypothesis" — a statement that the results in question have arisen through chance. This is not necessarily the case — the key restriction, as per Fisher (1966), is that "the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution."[9] As a consequence of this, in experimental science the null hypothesis is generally a statement that a particular treatment has no effect; in observational science, it is that there is nodifference between the value of a particular measured variable, and that of an experimental prediction.The extent to which the test in question shows that the "speculated hypothesis" has (or has not) been nullified is called its significance level; and the higher the significance level, the less likely it is that the phenomena in question could have been produced by chance alone. British statistician Sir Ronald Aylmer Fisher(1890–1962) stressed that the "null hypothesis":…is never proved or established, but is possibly disproved, in the course ofexperimentation. Every experiment may be said to exist only in order to give the factsa chance of disproving the null hypothesis. (1935, p.19)Bayes's theoremThe probability that an observed positive result is a false positive (as contrasted with an observed positive result being a true positive) may be calculated using Bayes's theorem.The key concept of Bayes's theorem is that the true rates of false positives and false negatives are not a function of the accuracy of the test alone, but also the actual rate or frequency of occurrence within the test population; and, often, the more powerful issue is the actual rates of the condition within the sample being tested.Various proposals for further extensionSince the paired notions of Type I errors(or "false positives") and Type II errors(or "false negatives") that were introduced by Neyman and Pearson are now widely used, their choice of terminology ("errors of the first kind" and "errors of the second kind"), has led others to suppose that certain sorts of mistake that they have identified might be an "error of the third kind", "fourth kind", etc.[10]None of these proposed categories have met with any sort of wide acceptance. The following is a brief account of some of these proposals.DavidFlorence Nightingale David (1909-1993),[3] a sometime colleague of both Neyman and Pearson at the University College London, making a humorous aside at the end of her 1947 paper, suggested that, in the case of her own research, perhaps Neyman and Pearson's "two sources of error" could be extended to a third:I have been concerned here with trying to explain what I believe to be the basic ideas[of my "theory of the conditional power functions"], and to forestall possible criticism that I am falling into error (of the third kind) and am choosing the test falsely to suit the significance of the sample. (1947), p.339)MostellerIn 1948, Frederick Mosteller (1916-2006)[11] argued that a "third kind of error" was required to describe circumstances he had observed, namely:∙Type I error: "rejecting the null hypothesis when it is true".∙Type II error: "accepting the null hypothesis when it is false".∙Type III error: "correctly rejecting the null hypothesis for the wrong reason". (1948, p.61)KaiserIn his 1966 paper, Henry F. Kaiser (1927-1992) extended Mosteller's classification such that an error of the third kind entailed an incorrect decision of direction following a rejected two-tailed test of hypothesis. In his discussion (1966, pp.162-163), Kaiser also speaks of α errors, β errors, and γ errors for type I, type II and type III errors respectively.KimballIn 1957, Allyn W. Kimball, a statistician with the Oak Ridge National Laboratory, proposed a different kind of error to stand beside "the first and second types of error in the theory of testing hypotheses". Kimball defined this new "error of the third kind" as being "the error committed by giving the right answer to the wrong problem" (1957, p.134).Mathematician Richard Hamming (1915-1998) expressed his view that "It is better to solve the right problem the wrong way than to solve the wrong problem the right way".The famous Harvard economist Howard Raiffa describes an occasion when he, too, "fell into the trap of working on the wrong problem" (1968, pp.264-265).[12]Mitroff and FeatheringhamIn 1974, Ian Mitroff and Tom Featheringham extended Kimball's category, arguing that "one of the most important determinants of a problem's solution is how that problem has been represented or formulated in the first place".They defined type III errors as either "the error… of having solved the wrong problem… when one should have solved the right problem" or "the error… [of] choosing the wrong problem representation… when one should have… chosen the right problem representation" (1974), p.383).RaiffaIn 1969, the Harvard economist Howard Raiffa jokingly suggested "a candidate for the error of the fourth kind: solving the right problem too late" (1968, p.264).Marascuilo and LevinIn 1970, Marascuilo and Levin proposed a "fourth kind of error" -- a "Type IV error" -- which they defined in a Mosteller-like manner as being the mistake of "the incorrect interpretation of a correctly rejected hypothesis"; which, they suggested, was the equivalent of "a physician's correct diagnosis of an ailment followed by the prescription of a wrong medicine" (1970, p.398).Usage examplesStatistical tests always involve a trade-off between:(a) the acceptable level of false positives (in which a non-match is declared to be amatch) and(b) the acceptable level of false negatives (in which an actual match is not detected).A threshold value can be varied to make the test more restrictive or more sensitive; with the more restrictive tests increasing the risk of rejecting true positives, and the more sensitive tests increasing the risk of accepting false positives.ComputersThe notions of "false positives" and "false negatives" have a wide currency in the realm of computers and computer applications.Computer securitySecurity vulnerabilities are an important consideration in the task of keeping all computer data safe, while maintaining access to that data for appropriate users (see computer security, computer insecurity). Moulton (1983), stresses the importance of:∙avoiding the type I errors (or false positive) that classify authorized users as imposters.∙avoiding the type II errors (or false negatives) that classify imposters as authorized users (1983, p.125).False Positive (type I) -- False Accept Rate (FAR) or False Match Rate (FMR)False Negative (type II) -- False Reject Rate (FRR) or False Non-match Rate (FNMR)The FAR may also be an abbreviation for the false alarm rate, depending on whether the biometric system is designed to allow access or to recognize suspects. The FAR is considered to be a measure of the security of the system, while the FRR measures the inconvenience level for users. For many systems, the FRR is largely caused by low quality images, due to incorrect positioning or illumination. The terminology FMR/FNMR is sometimes preferred to FAR/FRR because the former measure the rates for each biometric comparison, while the latter measure the application performance (ie. three tries may be permitted).Several limitations should be noted for the use of these measures for biometric systems:(a) The system performance depends dramatically on the composition of the test database(b) The system performance measured in this way is the zero-effort error rate. Attackersprepared to use active techniques such as spoofing will decrease FAR.(c) Such error rates only apply properly to biometric verification (or one-to-onematching)systems. The performance of biometric identification or watch-list systems is measured with other indices (such as the cumulative match curve (CMC))∙Screening involves relatively cheap tests that are given to large populations, none of whom manifest any clinical indication of disease (e.g., Pap smears).∙Testing involves far more expensive, often invasive, procedures that are given only to those who manifest some clinical indication of disease, and are most often applied to confirm a suspected diagnosis.test a population with a true occurrence rate of 70%, many of the "negatives" detected by the test will be false. (See Bayes' theorem)False positives can also produce serious and counter-intuitive problems when the condition being searched for is rare, as in screening. If a test has a false positive rate of one in ten thousand, but only one in a million samples (or people) is a true positive, most of the "positives" detected by that test will be false.[17]Paranormal investigationThe notion of a false positive has been adopted by those who investigate paranormal or ghost phenomena to describe a photograph, or recording, or some other evidence that incorrectly appears to have a paranormal origin -- in this usage, a false positive is a disproven piece of media "evidence" (image, movie, audio recording, etc.) that has a normal explanation.[18]。

在报告中解释和评估统计假设检验中的类型I错误和类型II错误

在报告中解释和评估统计假设检验中的类型I错误和类型II错误统计假设检验是指根据观察数据对某个总体或者总体特征做出判断的一种统计推断方法。

在这一过程中，我们需要根据样本数据来评估并得出结论。

然而，由于样本数据只是总体的一部分，所以我们在做出判断时存在两种错误的可能性，即类型I错误和类型II错误。

本文将详细解释和评估这两种错误。

一、类型I错误的概念和影响在统计假设检验中，我们通常设置一个临界值，当计算得到的统计量超过这个临界值时，我们拒绝原假设，认为研究结果存在统计显著性。

而类型I错误，也被称为α错误，就是当原假设为真时，拒绝了原假设的错误判断。

类型I错误可能会导致我们得出错误的结论，即原假设被错误地拒绝。

这种错误可能会导致错误的决策和误导后续的研究。

因此，在统计假设检验中，我们通常将类型I错误的概率限制在一个较小的数值，通常是0.05或0.01。

二、类型II错误的概念和影响与类型I错误相对应，类型II错误，也被称为β错误，是指当原假设不成立时，接受了原假设的错误判断。

类型II错误可能会导致我们无法发现真实的差异或效应。

这种错误可能会导致我们错失发现重要的关联或效应，从而误导我们的研究结论。

与类型I错误类似，我们也希望将类型II错误的概率限制在一个较小的数值，通常是0.05或0.01。

三、影响类型I错误和类型II错误的因素在统计假设检验中，类型I错误和类型II错误的发生与以下几个因素有关：1. 样本大小：样本大小越大，我们对总体的了解就越全面，因此减少了类型I 错误和类型II错误的概率。

2. 显著性水平：显著性水平（即α）是控制类型I错误的概率。

当我们将显著性水平设定为较小的数值时，类型I错误的概率就会下降。

3. 效应大小：效应大小是指总体之间的差异，一般而言，差异越大，我们越容易发现这种差异，因此减少类型II错误的概率。

4. 统计功效：统计功效是指在原假设不成立时，能够正确拒绝原假设的概率。

提高统计功效可以减少类型II错误的概率。

第一类错误和第二类错误分别是什么

第一类错误和第二类错误分别是什么
第一类错误和第二类错误的区别：
第一类错是Ⅰ型错误，拒绝了实际上成立的H0，即错误地判为有差别，这种弃真的错误称为Ⅰ型错误。

其概率大小用即检验水准用α表示。

α可取单尾也可取双尾。

假设检验时可根据研究目的来确定其大小，一般取0.05，当拒绝H0时则理论上理论100次检验中平均有5次发生这样的错误。

第二类错误是Ⅱ型错误，接受了实际上不成立的H0，也就是错误地判为无差别，这类取伪的错误称为第二类错误。

第二类错误的概率用β表示，β的大小很难确切估计。

二者的关系是当样本例数固定时，α愈小，β愈大；反之，α愈大，β愈小。

因而可通过选定α控制β大小。

要同时减小α和β，唯有增加样本例数。

统计上将1-β称为检验效能或把握度，即两个总体确有差别存在，而以α为检验水准，假设检验能发现它们有差别的能力。

实际工作中应权衡两类错误中哪一个重要以选择检验水准的大小。

统计学知识（一类错误和二类错误）

Type I and type II errors(α) the error of rejecting a "correct" null hypothesis, and(β) the error of not rejecting a "false" null hypothesisIn 1930, they elaborated on these two sources of error, remarking that "in testing hypotheses two considerations must be kept in view, (1) we must be able to reduce the chance of rejecting a true hypothesis to as low a value as desired; (2) the test must be so devised that it will reject the hypothesis tested when it is likely to be false"[1]When an observer makes a Type I error in evaluating a sample against its parent population, s/he is mistakenly thinking that a statistical difference exists when in truth there is no statistical difference (or, to put another way, the null hypothesis is true but was mistakenly rejected). For example, imagine that a pregnancy test has produced a "positive" result (indicating that the woman taking the test is pregnant); if the woman is actually not pregnant though, then we say the test produced a "false positive". A Type II error, or a "false negative", is the error of failing to reject a null hypothesis when the alternative hypothesis is the true state of nature. For example, a type II error occurs if a pregnancy test reports "negative" when the woman is, in fact, pregnant.Statistical error vs. systematic errorScientists recognize two different sorts of error:[2]Statistical error: Type I and Type IIStatisticians speak of two significant sorts of statistical error. The context is that there is a "null hypothesis" which corresponds to a presumed default "state of nature", e.g., that an individual is free of disease, that an accused is innocent, or that a potential login candidate is not authorized. Corresponding to the null hypothesis is an "alternativehypothesis" which corresponds to the opposite situation, that is, that the individual has the disease, that the accused is guilty, or that the login candidate is an authorized user. The goal is to determine accurately if the null hypothesis can be discarded in favor of the alternative. A test of some sort is conducted (a blood test, a legal trial, a login attempt), and data is obtained. The result of the test may be negative (that is, it does not indicate disease, guilt, or authorized identity). On the other hand, it may be positive (that is, it may indicate disease, guilt, or identity). If the result of the test does not correspond with the actual state of nature, then an error has occurred, but if the result of the test corresponds with the actual state of nature, then a correct decision has been made. There are two kinds of error, classified as "Type I error" and "Type II error," depending upon which hypothesis has incorrectly been identified as the true state of nature.Type I errorType I error, also known as an "error of the first kind", an αerror, or a "false positive": the error of rejecting a null hypothesis when it is actually true. Plainly speaking, it occurs when we are observing a difference when in truth there is none. Type I error can be viewed as the error of excessive skepticism.Type II errorType II error, also known as an "error of the second kind", a βerror, or a "false negative": the error of failing to reject a null hypothesis when it is in fact false. In other words, this is the error of failing to observe a difference when in truth there is one. Type II error can be viewed as the error of excessive gullibility.See Various proposals for further extension, below, for additional terminology.Understanding Type I and Type II errorsHypothesis testing is the art of testing whether a variation between two sample distributions can be explained by chance or not. In many practical applications Type I errors are more delicate than Type II errors. In these cases, care is usually focused on minimizing the occurrence of this statistical error. Suppose, the probability for a Type I error is 1% or 5%, then there is a 1% or 5% chance that the observed variation is not true. This is called the level of significance. While 1% or 5% might be an acceptable level of significance for one application, a different application can require a very different level. For example, the standard goal of six sigma is to achieve exactness by 4.5 standard deviations above or below the mean. That is, for a normally distributed process only 3.4 parts per million are allowed to be deficient. The probability of Type I error is generally denoted with the Greek letter alpha.In more common parlance, a Type I error can usually be interpreted as a false alarm, insufficient specificity or perhaps an encounter with fool's gold. A Type II error could be similarly interpreted as an oversight, a lapse in attention or inadequate sensitivity.EtymologyIn 1928, Jerzy Neyman (1894-1981) and Egon Pearson (1895-1980), both eminent statisticians, discussed the problems associated with "deciding whether or not a particular sample may be judged as likely to have been randomly drawn from a certain population" (1928/1967, p.1): and, as Florence Nightingale David remarked, "it is necessary to remember the adjective ‘random’ [in the term ‘random sample’] should apply to the method of drawing the sample and not to the sample itself" (1949, p.28).They identified "two sources of error", namely:(a) the error of rejecting a hypothesis that should have been accepted, and(b) the error of accepting a hypothesis that should have been rejected (1928/1967, p.31). In 1930, they elaborated on these two sources of error, remarking that:…in testing hypotheses two considerations must be kept in view, (1) we must be able to reduce the chance of rejecting a true hypothesis to as low a value as desired; (2) the test must be so devised that it will reject the hypothesis tested when it is likely to be false (1930/1967, p.100).In 1933, they observed that these "problems are rarely presented in such a form that we can discriminate with certainty between the true and false hypothesis" (p.187). They also noted that, in deciding whether to accept or reject a particular hypothesis amongst a "set of alternative hypotheses" (p.201), it was easy to make an error:…[and] these errors will be of two kinds:(I) we reject H0[i.e., the hypothesis to be tested] when it is true,(II) we accept H0when some alternative hypothesis H i is true. (1933/1967, p.187)In all of the papers co-written by Neyman and Pearson the expression H0always signifies "the hypothesis to be tested" (see, for example, 1933/1967, p.186).In the same paper[4] they call these two sources of error, errors of type I and errors of type II respectively.[5]Statistical treatmentDefinitionsType I and type II errorsOver time, the notion of these two sources of error has been universally accepted. They are now routinely known as type I errors and type II errors. For obvious reasons, they are very often referred to as false positives and false negatives respectively. The terms are now commonly applied in much wider and far more general sense than Neyman and Pearson's original specific usage, as follows:∙Type I errors (the "false positive"): the error of rejecting the null hypothesis given that it is actually true; e.g., A court finding a person guilty of a crime that they did not actually commit.∙Type II errors(the "false negative"): the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., A court finding a person not guilty of a crime that they did actually commit.These examples illustrate the ambiguity, which is one of the dangers of this wider use: They assume the speaker is testing for guilt; they could also be used in reverse, as testing for innocence; or two tests could be involved, one for guilt, the other for innocence. (This ambiguity is one reason for the Scottish legal system's third possible verdict: not proven.)The following tables illustrate the conditions.Example, using infectious disease test results:Example, testing for guilty/not-guilty:Example, testing for innocent/not innocent – sense is reversed from previous example:Note that, when referring to test results, the terms true and false are used in two different ways: the state of the actual condition (true=present versus false=absent); and the accuracy or inaccuracy of the test result (true positive, false positive, true negative, false negative). This is confusing to some readers. To clarify the examples above, we have used present/absent rather than true/false to refer to the actual condition being tested.False positive rateThe false positive rate is the proportion of negative instances that were erroneously reported as being positive.[6]It is standard practice for statisticians to conduct tests in order to determine whether or not a "speculative hypothesis" concerning the observed phenomena of the world (or its inhabitants) can be supported. The results of such testing determine whether a particular set of results agrees reasonably (or does not agree) with the speculated hypothesis.On the basis that it is always assumed, by statistical convention, that the speculated hypothesis is wrong, and the so-called "null hypothesis" that the observed phenomena simply occur by chance (and that, as a consequence, the speculated agent has no effect) — the test will determine whether this hypothesis is right or wrong. This is why the hypothesis under test is often called the null hypothesis (most likely, coined by Fisher (1935, p.19)), because it is this hypothesis that is to be either nullified or not nullified by the test. When the null hypothesis is nullified, it is possible to conclude that data support the "alternative hypothesis" (which is the original speculated one).The consistent application by statisticians of Neyman and Pearson's convention of representing "the hypothesis to be tested" (or "the hypothesis to be nullified") with the expression H0has led to circumstances where many understand the term "the null hypothesis" as meaning "the nil hypothesis" — a statement that the results in question have arisen through chance. This is not necessarily the case — the key restriction, as per Fisher (1966), is that "the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supplythe basis of the 'problem of distribution,' of which the test of significance is the solution."[9] As a consequence of this, in experimental science the null hypothesis is generally a statement that a particular treatment has no effect; in observational science, it is that there is no difference between the value of a particular measured variable, and that of an experimental prediction.The extent to which the test in question shows that the "speculated hypothesis" has (or has not) been nullified is called its significance level; and the higher the significance level, the less likely it is that the phenomena in question could have been produced by chance alone. British statistician Sir Ronald Aylmer Fisher(1890–1962) stressed that the "null hypothesis":…is never proved or established, but is possibly disproved, in the course ofexperimentation. Every experiment may be said to exist only in order to give the factsa chance of disproving the null hypothesis. (1935, p.19)Bayes's theoremThe probability that an observed positive result is a false positive (as contrasted with an observed positive result being a true positive) may be calculated using Bayes's theorem.The key concept of Bayes's theorem is that the true rates of false positives and false negatives are not a function of the accuracy of the test alone, but also the actual rate or frequency of occurrence within the test population; and, often, the more powerful issue is the actual rates of the condition within the sample being tested.Various proposals for further extensionSince the paired notions of Type I errors(or "false positives") and Type II errors(or "false negatives") that were introduced by Neyman and Pearson are now widely used, their choice of terminology ("errors of the first kind" and "errors of the second kind"), has led others to suppose that certain sorts of mistake that they have identified might be an "error of the third kind", "fourth kind", etc.[10]None of these proposed categories have met with any sort of wide acceptance. The following is a brief account of some of these proposals.DavidFlorence Nightingale David (1909-1993),[3] a sometime colleague of both Neyman and Pearson at the University College London, making a humorous aside at the end of her 1947 paper, suggested that, in the case of her own research, perhaps Neyman and Pearson's "two sources of error" could be extended to a third:I have been concerned here with trying to explain what I believe to be the basic ideas[of my "theory of the conditional power functions"], and to forestall possible criticism that I am falling into error (of the third kind) and am choosing the test falsely to suit the significance of the sample. (1947), p.339)MostellerIn 1948, Frederick Mosteller (1916-2006)[11] argued that a "third kind of error" was required to describe circumstances he had observed, namely:∙Type I error: "rejecting the null hypothesis when it is true".∙Type II error: "accepting the null hypothesis when it is false".∙Type III error: "correctly rejecting the null hypothesis for the wrong reason". (1948, p.61)KaiserIn his 1966 paper, Henry F. Kaiser (1927-1992) extended Mosteller's classification such that an error of the third kind entailed an incorrect decision of direction following a rejected two-tailed test of hypothesis. In his discussion (1966, pp.162-163), Kaiser also speaks of α errors, β errors, and γ errors for type I, type II and type III errors respectively.KimballIn 1957, Allyn W. Kimball, a statistician with the Oak Ridge National Laboratory, proposed a different kind of error to stand beside "the first and second types of error in the theory of testing hypotheses". Kimball defined this new "error of the third kind" as being "the error committed by giving the right answer to the wrong problem" (1957, p.134).Mathematician Richard Hamming (1915-1998) expressed his view that "It is better to solve the right problem the wrong way than to solve the wrong problem the right way".The famous Harvard economist Howard Raiffa describes an occasion when he, too, "fell into the trap of working on the wrong problem" (1968, pp.264-265).[12]Mitroff and FeatheringhamIn 1974, Ian Mitroff and Tom Featheringham extended Kimball's category, arguing that "one of the most important determinants of a problem's solution is how that problem has been represented or formulated in the first place".They defined type III errors as either "the error… of having solved the wrong problem… when one should have solved the right problem" or "the error… [of] choosing the wrong problem representation… when one should have… chosen the right problem representation" (1974), p.383).RaiffaIn 1969, the Harvard economist Howard Raiffa jokingly suggested "a candidate for the error of the fourth kind: solving the right problem too late" (1968, p.264).Marascuilo and LevinIn 1970, Marascuilo and Levin proposed a "fourth kind of error" -- a "Type IV error" -- which they defined in a Mosteller-like manner as being the mistake of "the incorrect interpretation of a correctly rejected hypothesis"; which, they suggested, was the equivalent of "a physician's correct diagnosis of an ailment followed by the prescription of a wrong medicine" (1970, p.398). Usage examplesStatistical tests always involve a trade-off between:(a) the acceptable level of false positives (in which a non-match is declared to be amatch) and(b) the acceptable level of false negatives(in which an actual match is not detected).A threshold value can be varied to make the test more restrictive or more sensitive; with the more restrictive tests increasing the risk of rejecting true positives, and the more sensitive tests increasing the risk of accepting false positives.ComputersThe notions of "false positives" and "false negatives" have a wide currency in the realm of computers and computer applications.Computer securitySecurity vulnerabilities are an important consideration in the task of keeping all computer data safe, while maintaining access to that data for appropriate users (see computer security, computer insecurity). Moulton (1983), stresses the importance of:∙avoiding the type I errors (or false positive) that classify authorized users as imposters.∙avoiding the type II errors(or false negatives) that classify imposters as authorized users (1983, p.125).Spam filteringA false positive occurs when "spam filtering" or "spam blocking" techniques wrongly classify a legitimate email message as spam and, as a result, interferes with its delivery. While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task.A false negative occurs when a spam email is not detected as spam, but is classified as "non-spam".A low number of false negatives is an indicator of the efficiency of "spam filtering" methods. MalwareFalse Positive (type I) -- False Accept Rate (FAR) or False Match Rate (FMR)False Negative (type II) -- False Reject Rate (FRR) or False Non-match Rate (FNMR)The FAR may also be an abbreviation for the false alarm rate, depending on whether the biometric system is designed to allow access or to recognize suspects. The FAR is considered to be a measure of the security of the system, while the FRR measures the inconvenience level for users. For many systems, the FRR is largely caused by low quality images, due to incorrect positioningor illumination. The terminology FMR/FNMR is sometimes preferred to FAR/FRR because the former measure the rates for each biometric comparison, while the latter measure the application performance (ie. three tries may be permitted).Several limitations should be noted for the use of these measures for biometric systems:(a) The system performance depends dramatically on the composition of the test database(b) The system performance measured in this way is the zero-effort error rate. Attackersprepared to use active techniques such as spoofing will decrease FAR.(c) Such error rates only apply properly to biometric verification (or one-to-onematching)systems. The performance of biometric identification or watch-list systems is measured with other indices (such as the cumulative match curve (CMC))∙Screening involves relatively cheap tests that are given to large populations, none of whom manifest any clinical indication of disease (e.g., Pap smears).∙Testing involves far more expensive, often invasive, procedures that are given only to those who manifest some clinical indication of disease, and are most often applied to confirm a suspected diagnosis.。

一二型错误

6、进行假设检验有哪两类错误？如何避免两类错误的发生？
答：在检验无效假设H0 时可能犯两类错误：
1）第一类错误是H0 成立，却否定了它，犯了“弃真”错误，也叫Ⅰ型错。

Ⅰ型错误发生的概率主要受到的显著水平α和样本含量大小的影响
2）第二类错误是H0 不成立，却接受了它，犯了“纳伪”错误，也叫Ⅱ型错误。

Ⅱ型错误发生的概率受到以下几个因素的影响：（1）-（4）第一版生物统计书56页，倒数第二行-57页第二行（或者第二版生物统计书P62页）
采用如下方法避免两类错误发生：
1）控制试验误差：降低标准误，增加样本容量和重复次数。

2）提高试验质量：严格选条件一致的试验材料，采用合理的试验设计，操作中贯彻唯一差原则。

3）确定适宜的显著水平α：假设检验时显著水平α不宜定得太高，就一般试验来说α=0.05，α越大则容易发生Ⅰ型错。

但是，如试验耗费较大，对精确度的要求较高，不容许反复，或者试验结论的应用事关重大，则所选显著水平应高些。

4）试验结果经假设检验后接近无效假设，但未达这一显著水平时，不要轻易下结论，接受
或否定H0，而应重复一次试验。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。