nonnormal data and capability anlysis(非正态数据转换及过程能力分析)

合集下载

汽车研发与制造专业术语汇总

TTO Tool Try-Out 工装设备试运行(J1) Job 1 整车投产DFMEA Design Failure Mode Effects Analysis 故障模式影响分析设计DVP Design Verification Plan 设计验证计划DVP&R Design Verification Plan & Report 设计验证计划和结果FMEA Failure Mode Effects Analysis 故障模式影响分析FPDS Ford Product Development System 福特产品开发系统GYR Green-Yellow-Red 绿-黄-红MRD Material Required Date 物料要求到厂日OTT OK-TO-TOOL 可以开模TKO Tooling-Kick-Off 工装启动OEM original Equipment Manufacturer 设备最初制造厂FtF/F2F Face To Face 面对面会议PV Production Validation 产品验证OTS Off-Tooling-Sample 完全工装样件QOS Quality Operating System 质量运作体系TS-16949 Technical Specification – 16949 技术规范-16949APQP Advanced Product Quality Planning 先期产品质量计划IPD In Plant Date 进厂日PPM Parts per Million (applied to defective Supplier parts) 零件的百万分比率（适用于供应商不合格零件）PPAP Production Part Approval Process 生产件批准程序Pre-PV Pre -Production Validation 产品预先验证1PP- First Phase of Production Prove-Out 第一次试生产3C Customer(顾客导向)、Competition(竞争导向)、Competence（专长导向）4S Sale, Sparepart零配件, Service, Survey信息反馈5S 整理，整顿，清理，清洁，素养8D- 8 DisciplineABS Anti-lock Braking SystemAIAG 美国汽车联合会ANPQP Alliance New Product Quality ProcedureApportionment 分配APQP Advanced Product Quality PlanBacklite Windshield 后窗玻璃Benchmark Data 样件资料bloodshot adj.充血的, 有血丝的BMW Bavarian Motor WorksCertified Purchasing manger 认证采购经理人制度CB- Confirmation Build 确认样车制造CC- Change CutOff 设计变更冻结CC\SC- critical/significant characteristicCCR Concern & Countermeasure RequestCCT Cross Company TeamCharacteristics Matrix 特性矩阵图COD Cash on Delivery 货到付现预付货款(T/T in advance) CP1- Confirmation Prototype 1st 第一次确认样车CP2- Confirmation Prototype 2nd 第二次确认样车Cpk 过程能力指数Cpk=Zmin/3CPO Complementary Parts orderCraftsmanship 精致工艺Cross-functional teams 跨功能小组CUV Car-Based Ultility VehicleD1：信息收集；8DD2：建立8D小组；D3：制定临时的围堵行动措施，避免不良品流出；D4：定义和证实根本原因，避免再发；D5：根据基本原因制定永久措施；D6：执行和确认永久措施；D7：预防再发，实施永久措施；D8：认可团队和个人的贡献。

数据分析英语试题及答案

数据分析英语试题及答案一、选择题（每题2分，共10分）1. Which of the following is not a common data type in data analysis?A. NumericalB. CategoricalC. TextualD. Binary2. What is the process of transforming raw data into an understandable format called?A. Data cleaningB. Data transformationC. Data miningD. Data visualization3. In data analysis, what does the term "variance" refer to?A. The average of the data pointsB. The spread of the data points around the meanC. The sum of the data pointsD. The highest value in the data set4. Which statistical measure is used to determine the central tendency of a data set?A. ModeB. MedianC. MeanD. All of the above5. What is the purpose of using a correlation coefficient in data analysis?A. To measure the strength and direction of a linear relationship between two variablesB. To calculate the mean of the data pointsC. To identify outliers in the data setD. To predict future data points二、填空题（每题2分，共10分）6. The process of identifying and correcting (or removing) errors and inconsistencies in data is known as ________.7. A type of data that can be ordered or ranked is called________ data.8. The ________ is a statistical measure that shows the average of a data set.9. A ________ is a graphical representation of data that uses bars to show comparisons among categories.10. When two variables move in opposite directions, the correlation between them is ________.三、简答题（每题5分，共20分）11. Explain the difference between descriptive andinferential statistics.12. What is the significance of a p-value in hypothesis testing?13. Describe the concept of data normalization and its importance in data analysis.14. How can data visualization help in understanding complex data sets?四、计算题（每题10分，共20分）15. Given a data set with the following values: 10, 12, 15, 18, 20, calculate the mean and standard deviation.16. If a data analyst wants to compare the performance of two different marketing campaigns, what type of statistical test might they use and why?五、案例分析题（每题15分，共30分）17. A company wants to analyze the sales data of its products over the last year. What steps should the data analyst take to prepare the data for analysis?18. Discuss the ethical considerations a data analyst should keep in mind when handling sensitive customer data.答案：一、选择题1. D2. B3. B4. D5. A二、填空题6. Data cleaning7. Ordinal8. Mean9. Bar chart10. Negative三、简答题11. Descriptive statistics summarize and describe thefeatures of a data set, while inferential statistics make predictions or inferences about a population based on a sample.12. A p-value indicates the probability of observing the data, or something more extreme, if the null hypothesis is true. A small p-value suggests that the observed data is unlikely under the null hypothesis, leading to its rejection.13. Data normalization is the process of scaling data to a common scale. It is important because it allows formeaningful comparisons between variables and can improve the performance of certain algorithms.14. Data visualization can help in understanding complex data sets by providing a visual representation of the data, making it easier to identify patterns, trends, and outliers.四、计算题15. Mean = (10 + 12 + 15 + 18 + 20) / 5 = 14, Standard Deviation = √[(Σ(xi - mean)^2) / N] = √[(10 + 4 + 1 + 16 + 36) / 5] = √52 / 5 ≈ 3.816. A t-test or ANOVA might be used to compare the means ofthe two campaigns, as these tests can determine if there is a statistically significant difference between the groups.五、案例分析题17. The data analyst should first clean the data by removing any errors or inconsistencies. Then, they should transformthe data into a suitable format for analysis, such ascreating a time series for monthly sales. They might also normalize the data if necessary and perform exploratory data analysis to identify any patterns or trends.18. A data analyst should ensure the confidentiality andprivacy of customer data, comply with relevant data protection laws, and obtain consent where required. They should also be transparent about how the data will be used and take steps to prevent any potential misuse of the data.。

QA常用术语中英对照表格

English (英文) Chinese (中文) English (英文) Chinese (中文)A Cacceptable quality 允收品質central limit theorem 中央極限定理acceptable reliability level (ARL) 允收可靠度水準central line (CL) 中心線acceptance 允收或驗收central tendency 中心傾向acceptance control chart 驗收管制圖chain sampling inspection (CHSP) 鏈鎖抽樣計劃acceptance line 允收線chance cause 機遇因素acceptance quality level (AQL) 允收品質水準characteristic 特性acceptance sampling inspection 驗收抽樣檢驗characteristic value 特性值acceptance test 驗收試驗coefficient of variation (CV) 變異系數acceptance zone 允收區域company-wide quality control (CWQC) 全公司品質管理affinity diagram 親和圖confirm 確認alarm 警告conforming 合格ANOV A, analysis of variance 變異數分析表conforming article 合格品appraisal cost 評估成本conformity 符合assignable cause 異常因素constant failure period 恆常故障(失效)期assurance 保證consumer’s risk (CR)消費者冒險率attribute data 計數值數據consumer’s risk point (CRP)消費者冒險點availability 可用性consumer’s risk quality (CRQ)消費者冒險品質average fraction inspected (AFI) 平均受檢比率continuous sampling inspection 連續抽樣檢驗average outgoing quality (AOQ) 平均出廠品質control 管制average outgoing quality limit (AOQL) 平均出廠品質界限control chart 管制圖average sample number (ASN) 平均樣本數controllable assignable cause 欲控異常因素average total inspection (ATI) 平均總檢驗數corrective maintenance 修復性維護B critical failure 致命性故障bathtub curve 浴缸曲線cumulative distribution function 累積分配函數bias 偏差customer satisfaction index (CSI) 顧客滿意指數binomial distribution 二項分配CUSUM 累積和管制圖bottle neck technique 瓶頸技術 Dbrainstorming (BS) 腦力激蕩術defect 瑕疵defective / defective unit 不良品English (英文) Chinese (中文) English (英文) Chinese (中文)第i 頁，共 5 頁defective fraction 不良率Gdefects per hundred units 百件缺點數Gantt chart 甘特圖dendrogram 枝叉圖gauge 量規dependability 可恃性grade 品級detailed inspection, complete inspection 全數檢驗Hdevelopment 發展histogram 直方圖diagnosis 診斷human factor 人員因素discrimination ratio 判別率human reliability 人員可靠性dispersion 離勢Idouble sampling inspection 雙次抽樣檢驗individual 個體或個別值down time 不能工作時間infinite population 無限母體E inspection 檢驗early failure period 早期故障(失效)期inspection 100% 100%檢驗error of alarm missing 漏發警報的錯誤inspection level 檢驗水準error of false alarm 虛發警報的錯誤inspection lot 檢驗批error-resistant 抗錯誤internal failure cost 內部故障成本estimate 估計international organization of standardization (ISO) 國際標準化組織external failure cost 外部故障成本Ishikawa diagram 石川圖F ISO 9000 family of international standard ISO 9000系列國際標準failure 故障Lfailure mechanism 失效機制least square method 最小平方法failure mode 主要故障失效)模式least squares estimate 最小平方估計值failure mode and effects analysis (FMEA) 失效模式與效應分析level of significance 顯著水準failure rate 故障(失效)率life 壽命fault tree analysis (FTA) 故障樹分析life cycle theory 生命周期理論fault-tolerant 容故障limiting number 界限數final assessment 最后評估limiting quality 極限品質final sample 最終樣本limiting quality level (LQL) 極限品質水準finite population 有限母體log normal distribution 對數正態分配fishbone diagram 魚骨圖location 位置English (英文) Chinese (中文) English (英文) Chinese (中文)lot size 批量non-conforming product 不合格產品第ii 頁，共 5 頁lot tolerance failure rate (LTFR) 批容許故障率或批容許失效率nonconformity 不符合lot tolerance fraction defective (LTFD) 批容許不良率non-defective unit 良品lot tolerance percent defective (LTPD) 批容許不良(百分)率normal distribution 常態分配lot, batch 批normal inspection 正常檢驗lower control line (LCL) 管制下限normal random variable 常態隨機變數M Omaintainability 維護性old seven tools and techniques 舊七種工具或舊七大手法maintenance 維修operating characteristic curve (OC曲線)操作特性曲線management by fact 事實進行管理operating characteristic function (OC函數)操作特性函數man-machine analysis 人機分析operational availability 操作可用性mean life between failures 平均故障間隔壽命organizational structure 組織結構mean life to failure 平均故障前壽命parts per billion (PPB) 十億分率mean time between failure 平均故障間隔時間percent defective 不良百分率mean time between maintenance (MTBM) 平均能工作時間pilot production lot 試驗生產批或先導生產批mean time to failure (MTTF) 平均故障前時間Poisson 卜氏mean time to repair (MTTR) 平均修復時間Poisson distribution 卜氏分配measure 量度population 母體measurement 測量population distribution 母體分配median 中位數population size 母體大小milestone 里程碑precision 精密度mode 眾數prevention cost 預防成本modulus 模數preventive maintenance 預防性維護multiple sampling inspection 多次抽樣檢驗principal component analysis 主成份分析法multi-stage sampling 多段抽樣probability density function 機率密度函數N probability ratio sequential test (PRST) 逐次試驗計劃,又稱機率比值逐次試驗new seven tools and techniques 新七種工具或新七大手法probability sampling 機率抽樣nomography 圖算法procedure 程序non-conforming 不合格process 過程non-conforming article 不合格品process average 制程平均English (英文) Chinese (中文) English (英文) Chinese (中文)process capability 制程能力range 全距process capability index 制程能力指數rational subgroup 合理分組第iii 頁，共 5 頁process decision program chart (PDPC) 過程決策方案圖real time diagnosis 實時診斷process level 制程水準rejection 拒收producer’s risk生產者冒險率rejection line 拒收線producer’s risk point (PRP)生產者冒險點rejection zone 拒收區域producer’s risk quality (PRQ)生產者冒險品質relevant failure 相關故障product liability 產品責任reliability 可靠性program evaluation and review technique (PERT) 計劃評核術reliability acceptance test 可靠度驗收試驗Q reliability function 可靠度函數qualification test 鑒定試驗reliability growth test (RGT) 可靠度成長試驗quality 品質reliability qualification test 可靠度鑒定試驗quality assurance (QA) 品質保證review 評審quality audit 品質稽核risk 冒險quality cause 品質因素road map 路線圖quality chart 品質表robust design 穩健性設計quality control (QC) 品質管制root square transformation 平方根轉換quality control and diagnosis theory 品質管制與診斷理論run 連quality control theory 品質管制理論run length 連長quality improvement 品質改善Squality loop 品質環圈sample 樣本quality losses 品質損失sample size 樣本大小quality management (QM) 品質管理sampling 抽樣quality planning 品質策劃sampling by attributes 計數值抽樣quality policy 品質方針sampling by variables 計量值抽樣quality system 品質系統sampling distribution 抽樣分配quality-related costs 品質成本sampling inspection 抽樣檢驗R sampling plan 抽樣檢驗計劃random failure 偶發性故障sampling scheme 抽樣方案random failure period 偶然故障(失效) sampling unit 樣本單位English (英文) Chinese (中文) English (英文) Chinese (中文) scale 尺度Taylor’s series泰勒數列screening inspection 篩選檢驗technical diagnosis 技術診斷screening test 篩選試驗testability 測試性第iv 頁，共 5 頁sensitivity analysis 靈敏度分析the significant few versus the trivial many 重要的少數與無緊要的多數sequential analysis 逐次分析total quality management (TQM) 全面品質管理sequential sampling inspection 逐次抽樣檢驗transformation 變換service quality loop 服務品質環圈tree diagram 樹形圖service specification 服務規格trend 趨勢services 服務true value 真值short run 短期生產two-stage sampling 兩段抽樣signal to noise ratio SN 比type I error 第I型錯誤simple random sampling 簡單隨機抽樣type II error 第II型錯誤single attribute sampling plan 單次計數值抽樣計劃Usingle sampling inspection 單次抽樣檢驗uncontrollable assignable cause 非控異常因素skewness 偏態(歪斜度) upper bound 上界skip-lot sampling inspection 跳批抽樣計劃upper control line (UCL) 管制上限specification 規範Vspread 分散validation 確認standard deviation 標準差variable data 計量值數據standard transformation 標準轉換variance 變異數state of statistical control 統計管制狀態variation 變異statistic 統計量verification 驗證statistical diagnosis 統計診斷Wstatistical process control (SPC) 統計制程管制wear-out failure period 磨耗故障(失效)statistical quality control (SQC) 統計品質管理weight 權數stratification 分層world-class quality 世界級品質stratified sampling 分層抽樣Zsum of squares 偏差平方和zero defect ZD小組T zero defect program 無缺點計劃tangible product 有形產品第v 頁，共 5 頁。

运用Minitab进行过程能力(Process+Capability)_1

过程能力概述（Process CapabilityOverview）在过程处于统计控制状态之后，即生产比较稳定时，你很可能希望知道过程能力，也即满足规格界限和生产良品的能力。

你可以将过程变差的宽度与规格界限的差距进行对比来片段过程能力。

在评价其能力之前，过程应该处于控制状态，否则，你得出的过程能力的估计是不正确的。

你可以画能力条形图和能力点图来评价过程能力，这些图形可以帮助你评价数据的分布并验证过程是否受控。

你还可以计算过程指数，即规范公差与自然过程变差的比值。

过程指数是评价过程能力的一个简单方法。

因为它们无单位，你可以用能力统计量来比较不同的过程。

一、选择能力命令（Choosing a capability command）Minitab提供了许多不同的能力分析命令，你可以根据数据的属性及其分布来选择适当的命令。

你可以为以下几个方面进行能力分析：⏹正态或Weibull概率模型(适合于测量数据)⏹很可能来源于具有明显组间变差的总体的正态数据⏹二项分布或泊松概率分布模型(适合于属性数据或计数数据)注：如果你的数据倾斜严重，你可以利用Box-Cox转换或使用Weibull 概率模型。

在进行能力分析时，选择正确的分布是必要的。

例如：Minitab提供基于正态和Weibull概率模型的能力分析。

使用正态概率模型的命令提供更完整的一系列的统计量，但是你的数据必须近似服从正态分布以保证统计量适合于这些数据。

举例来说，Analysis (Normal) 利用正态概率模型来估计期望的PPM。

这些统计量的结实依赖于两个假设：数据来自于稳定的过程，且近似服从的正态分布。

类似地，Capability Analysis (Weibull) 利用Weibull 分布模型计算PPM。

在两种情况下，统计的有效性依赖于假设的分布的有效性。

如果数据倾斜严重，基于正态分布的概率会提供对实际的超出规格的概率做比较差的统计。

这种情况下，转化数据使其更近似于正态分布，或为数据选择不同的概率模型。

过程能力分析minitab版

你可以将过程变差的宽度与规格界限的差距进行对比来片段过程能力。

在评价其能力之前，过程应该处于控制状态，否则，你得出的过程能力的估计是不正确的。

你可以画能力条形图和能力点图来评价过程能力，这些图形可以帮助你评价数据的分布并验证过程是否受控。

你还可以计算过程指数，即规范公差与自然过程变差的比值。

过程指数是评价过程能力的一个简单方法。

因为它们无单位，你可以用能力统计量来比较不同的过程。

一、选择能力命令（Choosing a capability command）Minitab提供了许多不同的能力分析命令，你可以根据数据的属性及其分布来选择适当的命令。

在进行能力分析时，选择正确的分布是必要的。

例如：Minitab提供基于正态和Weibull概率模型的能力分析。

使用正态概率模型的命令提供更完整的一系列的统计量，但是你的数据必须近似服从正态分布以保证统计量适合于这些数据。

举例来说，Analysis (Normal) 利用正态概率模型来估计期望的PPM。

这些统计量的结实依赖于两个假设：数据来自于稳定的过程，且近似服从的正态分布。

类似地，Capability Analysis (Weibull) 利用Weibull 分布模型计算PPM。

在两种情况下，统计的有效性依赖于假设的分布的有效性。

如果数据倾斜严重，基于正态分布的概率会提供对实际的超出规格的概率做比较差的统计。

这种情况下，转化数据使其更近似于正态分布，或为数据选择不同的概率模型。

spc专业词汇

quality improvement 质量改进quality control and improvement 质量控制及改进statistical 统计学reliability 可靠性conformance to Standards 符合标准characteristic 特性，性能regression analysis 回归分析random 随机的rectifying inspection 挑选型检验systematic reduction of variability 减少系统性波动acceptance sampling 验收抽样total quality management 全面质量管理company-wide quality control 全公司质量控制total quality assurance 全面质量保证quality standards and registration 质量标准和注册process control 过程控制quality system 质量体系internal audits 内部审核第二章专业词汇binomial distribution 二项分布mean 平均值variance 方差sample fraction defective 样本不合格品率sample fraction nonconforming 样本不合格品率Poisson distribution 泊松分布stem-and-leaf plot 茎叶图frequency distribution and histogram 频率分布和直方图box plot 箱线图probability distributions 概率分布hypergeometric distribution 超几何分布Pascal and related distributions 帕斯卡及其相关分布normal distribution 正态分布exponential distribution 指数分布the first quartile 第一四分位数the third quartile 第三四分位数Inter quartile range 四分位距sample mean 样本均值sample variance 样本方差sample standard deviation 样本标准差sample median 样本中位值mode 众数continuous distributions 连续分布discrete distributions 离散分布Bernoulli trials 伯努利试验（或贝努利试验）第三章专业词汇statistical process control (SPC) 统计过程控制check sheet 检查表Pareto chart 排列图cause-and-effect diagram 因果图defect concentration diagram 缺陷位置图scatter diagram 散布图control chart 控制图in statistical control 处于统计控制状态assignable causes 非随机原因，可查明的原因standard deviation 标准差average to signal(ATS) 平均报警时间（指：过程发生变化后平均发信号时间）average run length(ARL) 平均链长ATS=ARL×h（h 为时间）false alarms 误发警报missing alarms 漏发警报positive correlation 正相关causality 因果关系capability 能力（第四章出现该词通常指过程能力的意思）trial control limits 试验用控制限（指试验用控制图的控制限）specification limits 规范限，规格限current control 当前（生产）控制X bar and R chart 均值－极差控制图in control 受控（状态）out of control 失控（状态）process variability 过程波动unbiased estimator 无偏估计量departures 偏离variable sample size 可变样本容量exhibited 呈现recompute 重新计算parameter 参数equation 等式，公式standard values 标准值（指过程参数）the process mean 过程均值make process modifications 过程改进Cyclic patterns 周期性（变化）模式A shift in process level 过程水平发生偏移standard normal cumulative distribution function 标准正态累积分布函数quality characteristic 质量特性range 极差nonconforming 不符合，不合格nominal value 标称值subgroup 子组rational subgroup 合理子组range method 极差法weighted average approach 加权平均法the moving range 移动极差control chart for individual measurement 单值控制图operating-characteristic curves 操作特性曲线Over control 过度控制a shift in process level 过程水平偏移第五章专业词汇fraction nonconforming 不合格品率target value 目标值variable-width control limit 可变宽度控制界限individual sample 每个样品specific sample size 特定样本大小the upper control limit 控制上限the lower control limit 控制下限square root 平方根estimate of the standard deviation 标准偏差估计average sample size 平均样本容量approximate set of control limits 近似的一组控制限the standardized control chart 将控制图标准化（指通用控制图）nonrandom 非随机的nonconformities per unit 单位不合格数the preliminary data 原始数据the average number of nonconformities per unit 平均单位不合格数the number of inspection units 检验单位个数variable control limits 可变控制限center line 中心线process fraction nonconforming 过程不合格品率不合格品率控制计算公式：nonconformity 不符合、不合格第六章专业词汇process capability analysis 过程能力分析probability plot 概率图process capability ratio （PCR）过程能力指数off-center process （分布中心）偏离公差中心的过程confidence interval 置信区间uniformity 一致性quality characteristic 质量特性product characteristic 产品特性tolerance 公差vendor 供方designed experiments 实验设计chi-square distribution 卡方分布process performance indices 过程性能指数normal probability plot 正态概率图variables 计量，计量值（注意：variable 意思是“可变的，变量”）第七章专业词汇sampling plan 抽样方案sampling scheme 抽样计划acceptance sampling 验收抽样items 项目，产品liability risk 可靠性风险、责任风险Lots 批lot-by-lot 逐批attributes 计数，计数值single-sampling plan 一次抽样方案acceptance number 接收数inspection 检验OC curve(the operating characteristic curve）操作特性曲线probability of acceptance 接收概率discriminatory power 判别力、鉴别力（指判别批质量好坏的能力）acceptable quality limit（AQL）接收质量限lot tolerance percent defective(LTPD) 批允许不合格品率rejectable quality level(RQL) 拒收质量水平limiting quality level(LQL) 极限质量水平probability distribution 概率分布finite size 有限（样本）容量lot fraction defective 批不合格品率fixed percentage 固定百分比double sampling 二次抽样a final lot dispositioning decision 批的最终处置决定the fraction defective 不合格品率sample size code letter 样本大小字母tightened inspection 加严检验nonconformities per 100 items 每百单位产品不合格数lot size 批量100% inspection 全数检验，100% 检验rejection number 拒绝数reduced inspection 放宽检验skip-lot sampling 跳批抽样sampling procedures 抽样程序defective 不合格品average sample number curve 平均样本量曲线double-sampling plan 二次抽样方案curtailed inspection 截尾检验multiple-sampling plan 多次抽样方案disposition decision （批）处置决定subsequent sample 后续样本specified values 规定值sequential-sampling plan 序贯抽样方案sampling procedures for inspection by attributes 计数抽样检验程序continuing series of lot 连续多批LQ 极限质量poor lot 劣质批lots in isolation 孤立批percent nonconforming(in a sample) (样本)不合格品百分数responsible authority 负责部门limiting quality 极限质量isolated lot inspection 孤立批检验skip-lot sampling procedures 跳批抽样程序sentence 判别audit tool 审核工具accept with no inspection 免检average outgoing quality 平均检出质量AOQ rectifying inspection 挑选型抽样方案lot sentencing 批的判断random sampling 随机抽样ideal OC curve 理想 OC 曲线the producer's risk point 生产方风险点the consumer's risk point 使用方风险点P96--P98 Trial control limits 试验用控制限Current control 实时控制X bar and R chart 均值－极差控制图Statistical background 统计背景In control 受控Out of control 不受控Process variability 过程变量。

你的数据是正态分布吗？

Are You Sure Your Data Is Normal?Most processes are not normally distributed. Most Six Sigma tools, however, assume normality. Choose the right tool for your next process analysis.Most processes, particularly those involving life data and reliability, are not normally distributed. Most Six Sigma and process capability tools, however, assume normality. Only through verifying data normality and selecting the appropriate data analysis method will the results be accurate. This article discusses the non-normality issue and helps the reader understand some options for analyzing non-normal data.IntroductionSome years ago, some statisticians held the belief that when processes were not normally distributed, there was something "wrong" with the process, or even that the process was "out of control." In their view, the purpose of the control chart was to determine when processes were non-normal so they could be "corrected," and returned to normality. Most statisticians and quality practitioners today would recognize that there is nothing inherently normal (pun intended) about the normal distribution, and its use in statistics is only due to its simplicity. It is well defined, so it is convenient to assume normality when errors associated with that assumption would be minor. In fact, most of the efforts done in the interest of quality improvement lead to non-normal processes, since they try to narrow the distribution using process stops. Similarly, nature itself can impose stops to a process, such as a service process whose waiting time is physically bounded at the lower end by zero. The design of a waiting process would move the process as close zero as economically possible, causing the process mode, median and average to move towards zero. This process would tend towards non-normality, regardless of whether it is stable or non-stable.Many processes do not follow the normal distributions. Some examples of non-normal distributions would include:Cycle timeCalls per hourCustomer waiting timeStraightnessPerpendicularityShrinkageAnd many othersTo help you understand the concept, let us consider a data set of cycle time of a process (Table 1). The lower limit of the process is zero and the upper limit is 30 days. Using the Table 1 data, the process capability can be calculated. The results are displayed in Figure 2.Table 1: Cycle Time Date19 11 30 24 24 20 28 27 26 2017 53 32 33 39 26 20 48 21 3436 43 42 43 41 40 35 24 21 2322 20 25 39 26 53 19 13 27 2835 11 42 38 32 27 24 22 18 1717 15 15 9 51 26 25 13 47 3752 17 4 8 9 18 16 5 49 118 31 14 31 24 19 41 2 25 18 7 16 34 21 9 14 31 16 145 2 10 42 21 15 21 11 15 229 32 56 48 27 24 21 24 31 3331 15 40 27 24 22 14 13 13 1414 43 37 18 17 47 10 13 14 228 54 8 25 8 19 18 9 3 3221 16 6 36 36 9 21 7 28 2820 17 25 15 21 10 11 6 4 821 23 22 5 5 21 15 13 14 612 34 15 14 7 6 9 14 23 187 10 14 26 12 28 30 26 34 1425 17 13 18 19 21 27 27 23 1312 2 2 24 35 12 28If you observe the normal distribution curve for both the within and overall performance, you would see that the curve extends beyond zero and calculates the long term PPM less than zero as 36465.67.Is this applicable in this process where the process is bounded by zero? Use of the normal distribution for calculating the process capability actually penalizes this process because it assumes data points outside of the lower specification limit (below zero) when it is not possible for that to occur.The first step in data analysis should be to verify that the process is normal. If the process is determined to be non-normal, various other analysis methods must be employed to handle and understand the non-normal data.For the above data, if we calculate the basic statistics they would indicate whether the data is normal or not. Figure 3 below indicates that the data is not normal. The P value of zero and the histogram help in confirming that the data is not normal. Also, the fact that the process is bounded by zero is an important point to consider.The most common methods for handling non-normal data are:Sub group averagingSegmenting dataTransforming dataUsing different distributionsNon-parametric statisticsSub Group AveragingAveraging the sub groups(recommended size greater than 4) usually produces a normal distributionThis is often done with control chartsWorks on the central limit theoremThe more skewed the data, the more samples are neededSegmenting DataData sets can often be segmented into smaller groups by stratification of dataThese groups can then be examined for normalityOnce segmented, non-normal data sets often become groups of normal data setsTransforming DataBox-Cox transformations of dataLogit transformation for Yes or No dataTruncating distributions for hard limits (like the data set presented here)Application of meaningful transformationsTransformations do not always workUsing Different DistributionsWiebull DistributionsLog NormalExponentialExtreme valueLogisticLog logisticNon-Parametric StatisticsUsed for statistical tests when data is not normalTests using medians rather than meansMost often used when sample sizes of groups being compared are less than 100, but just as valid for larger sample sizesFor larger sample sizes, the central limit theorem often allows you to use regular comparison testsWhen performing statistical tests on data, it is important to realize that many statistical tests assume normality. If you have non-normal data, there are parametric equivalent statistical tests that should be employed. Table 2 below summarizes the statistical tests to use with normal process data, as well as the and non-parametric statistical test equivalents.Table 2: Common Statistical Tests For Normal & Non Parametric DataAssumes Normality No Assumption RequiredOne sample Z test One sample SignOne sample t test One sample WilcoxonTwo sample t test Mann - WhitneyOne way ANOVA Kruskal - Wallis Moods MedianRandomized Block(Two way ANOVA Analysis)The Friedman TestIf we look back at our T able 1 data set where zero was the hard limit, we can illustrate what tests might be employed when dealing with non-normal data.Setting A Hard LimitIf we set a hard limit at zero and re-run the process capability, the results are presented in Figure 4.Figure 4 now indicates that the long term PPM is 249535.66, as opposed to 286011.34 in Figure 2. This illustrates that quantification can be more accurate by first understanding whether the distribution is normal or non-normal.Weibull DistributionIf we take this analysis a step further, we can determine which non-normal distribution is a best fit. Figure 5 displays various distributions overlaid on the data. We can see that the Weibull distribution is the best fit for the data.Knowing that the Weibull distribution is a good fit for the data, we can then recalculate the process capability. Figure 6 shows that a Wiebull model with the lower bound at zero would produce a PPM of 233244.81. This estimate is far more accurate than the earlier estimate of the bounded normal distribution.Box-Cox TransformationThe other method that we discussed earlier was transformation of data. The Box-Cox transformation can be used for converting the data to a normal distribution, which then allows the process capability to be easily determined. Figure 7 indicates that a Lambda of 0.5 is most appropriate. This Lambda is equivalent to the square root of the data.After using this data transformation, the process capability is presented in Figure 8. This transformation of data estimates the PPM to be 227113.29, which is very close to the estimate provided by the Weibull modeling.We have seen three different methods for estimating the appropriate process capability of the process in case the data is from a non-normal source: Setting a hard limit on a normal distribution, using a Weibull distribution, and using the Box-Cox transformation.Sub-Group AveragingNow let us assume that the data is collected in time sequence with a subgroup of one. The X bar R chart with subgroups cannot be used. Had the data been collected in subgroups the Central limit theorem would come in handy and the data would have exhibited normality. If we use the Individual Moving Range chart -- which is the more appropriate chart to use -- Table 3 displays the results.Table 3: I/MR for Cycle TimeTest Results for I ChartTEST 1. One point more than 3.00 sigmas from center line.Test Failed at points: 12 36 55 61 103 132TEST 2. 9 points in a row on same side of center line.Test Failed at points: 28TEST 3. 6 points in a row all increasing or all decreasing.Test Failed at points: 49 50TEST 5. 2 out of 3 points more than 2 sigmas from center line (on one side of CL).Test Failed at points: 23 24 25 61 80 104 203TEST 6. 4 out of 5 points more than 1 sigma from center line (on one side of CL).Test Failed at points: 15 22 23 24 25 26 27 45 82 159 160TEST 8. 8 points in a row more than 1 sigma from center line (above and below CL).Test Failed at points: 27Test Results for MR ChartTEST 1. One point more than 3.00 sigmas from center line.Test Failed at points: 12 55 62 69 70 78 127 132 133TEST 2. 9 points in a row on same side of center line.Test Failed at points: 52 53 54 200 201 202 203Figure 9 indicates that the process is plagued by special causes. If we focus only on those points that are beyond the three Sigma limits on the I chart, we find the following data points as special causes.TEST 1. One point more than 3.00 sigmas from centerline.Test Failed at points: 12 36 55 61 103 132.The primary assumption in the Figure 9 control chart is that the data is normal. If we plot the I-MR chart applying the Box-Cox transformation as above, the looks much different (see Figure 10).Table 4: I/MR for Cycle TimeTest Results for I ChartTEST 1. One point more than 3.00 sigmas from center line.Test Failed at points: 80TEST 2. 9 points in a row on same side of center line.Test Failed at points: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 110111TEST 3. 6 points in a row all increasing or all decreasing.Test Failed at points: 49 50TEST 5. 2 out of 3 points more than 2 sigmas from center line (on one side ofCL).Test Failed at points: 61 80 92 104 165 203TEST 6. 4 out of 5 points more than 1 sigma from center line (on one side ofCL).Test Failed at points: 15 22 23 24 25 26 27 45 82 113 159 160TEST 8. 8 points in a row more than 1 sigma from center line (above andbelow CL).Test Failed at points: 27Test Results for MR ChartTEST 1. One point more than 3.00 sigmas from center line.Test Failed at points: 55 69 78 80 132 133 140TEST 2. 9 points in a row on same side of center line.Test Failed at points: 29 30 31 32 33 52 53 54 200 201If we attempt to study the number of points outside the three sigma limits on the I chart, we note that the test fails at only one point -- number 80 -- and not at points 12 36 55 61 103 132 as indicated by the Figure 8 chart earlier. In fact, if you study the results of other failure points one would realize the serious consequences of assuming normality: doing so might cause you to react to common causes as special causes, which would lead to tampering of the process. It is important to note that for X bar R chart the problem with normality is not serious due to the central limit theorem, but in the case of Individual Moving Range chart there could be serious consequences.Now let's consider a case where we would like to study whether the mean cycle time of this process is at 20 days. If we assume data normality and run a one sample t test to confirm this hypothesis, Table 5 displays the results.Table 5: One-Sample T - Cycle TimeTest of mu = 20 vs mu not = 20Variable N Mean StDev SE Meanturn around 207 21.787 12.135 0.843Variable 95.0% CI T Pturn around ( 20.125, 23.450) 2.120.035Based on the above statistics, one would pronounce at an alpha risk of 5% that the mean of the data set is different than 20. If we were to verify the fact that the data is not normal, we would have run the one sample Wilcoxon test which is based on the medians rather than the means, and we would have obtained the results found in Table 6.Table 6: Wilcoxon Signed Rank Test: Cycle TimeTest of median = 20.00 versus median not = 20.00N for Wilcoxon EstimatedN Test Statistic P Medianturn aro 207 202 11163.5 0.273 21.00The Wilcoxon test indicates that the null hypothesis (test median is equal to 20) is accepted, and there is no statistical evidence that the median is different than 20.The above example illustrates the fact that assuming that the data is normal and applying statistical tests is dangerous. As a better strategy in data analysis, it is better to verify the normality assumption and then -- based on the results -- use an appropriate data analysis method.。

无统计学差异英文描述

无统计学差异英文描述No Statistical Difference: A Technical and Interpretive Exploration.In the field of scientific research, statistical significance is a cornerstone for drawing conclusions and making informed decisions. When two or more groups are compared, the researcher often aims to determine whether there is a statistically significant difference between them. However, when the results indicate no statistical difference, it can be equally as important as finding a significant difference, as it provides valuable insights into the nature of the data and the underlying phenomena.Understanding Statistical Significance.Before delving into the implications of no statistical difference, it is crucial to understand what statistical significance means. Statistical significance is the likelihood that an observed effect or difference betweengroups is not due to chance but is instead a genuine reflection of a real-world difference. It is typically determined using statistical tests, such as the t-test or ANOVA, and is expressed as a p-value, which represents the probability of obtaining the observed results or more extreme results under the null hypothesis (that there is no difference between groups).Interpreting No Statistical Difference.When the results of a statistical test indicate no significant difference between groups, it means that the observed differences are not large enough to confidently reject the null hypothesis. This does not necessarily mean that there is no difference between the groups in the real world; it only means that the data do not provide enough evidence to support the existence of a difference.There can be several reasons for not observing a statistically significant difference. Firstly, the sample size may be too small to detect a difference, even if one exists. Secondly, the effect size may be small, meaningthat the difference between groups is not large enough tobe detected with the given sample size. Additionally, the variance within the groups may be high, obscuring any potential differences.Implications of No Statistical Difference.The absence of statistical significance does not necessarily mean that the research question is answered conclusively. Here are some implications to consider when interpreting no statistical difference:1. Limited Conclusion: Without statistical significance, one cannot confidently claim that there is a real-world difference between the groups. The results are inconclusive and require further investigation.2. Need for Replication: Replicating the study with a larger sample size or under different conditions may help clarify the presence of a difference. Replication iscrucial in science to ensure the reliability of findings.3. Exploring Other Variables: The absence of a statistically significant difference may suggest that other variables or factors are influencing the observed outcomes. Future research can explore these potential confounders or moderators.4. Practical Implications: Even if there is no statistically significant difference, there may still be practical implications for the target population or industry. For example, small but consistent effects maystill have a meaningful impact on real-world outcomes.5. Research Hypotheses: The absence of statistical significance may require researchers to reevaluate their hypotheses or consider alternative explanations for the observed data.Conclusion.In summary, the absence of statistical significance can be as informative as finding a significant difference. It provides an opportunity to reflect on the nature of thedata, consider alternative explanations, and plan forfuture research. By acknowledging the limitations of statistical tests and interpreting results in the context of the overall research objectives, researchers can draw more comprehensive and nuanced conclusions from their data.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Stat>Quality Tools>Individual Distribution Identification
分布识别图
Tim e 的概率图
Logistic 99.9 99 90 - 95% 置信区间 99.9 99 90 对数 Logistic - 95% 置信区间拟合优度检验 Logistic AD = 3.715 P 值 < 0.005 对数 Logistic AD = 0.498 P 值 = 0.169 3 参数对数 Logistic AD = 0.322 P 值 = * Johnson 变换 AD = 0.267 P 值 = 0.680
检验正态性。选择最佳转换(如果需要)。执行流程能力分析。如果数据非正态。判定最佳分布。
使用最适合分布执行流程能力分析。比较结果。
谢谢！
谢谢您的参与,下次课再见!
百分比
50 10 1 0.1 -30 0 Time 3 参数对数 Logistic 99.9 99 - 95% 置信区间 30 60
百分比
50 10 1 0.1 1 10 Time 正态 - 95% 置信区间
99.9 99 90
100
百分比
50 10 1 0.1 1 10 100 T i m e - 阈值 1000
非正态数据转换及过程能力分析
Nonnormal Data
关于这个模块
使用为正态分布数据设计的工具主要依赖于数据确实接近正态分布且稳定(没有异常原因)。这个模块主要研究对非参数数据起作用的方法。
我们将学到…
1. 为什么我们需要正态数据 2. 如何检验正态数据 – 所有的数据 – 分层的数据 3. 我们如何对连续数据实施转换使其正态 – 转换的类型 – 转换选择 – Box-Cox 方法 4. 非正态数据的流程能力分析
时间序列图
Graph > Time Series Plot
T im e 的时间序列图
60
50 40
Time
30 20 10
0 1 10 20 30 40 50 指数 60 70 80 90 100
结论既不是因子(决定或区域) 也不是时间移动了数据。因此，假设决定时间是非正态的是可靠的。
看数据是否符合一个已知的分布
40
50
60
检查分层数据
Graph>Dot Plot
Minitab 输出
T im e 的点图
Decision Approved
Zone East North South
两种因子分层都没有明显的证据
West Rejected East North South West 8 16 24 Time 32 40 48 56
其他正态检验方法？
正态检验
Tim e 的概率图
正态 - 95% 置信区间
99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1
均值 12.31 标准差 9.656 N 100 AD 5.738 P 值 <0.005
百分比
-30
-20
-10
0
10 20 Time
30
非正态数据
不是所有的数据都是正态
只在一个方向有长尾的我们就说他是偏斜的
使用非正态数据的结果
案例：计算正态曲线的流程西格玛值
为了判定流程西格玛，找到超出规格界限的缺陷区域：[在Z表中查找不合格率]如果数据是非正态的，那么估计的缺陷区域将会不正确，使用这种方法将会得到错误的流程西格玛值。
1. 绘制决定时间的直方图 2. 构建决定时间的正态图 3. 决定时间是正态分布的吗？ 4. 检查时间图和分层的点图，判定是否： – 有偏移发生在特殊的时间， – 另一个因子导致偏斜。
直方图
直方图
Tim e 的直方图
40
30
频率
20
10
0 0 10 20 30 Time 40 50
正态性检验
Stat>Basic Stat>Normality Test
寻找使数据偏斜的因子始终绘制数据的时间序列图 – 寻找流程平均数或可变性的偏移 – 可能导致偏斜始终根据其他因子对数据分层 – 某个特殊组可能导致偏斜
案例：决定时间
目标：使用Minitab 检查数据组的正态性。背景：一个商业贷款业务根据作出决定的时间收集数据。可以得到100笔贷款申请的数据。时间假定是按天记录。18天是允许决定的最长时间。
形状改变转换
BOX-COX
结果分析
检验转换后是否正态
检验结果
正态检验
改进前, 改进后的概率图
正态 - 95% 置信区间
99
95 90 80 70
变量改进前改进后均值标准差 N AD P 522.5 6 8 0.208 0.792 531.6 9.778 10 0.257 0.638
百分比
60 50 40 30 20 10 5
1
490
500
510
520
530 数据
540
550
Байду номын сангаас
560
570
使用Lambda
Box-Cox 转换
Box-Cox 幂转换把Y 升高到l1的幂,幂转换包括以下方面：
能力分析
能力分析
能力分析
该你了…
数据收集了银行的排队时间，上界限是3.5分钟。数据在“Bank Queue”栏中。
百分比
90
50 10 1 0.1
-4 Johnson 变换后
0 Time
4
绘制流程能力图
Stat>Quality Tools>Process Capability>Process Capability Analysis Nonnormal
能力分析
什么是转换？
“线性” 转换线性转换有以下形式： Y = aX + b; 你能：数据乘以一个常数 u 给数据增加一个常数 u 或两者都做线性转换的例子 u 乘以或增加一个常数会影响数据的形状(分布) ；它将只是改变刻度。 u 改变指数或使用三角函数会影响数据的形状。
受非正态数据影响的方法
方法非正态数据的结果
流程西格玛计算不正确的流程西格玛值个体控制图假设检验
错误探测某些异常原因，错过其他的信号关于组群间差异不正确的结论
回归分析
实验设计
错误识别重要因子，糟糕的预测能力
关于重要因子不正确的结论，糟糕的预测能力
注意！
有些时候异常原因导致正态数据出现偏斜