概率统计教育研究
《概率论与数理统计》教学中驱动式教学法的探讨

【 摘要 】 本文探 讨 了《 概率论与数理统计》 教 学 中驱动式教学法 , 介 绍 了驱动式教学 法开展 的意义与具体 实施 步骤 , 通过驱动 式教 学, 可以激发 学生的 学习兴趣 , 提 高教 学效果 , 培养学生运 用知识解决 问题的能力。
【 关键词】 概率统计 驱动式教学 教学效果 【 基金项目】 南华大学 船山学院教改课题( 2 0 1 2 c Y 0 0 8 ) ( 2 0 1 2 C Y 0 0 9 ) 。 【 中图分类号】 0 2 1 【 文献标识码】 A 【 文章编号】 2 o 9 5 — 3 0 8 9 ( 2 0 1 3 ) 0 4 — 0 1 3 9 — 0 2
“概率论与数理统计”课程教学改革

“概率论与数理统计”课程教学改革
近年来,随着数据科学和人工智能技术的快速发展,概率论与数理统计成为了各个领域中至关重要的一部分。
因此,在高等教育中,概率论与数理统计的教学也逐渐受到了重视。
为了更好地满足学生的需求,各个学校在教学改革方面都做出了许多积极的尝试。
一、课程内容实际化
针对怎样更好地让学生理解和应用概率论和数理统计在日常生活和各个领域的应用,目前很多学校开始重新设计课程内容,力求将其更具实际性。
例如,可以通过实例说明课程中的各个概念和公式,在课堂上进行算例演练,帮助学生更好地掌握相关知识。
二、提高教学质量
准确评估学生的实际掌握程度对于教学质量的提升非常重要。
因此,很多学校推行课堂互动,加强师生之间的互动,提高课堂氛围,让学生积极参与。
同时,还可以在课程中融入一些教学实践,例如进行实践调查、编写实践报告等,让学生更深入地理解相关知识。
三、注重创新和研究
随着科技的不断进步,概率论和数理统计的应用范围也在不断扩展。
因此,针对新领域的研究也是教学改革的一个方向。
许多学校开始鼓励教师开展教学研究,研究可行性强且实用性强的新教学方式,为学生提供更好的教学体验。
总之,概率论与数理统计的教学改革需要重视学生实际需求和未来的发展趋势。
通过针对实际性和创新性的教学改革,教学质量和学生满意度得到了显著提高。
未来,学校和教师还应进一步完善和优化课程设计和实施,为学生提供更好的教育教学体验。
课程思政理念下概率论与数理统计教学改革的探索与实践

在概率论与数理统计的教学过程中,教师应当深入挖掘课程中的思政元素, 并将其自然地融入到课程内容中。例如,在讲解随机事件及其概率时,可以引入 现实生活中随机事件的实际例子,如天气预报、彩票中奖等,同时强调概率的公 平性和不确定性,引导学生树立正确的概率观念和理性思维。
2、创新教学方法,提高思政教 育效果
三、实践环节的改革
概率论与数理统计是一门实践性很强的学科,因此实践环节的改革也是必不 可少的。可以通过组织学生进行社会调查、数据分析等活动来提高学生的实践能 力和科学素养。同时,也可以引导学生参与到教师的科研项目中,培养学生的创 新意识和团队协作能力。
四、评价体系的改革
在课程思政理念下,评价体系的改革也应该注重思想政治教育与学科知识的 结合。可以采用多元化的评价体系,包括平时表现、作业完成情况、考试成绩等 多个方面,以全面评估学生的能力和素质。也可以通过开展第二课堂活动、学术 沙龙等方式来丰富评价形式,更好地发挥评价的导向作用。
1、深入挖掘更多的思政元素,提高融合的自然性和有效性。 2、加强教师队伍的培训与学习,提高教师的思政教育能力和综合素质。
3、持续学生的学习进程与反馈,不断调整和优化教学策略。
参考内容
在高等教育中,课程思政是一种重要的教育理念,旨在将思想政治教育融入 专业课程中,以培养学生的思想道德素质和科学素养。概率论与数理统计是高校 中一门重要的数学课程,也是许多专业的基础课程。本次演示旨在探讨在课程思 政理念下,对概率论与数理统计教学进行改革的方法。
教学方法的创新是课程思政理念下教学改革的关键。教师可以采用案例教学、 小组讨论、社会实践等多种方式,引导学生主动思考、积极参与。例如,教师可 以组织学生进行社会调查,运用概率论和数理统计的知识分析调查数据,培养学 生的统计分析能力和社会责任感。
概率论与数理统计多媒体教学有效性研究

2009年7月第7期高教论坛H igher Education ForumJul.2009.No 7收稿日期:2009-03-17 修稿日期:2009-04-07基金项目:新世纪广西高等教育教学改革工程 十一五 第四批立项项目 概率论与数理统计 课程教学改革研究与实践 (2008C028)。
作者简介:黄敢基(1972 ),男,广西钦州人,讲师,硕士,从事概率论与数理统计教学与应用研究。
概率论与数理统计多媒体教学有效性研究黄敢基,尹长明(广西大学 数学与信息科学学院,广西 南宁 530004)摘要:结合作者的教学实践并针对概率论与数理统计课程的特点,从课件制作、教学方式、方法的设计等方面对多媒体环境下如何实现有效教学作了探讨。
关键词:概率论与数理统计;多媒体;有效教学中图分类号:G642.0 文献标识码:A 文章编号:1671-9719(2009)07-0059-03 随着现代多媒体信息技术的迅速发展以及教育部鼓励高校教师树立现代化教育观念,改革传统教学模式,培养创新型人才的相关政策的出台,多媒体技术在教学中的应用日益广泛。
概率论与数理统计是大学理工类、经管类的一门基础课。
学习好概率论与数理统计不仅是为许多专业后继课程学习打下基础,更重要是培养学生的数学素养和数学技能,提高运用数学思想方法和思维方式解决实际问题的能力。
由于该课程具有内容丰富、抽象、应用性强的特点,一直是学生认为比较难学的课程。
多媒体教学作为一种先进的教学手段,具有图文并茂、动静结合、形象直观及信息量大等特点,为数学这一 思维的体操 提供了一个崭新的舞台。
在概率论与数理统计的教学中采用多媒体教学已为许多高校教师所接受和喜爱,因此,探讨如何利用多媒体实现概率论与数理统计课程的有效教学对于提高教学质量、培养创新人才具有重要的意义。
一、选择加工或制作一个优质的课件俗话说 工欲善其事,必先利其器 。
优质的课件是实现有效教学的前提。
课件是呈现教学内容、体现教师教学设计的平台,是课堂上联系教师的教和学生的学的桥梁,因此,课件的好坏就显得极为重要,针对概率论与数理统计的课程特点,可以考虑从以下几个原则出发制作加工课件。
小学数学教材“统计与概率”比较研究的开题报告

小学数学教材“统计与概率”比较研究的开题报告1. 研究背景统计与概率是小学数学教学中的一个重要部分,也是与生活联系最紧密的内容之一。
在现代社会,统计与概率知识无处不在,在各类实践活动中都有广泛应用,如日常生活、商业活动、科学研究、社会管理等领域。
因此,小学生学习统计与概率能够提高其综合素质,培养其实际应用能力和逻辑思维能力,具有非常重要的意义。
目前,各地小学数学教材中对统计与概率的教学方式、教学内容、教学效果等存在差异。
因此,对比分析不同教材之间的差异并探究其原因,对于制定更好的教学计划以及提高小学生统计与概率学习效果具有重要意义。
2. 研究内容本研究将对比分析不同地区小学数学教材的统计与概率部分,探究其教学方式、教学内容和教学效果的差异,并从教材设计、教学管理等角度分析原因。
具体研究内容包括:1. 教材分析。
对不同地区小学数学教材的统计与概率部分进行分析,比较教材编排顺序、难易程度、知识点覆盖率等方面的差异。
2. 调查研究。
通过问卷调查和实地观察等研究方法,了解不同地区小学教师在统计与概率教学中的教学方式、教学过程中存在的问题及其对教学效果的影响。
3. 经验总结。
根据对比分析和调查研究结果,总结出不同教材教学方式和教学策略的优劣之处,并探究提高学生学习效果的有效途径。
3. 研究意义通过本研究,可以从教材编排、教学管理等方面探究小学数学教育中统计与概率教学的现状及存在的问题,为提高小学生统计与概率学习效果提供指导和建议,具有以下意义:1. 加深对不同教材教学方式的理解。
通过对比分析不同教材教学方式,探究其差异并总结出优劣之处,可以帮助教师更好地制定教学计划。
2. 提高教学效果。
对小学数学教育中统计与概率教学存在的问题进行深入研究,探究提高教学效果的有效途径,有利于培养学生实际应用能力和逻辑思维能力。
3. 为教材编写提供参考。
通过对比分析不同教材的差异,可以为制定更符合实际需要的教材提供参考。
4. 研究方法本研究采用问卷调查、实地观察等方法进行,具体如下:1. 对不同地区小学数学教材的统计与概率部分进行分析。
基于大数据分析能力的概率论与数理统计课程教学改革研究

227随着我国科学技术的不断创新与发展,概率论与数理统计作为一门课程,在灵活运用中能够协助我们从海量数据中发现规律,并深入挖掘数据的潜在价值。
通过将大数据分析技术引入概率论与数理统计课程,我们可以探索不同数据之间的关系,并展示数据的意义。
通过使用概率统计中随机数据的演绎和归纳理念来分析海量数据间的联系,可以更清晰地实现理论与实践的结合,帮助学生更容易地掌握这门学科。
这种教学方法基于作者多年的教学经验和对大数据的认识和理解。
同时,该方法也考虑到学生自身发展的实际需求,并符合数字化时代发展的理念。
基于上述背景,文中提出了基于大数据分析能力的概率论与数理统计课程的教学改革策略。
这些策略旨在为今后概率统计课程的发展提供借鉴,并帮助学生更好地适应数字化时代的需求。
一、概率论与数理统计课程教学目标基于大数据分析能力的概率论与数理统计课程,其教学改革旨在适应大数据、人工智能等信息化技术的发展,培养学生深入挖掘数据价值的潜在能力,让他们能将大数据技术应用到更广泛的行业领域。
该课程的教学目标主要体现在以下几个层面:第一,整合专业相关的大数据技术应用案例:针对不同专业的教师和学生,需要收集和汇总与各个专业相关的大数据技术应用案例,并将这些案例有机地融入概率论与数理统计课程中,使学生能够将理论知识应用于实际问题。
第二,采用线上和线下相结合的教学模式:通过开设“微课”、MOOC 等线上学习资源,结合线下的理论学习,学生可以拓展眼界、开阔思维,并提升自主学习能力。
基于大数据分析能力的概率论与数理统计课程教学改革研究第三,建立多样化的考核评价制度:当前许多高校的概率论与数理统计课程的考核内容主要集中在理论知识的了解层面,对实践应用的考核较少,甚至没有考核。
在改革课程考核制度时,可以借鉴数学模型的构建形式,要求学生将概率统计相关的理论知识运用到论文中,以展示他们对知识的掌握程度,并提升他们的课程实践能力。
第四,整合数据统计软件:将数据统计软件与概率论与数理统计课程相融合。
新课标背景下的高中数学概率统计教学方法探讨

新课标背景下的高中数学概率统计教学方法探讨随着新课标的推行,教育教学方法也必然会做出相应的调整和改进。
高中数学作为学生必修的一门学科,其教学方法更是需要不断地与时俱进,以适应新课标的要求。
而概率统计作为高中数学中的一个重要内容,如何在新课标背景下进行教学方法上的探讨,成为了当前教学研究的一个热点。
本文将通过对新课标下高中数学概率统计教学方法的探讨,从理论和实践两方面对其进行深入分析和讨论。
一、理论探讨1. 新课标下的教学理念变化新课标的推行,意味着教学理念上将迎来深刻的变革。
新课标提出了素质教育的理念,强调学生的全面发展和自主学习能力的培养。
在这样的背景下,传统的教学方法已经不能完全适应教育教学的需要。
在概率统计的教学中,需要更注重培养学生的批判性思维、问题解决能力和合作学习能力。
教师应该激发学生的兴趣,引导学生参与课堂,展现自己的不同见解,通过探究性学习等方式,引导学生主动思考、发现问题、解决问题。
2. 教学内容的调整在新课标下,概率统计的教学内容也需要做出相应的调整。
传统的概率统计教学过于以公式和概念为主,偏重计算,而缺乏实际应用和探究性学习。
新课标要求提高学生的实际应用能力,培养学生的数理思维和创新能力,因此需要将概率统计的教学内容与实际生活和科学研究相结合,引导学生主动探究、发现和解决问题,使学生能够在真实情境中运用所学知识进行分析和应用。
3. 教学方法的创新在新课标下,概率统计的教学方法也需要做出创新。
传统的教学方法主要以教师为中心,学生被动接受知识,而新课标要求教学方法要更加注重学生的主体地位和学生的自主学习。
教师应该采用多种教学方法,如启发式教学、问题导向教学、合作式学习等,引导学生积极参与课堂,发挥主动性和创造性,培养学生的自主学习能力。
二、实践探讨1. 通过实例引入概率统计概念在概率统计教学中,教师可以通过实际生活中的例子引入概念,让学生从真实的情境中感受和理解概率统计的概念和方法。
初中阶段“统计与概率”课程内容的几点建议与调查

爱情坚定不渝的追求,也有对当时社会⽣活和时代背景的描述,如果仅通过课堂上刻板的形式讲授,很难将⽂学作品⽣动形象展现给学⽣。
设⽴影视作品欣赏群,通过观看影视作品,能提⾼学⽣的英语听说能⼒,让学⽣感受到真善美,提⾼⾃⾝的⼈⽂素养,激起学⽣学习英美⽂学的热情,⽽在群⾥交流对作品的看法,则有助于学⽣培养的思辨能⼒。
评价指标:将随访3个⽉失败率、1年失败率以及5年失败率加以计算,并将其实时进⾏记录,将其进⾏对⽐分析,失败率越低说明种植体⾻结合的效果越明显,反之,越不明显。
因此,建议课程标准中明确要求:通过案例,让学⽣感受不同类型的数据.具体教学中,可以提供某个班级学⽣的信息表(包括性别、年龄、⾝⾼、体重、体育测试以及有关⽂化测试成绩等),其中既有男、⼥这样表⽰类别的定性数据,也有优、良这样表⽰等级的定序数据,更有⾝⾼、体重等数值型的定量数据,要求学⽣阅读学⽣信息表,引导学⽣思考:表格中有哪些数据,这些数据可以分为哪些不同的类型,各类数据各有什么特点,实际⽣活中是否可以进⾏不同类型数据的转化,结合这个例⼦给出具体转化的办法.1.2 关于数据收集的⽅法关于数据收集,《课程标准(2011版)》明确要求学⽣“经历数据收集的过程”“体会抽样的必要性,通过实例了解简单随机抽样”[2].《普通⾼中数学课程标准(实验)》则要求:“在参与解决统计问题的过程中,学会⽤简单随机抽样⽅法从总体中抽取样本,通过对实例的分析,了解分层抽样和系统抽样.”[3]这样的定位,默认了简单随机抽样相对简单,⽽分层抽样和系统抽样更为复杂.当然,从逻辑上⽽⾔,似乎有些道理,系统抽样需要对总体中各个个体进⾏编号,分层抽样需要进⾏样本的分层.但从操作层⾯看,未必如此.当总体中个体不定或者个体数量⽆限时,⽆法采⽤简单随机抽样,个体数量有限但数量较⼤时,也很难进⾏简单随机抽样.⽽实际⽣活中,有时分层抽样、系统抽样倒⼗分⽅便,如从全校10个班级中抽取50名学⽣开展某项调查,班级、年级就是⾃然的分层,可以采⽤分层抽样,具体到某个班级从50名学⽣中抽取5名学⽣,采⽤系统抽样⼗分⽅便,可以从0~9⼗个数字中随机抽取⼀个数字(如3),则学号尾数是这个数字的学⽣即被抽中.因此,《课程标准(2011版)》中仅让学⽣通过实例感受简单随机抽样,反⽽束缚了学⽣的⼿脚,使得很多调查活动⽆法展开,经历数据收集的过程,流于形式,成为⼀句空话.建议课程标准中能增加“通过案例,感受简单随机抽样、系统抽样和分层抽样等抽样⽅法”,从⽽,确保实际调查研究活动中,学⽣能根据情况选择适合的抽样⽅法,开展调查活动,切实形成数据收集的能⼒.具体教学中,可以呈现⼀个具体的情境,如,为了解全校学⽣对学校某项管理规定的认可程度,设计了调查问卷,要求学⽣给出具体的调查⽅式,并说明各⾃调查⽅式的特点,在学⽣交流的基础上,点出普查、简单随机抽样、系统抽样和分层抽样.教学过程中,学⽣并不⼀定都能⾃主提出这么多种类的⽅案,可以引导学⽣思考已有⽅案的优点与不⾜,进⽽引出新的⽅式.如已经出现了简单随机抽样的情况下,引导学⽣思考,这样的抽样是将全校学⽣作为⼀个整体,但学校有7、8、9三个年级⼏⼗个班级,抽样时最好能兼顾到年级、班级,能否将名额分解到班级呢?这就引出了分层抽样;继续引导学⽣思考,如果每个班级恰抽取5个学⽣,对这些含有50名学⽣的班级都要分别随机产⽣5个学号吗?能否更加快速地得到5个学号?从⽽引出系统抽样.1.3 关于图表《课程标准(2011版)》中对统计图表的要求是“会制作扇形统计图,能⽤统计图直观、有效地描述数据”“能画频数直⽅图,能利⽤频数直⽅图解释数据中的信息”[2]兼之⼩学阶段的“认识条形统计图、扇形统计图、折线统计图,能⽤条形统计图、折线统计图直观⽽有效地表⽰数据”[2],义务教育阶段对统计图的总体要求是“能读懂、会制作”,统计图包括条形统计图、折线统计图、扇形统计图、频数直⽅图等.但,现实⽣活中的统计图表,往往更为灵活多样,常常会基于这些统计图表做出⼀些变化,如两组数据直接对⽐的“复式”的条形统计图、扇形统计图的变异形式——“环形”的统计图,还有根据实际背景问题设计的、更加直观形象的、甚⾄“变异”了的统计图,如图2.现有教科书,基本上严格按照课程标准的要求,仅仅介绍了相对规范的条形图、扇形图、折线图、直⽅图等,⽽很少介绍更为形式多样的统计图,这⽆疑与现实需要有所脱节.从开始侍酒师考试,到摘取侍酒师⼤师桂冠,这⼀路,他⾛了⼗年,谈起当初⽗母并不同意他转换专业去学葡萄酒,他觉得⽗母除了不了解这个⾏业以外,更多是不希望他轻易放弃,但他却轻描淡写地感慨了⼀句:“我做的最正确的⼀个决定,就是去学葡萄酒。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Contributions from the study of the history of statistics in understanding students’ difficulties for the comprehension of the VarianceMichael Kourkoulos, Constantinos TzanakisDepartment of Education, University of Crete, 74100 Rethymnon, Crete, Greecemkourk@edc.uoc.gr tzanakis@edc.uoc.gr1. IntroductionSince the 1980’s didactical studies point out that,students encounter important difficulties to understand variation and its parameters(e.g. Mevarech 1983, Loosen et al. 1985, Huck et al. 1986, Batanero et al. 1994, Shaughnessy 1992, 1999). Nevertheless, as Baker (2004a, p.16,) Reading (2004) and others remark, no much attention was given to variation in didactical research before the end of the 90’s. Only recently there have been some systematic studies on the development of students’ conception of variation. (e.g. Torok and Watson 2000, Watson et al. 2003, Baker 2004(b), Reading 2004, Reading and Shaughnessy 2004, Canada 2006, Garfield & Ben-Zvi 2007, especially pp.382-386).The research work of deLmas & Liu point out that college students have important conceptual difficulties for understanding and coordinate even the simpler of the underlying foundational concepts of the standard deviation and that considerable and well organized teaching work is needed in order to ameliorate the comprehension and coordination of these concepts (deLmas & Liu 2005). For understanding the variance and the standard deviation (s.d.) students’ reasoning has to correspond to the highest of the levels of the developmental hierarchy established by Reading & Shaughnessy (2004), which concerns the description of variation. This is also compatible with Mooney’s corresponding developmental hierarchy (Mooney 2002, pp. 36-37)1.Today’s students are not the only ones for whom the variance appear to be a complex and difficult notions. The historical analysis of Statistics points out that a long, multifarious and conceptually complex path had been followed before a deep understanding of variance was achieved (Stigler 1986, Porter 1986, Tzanakis & Kourkoulos 2006). Examining didactically the historical development of the concept of variance can be useful in its teaching for several reasons of a more general value, but which in the case of Statistics are especially valuable: This historical development was related to several different domains, and the students may appreciate their interrelation and that fruitful research in a scientific domain does not stand in isolation from similar activities in other domains. In addition, it is possible to identify the motivations behind the introduction of the concept of variance, through the study of examples that served as prototypes in its historical development and which may help students to understand it, when they are didactically reconstructed. In fact, history provides a vast reservoir of relevant questions, problems and expositions which may be valuable both in terms of their content and their potential to motivate, interest and engage the learner. Didactical activities designed and/or inspired by history may be used to get students involved into, hence become more aware of, the creative process of “doing mathematics”. As we describe later (section 5), students may do “guided research work” in this context. Moreover, the historical analysis may help to appreciate conceptual difficulties and epistemological obstacles that are worth of more attention since they may bear some similarity with students’ difficulties; hence, to provide clues for explaining some of the students’ difficulties (cf. Tzanakis & Arcavi 2000, section 7.2). Such a historical 1Reading & Shaughnessy’s hierarchy concerns the types of description and measures of variation used by the students, classified according to their cognitive complexity. Moreover a refinement of this hierarchy, based on SOLO taxonomy (Biggs & Collis 1991, Pegg 2003), is proposed by Reading (Reading 2004). Mooney’s developmental hierarchy is part of a broader classification of students’ statistical thinking, presented in Mooney 2002; it concerns mainly the correctness and validity of students’ descriptions and measures of spread.approach can be particularly fruitful for a complex notion, like variance, in a domain (Statistics) for which teaching large populations and didactical researches are relatively recent (Baker 2004a ch4 ).This paper aims to make clear that the historical analysis of the development of basic statistical concepts related to variation, reveals the importance of physical examples in this context, implicitly suggests their possible didactical relevance and points out that examples from social sciences are definitely more complicated, hence, they should be selected and treated with adequate carefulness, especially in introductory statistics courses. Therefore, in section 2 we present some didactically relevant selected elements of the historical development of the concept of variance and in the next sections we comment on them from a didactical point of view, also using data from our previous experimental teaching work (Kourkoulos Tzanakis 2003 a,b Kourkoulos et al. 2006, Tzanakis & Kourkoulos 2006).2.Historical aspects of the development of the statistical concept of varianceDuring the18th century, probabilistic thinking and the treatment of data in astronomy and geodesy followed distinct paths. The convergence and synthesis of these paths, culminating with the works of Gauss and Laplace (from 1809 to 1812), required important developments in both domains, as well as, overcoming deep conceptual barriers (Stigler 1986, part I, Kolmogorov & Yushkevich 1992, ch.4, Maistrov 1974, §§III9, III.10).The discovery of the normal distribution by De Moivre as an approximation to the binomial distribution, Laplace’s work on the approximation of probability distributions, culminating in 1810 with the proof of the central limit theorem, and his works on inverse probability and error functions -aiming at statistical inferences- (Smith 1959 pp.566-575, Laplace 1886/1812 pp.309-327, English translation in Smith 1959 pp.588-604), are key elements concerning the evolution in probabilities necessary for the convergence and synthesis mentioned above (Stigler 1986 chs.2-4).On the other hand, the important development of methods to combine observations in the second half of the 18th century, culminating in 1805 with Legendre’s publication of the least-squares method, was the essential element in the evolution of data treatment in astronomy and geodesy necessary for to the aforementioned convergence to become possible. The development of these methods was enriched by important insights in mechanics and mathematics and by extended acquaintance with the characteristics of the data under consideration. Before Gauss and Laplace’s synthesis,there was no appeal to formal probability theory in developing and establishing these methods,2 although, some limited but essential intuitive probabilistic notions were used. Since observed measures that contain random measurement errors had to be combined, it was considered reasonable to assume that (i) equilibrium centers of sets of observed measures (i.e. averages, centers of gravity) are the most likely values of the correct measures; (ii) positive errors should (most probably) compensate negative ones; and (iii) a line of best fit should minimize the total amount of (weighted, or not) errors’ absolute values. These intuitive probabilistic ideas were enhanced by their compatibility with fundamental mechanical models, and scientists’ acquaintance with their data characteristics; they were further established by the success of these methods in main problems of astronomy and geodesy (Stigler 1986, ch.1).In the evolution of probability, the use of variance and standard deviation (s.d.) appears closely connected to the normal distribution. De Moivre was the first to use a parameter equal to twice the s.d.in his work of 1733, where the normal distribution appears as an approximation to the binomial distribution(Smith 1959, pp.566-575).3 Because of this approximation, in this work he measured distances from the center of the 2Though important works have been done in probability concerning error functions before 1810 (Stigler 1986, ch.3; Henry 2001 pp.51-52 table 9), it had not been possible to use them in methods of treating real data in astronomy and geodesy (Stigler 1986, ch.1).3 This was, also, the first appearance of the normal distribution.symmetrical binomial as multiples of n(where n is the total number of trials). In the 2nd edition of his The Doctrine of Chances (1738) he goes further and explains clearly that n is the unit that should be used for measuring the distances from the center of the distribution and he introduced the term modulus4 for this unit n) (Smith 1959 p.572,Stigler 1986, ch.2, particularly pp.80-85). The interest of using the s.d. (or a multiple of it) as a dispersion parameter was increased as other probability distributions were found to be approximated well by the normal distribution (mainly by Laplace; see Stigler 1986 ch.3; but Lagrange’s memoir of 1776 played also a significant role, ibid pp.117, 118). This approach culminated with Laplace’s formulation and proof of the central limit theorem in 1810 (Laplace 1898a/1810), and the Gauss-Laplace synthesis (1809-1812), which determined a very large category of probabilistic phenomena in which the natural way for measuring distances from the center is by using the s.d. (or a multiple of it) as unit of measurement. 5An interesting relevant parameter that enjoyed popularity during the 19th century is the“probable error”, which is equal to 0,6745s.d. It was introduced by Bessel before 1820 (Stigler 1986 p.230 footnote 5), and played the role of s.d. in many works of this period. The probable error is that multiple of the s.d. that would correspond to the distance from the mean to a quartile if the distribution were normal. An interesting characteristic of the“probable error” is that, although determined by the s.d., it conserves, through the assumption of normality, a close conceptual relation with the interquartile range, which is another basic aggregate of dispersion, easier to understand than the s.d.In the combination of observations in geodesy and astronomy, a first6 significant use of squared distances appeared in Legendre’s work of 18057. In this work he also explained that his use of the sum of squared distances leads to a general method for treating problems concerning the combination of inconsistent observations (the method of least squares). Legendre used three main arguments to convince for the importance of his method: (i) the method of least squares satisfies the criterion of minimizing the total amount of weighted errors8, a criterion then generally accepted; (ii) the solution thus found establishes “a kind of equilibrium among the errors” and reveals the center around which the results of observations arrange themselves; (iii) it is a general and easy-to-apply method. 94The term modulus was used later by Bravais (1846) as a term for the scale parameter of a normal distribution and by Edgeworth for the square root of twice the variance; hence Edgeworth’s modulus was equal to De Moivre modulus divided by √2 (Stigler 1986, p.83; Stigler 1999 p.103;Walker 1931). The term “standard deviation” was introduced by Pearson at the end of the 19th century (Porter 1986 p.13;Baker 2004, p.70; David 1995).5In the 18th century works, error distributions were proposed, which used squared distances from the center of the distribution: Lambert (1765) examined the error function φ(x)= 1/2 √(1- x2) (flattened semicircle). Lagrange in his 1776 memoir examined a family of distributions of the mean error, φ(x), that are proportional to the quantity p2-x2. The error function φ(x)= a2-x2 is also examined by Daniel Bernoulli, in his 1778 memoir (Stigler 1986, ch.3 pp.110, 117, Baker 2004, pp.75, 76). These geometrically motivated distributions, satisfy the basic criteria requested in that period for an error distribution, namely (i) φ(x) is symmetric with zero average; (ii) φ(x) decreases to the right and left of the average; and (iii) φ(x)=0 beyond a certain distance from the average, or at least it is very small and tends to 0. As the examined distributions satisfied these criteria, at that period they could be considered as legitimate candidates of good error distributions. However, neither the initial works, nor later ones reveal any significant domain, or categories of practical situations in which the use of these error distributions leads to efficient treatments.6There was a priority dispute between Gauss and Legendre (Stigler 1999 ch.17; Stigler 1986 145,146). Although this issue may not be entirely settled, it seems clear that both Gauss and Legendre conceived the method independently and that Legendre’s significant contribution was that he realized the generality and power of the method and formulated it in a way that attracted the attention of the scientific community (Stigler 1999, p.331).7Sur la méthode des moindres carrés, reproduced in part in Smith 1959, pp.576-579.8Considering that the weighting coefficients are equal (or proportional) to the errors.9Legendre remarks that there is some arbitrariness in any chosen way to let errors influence aggregate equations (Stigler 1986 p.13).This suggests that he thought that there was no absolutely indisputable reason for choosing the criterion of least squares, although after this remark,he defended firmly his method: “Of all the principles that can beBefore Legendre’s least-squares method, there were other important works in the 2nd half of the 18th century on the treatment of inconsistent observations in astronomy and geodesy, where simpler measures were used for measuring deviations (errors): first order relative and absolute deviations (Boscovich’s method,presented in his works of 1757, 1760, 1770),as well as, weighted deviations (Laplace’s “method of situation” in1799). An earlier, but also influential method was Mayer’s method of 1750 (amended by Laplace in his work of 1787). According to this method, in situations in which more initial (linear) equations than unknowns exist, and these equations are inconsistent (because they are obtained from observed values having errors of measurement), equations were weighted in a simple way (each equation was multiplied by 1, 0 or -1) and then added, in order to obtain an aggregate equation; the final solution was found by solving a system of such aggregate equations10 (Stigler 1986, 31-55).These widely used methods were the conceptual background that allowed Legendre to conceive his method. Thus, the emergence of the least-squares method appears as a natural evolution of previously existing methods of data treatment, rather than as a jump, or discontinuity in their evolution due mainly to one man’s genius. Within the conceptual context formed by the previously existing methods, the least squares method appears as another way of weighting errors, whose important advantages were initially supported by Legendre with theoretical and practical arguments (and later on, by the Gauss-Laplace synthesis and the accumulated experience from its use). 11The simplicity and generality of this method, the interest in the results of the treated examples, and Legendre’s arguments and clarity of presentation were decisive for his method to attract the interest of scientists in astronomy and geodesy from the outset12. The method was gradually disseminated in continental Europe and England so that, by the end of 1825, it had become a standard and widely used tool proposed for this purpose, I think there is none more general, more exact or easier to apply, than that which we have used in this work; it consist of making the sum of the squares of the errors minimum. By this method, a kind of equilibrium is established among the errors, since it prevents the extremes from dominating, is appropriate for revealing the state of the system which more nearly approach the truth.” (Smith 1959 p.577). Then he explained that (i) if there is a perfect match, the method will find it and (ii) the arithmetic mean is a special case of the solutions found with this method. After that he explained that the center of gravity of several equal masses in space, as well as, the center of gravity of a solid body, are also a special case of the solutions found with the method; then, by analogy to the center of gravity he concluded: “We see, then, that the method of least squares reveals to us, in a fashion, the center aboud which all the results furnished by experiments tend to distribute themselves, in such a manner as to make their deviations from it as small as possible.” (ibid p.579). It is explicit in these quotations that Legendre considered an analogy between the properties of the solution obtained by his method and properties of mechanical equilibrium (Stigler pp 11-15, 55-61).Conceptually, this analogy was an important convincing element, especially in this post-Newtonian era, where, for example, the basis of the theoretical framework of astronomy and geodesy was Newtonian and classical mechanics.10The underlying principle of the method is that the system of aggregate equations is more stable (less sensitive to measurement errors) than the systems obtained from the initials equations, if adequate weightings of initial equations are chosen. (Thus, weighting was chosen by taking under consideration simple criteria of mechanical equilibrium, at least for the values of the more important of the involved statistical variables, as well as, other criteria specific to the examined situation). In this early method, errors’ measures are not expressed in the mathematical treatment and properties of errors’ distribution are not directly discussed. Annihilation of the influence of errors is realised through the use of equilibrium criteria, often of an ad hoc and context-dependent character. In fact, at the conceptual level, in this method the key issue is stability, rather than the explicit discussion and treatment of instability factors. In this respect, Boscovich method constitute a significant advance, since the measure of errors is explicit, properties of errors’ are clearly expressed and constitute the key point of the whole treatment for obtaining aggregate equations. 11That a few years earlier (1801), Gauss used the same method for the determination of the orbit of Ceres, the first asteroid ever discovered, is an additional indication that the least-squares method was the outcome of a natural evolution of pre-existing methods of data treatment in geodesy and astronomy (Gauss 1996/1821 p.III; cf. footnote 6 and references therein).12In the same year (1805) the method was presented in the Traité de géodésie by L. Puissant and the next year it was presented in Germany by von Lindenau in von Zach’s astronomical journal (Stigler 1986 p.15).in both disciplines, although there was some resistance and explicit objections13.We have noticed that in Legendre’s initial work there is no explicit appeal to probability for founding the method of least squares (Stigler 1986 pp.11-15, 55-61), despite the fact that important works had already been done on error functions, inverse probability and statistical inference(Stigler 1986, ch.3). For an interpretative probabilistic framework to explain this method, two key elements were still needed: (a) Gauss’ brilliant result in 1809, that the normal distribution is the adequate choice of an error function under apparently quite plausible conditions (Gauss 1996/1809, pp.65-76)1415, and (b) Laplace’s central limit theorem, which allowed him to provide in his works of 1810, 1811 and 1812better explanations for Gauss’ choice and to point out a large family of situations where the normal distribution was an appropriate error function16 (Laplace 1898b/1810, 1898c/1811, 1886/1812 pp.309-327, Stigler 1986 pp.139-148).Gauss and Laplace’s works (from 1809 to 1812) constitute, both a main synthesis of the distinct evolutionary paths followed in the 18th century in probabilities and in the treatment of data in astronomy and geodesy, and the next step in understanding the significance of the sum of squares distances, hence of variance.Concerning the use of the sums of absolute deviations and the sums of squared deviations it is worth noting the following from the works of Laplace and Gauss:In his works of 1810, 1811, 1812, Laplace considered that the basic criterion for selecting a best estimate value for an unknown parameter sought is: to chose as an estimate that value which minimize the posterior expected absolute error (that he called “l’erreur moyen à craindre” - “the mean error to be feared”); and it seems to have no doubt about his criterion17. n these works, he explained that in the examined cases, involving normal distribution, the least square method leads to the same estimate value as his criterion. In the work of 1810 he adds explicitly that in the examined cases this estimate is also the “most probable” (it corresponds to the mode of the posterior distribution), and thus it also satisfies this criterion of choosing a best estimate, which was used by Daniel Bernoulli, Euler and Gauss; however he continues that in the general case his criterion is more appropriate (Laplace 1898b/1810p 352).Gauss in his work of 1821 agrees with Laplace’s idea that the best estimate should be the one which 13The Mayer - Laplace method is simpler and demands much less labour than the least-squares method, therefore it enjoyed popularity until the mid 19th century, even though it is less accurate (Stigler 1986, p. 38-39).As late as 1832, Bowditch was recommending Boscovich’s method, which involves first-order relative and absolute deviations, over least squares, because it attributes less weight to defective observations (Stigler 1986, p.55).14 Gauss relied on the widely spread idea at that time that the arithmetic mean was a very good way for combining observations’ results. He admitted as an axiom that the most probable value of a single unknown quantity observed several times under the same circumstances is the arithmetic mean of the observations, and he proved that, if so, the probability law of the errors of observations has to be a normal distribution. He proved then that in the more general case, this errors’ distribution leads to the method of least squares as the method that provides the most probable estimates of the sought parameters (Stigler 1986 pp.140-143).15In 1808 R. Adrain has also obtained the normal distribution as an appropriate error function, but his work went largely unnoticed (Maistrov 1974, pp.149-150, Stigler 1978).16E.g. in his work of 1810 (Laplace, 1898b/1810)he explained that when the measurements’ errors are aggregates (e.g. sums, or averages) of a large number of commonly distributed elementary errors, the normal distribution approximates their distribution via the central limit theorem, and strengthened the conclusion by proving that in this case, the solution provided by the least-squares method was not only the most probable, but also the most accurate one, (in the sense that it minimizes the posterior expected error; Stigler 1986 pp.143-146, 201-202; Maistrov 1974 p.147; Kolmogorov & Yushkevich 1992, p.225).17For example in his work of 1810 he writes « Pour déterminer le point de l’axe des abscisses où l’on doit fixer le milieu entre les résultats des observations n, n', n'',…nous observerons que ce point est celui où l’écart de la véritéest un minimum ; or, de même que, dans la théorie de probabilités, on évalue la perte à craindre en multipliant chaque perte que l’on peut éprouver par sa probabilité, et en faisant une somme de tous ces produits, de même on aura la valeur de l’écart à craindre en multipliant chaque écart de la vérité, ou chaque erreur, abstraction faite du signe, par sa probabilité, et en faisant une somme de tous ces produits. » (our emphasis; Laplace 1898b/1810 p. 351).minimizes “l’erreur moyen à craindre” (Gauss 1996/1821 pp.11-13), however it determines differently “l’erreur moyen à craindre (m)”: for Gauss, m should be the square root of the expected squared error (m2 =dx∫+∞ϕ, x being an error and φ(x) the “the relative facility of error x”- in modern terminology, the ()xx2−∞probability density distribution of error x to occur).18 This difference in the definition of “the mean error to be feared” in fact changed the criterion for the best estimate value.Gauss discussed his choice in some length for “the mean error to be feared” (ibid p 12):- Initially he admits that there is an element of arbitrariness in the determination of “the mean error to be feared”. (“Si l’on objecte que cette convention est arbitraire et ne semple pas nécessaire nous en convenons volontiers. La question qui nous occupe a, dans sa nature même, quelque chose de vague et ne peut être bien précisée que par un principe jusqu’à un point arbitraire.” (ibid p 12))- Then he establishes an analogy, pretty much like Laplace (footnote 17), between the determination of a quantity through observations and a game of chance where there is “a loss to fear and no gain to expect”. In the context of this analogy, each error, positive or negative, corresponds to a loss of the truth and thus the expected loss is the sum of products of possible losses multiplied by their respective probability to occur. But what loss corresponds to each error? According Gauss, it is this point, which is unclear and needs to be settled by a partially arbitrary convention. (“Mais quelle perte doit on assimiler à une erreur déterminée? C’est ce qui n’est pas clair en soi; cette détermination dépend en partie de notre volonté.”, ibid p.12). The only restriction he considered is that each positive or negative error, should correspond to a loss and not to a gain, and he concludes that among all functions that satisfy this condition “it seems natural to choose the simplest, which is doubtlessly, the square of the error” (“il semble naturel de choisir la plus simple, qui est sans contredit, le carré de l’erreur”); ibid p 12.- Then Gauss considers Laplace’s choice for that function: the errors’ absolute value (“l’erreur elle même prise positivement” - the error itself taken positively). He considered that Laplace’s choice was equally arbitrary to his own, but admitted that it was also equally legitimate, and concluded that his choice was recommended because of the “generality and the simplicity of its consequences”.-What he meant by the “generality and the simplicity of its consequences” becomes clear later in the text. Based on probabilistic arguments, Gauss concluded that if his definition of “the mean error to be feared” (and thus his criterion of best estimate) is accepted, then the method of least squares provides “les combinaisons le plus avantageuses des observations” (the most advantageous combinations of observations) even in the cases that there is a small number of observations and errors’ probability law is any such law, and not necessarily a normal distribution (ibid pp. 21-26). So he extended the cases for which the method of least squares can be considered as preferable beyond those in which the normal distribution is involved, directly or through the central limit theorem. This generalised result offers a unified solution, and thus simplifies the whole problem of the treatment of observations, if his criterion is accepted; this result is known as the Gauss-Markov theorem (Stigler 1986, p 148).Gauss argumentation, that the square of errors is a simple function and thus adequate to be used to define “the mean error to be feared”, may be a convincing argumentation for an educated audience, but that it is simpler than the absolute value of the error it is much less convincing; for example, Laplace thought differently on this issue (see also footnote 13). Most likely, it was the aposteriori appreciation of the expected squared error (the “generality and the simplicity of its consequences”) that mainly convinced Gauss to prefer the use of squared errors than the use of absolute errors.18He writes “Nous ne limitons pas, du reste, cette dénomination [l’erreur moyen à craindre] au résultat immédiat des observations, mais nous l’entendons, au contraire, à toute grandeur qui peut s’en déduire d’un manière quelconque” (ibid p13). So, “l’erreur moyen à craindre”, thus defined, is used not only for the errors of directs observations but also for aggregate errors as well as posterior errors’ distributions.。