单词统计

合集下载

统计英语词汇

统计英语词汇abscissa 横坐标absence rate 缺勤率absolute number 绝对数absolute value 绝对值accident error 偶然误差accumulated frequency 累积频数alternative hypothesis 对立假设analysis of data 分析资料analysis of variance(ANOVA) 方差分析arith-log paper 算术对数纸arithmetic mean 算术均数arithmetic weighted mean 加权算术均数assumed mean 假定均数asymmetry coefficient 偏度系数average deviation 平均差average 平均数bar chart 直条图、条图bias 偏性binomial distribution 二项分布biometrics 生物统计学bivariate normal population 双变量正态总体box plot 盒须图cartogram 统计图case fatality rate(or case mortality) 病死率census 普查central tendency 集中趋势chi-sguare(X2) test 卡方检验class boundaries 组界class interval 组距class limits 组限class midpoint 组中点class width 组距classification 分组、分类cluster sampling 整群抽样coefficient of correlation 相关系数coefficient of regression 回归系数coefficient of variability(or coefficieut of variation) 变异系数coefficient of variation 变异系数collection of data 收集资料column 列（栏）combinative table 组合表combined standard deviation 合并标准差combined variance(or poolled variance) 合并方差complete survey 全面调查completely correlation 完全相关completely random design 完全随机设计confidence interval 可信区间，置信区间confidence level 可信水平，置信水平confidence limit 可信限，置信限consistent estimator 一致点估计式constituent ratio 构成比，结构相对数contingency table 列联表continuity 连续性continuous 连续的control group 对照组control 对照coordinate 坐标correction for continuity 连续性校正correction for grouping 归组校正correction number 校正数correction value 校正值correlation analysis 相关分析correlation coefficient 相关系数correlation 相关，联系critical value 临界值cumulative frequency (以下) 累加次数cumulative frequency polygon 累加次数多边形图cumulative frequency 累积频Ddata数据default默认的definition定义deleted residual剔除残差density function密度函数dependent variable因变量description描述deviations差异df.(degree of *******) 自由度diagnostic诊断dimension维discrete variable离散变量discriminant function判别函数discriminatory analysis半别分析distance距离distribution分布Eequal相等effects of interaction交互效应efficiency有效性eigenvalue特征值equal size等含量equation方程error误差estimate估计estimation of parameters参数估计estimations估计量evaluate衡量exact value精确值expectation 期望expected value期望值exponential指数的exponential distributon指数分布extreme value极值Ff-text f检验facto*因素，因子**ctor analysis因素分析factor score因子得分factorial designs析因分析factorial experiment析因试验failure rate失效率fit拟合fitted line拟合线fitted value拟合值fixed model固定模型fixed variable固定变量frequency频数function函数Ggamma distribution 伽玛分布geometric mean 几何均值group 组growth curve 生长曲线Hharmomic mean 调和均值hererogeneity不齐性histogram 直方图homogeneity 齐性homogeneity of variance 方差齐性hypothesis 假设hypothesis test 假设检验IIndependence 独立independent-samples 独立样本independent variable 自变量index 指数index of correlation 相关指数interaction 交互作用interclass correlation 组内相关interval estimate 区间估计intraclass correlation 组间inverse 倒数的iterate 迭代KKernel 核kolmogorov-smirnov test 柯尔莫哥洛夫-斯米诺夫检验kurtosis 峰度Llarge sample problem大样本问题layer 层least significant difference 最小显著差数least-square estimation 最小二乘估计least-square method最小二乘法level 水平level of significance显著性水平leverage value 中心化杠杆值life 寿命life test 寿命试验likelihood function似然函数likelihood ratio test 似然比检验linear线性的linear estimator 线性估计linear model 线性模型linear regression 线性回归linear relation线性关系linear term 线性项logarithmic 对数的Mm-estimator m估计main effect 主效应maintainability维护度matrix 矩阵maximum 最大值maximum likelihood estimation 极大似然估计mean fine between failure 平均无故障工作时间mean squared deviation(MSD) 均方差mean sum of square 均方和measure 衡量media 中位数minimum 最小值missing values 缺失值mixed model 混合模型mode 众数model 模型montecarle method 蒙特卡罗法moving average 移动平均值multicollinearity多元共线性multiple comparison 多重比较multiple correlation 多重相关multiple correlation coefficient 多元相关系数multiple regression analysis 多元回归分析multiple regression equatiom多元回归方程multiple response 多响应multivariate analysis 多元分析Nnegative relationship 负相关nonadditively不可加性nonlinear 非线性nonlinear regression 非线性回归nonparametric tests 非参数检验normal distribution 正太分布null hypotOone-sample 单样本one-tailed test 单侧检验one-way anova单方向分析one-way classification 单向分类optimal 优化的optimum allocation 最优配制order 排序order statistics 次序统计量origin 原点orthogonal 正交的outliers 异常值Pp-p plot pp概率图paried-sample成对样本paired observation 成对观测数据parameter 参数partial correlation 偏相关partial correlation coefficient 偏相关系数partial regression coefficient 偏回归系数percent 百分比percentiles 百分位数pie chart 饼图point estimate 点估计poisson distribution 泊松分布polynomial curve 多项式曲线polynomial regression 多项式回归polynomials 多项式positive relationship 正相关power 幂predict 预测predicted value 预测值prediction intervals 预测区间principal component analysis 主成分分析proability概率probit analysis 概率分析proportion 比例QQ-Q polt QQ概率图Quadratic 二次的Quadratic term 二次项Quality control 质量控制Quantitative 数量的，度量的Quartiles 四分位数RRandom 随机的Random number 随机数Random sampling 随机取样Random seed 随机数种子Random variable 随机变量Randomizatiom随机化Range 极差Rank 秩rank correlation 秩相关rank statistic 秩统计量rector 向量regression analysis 回归分析regression coefficient 回归系数regression line 回归线reject 拒绝rejection region 拒绝域relationship 关系reliability 可靠性reliability analysis 可靠性分析reliability test 可靠性试验repeated 重复的report 报告，报表residual 残差residual sum of squares剩余平方和response 响应risk function 风险函数robustness 稳健性root mean square 标准差row 行run 游程run teststatistical quality control 统计质量管理std.residual标准残差stem-and-leaf plot 茎叶图stepwise regression analysis 逐步回归stimulus 刺激strong assumption 强假设stud.deleted residual 学生化剔除残差stud.residual学生化残差subsamples次级样本sufficient statistic充分统计量sum 和sum of square 平方和summary 概括Tt-distibution t分布t-test t检验table 表test criterion 检验判据test for linearity线性检验test of goodness of fit拟合优度检验test pf independence 齐性检验test rules 独立性检验test statistics 检验法则testing function 检验统计量test 检验time series 时间序列tolerance limits 容许限total 总共transformation 转换treatment 处理trimmed mean 截尾均值true value 真值two-tailed test 双尾检验Uunbalanced 不平衡的unbiased estimation 无偏估计unbiasedness 不偏性unequal size 不等含量uniform distribution 均匀分布Vvalue of estimator 估计值variable 变量variance 方差variance components 方差分量variance ratio 方差比various 不同的Wweight 加权weighted average 加权平均值wishart distribution 维夏分布within groups 组内的z-score z分数。

单词统计之单词频率统计

单词统计之单词频率统计第1步：输出单个⽂件中的前 N 个最常出现的英语单词。

功能1：输出⽂件中所有不重复的单词，按照出现次数由多到少排列，出现次数同样多的，以字典序排列。

功能2：指定⽂件⽬录，对⽬录下每⼀个⽂件执⾏统计的操作。

功能3：指定⽂件⽬录，是会递归遍历⽬录下的所有⼦⽬录的⽂件进⾏统计单词的功能。

功能4：输出出现次数最多的前 n 个单词，例如，提⽰统计统计前多少名：输⼊10。

就是输出最常出现单词的前 10 名。

当没有指明数量的时候，我们默认列出所有单词的频率。

第2步：第⼆步: ⽀持 stop words在⼀本⼩说⾥，频率出现最⾼的单词⼀般都是 "a", "it", "the", "and", "this", 这些词，可以做⼀个 stop word ⽂件（停词表），在统计词汇的时候，跳过这些词。

我们把这个⽂件叫 "stopwords.txt" file.第三步: 想看看常⽤的短语是什么，怎么办呢？先定义短语："两个或多个英语单词，它们之间只有空格分隔". 请看下⾯的例⼦： hello world //这是⼀个短语 hello, world //这不是⼀个短语同⼀频率的词组，按照字典序来排列。

第四步：把动词形态都统⼀之后再计数。

想找到常⽤的单词和短语，但是发现英语动词经常有时态和语态的变化，导致同⼀个词，同⼀个短语却被认为是不同的。

怎么解决这个问题呢？假设我们有这样⼀个⽂本⽂件，这个⽂件的每⼀⾏都是这样构成：动词原型动词变形1 动词变形2... ，词之间⽤空格分开。

e.g. 动词 TAKE 有下⾯的各种变形：take takes took taken taking我们希望在实现上⾯的各种功能的时候，有⼀个选项，就是把动词的各种变形都归为它的原型来统计。

功能⽀持动词形态的归⼀化实验代码：1package sy0509_ZiMu;23import java.io.BufferedReader;4import java.io.File;5import java.io.FileInputStream;6import java.io.IOException;7import java.io.InputStreamReader;8import java.text.DecimalFormat;9import java.util.ArrayList;10import java.util.HashMap;11import java.util.Iterator;12import java.util.List;13import java.util.StringTokenizer;141516public class sy0509 {17public static void main(String[] args)throws IOException18 {19 List<Integer> list=new ArrayList<>();20 DecimalFormat df=new DecimalFormat("######0.00"); //格式化21 File f = new File("D:\\飘英⽂版.txt");22 FileInputStream fip = new FileInputStream("D:\\飘英⽂版.txt");23 InputStreamReader reader = new InputStreamReader(fip, "gbk");25while (reader.ready()) {26 sb.append((char) reader.read());27 }28 System.out.println(sb.toString());29 reader.close();30 fip.close();3132int i;33 String A=sb.toString();34 String M="abcdefghijklmnopqrstuvwxyz";35 String temp = "";36char NUM[]=new char[A.length()];37char Z[]=new char[26];38int X[]=new int[26];39int MAX=0;40 Z=M.toCharArray();41for(int k=0;k<26;k++)42 {43 X[k]=0;44for(i=0;i<A.length();i++)45 {46 NUM[i]=A.charAt(i);47if(Z[k]==NUM[i]||Z[k]==ch(NUM[i]))48 {49 X[k]++;50 }51 }52 }53 System.out.println("这篇⽂章中英⽂字母个数分别为:");54double sum=0;55 System.out.println("////////////排序如下:");56for(i=0;i<25;i++)57for(int k=0;k<25-i;k++)58 {59if(X[k]<X[k+1])60 {61int temp2=X[k];62 X[k]=X[k+1];63 X[k+1]=temp2;64char temp3=Z[k];65 Z[k]=Z[k+1];66 Z[k+1]=temp3;67 }68 }69for(i=0;i<26;i++)70 {71 System.out.println(Z[i]+"字母个数为:"+X[i]);72 sum=sum+X[i];73 }74for(i=0;i<26;i++)75 {76double jkl=(X[i])/sum*100;77 System.out.println(Z[i]+"字母频率为:"+df.format(jkl)+"%");78 }79 StringTokenizer st = new StringTokenizer(sb.toString(),",.! \n");80 String []a1=StatList(sb.toString());81int[]b1=StatList1(sb.toString());82 System.out.println("//////////////////////////////");83for(i=0;i<a1.length-1;i++)84for(int j=0;j<a1.length-1-i;j++)85 {86if(b1[j]<b1[j+1])87 {88int temp6=b1[j];89 b1[j]=b1[j+1];90 b1[j+1]=temp6;91 String temp7=a1[j];92 a1[j]=a1[j+1];93 a1[j+1]=temp7;94 }95 }96for(i=0;i<a1.length-1;i++)97 {98 System.out.println("单词:"+a1[i]+" 且出现的次数:"+b1[i]);99 }100 }101static char ch(char c)102 {103if(!(c>=97&&c<=122))104 c+=32;105return c;106 }107static String[] StatList(String str) {109 HashMap<String ,Integer> has = new HashMap<String ,Integer> (); // 打开⼀个哈希表110 String[] slist = str.split("\\W+");111int sum=0;112int sum1=0;113for (int i = 0; i < slist.length; i++) {114if (!has.containsKey(slist[i])) { // 若尚⽆此单词115 has.put(slist[i], 1);116 sum++;117 sum1++;118 } else {//如果有，就在将次数加1119 Integer nCounts = has.get(slist[i]);120121 has.put(slist[i],nCounts+1 );122 }123 }124int temp=0;125int temp1=0;126 String []a=new String[sum];127int []b=new int[sum1];128 Iterator iterator = has.keySet().iterator();129while(iterator.hasNext()){130 String word = (String) iterator.next();131 a[temp]=word;132 temp++;133 }134return a;135 }136static int[] StatList1(String str) {137 StringBuffer sb = new StringBuffer();138 HashMap<String ,Integer> has = new HashMap<String ,Integer> (); // 打开⼀个哈希表139 String[] slist = str.split("\\W+");140int sum=0;141int sum1=0;142for (int i = 0; i < slist.length; i++) {143if (!has.containsKey(slist[i])) { // 若尚⽆此单词144 has.put(slist[i], 1);145 sum++;146 sum1++;147 } else {//如果有，就在将次数加1148 Integer nCounts = has.get(slist[i]);149150 has.put(slist[i],nCounts+1 );151152 }153 }154int temp=0;155int temp1=0;156 String []a=new String[sum];157int []b=new int[sum1];158 Iterator iterator = has.keySet().iterator();159while(iterator.hasNext()){160 String word = (String) iterator.next();161 b[temp1]=has.get(word);162 temp1++;163 }164return b;165 }166 }//jcc167package sy0509_ZiMu;168169import java.io.BufferedReader;170import java.io.File;171import java.io.FileInputStream;172import java.io.IOException;173import java.io.InputStreamReader;174import java.text.DecimalFormat;175import java.util.ArrayList;176import java.util.HashMap;177import java.util.Iterator;178import java.util.List;179import java.util.StringTokenizer;180181182public class sy0509 {183public static void main(String[] args)throws IOException184 {185 List<Integer> list=new ArrayList<>();186 DecimalFormat df=new DecimalFormat("######0.00"); //格式化187 File f = new File("D:\\飘英⽂版.txt");188 FileInputStream fip = new FileInputStream("D:\\飘英⽂版.txt");189 InputStreamReader reader = new InputStreamReader(fip, "gbk");190 StringBuffer sb = new StringBuffer();191while (reader.ready()) {192 sb.append((char) reader.read());193 }194 System.out.println(sb.toString());195 reader.close();196 fip.close();197198int i;199 String A=sb.toString();200 String M="abcdefghijklmnopqrstuvwxyz";201 String temp = "";202char NUM[]=new char[A.length()];203char Z[]=new char[26];204int X[]=new int[26];205int MAX=0;206 Z=M.toCharArray();207for(int k=0;k<26;k++)208 {209 X[k]=0;210for(i=0;i<A.length();i++)211 {212 NUM[i]=A.charAt(i);213if(Z[k]==NUM[i]||Z[k]==ch(NUM[i]))214 {215 X[k]++;216 }217 }218 }219 System.out.println("这篇⽂章中英⽂字母个数分别为:");220double sum=0;221 System.out.println("////////////排序如下:");222for(i=0;i<25;i++)223for(int k=0;k<25-i;k++)224 {225if(X[k]<X[k+1])226 {227int temp2=X[k];228 X[k]=X[k+1];229 X[k+1]=temp2;230char temp3=Z[k];231 Z[k]=Z[k+1];232 Z[k+1]=temp3;233 }234 }235for(i=0;i<26;i++)236 {237 System.out.println(Z[i]+"字母个数为:"+X[i]);238 sum=sum+X[i];239 }240for(i=0;i<26;i++)241 {242double jkl=(X[i])/sum*100;243 System.out.println(Z[i]+"字母频率为:"+df.format(jkl)+"%");244 }245 StringTokenizer st = new StringTokenizer(sb.toString(),",.! \n");246 String []a1=StatList(sb.toString());247int[]b1=StatList1(sb.toString());248 System.out.println("//////////////////////////////");249for(i=0;i<a1.length-1;i++)250for(int j=0;j<a1.length-1-i;j++)251 {252if(b1[j]<b1[j+1])253 {254int temp6=b1[j];255 b1[j]=b1[j+1];256 b1[j+1]=temp6;257 String temp7=a1[j];258 a1[j]=a1[j+1];259 a1[j+1]=temp7;260 }261 }262for(i=0;i<a1.length-1;i++)263 {264 System.out.println("单词:"+a1[i]+" 且出现的次数:"+b1[i]);265 }266 }267static char ch(char c)268 {269if(!(c>=97&&c<=122))270 c+=32;271return c;272 }273static String[] StatList(String str) {274 StringBuffer sb = new StringBuffer();275 HashMap<String ,Integer> has = new HashMap<String ,Integer> (); // 打开⼀个哈希表276 String[] slist = str.split("\\W+");277int sum=0;278int sum1=0;279for (int i = 0; i < slist.length; i++) {280if (!has.containsKey(slist[i])) { // 若尚⽆此单词281 has.put(slist[i], 1);282 sum++;283 sum1++;284 } else {//如果有，就在将次数加1285 Integer nCounts = has.get(slist[i]);286287 has.put(slist[i],nCounts+1 );288 }289 }290int temp=0;291int temp1=0;292 String []a=new String[sum];293int []b=new int[sum1];294 Iterator iterator = has.keySet().iterator();295while(iterator.hasNext()){296 String word = (String) iterator.next();297 a[temp]=word;298 temp++;299 }300return a;301 }302static int[] StatList1(String str) {303 StringBuffer sb = new StringBuffer();304 HashMap<String ,Integer> has = new HashMap<String ,Integer> (); // 打开⼀个哈希表305 String[] slist = str.split("\\W+");306int sum=0;307int sum1=0;308for (int i = 0; i < slist.length; i++) {309if (!has.containsKey(slist[i])) { // 若尚⽆此单词310 has.put(slist[i], 1);311 sum++;312 sum1++;313 } else {//如果有，就在将次数加1314 Integer nCounts = has.get(slist[i]);315316 has.put(slist[i],nCounts+1 );317318 }319 }320int temp=0;321int temp1=0;322 String []a=new String[sum];323int []b=new int[sum1];324 Iterator iterator = has.keySet().iterator();325while(iterator.hasNext()){326 String word = (String) iterator.next();327 b[temp1]=has.get(word);328 temp1++;329 }330return b;331 }332 }实验结果：。

在统计方面的英语单词

在统计方面的英语单词English:Statistics involve the collection, analysis, interpretation, and presentation of data. It includes various methods for organizing and summarizing information, as well as techniques for making inferences and predictions based on data. Statistical concepts such as probability, correlation, and regression are used to understand the relationships between variables and to make informed decisions. In addition, statistical tools such as hypothesis testing and confidence intervals are essential for validating findings and drawing conclusions from data. Overall, statistics plays a crucial role in various fields such as science, economics, sociology, and business, as it provides valuable insights and evidence for making informed decisions and solving problems.中文翻译:统计学涉及数据的收集、分析、解释和展示。

它包括各种方法来组织和总结信息，以及基于数据进行推断和预测的技术。

每个单词的统计、变形情况

词典中找到 [laughter], 总频率: 3
词典中找到 [is], 总频率: 1
词典中找到 [difficult], 总频率: 1
词典中找到 [to], 总频率: 3
词典中找到 [a], 总频率: 3
词典中找到 [good], 总频率: 1
词典中找到 [laugh], 总频率: 2
[filness] --> [filch] 相似值: 166
[filness] --> [file] 相似值: 1110
[filness] --> [filename] 相似值: 226
[filness] --> [filial] 相似值: 142
[filness] --> [filibuster] 相似值: 111
[claims] --> [claimant] 相似值: 250
--claims 最终选择：claim
词典中找到 [to], 总频率: 2
词典中找到 [the], 总频率: 3
词典中找到 [contrary], 总频率: 1
词典中不含单词 : [laughing]
词典中找到 [laugh], 总频率: 1

词典中不含单词 : [Section]
词典中找到 [section], 总频率: 1
--小写为: [section] 找到
词典中找到 [I], 总频率: 1
词典中不含单词 : [Use]
词典中找到 [use], 总频率: 1
--小写为: [use] 找到
词典中找到 [of], 总频率: 1
词典中不含单词 : [But]
词典中找到 [but], 总频率: 1

英语单词的实际使用频率进行统计

英语单词的实际使用频率进行统计
统计英语单词的实际使用频率是一项庞大而复杂的任务，因为它涉及
到对全球范围内大量文字的分析。

目前有一些已经过统计的词频表可以作
为参考，例如COCA（Corpus of Contemporary American English）和
BNC（British National Corpus），它们都是通过收集和分析大量不同类
型的文本获取的。

这些词频表通常记录了各个单词在不同文本类型中的出现频率，如新闻、小说、科技、学术等。

一般来说，高频词（如“the”、“and”、“is”等）在各个类型的文本中都会频繁出现，而低频词（如较专业的词汇）则可能只在特定类型的文本中出现。

需要注意的是，这些词频表只是针对特定语言区域和特定类型的文本，无法全面覆盖英语在全球范围内的使用情况。

此外，词频会随着时间和语
境的变化而改变，因此需要定期进行更新和调整。

总之，统计英语单词的实际使用频率是一项复杂的任务，需要通过大
规模的文本分析来获取准确可靠的数据。

单词统计程序 C++

单词统计问题描述文字研究人员需要统计某篇英文小说中某些特定单词的出现次数和位置，试写出一个实现这一目标的文字统计系统。

这称为“文学研究助手”。

要求算法输入：文本文件和词集。

算法输出：单词出现的次数，出现位置所在行的行号（同一行出现两次的只输出一个行号）。

算法要点：（1）文本串非空且以文件形式存放。

（2）单词定义：用字母组成的字符序列，中间不含空格，不区分大小写。

（3）待统计的单词不跨行出现，它或者从行首开始，或者前置一个空格。

（4）数据结构采用二维链表，单词结点链接成一个链表，每个单词的行号组成一个链表，单词结点作为行号链表的头结点。

需求分析用户需求：用户可以通过该程序查询和统计一篇英文文章中某些特定单词出现次数和位置。

功能需求：用户可以输入单词来查询单词出现次数和位置；程序可以正确显示查询结果；用户可以选择是否在一次输出后继续查询；在一次查询中的结果记录到一个二维链表中。

概要设计为达到设计要求，本程序采用二维链表存储单词结点和相关的位置信息。

抽象数据类型：struct node{int col; //行坐标int row; //所在行的列坐标node* next; //指向下一个坐标结点的指针}; //单词坐标坐点类型struct Node{char words[20]; //单词数组node* ptr; //指向单词坐标结点的指针Node* next; //指向下一个单词结点的指针int num; //单词所含字符个数}; //单词结点class TLink{public:TLink() { head = NULL; }//构造函数~TLink()//析构函数{while( head != NULL ){Node* temp;temp = head;head = head -> next;delete temp;}}void Insert( char* Item );//前条件：参数Item[]为一个字符数组。

单词统计——精选推荐

单词统计这次的实验测试分为很多个⼩部分，由于个⼈能⼒有限，我只完成了前三个部分。

其中第⼀个是统计⽂本中26个英⽂字幕出现的次数与⽐例，并降序排序：是⾃⾏确定⽋多少个最多出现的单词package piao;import java.io.BufferedReader;import java.io.FileReader;import java.text.NumberFormat;public class text0{public static void main(String[] args) throws Exception {BufferedReader br = new BufferedReader(new FileReader("D:\\java/eclipse/测试/piao.txt"));int[] count = new int[26];char[] c = new char[1];int len = br.read(c);while(len!=-1) {if(c[0]>='A'&&c[0]<='Z') {int number = c[0];count[number-65]++;}if(c[0]>='a'&&c[0]<='z') {int number = c[0];count[number-97]++;}len = br.read(c);}count=Paixu(count);Print(count);br.close();}public static int[] Paixu(int[] count) {int temp;int size=count.length;for(int i=0;i<size-1;i++) {for(int j=i+1;j<size;j++) {if(count[i]<count[j]){temp=count[j];count[j]=count[i];count[i]=temp;}}}return count;}public static void Print(int[] count) {NumberFormat numberFormat = NumberFormat.getInstance();// 设置精确到⼩数点后2位numberFormat.setMaximumFractionDigits(2);int sum=0;for(int i=0;i<count.length;i++) {sum=count[i]+sum;}String[] a=new String[count.length];for(int i=0;i<count.length;i++) {a[i] = numberFormat.format((float) count[i] / (float) sum * 100);}for(int i=0;i<26;i++) {if(count[i]>0) {char lowerCase = (char)(i+97);System.out.println(lowerCase+"("+count[i]+")"+"("+a[i]+"%)");}}}}第⼆部分是统计所有单词出现的次数并降序排序：package piao;import java.io.BufferedReader;import java.io.FileReader;import java.util.ArrayList;import java.util.Collections;import parator;import java.util.List;import java.util.Map;import java.util.TreeMap;import java.util.regex.Matcher;import java.util.regex.Pattern;public class text1 {public static void main(String[] args) throws Exception {BufferedReader re = new BufferedReader(new FileReader("D:\\java/eclipse/测试/piao.txt"));StringBuffer buffer = new StringBuffer();String line = null;while ((line = re.readLine()) != null) {buffer.append(line);}re.close();Pattern expression = pile("[a-zA-Z]+");// 定义正则表达式匹配单词String string = buffer.toString();Matcher matcher = expression.matcher(string);Map<String, Integer> map = new TreeMap<String, Integer>();String word = "";int times = 0;while (matcher.find()) {// 是否匹配单词word = matcher.group();// 得到⼀个单词-树映射的键if (map.containsKey(word)) {// 如果包含该键，单词出现过times = map.get(word);// 得到单词出现的次数map.put(word, times + 1);}else {map.put(word, 1);// 否则单词第⼀次出现，添加到映射中}}List<Map.Entry<String, Integer>>list = new ArrayList<Map.Entry<String,Integer>>(map.entrySet()); Collections.sort(list, new Comparator<Map.Entry<String, Integer>>(){// 排序,打印public int compare(Map.Entry<String, Integer> left,Map.Entry<String, Integer> right) { return (left.getValue()).compareTo(right.getValue());}});int last = list.size() - 1;for (int i = last; i > 0; i--) {String key = list.get(i).getKey();Integer value = list.get(i).getValue();System.out.println(key + " :" + value);}}}第三部分是⾃⾏确定⽋多少个最多出现的单词：package piao;import java.io.BufferedReader;import java.io.FileReader;import java.util.ArrayList;import java.util.Collections;import parator;import java.util.List;import java.util.Map;import java.util.Scanner;import java.util.TreeMap;import java.util.regex.Matcher;import java.util.regex.Pattern;public class text2 {public static void main(String[] args) throws Exception {BufferedReader re = new BufferedReader(new FileReader("D:\\java/eclipse/测试/piao.txt"));StringBuffer buffer = new StringBuffer();String line = null;while ((line = re.readLine()) != null) {buffer.append(line);}re.close();Pattern expression = pile("[a-zA-Z]+");// 定义正则表达式匹配单词String string = buffer.toString();Matcher matcher = expression.matcher(string);Map<String, Integer> map = new TreeMap<String, Integer>();String word = "";int times = 0;while (matcher.find()) {// 是否匹配单词word = matcher.group();// 得到⼀个单词-树映射的键if (map.containsKey(word)) {// 如果包含该键，单词出现过times = map.get(word);// 得到单词出现的次数map.put(word, times + 1);}else {map.put(word, 1);// 否则单词第⼀次出现，添加到映射中}}List<Map.Entry<String, Integer>>list = new ArrayList<Map.Entry<String,Integer>>(map.entrySet()); Collections.sort(list, new Comparator<Map.Entry<String, Integer>>(){// 排序,打印public int compare(Map.Entry<String, Integer> left,Map.Entry<String, Integer> right) { return (left.getValue()).compareTo(right.getValue());}});@SuppressWarnings("resource")Scanner in=new Scanner(System.in);System.out.println("输⼊前n个最常出现的单词:");int n=in.nextInt();int last = list.size() - 1;for (int i = last; i > last - n; i--) {String key = list.get(i).getKey();Integer value = list.get(i).getValue();System.out.println(key + " :" + value);}}}。

统计的英语单词动词

统计的英语单词动词In the realm of statistics, the verb "to calculate" often takes center stage, as it is the fundamental action in determining averages and sums."To analyze" is another key verb in statistical work, encompassing the process of breaking down data to understand trends and patterns."To correlate" is used when we find relationships between different sets of data, a crucial step in statistical studies."To infer" is the act of drawing conclusions from the data, often based on probabilities and estimates."To predict" involves using statistical methods toforecast future outcomes based on existing data."To sample" is the process of selecting a subset of a larger population for study, a common technique in statistics."To aggregate" refers to combining data points into a single figure or category for easier analysis."To normalize" is the process of adjusting data to a common scale, making it easier to compare across different sets.Lastly, "to extrapolate" is the act of extending the range of a function or statistical series beyond the original range of data.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

江西理工大学实验报告系机电工程班级11机械电子（2）姓名杨锦其学号11212203课程数据结构教师刘廷苍实验题目：统计文本中单词的个数一．实验目的一个文本可以看成是一个字符序列，在这个序列中，有效字符被空格分隔为一个个单词。

统计算法统计文本中单词的个数。

二．实验内容（1）被处理文本的内容可以由键盘读入；（2）可以读取任意文本内容，包括英文、汉字等；（3）设计算法统计文本中的单词个数；（4）分析算法的时间性能。

三．设计与编码1.理论知识设置一个计数器count统计文本中的单词个数。

在逐个读入和检查字符时，需要区分当前字符是否是空格。

不是空格的字符一定是某个单词的一部分，空格的作用就是分隔单词。

但即使当前字符不是空格，他是不是新词的开始还依赖于前一个字符是否是空格，只有当前字符是单词的首字符时，才可以给计数器加1。

如果遇到非空格字符，则是新词。

读入过程在某单词内部，则不会是新词。

2.分析输入要查找的单词之后，单词插入链表，停止输入后，程序开始在文本字符中查找链表中的单词。

程序从文本数组顺次扫描，并在扫描到空格时记录一个单词的扫描结束，并记录单词所含字母个数，然后查找链表，如有和该单词字母个数相同的记录则进行比较，否则继续查找下一个直到链表尾。

此后程序继续扫描文本字符数组的下一个单词并和链表中单词进行相同的比较过程，直到字符数组扫描完毕。

3.详细代码#include<iostream>#include<fstream>#include<cstdlib>#include<stdio.h>using namespace std;#ifndef SOURCE_H#define SOURCE_Hstruct node{int col;int row;node* next;};struct Node{char words[20];node* ptr;Node* next;int num;};class TLink{public:TLink() { head = NULL; }~TLink(){while( head != NULL ){Node* temp;temp = head;head = head -> next;delete temp;}}void Insert( char* Item );void calcute(char *szFile,int size);Node* gethead();private:Node* head;};char A_to_a( char alp );void showwindow();void show_text();void input();int i=0;char szFile[2000];TLink link;void TLink::Insert(char *Item){TLink link;int flag = 0;Node* temp;temp = new Node;int i = 0;while( Item[i] != '\0' ){temp -> words[i] = Item[i];++ i;}temp -> num = i;temp -> words[i] = '\0';Node* ptrr = NULL;ptrr = link.gethead();while( ptrr != NULL ){if( ptrr -> num == temp -> num ){int n;for( n = 0; n < i; ++ n )if( A_to_a( ptrr -> words[n] ) != A_to_a( Item[n] ) )break;if( n == i ){flag = 1;break;}}ptrr = ptrr -> next;}if( flag != 1 ){temp -> ptr = NULL;temp -> next = NULL;Node* Temp = head;if( head == NULL ){head = temp;}else{while( Temp -> next != NULL )Temp = Temp -> next;Temp -> next = temp;}}elsedelete temp;}/*****************************************************************/ char A_to_a( char alp )//大小写转换{if( ( alp >= 'A' ) && ( alp <= 'Z' ) )alp = alp + 32;return alp;}/*****************************************************************/ void TLink::calcute(char *szFile, int size){//cout << "calcute is called!" << endl;int i = 0; //记录已搜索过的字符数-1int col = 1;//列标int row = 0;//行标int count;//记录空格数-1Node* ptrr = NULL;while( i < size ){ptrr = link.gethead();int j = 0;//对每个单词从开始计数while( ( szFile[i] >= 'a' && szFile[i] <= 'z' ) || ( szFile[i] >= 'A' && szFile[i] <= 'Z' ) ){++ i;++ j;}while( ptrr != NULL ){if( ptrr -> num == j ){int n;for( n = 0; n <= j; ++ n )if( A_to_a( ptrr -> words[n] ) != A_to_a( szFile[i - j + n] ) )break;if( n == j ){node* temp;temp = new node;temp -> col = col;temp -> row = row;temp -> next = NULL;node* Temp = ptrr -> ptr;if( ptrr -> ptr == NULL ){ptrr -> ptr = temp;}else{while( Temp -> next != NULL )Temp = Temp -> next;Temp -> next = temp;}}//插入行数}ptrr = ptrr -> next;}if( szFile[i] == ' ' || szFile[i] == '\n' ){count = -1;while( szFile[i] == ' ' ){++ i; //设置列数++ row;//行的单词个数加++ count;//单词之间空格-1}row = row - count;if( szFile[i] == '\n' ){++ col; //列遇到换行累加++ i;row = 0;//单词的行个数清零}}else++ i;}cout << endl;}/****************************************************************/ Node* TLink::gethead(){return head;}/********************************************************/void showwindow(){Node* curptr = link.gethead();while( curptr != NULL ){int word_num = 0;for( int k = 0; curptr -> words[k] != '\0'; ++ k )cout << curptr -> words[k];cout << endl;if( curptr -> ptr == NULL )cout << "\n没有该词，或输入不正确！" << endl;elsecout<<"位置（行，列）：";while( curptr -> ptr != NULL ){cout << "(";cout << curptr -> ptr -> col-1 ;cout << ",";cout << curptr -> ptr -> row ;cout << ")";cout << ' ';curptr -> ptr = curptr -> ptr -> next;word_num ++;}cout << endl;cout << "\n该单词共出现" << word_num << "次！" << endl;curptr = curptr -> next;}}/*************************************************************/ void show_text(){ifstream fin;fin.open("1.txt");if (fin.fail()){cout<<"Input file opening failed.\n";exit(1);}char next;int x=0,yy=0;fin.get(next);while (! fin.eof()){if(!(next>=65&&next<=90||next<=122&&next>=97)) yy=yy+1;else yy=0;if(yy==1)++x;szFile[i] = next;++ i;fin.get(next);}szFile[i] = '\0';for( int k = 0; k < i; ++ k )cout << szFile[k];cout << "\n\n*****单词总数:" << x<< endl;}/******************************************************************** **/void input(){char Item[40]; //暂存数组char in; //接受输入字符char ans; //判断是否重新开始do{if( link.gethead() != NULL )link.~TLink();cout << "\n请输入要统计的单词(输入#键结束)：" << endl;cin >> in;int flag = 1;while( true ){if( in == '#' )break;int m = 0;while(in>=65&&in<=90||in<=122&&in>=97){Item[m] = in;++ m;cin >> in;if( in == '#' ){flag = 0;break;}}Item[m] = '\0';link.Insert( Item );if( flag == 0 )break;cin >> in;}if( link.gethead() == NULL )cout << "没有任何单词！" << endl;else{link.calcute( szFile, i );showwindow();}cout << "是否继续？(Y/y or N/n):";cin >> ans;}while( ( ans != 'n' ) && ( ans != 'N' ) );}int putall(char *aa){char Item[40]; //暂存数组char in; //接受输入字符int sd=0;do{if( link.gethead() != NULL )link.~TLink();in=aa[sd];if(aa[sd]== '#') break;sd++;int flag = 1;while( true ){if( in == ' ' )break;int m = 0;while(in>=65&&in<=90||in<=122&&in>=97){Item[m] = in;++ m;in=aa[sd];if(aa[sd]== '#') break;sd++;if( in == ' ' ){flag = 0;break;}}Item[m] = '\0';link.Insert( Item );if( flag == 0 )break;in=aa[sd];if(aa[sd]== '#') break;sd++;}if( link.gethead() == NULL )cout << "没有任何单词！" << endl;else{link.calcute( szFile, i );showwindow();}}while(aa[sd]!= '#');return 0;}int fo(char *aa){FILE *fp1;char ch;if((fp1=fopen("1.txt","w"))==NULL){printf("Cannot open file strike any key exit!"); getchar();return 0;}printf("请输入段落（要结束请按#）：\n");ch=getchar(); int i=0;while ((aa[i]=ch)!='#'){fputc(ch,fp1);i++;ch=getchar();}fputc(ch,fp1);cout<<endl<<"输入内容为"<<endl;fclose(fp1);return 0;}int main(){ char aa[10000]={'a'},yn;fo(aa);show_text();input();cout<<"是否要显示所有单词数量及位置（是：y/Y ,否：n/N）:";cin>>yn;if(( yn!= 'n' ) && ( yn != 'N' ))putall(aa);return 0;}#endif四．运行与调试1.输入数据：I want to go to school,I want to have a good future!2.输出数据：******单词总数：13附图：五．总结与心得这次实验让我明白要能够结合自己所学的理论知识独立完成问题分析，整合编写程序，提高自己综合运用所学知识的方法和解决问题的能力。