基于BabelNET-文档空间的Twitter情感分类集成模型(IJITCS-V10-N1-3)

基于BabelNET-文档空间的Twitter情感分类集成模型(IJITCS-V10-N1-3)
基于BabelNET-文档空间的Twitter情感分类集成模型(IJITCS-V10-N1-3)

I.J. Information Technology and Computer Science, 2018, 1, 24-31

Published Online January 2018 in MECS (https://www.360docs.net/doc/e68218018.html,/)

DOI: 10.5815/ijitcs.2018.01.03

An Ensemble Model using a BabelNet Enriched Document Space for Twitter Sentiment

Classification

Semih Sevim

Kocaeli University Computer Engineering Departmen, Kocaeli, TURKEY

E-mail: semih.sevimm@https://www.360docs.net/doc/e68218018.html,

Sevin? ?lhan Omurca and Ekin Ekinci

Kocaeli University Computer Engineering Departmen, Kocaeli, TURKEY

E-mail: silhan@https://www.360docs.net/doc/e68218018.html,.tr, ekin.ekinci@https://www.360docs.net/doc/e68218018.html,.tr

Received: 22 September 2017; Accepted: 07 November 2017; Published: 08 January 2018

Abstract—With the widespread usage of social media in our daily lives, user reviews emerged as an impactful factor for numerous fields including understanding consumer attitudes, determining political tendency, revealing strengths or weaknesses of many different organizations. Today, people are chatting with their friends, carrying out social relations, shopping and following many current events through the social media. However social media limits the size of user messages. The users generally express their opinions by using emoticons, abbreviations, slangs, and symbols instead of words. This situation makes the sentiment classification of social media texts more complex. In this paper a sentiment classification model for Twitter messages is proposed to overcome this difficulty. In the proposed model first the short messages are expanded with BabelNet which is a concept network. Then the expanded and the original form of the messages are included in an ensemble learning model. Consequently we compared our ensemble model with traditional classification algorithms and observed that the F-measure value is increased. Index Terms—Twitter sentiment classification, ensemble learning, Semantic enrichment, BabelNet.

I.I NTRODUCTION

Today, people mostly prefer to communicate and socialize on the Web. As a result of this, social media websites such as Facebook, Twitter, LinkedIn, Youtube, Instagram and so on, have exploded in popularity with tremendous impact on society in several ways. For example, in many elections around the world social media has played an important role. Many of the companies measure their customer satisfaction via social media postings; therefore, adjust the advertisement of their products to develop a better sales strategy. Apart from these, social media websites can be used to improve the people’s career and business prospects. In summary, social media is an important and major source of information to enhance the strategies range from the business and health to politics and the education system in the competitive world. Thus, to obtain the view of public from these mediums there is a need for an automated analysis system.

The sentiment classification task is extremely useful in social media analysis as it allows us to gain the common opinion behind social media messages and is broadly studied in the literature [1]. Indeed, sentiment classification is the task of learning positive or negative opinion have been expressed on certain product, organization, event or individual from opinionated documents and required in-depth analysis.

In recent years, sentiment classification from social networks has become necessary due to containing huge amount of opinionated documents. In social networks, users usually write short-texts which are also rife with emoticons, abbreviations, slangs and symbols to express their opinions. The main characteristic of a short text is the text length which is no longer than 200 characters. The sparsity and un-standardability are the two main drawbacks of short text messages. If a text sample is short, it means that the features of it are very sparse and also these limited features frequently contain non-standard terms, noise terms and abbreviations [2]. With such a text compilation, sentiment classification is becoming a more complex problem because short text examples naturally do not provide sufficient statistics or associations for classifiers. In the other hand, text classification methods generally use word statistics and associations to provide a judgment about which terms are similar or fall into which classes. In this case, short-text classification problem becomes a crucial research area in social media analysis [3].

To address the above mentioned issues, several approaches for short text classification are presented in the literature. Faguo et al. [2] proposed short text classification based on statistics and rules. Zhou et al. [4] devised a hybrid model of character-level and word-level

features based on recurrent neural network with long short-term memory in order to reduce the impact of word segmentation and improve the overall performance of Chinese short text classification. Wang et al. [5] used better feature space selection to overcome sparseness problem of the short text classification. Sriram et al. [6] expanded the features of twitter texts by extracting additional author information to classify tweets with a better accuracy. For short texts in Chinese, Wensen et al.

[7] used Wikipedia concepts and Word2vec to expand the features. Chen et al. [8] used LDA and K-Nearest Neighbor algorithm to improve short text classification. Sang et al. [9] expanded short texts using word embedding for classification.

Original social media messages can be preprocessed and expanded by several ways which are presented and applied as above mentioned. After these initial processes the social media messages become inputs to machine learning algorithms. Among the machine learning methods, ensemble learning allow us not only to combine multiple methods both for the preprocessing and the classification stages but also to achieve a better hypothesis. The underlying intuition of ensemble learning is to build a strong learner from a bunch of weak learner. An ensemble learning system is proposed to perform sentiment classification for twitter messages which are limited by 140 characters in length. With these kinds of short messages, language is evolving faster than ever before. In text mining systems, extracting features from text, figuring out which features are relevant, and selecting an accurate classification algorithm are fundamental questions [10].So the accurate representation of short text has become an important step of any text mining task for social media.

In this paper, the first key point of the proposed ensemble learning model is the representation of short text messages and the second one is to decide a suitable classifier. For the first key point, three subsets of the original datasets are established to construct the extended feature space. The first one is constituted only by the raw data, the second one is constituted by adding the concepts obtained from Babelnet to the raw data. The third one is constituted by the concepts obtained from Babelnet and only the extended terms by Babelnet. For the second key point, we focus on the same base learners to provide diversity which is called homogeneous classifier ensembles. The well established classification algorithms such as Na?ve Bayes (NB), Support vector machines (SVM) and Decision tree (DT) are selected respectively for the weak learners of our three different ensemble learning models. The experimental results which are obtained by three public twitter datasets demonstrate that our proposed method outperforms existing classification methods in literature.

The remainder of the paper is organized as follows. In section 2, related work is summarized. In section 3, applied text preprocessing and expansion methods are defined. In section 4, methodological background of classification algorithms; in section 5, proposed ensemble model are given. In section 6 experimental results are reported and finally in section 7 we summarize the main points of our work and discuss the contributions.

II.R ELATED W ORK

In recent years, analyzing opinion oriented human behavior has attracted the attention of researchers in artificial intelligence and natural language processing. In the literature, there are many studies on the design of methods for sentiment classification of text; however, the numbers of studies which use ensemble of classifiers are limited. Akhilesh at. al. [11] used Naive Bayes method to analyze Assembly Election on twitter users data. Indrajit at. al. [12] classified tweets into three categories, news, sport and movies by using topic modelling which is an unsupervised machine learning method. Deebha at. al.

[13] formed opinion mining model according to polarity of words and examined effect of negation words. Xia et al.

[14] proposed an ensemble technique for sentiment classification by integrating different feature sets and classification algorithms to boost the overall performance. Li et al. [15] used a heterogeneous ensemble learning method for Chinese sentiment classification. Su et al. [16] presented an ensemble model for sentiment classification of reviews. And their experiments showed that stacking is consistently effective compared with majority voting. Filho and Pardo [17] combined rule-based, lexicon-based and machine learning approaches for twitter sentiment classification. Hassan et al. [18] proposed the bootstrapping ensemble framework (BPEF) which provided better performance for classifying Twitter sentiments. Wang et al. [19] compared the performance of bagging, boosting, and random subspace methods based on Naive Bayes, Maximum Entropy, Decision Tree, K Nearest Neighbor, and Support Vector Machine learners for sentiment classification. They showed that random subspace has the better comperative results. Fersini et al. [20] proposed a novel ensemble method based on Bayesian Model Averaging for sentiment analysis of user reviews. They showed their model outperformed the other approaches such as bagging. Chalothorn and Ellman [21] introduced an ensemble model for twitter sentiment analysis and they formed their model by the majority voting of four base learners such as Support Vector Machine, Naive Bayes, SentiStrength and Stacking. Wang et al. [19] and Xia et al.

[14] showed that the ensemble methods provide an effective way to represent short twitter texts in different ways and also improve the performance of individual base learners for sentiment classification. Cotelo et al. [22] combined the structural information and textual concepts of tweets with direct combination, stacked generalization and Multiple Pipeline Stacked Generalization methods. Isidoros Perikos and Ioannis Hatzilygeroudis [23] presented a sentiment analysis system for automatic recognition of emotions in text, using an ensemble model based on a Na?ve Bayes, a Maximum Entropy learner and a knowledge-based tool. Corrêa Jr. et al. [24] proposed a tweet classification system based on ensemble of different text representations to obtain better

classification performance and used Support Vector Machines and Logistic regression as base learners.

III.F EATURE E NGINEERING

A. Text Preprocessing

To perform short text classification, firstly some preprocessing steps should be carried out. The most prominent characteristic of short text classification systems, such as sentiment classification in Twitter, is the shortness of the sample texts. Besides, symbols, emoticons, character repetitions, hashtags, letters in capital and abbreviations have very common usage in these kinds of texts. In this case, a lexical normalization step is needed as one of the preprocessing steps. The lexical normalization process translates nonstandard messages to Standard English and for this aim we have used a text normalization dataset generated by the Computer Science Department, The University of Texas at Dallas in this study[25]. While an original tweet in Hobbit dataset is like “my feet r killin n im dead but the hobbit was soooo good i love !!”; after the lexical normalization this tweet is become like “my foot be kill and in dead but the hobbit be so good love”. In another normalization example a sample tweet in StsGold which is like “@wale I'm gonna havta temp stop fllwing u while ur talkin abt kobe bc I loveeeeeeee him & I'm taking it personal and I like lebron 2.” is converted to “i going to have to temporary stop flow you while you talk about kobe because i love he camp am take it personal and like lebron to”.

In our study, we have removed emoticons from the tweet messages which are regarded as noisy labels as another preprocessing step. We have also removed urls, user mentions, retweet markers, as well as punctuations and digits which are not meaningful for our classification task. Then, all letters in capital were converted to lowercase letters. Consequently an input text for our classification system rifes with hashtags and preprocessed body text only. After these steps, the stemming is applied to body text by using Stanford NLP tools [26].

B. Concepts Generation with BabelNet

For a more accurate classification system the restricted feature space of the sample input messages are extended by using BabelNet repository which is the largest multilingual semantic network and encyclopedic dictionary composed by concepts of WordNet and Wikipedia [27]. Concepts are defined as units of knowledge, each with a unique meaning. In our proposed system, each sentence in a sample message is processed by using BabelFy API [28] and concepts from BabelNet repository are added to the obtained the feature space as new features of the related sentence. However, the stop words are kept in their original form and they are not extended by the related concepts.

For a natural language processing (NLP) system Word Sense Disambiguation (WSD) is a very challenging task in order to develop an accurate system. Expanding the term space with the term related concepts can be a way for handling ambiguity. For example, “I am taking aspirin for my cold” and “weather is cold today” are two sentences which contain “cold” term. In the first sentence the term “cold” refers to a disease, however, in the second it refers to an environmental condition. When we try to get concepts of these two sentences from BabelFy; it gives a description like “A mild viral infection involving the nose and respiratory passages” for the cold term in the first sentence and a descripti on like “Having a low or inadequate temperature or feeling a sensation of coldness or having been made cold” for the cold term in the second sentence. Besides, term related concepts induce the increase in classifier accuracy due to generating joint feature space for terms which have common senses.

C. Feature Weighting

Term frequency-inverse document frequency (tf-idf) weighting scheme, which is very commonly used in text mining applications, is used to exploit the distinctiveness or importance value of a term in a document collection [29]. While tf computes the frequency of each term in a specific document, idf computes the the importance of a term for document collection. Here, a document can be viewed as a matrix, D ij= |d ij|mxn, n denotes the number documents and where m denotes the number of different of terms in the whole documents. d ij represents the weight of term t i in document d j. The formula of tf-idf is shown in the equation (1):

i

ij

ij

ij

ij n

N

tf

idf

tf

d log

?

=

?

= (1)

Where

ij

tf equals the number of time

i

t appears in

document

j

d. N is the number of documents and

i

t

occurs in

i

n documents. We use standard tf-idf to form input for machine learning algorithms, so each tweet collection is converted to a feature matrix and then classification is performed.

IV.C LASSIFICATION A LGORITHMS Classification is a supervised learning task, which is one of the major issues in text mining applications. The task of the classification is to assign new unobserved data to one of predefined classes by learning relationship between inputs and outputs (class labels) through available data [30]. By using available data, which is called as training data, a classification model is comprised and new unobserved data, which is called as test data, is used to validate this model. In literature there are many algorithms to perform classification in text mining area. However, to yield a better classification performance, it’s important to use a proper algorithm. In this paper NB, SVM, and DT classifiers are used due to their ability of classifying text data.

A. Na?ve Bayes

With its simplicity and high classification accuracy NB has gained great importance and is applied to many domains including sentiment classification [31]. The assumption behind NB, which is called class conditional independence, is that all words are independent from each other within each class. The assumption of class conditional independence provides high accuracy and speed when the number of features is large. Na?ve Bayes uses Bayes Theorem to compute the probability of every instance for each predefined class to assign these instances to one out of a set of the predefined classes. Based on class conditional independence, the computation of probability can be presented as the following:

()i C X P | =

∏=n

k i k C X P 1

)|( (2)

where X={X 1,X 2,…,X n } is feature vector composed of words and C1,C 2,…,C m are class labels. B. Sequential Minimal Optimization

Sequential Minimal Optimization (SMO) is one WEKA version of Support Vector Machine (SVM), breaks the training problem into quadratic programming (QP) sub-problems by using heuristics, was suggested by Platt [32]. SMO is an iterative algorithm so chooses the smallest possible sub-problem to solve at every step. The QP could be given as follows [33]:

()∑∑∑==

=-l i l

j j

i

j

i

j i l i i

X X k y y 11

max

ααα (3)

l i c to subject i ,,1,0 =≤≤α (4)

01

=∑=l

i i i y α (5)

In the Equation (3), k(X i X j ) is the kernel function and αi is the Lagrange multiplier; in the Equation (4), c is the regularization parameter. Equations above should satisfy Karush-Kuhn-Tucker (KKT) conditions in the Equation (6) and Q ij =y i y j k(X i X j ) is positive semi-definite.

()1

0≥?=i i i X f y α

()10≥?<

Due to easy implementation, be faster and better scaling properties SMO has gained great attention for sentiment classification problems [34]. C. Decision Tree

Based on divide and conquer approach decision tree is defined as tree structured data structure [35]. There are different algorithms for generating decision tree such as C4.5, ID3, J48, AD Tree, BF Tree, CART and many more. In this study, Quinlan’s C4.5 implementation J48 is preferred which is provided by the open-source software WEKA. Information gain calculated to select the best splitting attribute. J48 decision tree supports a variety of input data types for example nominal, textual and numeric and yields higher performance for sentiment classification tasks [36].

V. E NSEMBLE L EARNING S YSTEM

Machine learning methods have been extensively used in sentiment analysis due to their predictive performance. While the ordinary machine learning approaches that try to learn one hypothesis from the training data, ensemble methods try to construct a set of hypotheses and combine them for prediction. Ensemble learning uses a set of models which are called base learners in order to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone [19]. A labeled dataset for training, the base inducer which forms a classifier, the diversity generator which is responsible for generating diverse classifiers, and the combiner which is responsible for combining the results of the base classifiers are four building blocks that constitute a typical ensemble model [37]. Further to that, to achieve an optimal ensemble model, accuracy and diversity are two vital requirements [38]. The more diverse the base learners are, more accurate ensemble model is obtained due to diversified classifiers lead to uncorrelated classifications and improve the classification accuracy [39,40].

Methods for building ensembles can be divided into two main dimensions: how classifiers are combined and how the learning process is done [39] As for the first dimension, when combining classifiers with weights, a fixed or dynamically determined weight is assigned a learner. The majority voting is the most popular weighting combining method for base learners. In this method, a new instance is labeled due to the most frequent vote obtained from base learners [37]. Further, in weighted combination, the predictions of base classifiers are linearly aggregated. When combining classifiers with meta learning techniques, the predictions of classifiers are used as features of a meta learning model. According to [14] in trained methods, the weighted combination shows a slight superiority when compared to meta-classifiers and also simpler for sentiment classification task. Therefore we used majority voting model instead of a meta learning model our experiments.

The majority voting model combines the predictions from different classifiers using a simple fixed rule which counts the predictions of base classifiers and assigns the input to the most predicted class by classifiers. By this rule, with an ensemble model which has k=1,…,K base classifiers, the class label of the jth data point is

determined as in the equation (7) [43].

∑===

K

k kj j j c I c 1

))max((arg (7)

In the literature ensemble learning methods can be divided into two categories: instance partitioning methods such as Bagging and Boosting and feature partitioning methods such as Random Subspace [42]. There is not a global criteria to select a certain ensemble technique for improving the performance of a sentiment analysis task [43]. Thus, as for the second dimension, the learning

process is realized with Bagging method and the well known classification algorithms are selected as weak learners of the ensemble model in this study.

In the sentiment classification of short texts task, the main issue for improvement of an ensemble model accuracy is how diversified the input text are represented rather than which classifiers are chosen as base learners. Therefore in this work, three different homogeneous classifier ensembles are formed and the well known classifiers such as NB, SVM and DT are applied as homogeneous classifiers respectively in these models. The proposed ensemble model is shown in figure 1.

Fig.1. Proposed Ensemble Model

As it is seen from figure 1, three new training datasets for three base learners are generated from original input data. The original data form (raw data) is kept as the training data of first base learner. As for the second one, the raw data and the concept words obtained from Babelnet are jointly used. As for the third one, the concepts words obtained from babelnet and only the terms which are expanded by BabelNet are jointly used. In the data of third base learner, the terms which are not considered by Babelnet are ignored. Thus only the most important terms and their concept extensions are considered as a feature space. Eventually by these base learners, the diversity of an ensemble model has been provided. As the next phase, the decided classification algorithms are used as base learners. And finally, to obtain the class label for an input message, classification outputs of three homogeneous classifiers are combined by majority voting method.

VI. E XPERIMENTAL R ESULTS

A. Datasets

Table 1. Description of Datasets

We have employed six datasets which are widely evaluated in similar studies to assess the performance of

the proposed ensemble model. The whole datasets are in English, publicly available, non-encoded and involve tweets which are collected through Twitter API and labeled as negative or positive by using a Labeling tool. Table 1 summarizes the positive and negative tweet counts and theme of each datasets.

B. The Results Evaluation

As the first stage of the proposed ensemble model, we have generated three training sets to provide the diversity. The three columns of table2 explain the feature spaces of these training sets respectively. The hobbit dataset originally constituted by 954 terms and this original form is used as the first training dataset of proposed ensemble model. 2182 terms which are obtained by 954 original terms and the related concepts are used as the features of second training dataset of model. By eliminating non extended terms of the original dataset 1992 features are used as the features of third dataset of model. With these three feature sets, diversity for the proposed ensemble model is provided.

Table 2. Feature Extensions

After the comprising feature sets by extracting the semantically meaningful terms from twitter messages the model selection for training process must be realized. In this step, the well known classification algorithms such as NB, SVM and C4.5 are selected as weak learners of the ensemble model. In NB, SVM, C4.5 and ensemble experiments 80% of input data is used as training set and the remaining 20% is used as test set. The f-measure results using 10-fold cross validation are reported as in table 3.

Table 3. Classification results

Bold values in table 3 indicate the best classification performances. When we examine the results, it is clearly realized that the proposed ensemble model presents an overall outstanding performance to any of the other used individual classifiers. Only with UMICH dataset, the ensemble cannot increase the performances of the base classifiers. The accuracy of base classifiers for this dataset is already high such as 0.991 or 0.96. Therefore the classifiers actually cannot be considered as weak classifiers for this dataset.

When the datasets are examined in terms of theme, it can be clearly seen Sts-Gold and Sts-Test represent several domains. This causes a decline in classification performance because of domain-dependent sentiment words. Polarity of these words such as “short” change from domain to domain. When we mention about “battery life” of a smartphone for the review sentence “The battery life is very short”, sentiment word“short” has negative polarity. “The response time of the game is very short.” is about “response time” of a game and “short” has positive polarity for this review sentence. The effect of domain dependent sentiment words comes out in the classification results in Table 3. For Sts-Gold and Sts-Test the proposed ensemble model cannot increase the classification results as much as it does in the remaining datasets.

While an ensemble model compared to any other classification models, ensemble quite likely provides a superior performance due to combination of weak classifiers. The main drawback of an ensemble model is being time consuming. However if the time consuming tasks such as model selection and training can be preformed offline then the ensemble model is highly recommended due to its superior accuracy results.

VII.C ONCLUSION

In this paper, we have proposed an ensemble model to classify English Twitter messages according to sentiments by using word embedding representation. We indicated the need for word embedding like methods for short text classification problems; since the obtained results showed that providing word embedding by babelnet is an adequate way to represent main concepts related with the terms. The embedded knowledge extracted from Babelnet contributed the overall performance a short text classification system. Another main contribution of this study is that we propose a more accurate classifier by applying an ensemble model which includes the sub datasets which are constituted with not only the terms in original twits but also the concepts of these terms. We compared our ensemble model with widely used classification algorithms such as NB, SVM and C4.5. The comperative results which are obtained from six public datasets confirm that our ensemble model provides a more accurate classifier for short text messages.

R EFERENCES

[1]S. Sun, C. Luo and Junyu Chen, “A Review of Natural

Language Processing Techniques for Opinion Mining Systems”, In Information Fusion, vol. 36, pp. 10-25, 2017.

doi:10.1016/j.inffus.2016.10.004

[2]Z. Faguo, Z. Fan, Y. Bingru and Y. Xingang, "Research on

Short Text Classification Algorithm Based on Statistics and Rules," 2010 Third International Symposium on

Electronic Commerce and Security, Guangzhou, pp. 3-7, 2010. doi:10.1109/ISECS.2010.9

[3]I. Taksa, S. Zelikovitz and A. Spink, "Using Web Search

Logs to Identify Query Classification Terms," Information Technology, 2007. ITNG '07. Fourth International Conference on, Las Vegas, NV, pp. 469-474, 2007. doi:

10.1109/ITNG.2007.202

[4]Y. Zhou, B. Xu, J. Xu, L. Yang, C. Li and B. Xu,

"Compositional Recurrent Neural Networks for Chinese Short Text Classification," 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE,pp.137-144, 2016. doi: 10.1109/WI.2016.0029 [5]M. Wa ng, L. Lin and F. Wang, “Improving Short Text

Classification through Better Feature Space Selection”, 2013 Ninth International Conference on Computational Intelligence and Security, Leshan, pp. 120-124, 2013. doi:

10.1109/CIS.2013.32

[6] B. Sriram, D. Fuhry, E. Demir and M. Demirbas, “Short

Text Classification in Twitter to Improve Information Filtering”, In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp.841-842, 2010.

doi:10.1145/1835449.1835643

[7]L. Wensen, C. Zewen, W. Jun and W. Xiaoyi, "Short Text

Classification Based on Wikipedia and Word2vec", 2016 2nd IEEE International Conference on Computer and Communications (ICCC), Chengdu, pp. 1195-1200, 2016.

doi: 10.1109/CompComm.2016.792489

[8]Q. Chen, L. Yao and J. Yang, "Short Text Classification

Based on LDA Topic Model", 2016 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, pp.749-753, 2016. doi:

10.1109/ICALIP.2016.7846525

[9]L. Sang, F. Xie, X. Liu and X. Wu, "WEFEST: Word

Embedding Feature Extension for Short Text Classification", 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, pp.

677-683, 2016. doi: 10.1109/ICDMW.2016.0101

[10]A. Agarwal, B. Xie, I. Vovsha, O. Rambow and R.

Pass onneau, “Sentiment Analysis of Twitter Data Proceedings of the Workshop on Languages in Social Media”, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 30-38, 2011.

[11]A. K. Singh, D. K. Gupta and R. M. Singh, “Sentiment

Analysis of Twitter User Data on Punjab Legislative Assembly Election”, I.J. Modern Education and Computer Science, vol. 9, pp. 60-68, 2017. doi:

10.5815/ijmecs.2017.09.07

[12]I. Mukherjee, S. Sahana and P.K. Mahanti, “An Improved

Information Retrieval Approach to Short Text Classification”, I.J. Information Engineering and Electronic Business, vol.4, pp. 31-37, 2017. doi:

10.5815/ijieeb.2017.04.05

[13]D. Mumtaz and B. Ahuja, “A Lexical Approach f or

Opinion Mining in Twitter”, I.J. Education and Management Engineering, vol. 4, pp. 20-29, 2016. doi:

10.5815/ijeme.2016.04.03

[14]R. Xia, Chengqing Zong and Shoushan Li, “Ensemble of

Feature Sets and Classification Algorithms for Sentiment Classification”, In Information Sciences, vol. 181(6), pp.

1138-1152, 2011. doi:10.1016/j.ins.2010.11.023

[15]W. Li, W. Wang and Y. Chen, “Heterogeneous Ensemble

Learning for Chinese Sentiment Classification”, Journal of Information and Computational Science, 9(15), pp. 4551-4558, 2012.

[16]Y. Su, Y. Zhang, D. Ji, Y. Wang and H. Wu, “Ensemble

Learning for Sent iment Classification”, 13th Chinese Conf.

on Chinese Lexical Semantics (CLSW’12), Berlin, pp.84-93, 2012. doi:10.1007/978-3-642-36337-5_10Ava

[17]P.P.B. Filho and T.A.S. Pardo, “NILC USP: A Hybrid

System for Sentiment Analysis in Twitter Messages”, Semeval 2013, Atlanta, Georgia, pp. 568–572, 2013. [18]A. Hassan, A. Abbasi and D. Zeng, “Twitter Sentiment

Analysis: a Bootstrap Ensemble Framework”, SocialCom, Alexandria, VA, pp. 357-364, 2013. doi:

10.1109/SocialCom.2013.56

[19]G. Wang, J. Sun, J. Ma, K. Xu and J. Gu, “Sentiment

Classification: The Contribution of Ensemble Learning”, Decision Support Systems, vol. 57, pp. 77-93, 2014. doi:

10.1016/j.dss.2013.08.002

[20]E. Fersini, E. Messina and F.A. Pozzi, “Sentiment

Analysis: Bayesian Ensemble Learning”, Decision Support Systems, vol. 68, pp. 26-38, 2014.

doi:10.1016/j.dss.2014.10.004

[21]T. Chalothorn and J. Ellman, “Simple Approaches of

Sentiment Analysis via Ensemble Learning”, Information Science and Applications Lecture Notes in Electrical Engineering,Information Science and Applications, pp.

631-639, 2015. doi: 10.1007/978-3-662-46578-3_74 [22]J.M. Cotelo, F.L. Cruz, F. Enríquez and J.A. Troyano,

“Tweet Categorization by Combining Content and Structural Knowledge”, Information Fusion, vol. 31, pp.54-64, 2016. doi:10.1016/j.inffus.2016.01.002

[23]I. Perikos and I. Hatzilygeroudis, “Recognizing Emotions

in Text Using Ensemble of Classifiers”, Engineering Applications of Artificial Intelligence, vol. 51, pp. 191-201, 2016. doi: 10.1016/j.engappai.2016.01.012

[24]E. A. Corrêa Jr., V. Q. Marinho and L. B. dos Santos,

“NILC-USP at SemEval-2017 Task 4: A Multi-view Ensemble for Twitter Sentiment Analysis”, 2017.

doi:abs/1704.02263,2017.

[25]https://www.360docs.net/doc/e68218018.html,/~yangl/data/Text_Norm_Data

_Release_Fei_Liu/, Accessed July 2017

[26]C. D Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J

Bethard, and D. McClosky, “The Stanford Corenlp Natural Language Processing Toolkit”, Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60, 2014. doi:

10.3115/v1/P14-5010

[27]R. Navigli and S. P. Ponzetto, “BabelNet: The Automatic

Construction, Evaluation and Application of a Wide-coverage Multilingual Semantic Network”, Artificial Intelligence, vol. 193, pp. 217-250, 2012. doi:

10.1016/j.artint.2012.07.001

[28]A. Moro, F. Cecconi and R. Navigli. “Multilingual Word

Sense Disambiguation and Entity Linking for Everybody.”, Proc. of the 13th International Semantic Web Conference, Posters and Demonstrations (ISWC 2014), Riva del Garda, Italy, pp. 25-28, 2014.

[29]S. ?. Omurca, S. Ba? and E. Ekinci, “An Efficient

Document Categorization Approach for Turkish Based Texts”, International Journal Of Intelligent Systems And Applications In Engineering, 3(1), pp. 7-13, 2015. doi:

10.18201/ijisae.94177

[30]S. ?. Omurca and E. Ekinci, "An Alternative Evaluation of

post Traumatic Stress Disorder with Machine Learning Methods," 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA), Madrid, pp. 1-7, 2015. doi:

10.1109/INISTA.2015.7276754

[31]H. Parveen and S. Pandey, "Sentiment Analysis on Twitter

Data-set using Naive Bayes Algorithm," 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT),

Bangalore, pp. 416-419, 2016. doi: 10.1109/ICATCCT.2016.7912034

[32] J. Jenkins, W. Nick, K. Roy, A. Esterline and J. Bloch,

"Author identification using Sequential Minimal Optimization", SoutheastCon 2016, Norfolk, VA, pp. 1-2, 2016. doi: 10.1109/SECON.2016.7506654

[33] L. Yan, Y. Zhang, Y. He, S. Gao et al. “Hazardou s Traffic

Event Detection Using Markov Blanket and Sequential Minimal Optimization (MB-SMO)”, Sensors, Basel, Switzerland, 16(7) , 2016. doi: 10.3390/s1*******

[34] Michael Gamon, “Sentiment classification on customer

feedback data: noisy data, large feature vectors, and the role of linguistic analysis”, Proceedings of the 20th International Conference on Computational Linguistics, 2004. doi: 10.3115/1220355.1220476

[35] E. Ekinci and H. Tak??, “Elektronik Postalar?n Adli

Analizinde Yazar Analizi Tekniklerinin Kulla n?lmas?”, 2012.

[36] P. C. S. Nj?lstad, L. S. H?ys?ter, W. Wei and J. A. Gulla,

"Evaluating Feature Sets and Classifiers for Sentiment Analysis of Financial News", 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Warsaw, pp. 71-78, 2014. doi: 10.1109/WI-IAT.2014.82

[37] L. Rokach, “Ensemble -based classifiers”, Artificial

Intelligence Review, vol. 33, pp. 1-39, 2010. doi:10.1007/s10462-009-9124-7

[38] T. Windeatt and G. Ardeshir, “Decision Tree

Simplificat ion For Classifier Ensembles”. International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), 18(5), pp.749-776, 2004. doi:10.1142/S021800140400340X

[39] T. K. Ghosh J, “Error Correlation and Error Reduction in

Ensemble Classifiers”, Connec tion science, special issue on combining artificial neural networks: ensemble approaches , 8(3-4), pp. 385-404, 1996. doi:10.1080/095400996116839

[40] X. Hu, "Using Rough Sets Theory and Database

Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications", Proceedings 2001 IEEE International Conference on Data Mining, San Jose, CA, pp. 233-240, 2001. doi: 10.1109/ICDM.2001.989524

[41] L. Rokach, “Ensemble Methods for Classifiers”, The Data

Mining and Knowledge Discovery Handbook, pp. 957-980, 2005. doi: 10.1007/0-387-25465-X_45

[42] ZH. Zhou, “Ensemble Methods: Foundations and

Algorithms ”, Chapman & Hall , 2012.

[43] O. Araque, I. Corcuera-Platas, J. Fernando Sánchez-Rada

and Carlos A. Iglesias, “Enhancing deep learning sentiment analysis with ensemble techniques in social applications”, In Expert Systems with Applications , vol. 77, pp. 236-246, 2017, doi: 10.1016/j.eswa.2017.02.00

Authors ’ Profiles

Semih Sevim is postgraduate at the Kocaeli University Computer Engineering Department in Turkey, interested in data mining, natural language processing and sentiment analysis.

Sevin? ?lhan Omurca is an associate professor at the Kocaeli University Computer Engineering Department in Turkey. She has Ph.D. at the Kocaeli University Electronics and Communication Engineering. Her main research interest includes text mining, sentiment analysis, natural language processing, machine learning and data

mining earned.

Ekin Ekinci is Research Assistant of Computer Engineering Departmant at Kocaeli University in Turkey. She has received her BEng. in Computer Engineering from ?anakkale Onsekiz Mart University in 2009 and ME. in Computer Engineering from Gebze Technical University in 2012. She is currently working towards Ph.D. degree

in Computer Engineering from Kocaeli University. Her main research interest includes text mining, sentiment analysis, natural language processing and machine learning.

How to cite this paper: Semih Sevim, Sevin? ?lhan Omurca, Ekin Ekinci, "An Ensemble Model using a BabelNet Enriched Document Space for Twitter Sentiment Classification", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.1, pp.24-31, 2018. DOI: 10.5815/ijitcs.2018.01.03

布鲁姆教育目标分类的认知操作类型

布鲁姆教育目标分类的认知操作类型 按照布鲁姆的“教育目标分类法”,在认知领域的教育目标可分成:知道(知识);领会(理解);应用;分析;综合;评价 1.知道(知识)(knowledge) 是指认识并记忆。这一层次所涉及的是具体知识或抽象知识的辨认,用一种非常接近于学生当初遇到的某种观念和现象时的形式,回想起这种观念或现象。 提示:回忆,记忆,识别,列表,定义,陈述,呈现 2.领会(comprehension) 是指对事物的领会,但不要求深刻的领会,而是初步的,可能是肤浅的。 其包括“转化”、解释、推断等。 提示:说明,识别,描述,解释,区别,重述,归纳,比较 3.应用(application) 是指对所学习的概念、法则、原理的运用。它要求在没有说明问题解决模式的情况下,学会正确地把抽象概念运用于适当的情况。这里所说的应用是初步的直接应用,而不是全面地、通过分析、综合地运用知识。提示:应用,论证,操作,实践,分类,举例说明,解决 4.分析(analysis) 是指把材料分解成它的组成要素部分,从而使各概念间的相互关系更加明确,材料的组织结构更为清晰,详细地阐明基础理论和基本原理。 提示:分析,检查,实验,组织,对比,比较,辨别,区别 5.综合(synthesis)

是以分析为基础,全面加工已分解的各要素,并再次把它们按要求重新地组合成整体,以便综合地创造性地解决问题。它涉及具有特色的表达,制定合理的计划和可实施的步骤,根据基本材料推出某种规律等活动。 它强调特性与首创性,是高层次的要求。 提示:组成,建立,设计,开发,计划,支持,系统化 6.评价(evaluation) 这是认知领域里教育目标的最高层次。这个层次的要求不是凭借直观的感受或观察的现象做出评判,而是理性的深刻的对事物本质的价值做出有说服力的判断,它综合内在与外在的资料、信息,做出符合客观事实的推断。 提示:评价,估计,评论,鉴定,辩明,辩护,证明,预测,预言,支持 布鲁姆的教学目标分类法对设计问题的启发 所提问题可以从简单逐渐发展到复杂; 可以按学习目标的要求,分层次提出问题: l分层次提出问题--认知性问题 认知性问题:它是对知识的回忆和确认。 如:“人民陪审员的工作职责是什么?”,“当汽车向右拐的时候,坐在汽车上你会向哪边倒?” l分层次提出问题--理解性问题 理解性问题:它主要考察学生对概念、规律的理解,让学生进行知识的总结、比较和证明某个观点。 如“检察院和法院的区别和联系是什么?”,“你能用自己的语言来说清楚什么是惯性吗?” l分层次提出问题--应用性问题

文本情感分析综述

文本情感分析综述? 赵妍妍+, 秦兵, 刘挺 (哈尔滨工业大学计算机科学与技术学院信息检索研究中心, 黑龙江哈尔滨 150001) A Survey of Sentiment Analysis * ZHAO Yan-Yan+, QIN Bing, LIU Ting (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China) + Corresponding author: Phn: +86-451-86413683 ext 800, E-mail: zyy@https://www.360docs.net/doc/e68218018.html, Abstract: Sentiment analysis is a novel research topic with the quick development of online reviews, which has drawn interesting attention due to its research value and extensive applications. This paper surveys the state-of-the-art research on sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization; then the evaluation and corpus for sentiment analysis are introduced; finally the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis. It is expected to be helpful to the future research. Key words: sentiment analysis; sentiment extraction; sentiment classification; sentiment retrieval and summarization; evaluation; corpus 摘 要: 文本情感分析是随着网络评论的海量增长而迅速兴起的一个新兴研究课题,其研究价值和应用价值受到人们越来越多的重视.本文对文本情感分析的研究现状与进展进行了总结.首先将文本情感分析归纳为三项主要任务,即情感信息抽取,情感信息分类以及情感信息的检索与归纳,并对它们进行了细致的介绍和分析;进而介绍了文本情感分析的国内外评测和资源建设情况;最后介绍了文本情感分析的应用.文本重在对文本情感分析研究的主流方法和前沿进展进行概括,比较和分析,以期对后续研究有所助益. 关键词: 文本情感分析;情感信息抽取;情感信息分类;情感信息的检索与归纳;评测;资源建设 中图法分类号: TP391文献标识码: A 随着Web2.0的蓬勃发展,互联网逐渐倡导“以用户为中心,用户参与”的开放式构架理念.互联网用户由单纯的“读”网页,开始向“写”网页、“共同建设”互联网发展,并由被动地接收互联网信息向主动创造互联网信息迈进.因此,互联网(如:博客和论坛)上产生了大量的用户参与的,对于诸如人物、事件、产品等有价值的评论信息.这些评论信息表达了人们的各种情感色彩和情感倾向性,如“喜”、“怒”、“哀”、“乐”,和“批评”、“赞扬”等.基于此,潜在的用户就可以通过浏览这些主观色彩的评论,来了解大众舆论对于某一事件或产品的看法.由于越来越多的用户乐于在互联网上分享自己的观点或体验,这类评论信息迅速膨胀,仅靠人工的方法难以应对网上海量信 ?Supported by the National Natural Science Foundation of China under Grant Nos. 60803093, 60975055 (国家自然科学基金) and the “863” National High-Tech Research and Development of China via grant 2008AA01Z144(863计划探索类专题项目)

大数据挖掘(8):朴素贝叶斯分类算法原理与实践

数据挖掘(8):朴素贝叶斯分类算法原理与实践 隔了很久没有写数据挖掘系列的文章了,今天介绍一下朴素贝叶斯分类算法,讲一下基本原理,再以文本分类实践。 一个简单的例子 朴素贝叶斯算法是一个典型的统计学习方法,主要理论基础就是一个贝叶斯公式,贝叶斯公式的基本定义如下: 这个公式虽然看上去简单,但它却能总结历史,预知未来。公式的右边是总结历史,公式的左边是预知未来,如果把Y看出类别,X看出特征,P(Yk|X)就是在已知特征X的情况下求Yk类别的概率,而对P(Yk|X)的计算又全部转化到类别Yk的特征分布上来。举个例子,大学的时候,某男生经常去图书室晚自习,发现他喜欢的那个女生也常去那个自习室,心中窃喜,于是每天买点好吃点在那个自习室蹲点等她来,可是人家女生不一定每天都来,眼看天气渐渐炎热,图书馆又不开空调,如果那个女生没有去自修室,该男生也就不去,每次男生鼓足勇气说:“嘿,你明天还来不?”,“啊,不知道,看情况”。然后该男生每天就把她去自习室与否以及一些其他情况做一下记录,用Y表示该女生是否去自习室,即Y={去,不去},X是跟去自修室有关联的一系列条件,比如当天上了哪门主课,蹲点统计了一段时间后,该男生打算今天不再蹲点,而是先预测一下她会不会去,现在已经知道了今天上了常微分方法这么主课,于是计算P(Y=去|常微分方

程)与P(Y=不去|常微分方程),看哪个概率大,如果P(Y=去|常微分方程) >P(Y=不去|常微分方程),那这个男生不管多热都屁颠屁颠去自习室了,否则不就去自习室受罪了。P(Y=去|常微分方程)的计算可以转为计算以前她去的情况下,那天主课是常微分的概率P(常微分方程|Y=去),注意公式右边的分母对每个类别(去/不去)都是一样的,所以计算的时候忽略掉分母,这样虽然得到的概率值已经不再是0~1之间,但是其大小还是能选择类别。 后来他发现还有一些其他条件可以挖,比如当天星期几、当天的天气,以及上一次与她在自修室的气氛,统计了一段时间后,该男子一计算,发现不好算了,因为总结历史的公式: 这里n=3,x(1)表示主课,x(2)表示天气,x(3)表示星期几,x(4)表示气氛,Y仍然是{去,不去},现在主课有8门,天气有晴、雨、阴三种、气氛有A+,A,B+,B,C五种,那么总共需要估计的参数有8*3*7*5*2=1680个,每天只能收集到一条数据,那么等凑齐1 680条数据大学都毕业了,男生打呼不妙,于是做了一个独立性假设,假设这些影响她去自习室的原因是独立互不相关的,于是 有了这个独立假设后,需要估计的参数就变为,(8+3+7+5)*2 = 46个了,而且每天收集的一条数据,可以提供4个参数,这样该男生就预测越来越准了。

朴素贝叶斯算法详细总结

朴素贝叶斯算法详细总结 朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法,是经典的机器学习算法之一,处理很多问题时直接又高效,因此在很多领域有着广泛的应用,如垃圾邮件过滤、文本分类等。也是学习研究自然语言处理问题的一个很好的切入口。朴素贝叶斯原理简单,却有着坚实的数学理论基础,对于刚开始学习算法或者数学基础差的同学们来说,还是会遇到一些困难,花费一定的时间。比如小编刚准备学习的时候,看到贝叶斯公式还是有点小害怕的,也不知道自己能不能搞定。至此,人工智能头条特别为大家寻找并推荐一些文章,希望大家在看过学习后,不仅能消除心里的小恐惧,还能高效、容易理解的get到这个方法,从中获得启发没准还能追到一个女朋友,脱单我们是有技术的。贝叶斯分类是一类分类算法的总称,这类算法均以贝叶斯定理为基础,故统称为贝叶斯分类。而朴素朴素贝叶斯分类是贝叶斯分类中最简单,也是常见的一种分类方法。这篇文章我尽可能用直白的话语总结一下我们学习会上讲到的朴素贝叶斯分类算法,希望有利于他人理解。 ▌分类问题综述 对于分类问题,其实谁都不会陌生,日常生活中我们每天都进行着分类过程。例如,当你看到一个人,你的脑子下意识判断他是学生还是社会上的人;你可能经常会走在路上对身旁的朋友说“这个人一看就很有钱、”之类的话,其实这就是一种分类操作。 既然是贝叶斯分类算法,那么分类的数学描述又是什么呢? 从数学角度来说,分类问题可做如下定义: 已知集合C=y1,y2,……,yn 和I=x1,x2,……,xn确定映射规则y=f(),使得任意xi∈I有且仅有一个yi∈C,使得yi∈f(xi)成立。 其中C叫做类别集合,其中每一个元素是一个类别,而I叫做项集合(特征集合),其中每一个元素是一个待分类项,f叫做分类器。分类算法的任务就是构造分类器f。 分类算法的内容是要求给定特征,让我们得出类别,这也是所有分类问题的关键。那么如何由指定特征,得到我们最终的类别,也是我们下面要讲的,每一个不同的分类算法,对

情感识别综述

龙源期刊网 https://www.360docs.net/doc/e68218018.html, 情感识别综述 作者:潘莹 来源:《电脑知识与技术》2018年第08期 摘要:情感交互在人机自然交互的研究中受到了很大的重视,而情感识别是人机情感交互的关键,其研究目的是让机器感知人类的情感状态,提高机器的人性化水平。该文首先对情感识别理论进行了概述,继而对情感识别的研究方法进行了分类描述,接着简述了情感识别的应用领域,最后对情感识别的发展进行了展望。 关键词:情感识别;综述;多模态融合;特征提取;情感分类 中图分类号:TP391 文献标识码:A 文章编号:1009-3044(2018)08-0169-03 1引言 随着智能技术的迅猛发展以及智能机器在各领域的广泛应用,人们渴望对机器进行更深层次地智能化开发,使机器具备和人一样的思维和情感,让机器能够真正地了解用户的意图,进而让机器更好地为人类提供智能化的服务。在智能机器研究中,自然和谐的人机交互能力受到很大的重视。情感识别作为人机情感交互的基础,能够使机器理解人的感性思维,影响着机器智能化的继续发展,成为人机自然交互的关键要素。同时,情感识别融多学科交叉为一体,其发展将会带动多学科共同发展,其应用也会带来巨大的经济效益和社会效益。因而,情感识别技术的研究具有很大的发展前景和重要的学术价值。 2情感识别概述 情感是一种综合了行为、思想和感觉的状态。情感信息主要表现在内外两个层面:一是外在情感信息,是指通过外表能自然观察到的信息,如面部表情、唇动、声音、姿势等,二是内在情感信息,是指外部观察不到的生理信息,如心率、脉搏、血压、体温等。 情感识别本质上也是一种模式识别,它是指利用计算机分析各种情感信息,提取出描述情感的情感特征值,建立特征值与情感的映射关系,然后对情感信息进行分类,从而推断出情感状态的过程。 3情感识别的研究方法 情感识别的研究方法主要有:面部表情识别、语音情感识别、姿态表情识别、文本识别、生理模式识别和多模态情感识别。情感识别过程一般包括四个部分:数据获取、数据预处理、情感特征提取、情感分类。情感特征提取过程一般包括:特征提取、特征降维和特征选择。其中,特征提取的方式各有不同,而特征降维和选择的方式大致相同。

布鲁姆掌握学习理论、教学目标分类[1]

布鲁姆掌握学习理论 学习的分类与评价 一、目标分类 布卢姆认为,一部完整的教育目类分类学,应该包括学生学习的三个基本领域:认知、情感、动作技能。布卢姆的教育目标分类学具有以下几个特征。 1.用学生外显的行为来陈述目标 布卢姆认为,制定目标是为了便于客观地评价,而不是表述理想的愿望,事实上,只有具体的、外显的行为目标,才是可测量的。 2.目标是有层次结构的 布卢姆认知领域目标分类学中包括六个主要类别:⑴知识;⑵领会;⑶运用;⑷分析; ⑸综合;⑹评价。情感领域包括五个主要类别:⑴接受或注意;⑵反应;⑶价值评估;⑷组织;⑸性格化或价值的复合。每个主要类别都包括若干子类别, 由此可见,布卢姆的分类学是将学生行为由简单到复杂按秩序排列的,因而,教育目标具有连续性、累积性。 3.教育目标分类学是超越学科内容的。 布卢姆认为,教育目标分类的方法,是不受学生年龄和教学内容所局限的。可以把教育目标分类学的层次结构作为框架,加入相应的内容,形成每门学科的教育目标体系。 4.教育目标分类学是一种工具。 教育目标分类是为教师进行教学和科研服务的所以,目标分类本身并不是目的,而是为评价教学结果提供测量的手段,同时有助于对教学过程和学生的变化作出各种假设,激发对教育问题的思考。 二、认知与情感的相互作用

认知目标中始终具有情感的成分;情感目标总是带有认知成分,因此,布卢姆把学生的认知状态和情感特征作为学习的前提条件。所谓情感特征,就是指“学生学习新的学习任务的动机”。 三、教学评价 布卢姆侧重对学习过程的评价,并把评价作为学习过程的一部分。布卢姆主张教学中应更多地使用另一种评价方法——形成性评价或形成性测验。 形成性评价的有效的程序是:把一门课分成若干学习单元,再把每一单元分解成若干要素,使学习的各种要素形成一个学习任务的层次,确定相应的教育目标系统;在每一单元教学结束时,都要安排一次形成性测验(有时又称为诊断性测验)。形成性测验常常被用来为学生的学习定速度,保证学生在从事下一个学习任务之前,完全掌握这一单元的内容。形成性测验可起到强化的作用;形成性测验可以揭示出问题所在,“诊断”后应该附有一个非常具体的“处方”。 简言之,形成性评价的主要目的不是给学生评定分数或等级,而是帮助学生和教师把注意力集中在学生对教学内容达到掌握水平所必备的知识技能上;而总结性评价的目的,是要对学生在一门课上的学习结果作出全面的评定,并把评定成绩告诉学生家长和学校管理人员。所以,布卢姆指出,评价与评分不是同一个意义。不评分也能够对学习结果作出评价,形成性评价就是一个例子。 学校学习理论的若干变量 布卢姆认为,只要对下列三个自变量予以适当注意,就有可能使绝大多数学生的学习都达到掌握水平。 1.学生已经习得完成新的学习任务需必备的知识技能的程度(即“认知准备状态”)。 2.学生从事学习过程的动机程度(即“情感准备状态”)。 3.教学适合于学生的程度(即“教学质量”)。 一、认知准备状态 布卢姆认为,学生在学习结果上显示出来的差异可以被归结于在学习开始时掌握有关知识技能上的差异。

布鲁姆教育目标分类

布鲁姆教育目标分类 20世纪50年代以布鲁姆等人将教学活动所要实现的整体目标分为认知、情感、心理运动等三大领域,并从实现各个领域的最终目标出发,确定了一系列目标序列。 (1)认知学习领域目标分类 布鲁姆将认知领域的目标分为识记、理解、运用、分析、综合和评价六个层次。 ①识记:指对先前学习过的知识材料的记忆,包括具体事实、方法、过程、理论等的记忆,如记忆名词、事实、基本观念、原则等。 ②领会:指把握知识材料意义的能力。可以通过三种形式来表明对知识材料的领会,一是转换,即用自己的话或用与原先不同的方式来表达所学的内容。二是解释,即对一项信息(如图表、数据等)加以说明或概述。三是推断,即预测发展的趋势。 ③运用:指把学到的知识应用于新的情境、解决实际问题的能力。它包括概念、原理、方法和理论的应用。运用的能力以知道和领会为基础,是较高水平的理解。 ④分析:指把复杂的知识整体分解为组成部分并理解各部分之间联系的能力。它包括部分的鉴别、部分之间关系的分析和对其中的组织结构的认识。例如,能区分因果关系,能识别识别史料中作者的观点或倾向等。分析代表了比运用

更高的智力水平,因为它既要理解知识材料的内容,又要理解其结构。 ⑤综合:指将所学知识的各部分重新组合,形成一个新的知识整体。它包括发表一篇内容独特的演说或文章,拟定一项操作计划或概括出一套抽象关系。它所强调的是创造能力,即形成新的模式或结构的能力。 ⑥评价:指对材料(如论文、观点、研究报告等)作价值判断的能力。它包括对材料的内在标准(如组织结构)或外在的标准(如某种学术观点)进行价值判断。例如,判断实验结论是否有充分的数据支持,或评价某篇文章的水平与价值。这是最高水平的认知学习结果,因为它要求超越原先的学习内容,综合多方面的知识并要基于明确的标准才能做出评价。 在上述布卢姆的分类系统中,第一层次是“识记”,主要涉及对言语信息的简单记忆,不需要对原输入的信息作多大改组或加工。而以后的五个层次与“知道”的不同之处在于:它们是加工知识的方式,需要学习者在心理上对知识进行组织或重新组织。这个分类系统为我们确定教学目标提供了一个很好的思考框架。 (2)动作技能学习领域目标分类 动作技能涉及骨骼和肌肉的运用、发展和协调。在实验课、体育课、职业培训、军事训练等科目中,这常是主要的

试述数据模型的概念

试述数据模型的概念,数据模型的作用和数据模型的三个要素: 答案: 模型是对现实世界的抽象。在数据库技术中,表示实体类型及实体类型间联系的模型称为“数据模型”。 数据模型是数据库管理的教学形式框架,是用来描述一组数据的概念和定义,包括三个方面: 1、概念数据模型(Conceptual Data Model):这是面向数据库用户的实现世界的数据模型,主要用来描述世界的概念化结构,它使数据库的设计人员在设计的初始阶段,摆脱计算机系统及DBMS的具体技术问题,集中精力分析数据以及数据之间的联系等,与具体的DBMS 无关。概念数据模型必须换成逻辑数据模型,才能在DBMS中实现。 2、逻辑数据模型(Logixal Data Model):这是用户从数据库所看到的数据模型,是具体的DBMS所支持的数据模型,如网状数据模型、层次数据模型等等。此模型既要面向拥护,又要面向系统。 3、物理数据模型(Physical Data Model):这是描述数据在储存介质上的组织结构的数据模型,它不但与具体的DBMS有关,而且还与操作系统和硬件有关。每一种逻辑数据模型在实现时都有起对应的物理数据模型。DBMS为了保证其独立性与可移植性,大部分物理数据模型的实现工作又系统自动完成,而设计者只设计索引、聚集等特殊结构。 数据模型的三要素: 一般而言,数据模型是严格定义的一组概念的集合,这些概念精确地描述了系统的静态特征(数据结构)、动态特征(数据操作)和完整性约束条件,这就是数据模型的三要素。 1。数据结构 数据结构是所研究的对象类型的集合。这些对象是数据库的组成成分,数据结构指对象和对象间联系的表达和实现,是对系统静态特征的描述,包括两个方面: (1)数据本身:类型、内容、性质。例如关系模型中的域、属性、关系等。 (2)数据之间的联系:数据之间是如何相互关联的,例如关系模型中的主码、外码联系等。 2 。数据操作 对数据库中对象的实例允许执行的操作集合,主要指检索和更新(插入、删除、修改)两类操作。数据模型必须定义这些操作的确切含义、操作符号、操作规则(如优先级)以及实现操作的语言。数据操作是对系统动态特性的描述。 3 。数据完整性约束 数据完整性约束是一组完整性规则的集合,规定数据库状态及状态变化所应满足的条件,以保证数据的正确性、有效性和相容性。

朴素贝叶斯在文本分类上的应用

2019年1月 取此事件作为第一事件,其时空坐标为P1(0,0,0,0),P1′(0,0,0,0),在Σ′系经过时间t′=n/ν′后,Σ′系中会看到第n个波峰通过Σ′系的原点,由于波峰和波谷是绝对的,因此Σ系中也会看到第n个波峰通过Σ′系的原点,我们把此事件记为第二事件,P2(x,0,0,t),P2′(0,0,0,t′).则根据洛伦兹变换,我们有x=γut′,t=γt′。在Σ系中看到t时刻第n个波峰通过(x, 0,0)点,则此时该电磁波通过Σ系原点的周期数为n+νxcosθ/c,也就是: n+νxcosθc=νt→ν=ν′ γ(1-u c cosθ)(5)这就是光的多普勒效应[2],如果ν′是该电磁波的固有频率的话,从式(5)可以看出,两参考系相向运动时,Σ系中看到的光的频率会变大,也就是发生了蓝移;反之,Σ系中看到的光的频率会变小,也就是发生了红移;θ=90°时,只要两惯性系有相对运动,也可看到光的红移现象,这就是光的横向多普勒效应,这是声学多普勒效应中没有的现象,其本质为狭义相对论中的时间变缓。3结语 在本文中,通过对狭义相对论的研究,最终得到了光的多普勒效应的表达式,并通过与声学多普勒效应的对比研究,理解了声学多普勒效应和光学多普勒效应的异同。当限定条件为低速运动时,我们可以在经典物理学的框架下研究问题,比如声学多普勒效应,但如果要研究高速运动的光波,我们就需要在狭义相对论的框架下研究问题,比如光的多普勒效应。相对论乃是当代物理学研究的基石,通过本次研究,使我深刻的意识到了科学家为此做出的巨大贡献,为他们献上最诚挚的敬意。 参考文献 [1]肖志俊.对麦克斯韦方程组的探讨[J].通信技术,2008,41(9):81~83. [2]金永君.光多普勒效应及应用[J].现代物理知识,2003(4):14~15.收稿日期:2018-12-17 朴素贝叶斯在文本分类上的应用 孟天乐(天津市海河中学,天津市300202) 【摘要】文本分类任务是自然语言处理领域中的一个重要分支任务,在现实中有着重要的应用,例如网络舆情分析、商品评论情感分析、新闻领域类别分析等等。朴素贝叶斯方法是一种常见的分类模型,它是一种基于贝叶斯定理和特征条件独立性假设的分类方法。本文主要探究文本分类的流程方法和朴素贝叶斯这一方法的原理并将这种方法应用到文本分类的一个任务—— —垃圾邮件过滤。 【关键词】文本分类;监督学习;朴素贝叶斯;数学模型;垃圾邮件过滤 【中图分类号】TP391.1【文献标识码】A【文章编号】1006-4222(2019)01-0244-02 1前言 随着互联网时代的发展,文本数据的产生变得越来越容易和普遍,处理这些文本数据也变得越来越必要。文本分类任务是自然语言处理领域中的一个重要分支任务,也是机器学习技术中一个重要的应用,应用场景涉及生活的方方面面,如网络舆情分析,商品评论情感分析,新闻领域类别分析等等。 朴素贝叶斯方法是机器学习中一个重要的方法,这是一种基于贝叶斯定理和特征条件独立性假设的分类方法。相关研究和实验显示,这种方法在文本分类任务上的效果较好。2文本分类的流程 文本分类任务不同于其他的分类任务,文本是一种非结构化的数据,需要在使用机器学习模型之前进行一些适当的预处理和文本表示的工作,然后再将处理后的数据输入到模型中得出分类的结论。 2.1分词 中文语言词与词之间没有天然的间隔,这一点不同于很多西方语言(如英语等)。所以中文自然语言处理首要步骤就是要对文本进行分词预处理,即判断出词与词之间的间隔。常用的中文分词工具有jieba,复旦大学的fudannlp,斯坦福大学的stanford分词器等等。 2.2停用词的过滤 中文语言中存在一些没有意义的词,准确的说是对分类没有意义的词,例如语气词、助词、量词等等,去除这些词有利于去掉一些分类时的噪音信息,同时对降低文本向量的维度,提高文本分类的速度也有一定的帮助。 2.3文本向量的表示 文本向量的表示是将非结构化数据转换成结构化数据的一个重要步骤,在这一步骤中,我们使用一个个向量来表示文本的内容,常见的文本表示方法主要有以下几种方法: 2.3.1TF模型 文本特征向量的每一个维度对应词典中的一个词,其取值为该词在文档中的出现频次。 给定词典W={w1,w2,…,w V},文档d可以表示为特征向量d={d1,d2,…,d V},其中V为词典大小,w i表示词典中的第i个 词,t i表示词w i在文档d中出现的次数。即tf(t,d)表示词t在文档d中出现的频次,其代表了词t在文档d中的重要程度。TF模型的特点是模型假设文档中出现频次越高的词对刻画文档信息所起的作用越大,但是TF有一个缺点,就是不考虑不同词对区分不同文档的不同贡献。有一些词尽管在文档中出现的次数较少,但是有可能是分类过程中十分重要的特征,有一些词尽管会经常出现在众多的文档中,但是可能对分类任务没有太大的帮助。于是基于TF模型,存在一个改进的TF-IDF模型。 2.3.2TF-IDF模型 在计算每一个词的权重时,不仅考虑词频,还考虑包含词 论述244

文本情感分类研究综述

Web文本情感分类研究综述 王洪伟/刘勰/尹裴/廖雅国 2012-9-27 14:55:59 来源:《情报学报》(京)2010年5期【英文标题】Review of Sentiment Classification on Web Text 【作者简介】王洪伟,男,1973年生,博士,副教授/博士生导师,研究方向:本体建模和情感计算,E-mail:hwwang@https://www.360docs.net/doc/e68218018.html,。同济大学经济与管理学院,上海200092; 刘勰,男,1985年生,硕士研究生,研究方向:数据挖掘与情感计算。同济大学经济与管理学院,上海200092; 尹裴,女,1986年生,硕士研究生,研究方向:商务智能。同济大学经济与管理学院,上海200092; 廖雅国,男,1954年生,博士,教授,研究方向:人工智能与电子商务。香港理工大学电子计算学系,香港 【内容提要】对用户发表在Web上的评论进行分析,能够识别出隐含在其中的情感信息,并发现用户情感的演变规律。为此,本文对Web文本情感分类的研究进行综述。将情感分类划分为三类任务:主客观分类、极性判别和强度判别,对各自的研究进展进行总结。其中将情感极性判别的方法分为基于情感词汇语义特性的识别和基于统计自然语言处理的识别方法。分析了情感分类中的语料库选择和研究难点。最后总结了情感分类的应用现状,并指出今后的研究方向。

Analyzing the users' reviews on the Web can help us to identify users' implicit sentiments and find the evolution laws of their emotion. To this end, this paper is a survey about the sentiment classification on the Web text. We divided the process of classification into three categories:subjective and objective classification,polarity identification and intensity identification and respectively summarize the resent research achievements in these fields. We also sorted the methods of polarity identification into two types: one is based on the emotional words with semantic characteristics, while the other statistic methods of natural language processing. What is more, the choice of corpus and potential research problems are discussed. At last, this paper summarized the status quo of application and pointed out the direction of future research. 【关键词】Web文本/情感分类/综述/主观性文本Web texts/Sentiment classification/Survey/Subjective text 随着互联网的流行,Web文本成为我们获取信息、发表观点和交流情感的重要来源。特别是随着Web2.0技术的发展,网络社区、博客和论坛给网络用户提供了更宽广的平台来交流信息和表达意见。这些文章和言论往往包含有丰富的个人情感,比如对某部大片的影评,对某款手机的用户体验等,其中蕴含着巨大的商业价值。如何从这些Web文本中进行情感挖掘,获取情感倾向已经成为当今商务智能领域关注的热点。所谓情感分析(sentiment analysis),就是确定说话人或作者对某个特定主题的态度。其中,态度可以是他们的判断或者评估,他们(演说、写作时)的情绪状态,或者有意(向受众)传递的情感信息。因此,情感分

教育目标分类

认知领域的目标分类结构: 知识 这是指对先前学习过的材料的记忆。它包括:具体的知识,即术语的知识和具体事实的知识;处理具体事物的方式方法的知识;学习领域中的普通原理和抽象概念的知识。这是最低水平的认知学习结果,其所要求的心理过程主要是记忆。 领会 指理解所传授的知识和信息的能力,一般可借助转化、解释和推断三种形式来完成。转化:用自己觉得有意义的话语来组织表达所传授的内容和知识;解释:对所交流的信息进行解释和说明;推断:通过目前的知识去推测未来的状况。领会超越了单纯的记忆,代表最低水平的理解。 运用 指能将习得的材料应用于新的具体情境,包括概念、规则、方法、规律和理论的应用。运用代表较高水平的理解。 分析 把复杂的材料分解成各个组成部分,以便弄清各种观念的有关层次,或者弄清所表达的各种观念之间的关系。分析代表了比运用更高的智能水平,因为它既要理解材料的内容,又要理解其结构。 综合 把各种要素和组成部分组成一个整体。包括发表一篇内容独特的演说或文章,拟定一项操作计划或概括出一套抽象关系。它说强调的是创造能力,需要产生新的模式或结构。 评价 指对材料作价值判断的能力。包括按材料内在的标准(如组织)或外在的标准(如与目的适当性)进行价值判断。这是最高水平认知学习结果,因为它要求超越原先的学习内容,并需要基于明确标准的价值判断。

(二)情感领域的目标分类结构: 接受(注意) 指学生愿意注意特殊的现象或刺激(如课堂活动、教科书、文体活动等)。从教师方面来看,其任务是指引和维持学生的注意。学习结果包括从意识一事物的存在的简单注意到学生的选择性注意。它是低级的价值内化水平。 反应 指学生主动参与。处在这一水平的学生,不仅注意某种现象,而且以某种方式对它作出反应(如自愿读规定范围外的材料),以及反应的满足(如以愉快的心情阅读),这类目标与教师通常所说的“兴趣”类似,强调对特殊活动的选择与满足。 价值化 指学生将特殊的对象、现象或行为与一定的价值标准相联系。包括接受某种价值标准(如愿意改进与团体交往的技能),偏爱某种价值标准和为某种价值标准作奉献(如为发挥集体的有效作用而承担义务)。这一阶段的学习结果所涉及的行为的一致性和稳定性使得这种价值标准清晰可辨。价值化与教师通常所说的“态度”和“欣赏”类似。 组织 指将许多不同的价值标准组合在一起,克服它们之间的矛盾、冲突,并开始建立内在一致的价值体系。重点是将许多价值标准进行比较、关联和系统化。学习的结果可能涉及某一价值系统的组织。与人生哲学有关的教学目标属于这一级水平。 价值与价值体系的性格化 指个人具有长时期控制自己的行为以致发展了性格化“生活方式”的价值体系。其行为是普遍的、一致的和可以预期的。这一水平的学习结果包括范围广泛的活动,但强调学生行为的典型性和性格化。这阶段的教学目标着重学生的一般适应模式(包括个人的、社会的和情绪的)。

朴素贝叶斯学习报告

本次报告主要学习一种基于贝叶斯定理的分类方法-朴素贝叶斯分类。从一般分类问题,及贝叶斯原理,引出朴素贝叶斯分类原理,然后探讨朴素贝叶斯在文本分类和情感分析领域的应用,最后做了基于朴素贝叶斯分类的处理情感分析的demo程序。 1 朴素贝叶斯分类简介 朴素贝叶斯分类是贝叶斯分类器的一种,贝叶斯分类算法是统计学的一种分类方法,利用概率统计知识进行分类,其分类原理就是利用贝叶斯公式根据某类别的先验概率和对象特征的在该类别下的条件概率计算出类别的后验概率(即该对象属于某一类的概率),然后选择具有最大后验概率的类作为该对象所属的类。 2 分类问题 我们可能每天都在依据分类特征进行形形色色的分类,比如把开豪车的人认为很有钱,把东大校园带眼镜的老头认为是教授等,用直白的话讲,就是将一些个体分到特定的类别中。那这个分类问题有没有一个逻辑上的定义呢? 从数学的角度来说,可以定义如下: 已知集合:C={y1,y2,…,y n}和 I={x1,x2,…,x m},确定映射规则y=f(x),使得任意x i∈I 有且仅有一个y i∈C使得y i=f(x i)成立。 其中C叫做类别集合,其中每一个元素是一个类别,而I叫做项集合,其中每一个元素是一个待分类项,f叫做分类器。分类算法的任务就是构造分类器f,使得待分类项可以按照分类器进行相应分类。 例如,医生对病人进行诊断就是一个典型的分类过程,任何一个医生都无法直接看到病人的病情,只能观察病人表现出的症状和各种化验检测数据来推断病情,这时医生就好比一个分类器,病人的病情状况根据医生来分类。 3 贝叶斯定理 因为朴素贝叶斯分类是基于贝叶斯定理,于是我们得先谈谈贝叶斯定理。该定理是关于随机事件A和B的条件概率的一则定理。 P(A|B)=P(B|A)P(A) P(B) 其中P(A|B)是在B发生的情况下A发生的可能性。 贝叶斯定理之所以有用,是因为我们在生活中经常遇到这种情况:我们可以很容易直接得出P(A|B),P(B|A)则很难直接得出,但我们更关心P(B|A),贝叶斯定理就为我们打通从P(A|B)获得P(B|A)的道路。这点很重要,朴素贝叶斯分类就是基于这个来判断数据所归属的类别。 4 朴素贝叶斯分类的原理 现在可以谈谈朴素贝叶斯分类了,它是一种十分简单的分类算法,叫它朴素贝叶斯分类是因为这种方法的思想真的很朴素,朴素贝叶斯的思想基础是这样的:对于给出的待分类项,

基于朴素贝叶斯的文本分类算法

基于朴素贝叶斯的文本分类算法 摘要:常用的文本分类方法有支持向量机、K-近邻算法和朴素贝叶斯。其中朴素贝叶斯具有容易实现,运行速度快的特点,被广泛使用。本文详细介绍了朴素贝叶斯的基本原理,讨论了两种常见模型:多项式模型(MM)和伯努利模型(BM),实现了可运行的代码,并进行了一些数据测试。 关键字:朴素贝叶斯;文本分类 Text Classification Algorithm Based on Naive Bayes Author: soulmachine Email:soulmachine@https://www.360docs.net/doc/e68218018.html, Blog:https://www.360docs.net/doc/e68218018.html, Abstract:Usually there are three methods for text classification: SVM、KNN and Na?ve Bayes. Na?ve Bayes is easy to implement and fast, so it is widely used. This article introduced the theory of Na?ve Bayes and discussed two popular models: multinomial model(MM) and Bernoulli model(BM) in details, implemented runnable code and performed some data tests. Keywords: na?ve bayes; text classification 第1章贝叶斯原理 1.1 贝叶斯公式 设A、B是两个事件,且P(A)>0,称 为在事件A发生的条件下事件B发生的条件概率。 乘法公式P(XYZ)=P(Z|XY)P(Y|X)P(X) 全概率公式P(X)=P(X|Y 1)+ P(X|Y 2 )+…+ P(X|Y n ) 贝叶斯公式 在此处,贝叶斯公式,我们要用到的是

文本情感分析

ISSN 1000-9825, CODEN RUXUEW E-mail: jos@https://www.360docs.net/doc/e68218018.html, Journal of Software, V ol.21, No.8, August 2010, pp.1834?1848 https://www.360docs.net/doc/e68218018.html, doi: 10.3724/SP.J.1001.2010.03832 Tel/Fax: +86-10-62562563 ? by Institute of Software,the Chinese Academy of Sciences. All rights reserved. 文本情感分析 ? 赵妍妍+ , 秦兵, 刘挺 (哈尔滨工业大学计算机科学与技术学院信息检索研究中心,黑龙江哈尔滨150001) Sentiment Analysis ZHAO Yan-Yan + , QIN Bing, LIU Ting (Center for Information Retrieval, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China) + Corresponding author: E-mail: yyzhao@https://www.360docs.net/doc/e68218018.html, Zhao YY, Qin B, Liu T. Sentiment analysis. Journal of Software, 2010,21(8):1834?1848.https://www.360docs.net/doc/e68218018.html,/ 1000-9825/3832.htm Abstract: This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field,making detailed comparison and analysis. Key words: sentiment analysis; sentiment extraction; sentiment classification; sentiment retrieval and summarization; evaluation; corpus 摘要: 对文本情感分析的研究现状与进展进行了总结.首先将文本情感分析归纳为3项主要任务,即情感信 息抽取、情感信息分类以及情感信息的检索与归纳,并对它们进行了细致的介绍和分析;进而介绍了文本情感分 析的国内外评测和资源建设情况;最后介绍了文本情感分析的应用.重在对文本情感分析研究的主流方法和前 沿进展进行概括、比较和分析. 关键词: 文本情感分析;情感信息抽取;情感信息分类;情感信息的检索与归纳;评测;资源建设 中图法分类号: TP391 文献标识码: A 随着Web2.0的蓬勃发展,互联网逐渐倡导“以用户为中心,用户参与”的开放式构架理念.互联网用户由单纯 的“读”网页,开始向“写”网页、“共同建设”互联网发展,并由被动地接收互联网信息向主

常见主流数据库的分类与详细比较

常见主流数据库分类 1、IBM 的DB2 DB2是IBM著名的关系型数据库产品,DB2系统在企业级的应用中十分广泛。截止2003年,全球财富500强(Fortune 500)中有415家使用DB2,全球财富100强(Fortune100)中有96家使用DB2,用户遍布各个行业。2004年IBM的DB2就获得相关专利239项,而Oracle 仅为99项。DB2目前支持从PC到UNIX,从中小型机到大型机,从IBM到非IBM(HP及SUN UNIX 系统等)的各种操作平台。 IBM绝对是数据库行业的巨人。1968年IBM在IBM 360计算机上研制成功了IMS这个业界第一个层次型数据库管理系统,也是层次型数据库中最为著名和最为典型的。1970年,IBM E.F.Codd发表了业界第一篇关于关系数据库理论的论文“A Relational Model of Data for Large Shared DataBanks”,首次提出了关系模型的概念。1974年,IBM Don Chamberlin和Ray Boyce通过System R项目的实践,发表了论文“SEQUEL:A Structured English Query Language”,我们现在熟知SQL就是基于它发展起来的。IBM 在1983年发布了DATABASE 2(DB2)for MVS(内部代号为“Eagle”),这就是著名的DB2数据库。2001年IBM以10亿美金收购了Informix的数据库业务,这次收购扩大了IBM分布式数据库业务。2006 DB2 9作为第三代数据库的革命性产品正式在全球发布。 作为关系数据库领域的开拓者和领航人,IBM在1977年完成了System R系统的原型,1980年开始提供集成的数据库服务器——System/38,随后是SQL/DSforVSE 和VM,其初始版本与SystemR研究原型密切相关。 DB2 forMVSV1 在1983年推出。该版本的目标是提供这一新方案所承诺的简单性,数据不相关性和用户生产率。1988年DB2 for MVS 提供了强大的在线事务处理(OLTP)支持,1989 年和1993 年分别以远程工作单元和分布式工作单元实现了分布式数据库支持。最近推出的DB2 Universal Database 6.1则是通用数据库的典范,是第一个具备网上功能的多媒体关系数据库管理系统,支持包括Linux在内的一系列平台。 2、Oracle Oracle 前身叫SDL,由Larry Ellison 和另两个编程人员在1977创办,他们开发了自己的拳头产品,在市场上大量销售,1979 年,Oracle公司引入了第一个商用SQL 关系数据库管理系统。Oracle公司是最早开发关系数据库的厂商之一,其产品支持最广泛的操作系统平台。目前Oracle关系数据库产品的市场占有率名列前茅。 Oracle公司是目前全球最大的数据库软件公司,也是近年业务增长极为迅速的软件提供与服务商。IDC(Internet Data Center)2007统计数据显示数据库市场总量份额如下:Oracle 44.1% IBM 21.3%Microsoft 18.3% Teradata 3.4% Sybase 3.4%。不过从使用情况看,BZ Research的2007年度数据库与数据存取的综合研究报告表明76.4%的公司使用了Microsoft