Generative model-based document clustering a comparative study

Generative model-based document clustering a comparative study
Generative model-based document clustering a comparative study

Semi-supervised Model-based Document Clustering:A

Comparative Study

Shi Zhong

Department of Computer Science and Engineering

Florida Atlantic University

Boca Raton,FL33431

Abstract

Semi-supervised learning has become an attractive methodology for improving classi?cation models and is often viewed as using unlabeled data to aid supervised learning.However,it

can also be viewed as using labeled data to help clustering,namely,semi-supervised clustering.

Viewing semi-supervised learning from a clustering angle is useful in practical situations when

the set of labels available in labeled data are not complete,i.e.,unlabeled data contain new

classes that are not present in labeled data.This paper analyzes several multinomial model-

based semi-supervised document clustering methods under a principled model-based clustering

framework.The framework naturally leads to a deterministic annealing extension of existing

semi-supervised clustering approaches.We compare three(slightly)di?erent semi-supervised

approaches for clustering documents:Seeded damnl,Constrained damnl,and Feedback-based

damnl,where damnl stands for multinomial model-based deterministic annealing algorithm.

The?rst two are extensions of the seeded k-means and constrained k-means algorithms studied

by Basu et al.(2002);the last one is motivated by Cohn et al.(2003).Through empirical

experiments on text datasets,we show that:(a)deterministic annealing can often signi?cantly

improve the performance of semi-supervised clustering;(b)the constrained approach is the best

when available labels are complete whereas the feedback-based approach excels when available

labels are incomplete.

Keywords:Semi-supervised Clustering,Seeded Clustering,Constrained Clustering,Clustering with Feedback,Model-based Clustering,Deterministic Annealing

1Introduction

Learning with both labeled and unlabeled data,also called semi-supervised learning or transductive learning,1has recently been studied by many researchers with great interest(Seeger,2001),mainly as a way of exploiting information in unlabeled data to enhance the performance of a classi?cation model(traditionally trained using only labeled data).A variety of semi-supervised algorithms have been proposed,including co-training(Blum&Mitchell,1998),transductive support vector machine(Joachims,1999),entropy minimization(Guerrero-Curieses&Cid-Sueiro,2000),semi-supervised EM(Nigam et al.,2000),graph-based approaches(Blum&Chawla,2001;Zhu et al., 2003),and clustering-based approaches(Zeng et al.,2003).They have been successfully used in

diverse applications such as text classi?cation or categorization,terrain classi?cation(Guerrero-Curieses&Cid-Sueiro,2000),gesture recognition(Wu&Huang,2000),and content-based image retrieval(Dong&Bhanu,2003).

Despite the empirical successes,negative results have been reported(Nigam,2001)and the usefulness of unlabeled data and transductive learning algorithms have not been appreciated by theoretical studies.For example,Castelli and Cover(1996)proved that unlabeled data are exponen-tially less valuable than labeled data in classi?cation problems,even though the proof comes with strong assumptions that the input distribution is known completely and that all class-conditional distributions can be learned from unlabeled data https://www.360docs.net/doc/c016805072.html,ing Fisher information matrices to mea-sure the asymptotic e?ciency of parameter estimation for classi?cation models,Zhang and Oles (2000)theoretically and experimentally questioned the usefulness and reliability of transductive SVMs for semi-supervised learning.Cozman et al.(2003)recently presented an asymptotic bias-variance analysis of semi-supervised learning with mixture models.They showed that unlabeled data will always help if the parametric model used is“correct”or has low bias,meaning that the true model is contained in the parametric model family.If the model is not“correct”,additional unlabeled data may hurt classi?cation performance.Their analysis is limited due to the assump-tions that all classes are available in the labeled data and the distributions for both labeled and unlabeled data are the same.

These results remind us to be careful in applying semi-supervised learning to real world appli-cations:In a text categorization task,we might not have all categories in the labeled data;in a network intrusion detection system,we will certainly encounter new attack types never seen before. Furthermore,new classes may follow di?erent models than those used to characterize labeled data. Since semi-supervised classi?cation models are not well positioned to detect new classes,we are motivated to look at semi-supervised clustering,which exploits labeled data to enhance cluster-ing results on unlabeled data.Semi-supervised clustering can be used to discover new classes in unlabeled data in addition to assigning appropriate unlabeled data instances to existing categories.

In semi-supervised clustering,labeled data can be used as initial seeds(Basu et al.,2002), constraints(Wagsta?et al.,2001),or feedback(Cohn et al.,2003).All these existing approaches are based on model-based clustering(Zhong&Ghosh,2003)where each cluster is represented by its“centroid”.2Seeded approaches use labeled data only to help initialize cluster centroids;Con-strained approaches keep the grouping of labeled data unchanged(as?xed constraints)throughout the clustering process;Feedback-based approaches?rst run a regular clustering process and then adjust resulting clusters based on labeled data.Basu et al.(2002)compared the?rst two ap-proaches on text documents based on spherical k-means(Dhillon&Modha,2001).They observed that the constrained version fares at least as well as the seeded version.

On the application side,much existing work focused on document classi?cation or clustering, which has become an increasingly important technique for(un)supervised document organization, automatic topic extraction,and fast information retrieval or?ltering.For example,a web search engine often returns thousands of pages in response to a broad query,making it di?cult for users to browse or to identify relevant information.Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories.Similarly,a large database of documents can be pre-classi?ed or pre-clustered to facilitate query processing by searching only the clusters that are close to the query(Jardine&van Rijsbergen,1971).In this paper,we focus on semi-supervised clustering of text documents for improving the organization of document collections based on a set of labeled documents.

This paper presents deterministic annealing(DA)extensions of the three semi-supervised clus-

tering methods—seeded clustering,constrained clustering,and feedback clustering—under a model-based clustering framework(Zhong&Ghosh,2003)and compares their performance on real text datasets with multinomial models.3Our experimental results show that deterministic annealing (Rose,1998)can signi?cantly boost the performance of semi-supervised clustering.

Our comparative study also shows that,for multinomial model-based clustering of documents, the constrained approach performs better than the other two when the classes in labeled data are complete,whereas the feedback-based approach is superior when the labeled data contain only a partial set of text categories.To our knowledge,the semi-supervised DA algorithms are new and the comparison between feedback-based approaches and the other two(seeded and constrained) has not been done before.

The organization of this paper is as follows.Section2brie?y describes multinomial model-based document clustering with deterministic annealing.Section3analyzes three enhanced semi-supervised clustering algorithms based on multinomial models and deterministic annealing.Sec-tion4presents a comparative study and discusses clustering results on several text datasets.Sec-tion5discusses some related work,followed by concluding remarks in Section6.

2Multinomial Model-based Deterministic Annealing

In Zhong and Ghosh(2003),we presented a uni?ed analysis of model-based partitional clustering from a DA point of view.In this section we shall?rst summarize model-based DA clustering and then describe multinomial models used for clustering documents.Some properties and results of multinomial model-based DA clustering will also be discussed.

2.1Model-based DA Clustering

In model-based clustering,one estimates K models(Λ={λ1,...,λK})from N data objects,with each model representing a cluster.Let X be a random variable characterizing the set of data objects,Y a random variable characterizing the set of cluster indices.Let the joint probability between X and Y be P(x,y).De?ning a prior entropy H(Y)as H(Y)=? y P(y)log P(y)and an average posterior entropy H(Y|X)as H(Y|X)=? x P(x) y P(y|x)log P(y|x),we aim to maximize the expected log-likelihood with entropy constraints

L=E P(x,y)[log P(x|λy)]+T·H(Y|X)?T·H(Y)

= x P(x) y P(y|x)log P(x|λy)?T·I(X;Y),(1)

where I(X;Y)the mutual information between X and Y.The parameter T is a Lagrange mul-tiplier used to trade o?between maximizing the expected log-likelihood E P(x,y)[log P(x|λy)]and minimizing the mutual information between X and Y(i.e.,compressing data X into clusters Y as much as possible).If we?x H(Y),minimizing I(X;Y)is equivalent to maximizing the aver-age posterior entropy H(Y|X),or maximizing the randomness of the data assignment.The prior P(x)is usually set to be constant1/N.As N goes to in?nity,the sample average approaches the expected log-likelihood asymptotically.

Maximizing(1)over P(y|x)andλy leads to a generic model-based partitional clustering algo-rithm(Zhong&Ghosh,2003)that iterates between the following two steps:

P(y|x)=

P(y)P(x|λy)1

y′P(y′)P(x|λy′)1 y′P(y′)P(x|λy′)β,β=1

T

T

;

?M-step:λy=arg maxλ x P(y|x)log P(x|λy).

3.Annealing:lower temperature by settingβ(new)=β(old)/α,go to step4ifβ>βf,otherwise

go back to step2.

4.For each data object x n,set y n=arg max y P(y|x n).

Figure1:Deterministic annealing algorithm for model-based clustering.

It can be shown(Zhong&Ghosh,2003)that setting T=0leads to a generic k-means clustering algorithm and setting T=1leads to the standard mixture-of-models EM clustering algorithm (Ban?eld&Raftery,1993).In other words,k-means and EM clustering correspond to two di?erent stages in the model-based DA clustering process(T=0and T=1,respectively).

2.2Multinomial Models

In Zhong and Ghosh(2005),we compared three di?erent probabilistic models for clustering text documents—multivariate Bernoulli,multinomial,and von Mises-Fisher(vMF)(Mardia,1975).The

Bernoulli model was found to be the least suitable model due to its limited binary representation of documents.For regular mixture-of-models clustering,vMF models slightly outperform multino-mial models.With the addition of deterministic annealing,however,multinomial models perform comparably with vMF models.Here is why we use multinomial models instead of vMF models underlying the spherical k-means algorithm(Banerjee et al.,2003):Even though the spherical k-means algorithm is simple and e?cient,the mixture-of-vMFs,a soft version of spherical k-means, involves Bessel function in its parameterization and requires intensive computation even for approx-imated parameter estimation(Banerjee et al.,2003).A deterministic annealing extension would be computationally even more complicated and mask the original purpose of this paper.A standard description of multinomial models is available in many statistics or probability books(e.g.,Stark &Woods,1994);here we brie?y discuss it in the context of clustering text documents.

A traditional vector space representation is used for text documents,i.e.,each document is represented as a high-dimensional vector of“word”counts in the document.The“word”here is used in a broad sense since it may represent individual words,stemmed words,tokenized words,or short phrases.The dimensionality of document vectors equals the vocabulary size.

Based on the na¨?ve Bayes assumption,a multinomial model for cluster y represents a document x by a multinomial distribution of the words in the vocabulary P(x|λy)= i P y(i)x(i),where x(i) is the i-th dimension of document vector x,indicating the number of occurrences of the i-th word in document x.To accommodate documents of di?erent lengths,we use a normalized(log-)likelihood measure

log?P(x|λy)=

1

j(1+ x P(y|x)x(j))=1+

x P(y|x)x(i)

N

x

1

N

x

log?P(x|λy(x)),(6)

where the?P notation is de?ned in(4).This provides another good reason for our choice of multinomial models in this paper.

2.3Multinomial Model-based DA Clustering(damnl)

Substituting the generic M-step in the model-based DA clustering(Figure1)with(5)gives a multinomial model-based DA clustering algorithm,abbreviated as damnl.The normalized log-likelihood measure(4)is used since it accommodates di?erent document lengths and leads to a stable annealing process in our experiments.

After some algebraic manipulation,the objective function of damnl can be written as

L=? x D KL(P x|P y(x))?T·I(X;Y)+ x H(P x),(7) where the last term is a constant.This leads to the following connection to the Information Bottleneck method.

Connection to Information Bottleneck

In this section we show that,when applied to clustering,the Information Bottleneck method(Tishby et al.,1999)can be seen as a special case of model-based DA clustering with the underlying probabilistic models being multinomial models.This was mentioned by Slonim and Weiss(2003) when they explored the relationship between maximum likelihood formulation and information bottleneck.

The IB method aims to minimize the objective function

F=I(X;Y)?βI(Z;Y)

=I(X;Y)+β(I(Z;X)?I(Z;Y))?βI(Z;X)

=I(X;Y)+βE P(x,y)[D KL(P(z|x)|P(z|y))]?βI(Z;X)(8) It trades o?between minimizing the mutual information between data X and compressed clusters Y and preserving the mutual information between Y and a third variable Z.Both X and Z are ?xed data but Y represents the cluster structure that one tries to?nd out.The last term in(8) can be treated as a constant w.r.t.to Y and thus the clustering algorithm.It is easy to see that minimizing(8)is equivalent to maximizing(7),withβbeing the inverse of temperature T and Z being a random variable representing the word dimension.This relationship was also noted in Banerjee et al.(2004).

2.4Annealing Results for damnl

To see the annealing e?ect of the damnl algorithm,we draw out the training curves(Figure2)for both the average(normalized)log-likelihood objective(6)and the average normalized entropy of the posterior distribution

1

ˉH

post=

a v e r a g e l o g ?l i k e l i h o o d

Figure 2:Training curves for,(a)average log-likelihood,and,(b)average (normalized)posterior entropy,for document clustering results using multinomial model-based deterministic annealing on the six datasets described in Table 1.

3Semi-supervised Model-based DA Clustering

In this section,we ?rst present three semi-supervised clustering algorithms under the generic model-based DA clustering framework (Section 2.1)and then discuss the weaknesses and strengths of each approach.

Figure 3shows the ?rst semi-supervised model-based clustering algorithm—seeded DA clus-tering.The clustering process di?ers from regular model-based DA clustering (Figure 1)only in Step 1.The basic idea is to use labeled data to help initialize model parameters:we partition data into a pre-speci?ed number of clusters while keeping data instances with same labels in the labeled data to be in the same cluster.

The second algorithm,constrained DA clustering,is shown in Figure 4.In addition to using labeled data to help initialization,this algorithm also constrains the assignment of labeled data instances to be hard in the E-step (Step 2a )—each labeled instance must stay with its label (initial cluster)with probability 1.This algorithm is basically the semi-supervised EM algorithm (Nigam et al.,2000;Dong &Bhanu,2003)wrapped by an annealing process,with the E-step parameterized by a gradually decreasing temperature.

Finally,the feedback DA clustering algorithm is shown in Figure 5.It is basically the combi-nation of the regular DA clustering algorithm (Figure 1)and a constrained clustering algorithm,with a feedback step (Step 2)in between.The basic idea is to ?rst treat all data as unlabeled and do a regular DA clustering,then take labeled data as feedback to adjust the cluster structure.We caution that there could be several possible heuristic designs.We choose the feedback step used here due to its simplicity since our main purpose is to demonstrate the usefulness of feedback methods,not to ?nd the best feedback strategy.For each class k in the labeled data,we ?nd the corresponding cluster that contains more instances of class k than any other clusters and put all labeled instances of class k into this cluster.In the case when the corresponding cluster is already occupied by another labeled class,we create a new cluster and put all labeled instances of class k into the new cluster.After the adjustment is done,we ?x the cluster labels for labeled data instances and run the constrained clustering algorithm (for which we use the algorithm in Figure 4

Algorithm:seeded DA clustering

Input:A set of N data objects containing N l labeled data objects X (l )={x 1,...,x N l },N l la-bels Y (l )={y (l )1,...,y (l )

N l },y (l )∈S l ?{1,...,K },and N u unlabeled data objects X (u )={x N l +1,...,x N };model structure Λ={λ1,...,λK };temperature decreasing rate α,0<α<1,and initial inverse temperature β0and ?nal βf .Output:Trained models Λand a partition of the data objects given by the cluster identity vector

Y ={y 1,...,y N },y n ∈{1,...,K }.Steps:

1.Initialization:randomly sample N u labels from {1,...,K }to form Y (u ),initialize Y =Y (l )∪Y (u ),train initial model parameters λk =arg max λ

n :y n =k log P (x n |λ),?k ∈{1,...,K },and set β=β0;

2.Optimization:optimize (1)by iterating between (2)and (3)until convergence;

3.Annealing:lower temperature by setting β(new )=β(old )/α,go to step 4if β>βf ,otherwise go back to step 2.

4.For each data object x n ,set y n =arg max y log P (x n |λy ).

Figure 3:Seeded DA clustering algorithm.

with a ?xed high β).

The three algorithms presented above are generic and can be used with di?erent models.Plug-ging multinomial models into the algorithms,we get three semi-supervised multinomial model-based DA clustering algorithms,abbreviated as Seeded damnl ,Constrained damnl ,and Feedback damnl ,respectively.For comparison,we also construct three regular semi-supervised clustering algorithm using mixture-of-multinomials at a ?xed low temperature (i.e.,high β).While k-means type algorithms (e.g.,seeded k-means,etc.)could be constructed,to reuse the code for the above three algorithms,we simply ?x βat 100in the algorithms and get three semi-supervised mixture-of-multinomials algorithms.They are named Seeded mixmnl ,Constrained mixmnl ,and Feedback mixmnl ,respectively.According to the results in Figure 2,setting β=100makes the posterior data assignment very close to k-means assignment since each data instance goes to its “closest”cluster with high probability.The Constrained mixmnl algorithm is used in Step 3of the Feedback damnl algorithm (Figure 5).

Intuitively,the Seeded damnl cannot take advantage of labeled data since the DA process starts the global search at a high temperature where each data instance goes to every cluster with equal probability,which means the initial partition constructed with the help of labeled data may not matter.In fact,one feature of deterministic annealing is its insensitivity to initialization (Rose,1998).However,starting the seeded approach directly at a low temperature might work since seeding clusters with labeled data can position the local search at a good starting point.The results shown in Basu et al.(2002)and our results in the next section support this analysis.While the DA technique cannot help the seeded approach,we expect that it will bene?t the constrained and feedback approaches.Considering the di?erence between the constrained approach and the feedback approach,we hypothesize that

?The constrained approach should perform better when the available labels are complete and the amount of labeled data is reasonable.Through the DA process,the available labels will guide/bias the search towards “correct”partitioning.

Input:A set of N data objects containing N l labeled data objects X (l )={x 1,...,x N l },N l la-bels Y (l )={y (l )1,...,y (l )

N l },y (l )∈S l ?{1,...,K },and N u unlabeled data objects X (u )={x N l +1,...,x N };model structure Λ={λ1,...,λK };temperature decreasing rate α,0<α<1,and initial inverse temperature β0and ?nal βf .Output:Trained models Λand a partition of the data objects given by the cluster identity vector

Y ={y 1,...,y N },y n ∈{1,...,K }.Steps:

1.Initialization:randomly sample N u labels from {1,...,K }to form Y (u ),initialize Y =Y (l )∪Y (u ),train initial model parameters λk =arg max λ

n :y n =k log P (x n |λ),?k ∈{1,...,K },and set β=β0;

2.Optimization:optimize (1)by iterating between the following two steps until convergence.

2a.For n ∈{1,...,N l },set P (y |x n )be 1if y =y (l )

n and 0otherwise;for n ∈{N l +1,...,N },

update P (y |x n )according to (2);2b.Update model parameters according to (3);

3.Annealing:lower temperature by setting β(new )=β(old )/α,go to step 4if β>βf ,otherwise go back to step 2.

4.For each data object x n ,set y n =arg max y log P (x n |λy ).

Figure 4:Constrained DA clustering algorithm.

?The feedback approach may be more appropriate when the available labels are incomplete since it starts without the bias of labeled data and is expected to cover all potential labels better than the constrained approach.

4

Experimental Results

4.1

Datasets

We used the 20-newsgroups data 4and several datasets from the CLUTO toolkit 5(Karypis,2002).These datasets provide a good representation of di?erent characteristics:the number of documents ranges from 414to 19949,the number of words from 6429to 43586,the number of classes from 4to 20,and balance from 0.046to 0.998.The balance of a dataset is de?ned as the ratio of the number of documents in the smallest class to the number of documents in the largest class.So a value close to 1indicates a very balanced dataset and a value close to 0signi?es the opposite.A summary of all the datasets used in this paper is shown in Table 1.

The NG20dataset is a collection of 20,000messages,collected from 20di?erent usenet news-groups,1,000messages from each.We preprocessed the raw dataset using the Bow toolkit (McCal-lum,1996),including chopping o?headers and removing stop words as well as words that occur in less than three documents.In the resulting dataset,each document is represented by a 43,586-dimensional sparse vector and there are a total of 19,949documents (after empty documents being removed).

Input:A set of N data objects containing N l labeled data objects X (l )={x 1,...,x N l },N l la-bels Y (l )={y (l )1,...,y (l )

N l },y (l )∈S l ?{1,...,K },and N u unlabeled data objects X (u )={x N l +1,...,x N };model structure Λ={λ1,...,λK };temperature decreasing rate α,0<α<1,and initial inverse temperature β0and ?nal βf .Output:Trained models Λand a partition of the data objects given by the cluster identity vector

Y ={y 1,...,y N },y n ∈{1,...,K }.Steps:

1.Run the model-based DA clustering algorithm (Figure 1)with random initialization;

2.Match available labels in labeled data with clusters from Step 1:

2a.Tag all clusters to be “unoccupied”;

2b.For k ∈S l ,let X (l )

k ={x (l )

n |y (l )

n =k }.Find cluster j that contains more data instances

in X (l )

k than any other cluster;2c.If cluster j is unoccupied,set y (l )

n =j,?x n ∈X (l )

k ;otherwise increase the number of

clusters by 1,K =K +1,and set y (l )n =K,?x n ∈X (l )

k ;3.Run the constrained clustering algorithm (e.g.,using the algorithm in Figure 4with a ?xed high β)using the Y l updated in Step 2.

Figure 5:Feedback DA clustering algorithm.

All the datasets associated with the CLUTO toolkit have already been preprocessed (Zhao &Karypis,2001)in approximately the same way 6and we further removed those words that appear in two or fewer documents.The classic dataset was obtained by combining the CACM,CISI,CRANFIELD,and MEDLINE abstracts that were used in the past to evaluate various information retrieval systems.7The ohscal dataset was from the OHSUMED collection (Hersh et al.,1994).It contains 11,162documents from the following ten categories:antibodies,carcinoma,DNA,in-vitro,molecular sequence data,pregnancy,prognosis,receptors,risk factors,and tomography.The k1b dataset is from the WebACE project (Han et al.,1998).Each document corresponds to a web page listed in the subject hierarchy of Yahoo!(https://www.360docs.net/doc/c016805072.html,).The tr11dataset is derived

Source

|V |

N c

NG2019949200.991classic 709440.323ohscal 11162100.437k1b 234060.043tr1141490.046la12

6279

6

0.289

from TREC collections(https://www.360docs.net/doc/c016805072.html,).

4.2Evaluation Criteria

Clustering evaluation criteria can be based on internal measures or external measures.An internal measure is often the same as the objective function that a clustering algorithm explicitly opti-mizes,in this paper,the average log-likelihood(6).For document clustering,external measures are more commonly used,since typically the benchmark documents’category labels are known (but of course not used in the clustering process).Examples of external measures include the confusion matrix,classi?cation accuracy,F1measure,average purity,average entropy,and mutual information(Ghosh,2003).

In the simplest scenario where the number of clusters equals the number of categories and their one-to-one correspondence can be established,any of these external measures can be fruitfully applied.However,when the number of clusters di?ers from the number of original classes,the confusion matrix is hard to read and the accuracy di?cult or impossible to calculate.It has been argued that the mutual information I(Y;?Y)between a random variable Y,governing the cluster labels,and a random variable?Y,governing the class labels,is a superior measure compared to purity or entropy(Strehl&Ghosh,2002;Dom,2001).Moreover,when normalized to lie in the range[0,1], this measure becomes relatively impartial to K.There are several choices for normalization based on the entropies H(Y)and H(?Y).We shall follow the de?nition of normalized mutual information

(NMI)using geometrical mean,NMI=I(Y;?Y)

H(Y)·H(?Y),as given in(Strehl&Ghosh,2002).In

practice,we use a sample estimate

NMI=

h,l n h,l log n·n h,l n l n l log n l

?KKZ:This scheme is adapted from the initialization method used in Katsavounidis et al.

(1994),which has been shown to be one of the best initialization methods for k-means clus-tering(He et al.,2004).Our modi?ed version works as follows:We initialize the?rst cluster model using the document that is most distant to the global multinomial model.For each subsequent model,we use the document vector that is most distant to its closest existing clus-ter.The distance to a cluster is measured by KL-divergence.This method is deterministic compared to the previous three.

We compared the four initialization schemes for mixmnl and damnl algorithms8and found that the best scheme is KKZ for mixmnl and Random for damnl.Most of the time,the Perturb and Marginal schemes are no better than the Random approach.Table2shows the running time for di?erent initialization schemes.The last row is the theoretical computational complexity.

Table2:Running time results for four di?erent initialization schemes on six datasets.d is the data dimensionality(i.e.,number of terms)and N nz the number of nonzeros in a term-document matrix.

Random Perturb Marginal KKZ

00.0180.4850.359

k1b

00.0490.910.998

ng20

00.048 1.2440.248

la12

Complexity O(N nz+Kd)

8Results on six datasets are not shown here for simplicity.Another reason is that here we are more interested in the results for semi-supervised algorithms.

9In the plots,the box has lines at the lower quartile(25%),median(50%),and upper quartile(75%)values.The whiskers are lines extending from each end of the box to show the extent of the rest of the data(5%and95%).

random KKZ

0.5

0.55

0.6

0.65

N M I

classic

k1b

NG20

N M I

ohscal

tr11la12

Figure 6:Comparing NMI results for two initialization schemes (Random and KKZ )for feedback mixmnl on six datasets.

?Complete labels —all document classes are randomly sampled.We construct a series of training datasets by randomly sampling (without replacement)2.5%,5%,7.5%,10%,20%,...,and 90%of all documents as the labeled set and the rest as the unlabeled set.The sampling is not strati?ed,i.e.,the percentage of documents in the labeled set may be di?erent for each class.?Incomplete labels —some classes are not available in the labeled set.We pick documents from only half of all classes.We ?rst randomly decide the half of all classes and then sam-ple (not-strati?ed,without replacement)2.5%,5%,7.5%,10%,20%,...,and 90%of the documents in the selected (half of all)classes as the labeled set.For each algorithm and each percentage setting,we repeat the random sampling process ten times and report the average and standard deviation of NMI values for clustering results.For better readability,we only show NMI results up to 60%in the ?gures in the next section.

random

KKZ

0.6

0.650.70.75

0.8N M I

classic

k1b

NG20

N M I

ohscal

tr11la12

Figure 7:Comparing NMI results for two initialization schemes (Random and KKZ )for feedback damnl on six datasets.

4.5Results Analysis

Complete Labels Scenario

Figures 9–12show the NMI results for the complete labels scenario.The ?rst three ?gures compare semi-supervised mixmnl and semi-supervised damnl algorithms for the seeded approach,the constrained approach,and the feedback approach,respectively.The results for the Seeded damnl algorithm are relatively ?at across di?erent amount of labeled data,con?rming previous discussion on the weak e?ect of labeled data on seeded DA clustering.In contrast,a clearer upward trend can be seen for the Seeded mixmnl algorithm as the fraction of labeled data increases.The Seeded mixmnl algorithm also performs signi?cantly better than Seeded damnl in most situations except for the tr11dataset and when the fraction of labeled data is low.The reason for the exception on tr11dataset may be that the tr11dataset is small and unbalanced and thus the amount of labeled data for some classes may be too small or even near empty,causing the Seeded mixmnl algorithm to get stuck in a bad local solution quickly.When the fraction of labeled data is small,the initial models are less accurate,causing Seeded mixmnl to converge to a worse solution than Seeded damnl .

Both constrained approaches and feedback approaches show clear bene?ts of having more labeled data:the NMI values increase as the percentage of labeled instances grows,as shown in Figure 10and 11.It is also evident that the DA versions perform at least as well as the mixmnl versions

and sometimes signi?cantly better.The only exception is for the constrained approach on the classic dataset and when the percentage of labeled data is10%or lower,where Constrained damnl performs signi?cantly worse than Constrained mixmnl.Upon more examination,we?nd that the DA version reaches better average log-likelihoods even as it gives lower NMI values for this case. We suspect that the DA process?nds a better local maxima of the objective function but the resulting partition does not conform to the given labels.This will be further investigated in our future work.

We compare the three semi-supervised approaches together in Figure12.For the seeded ap-proach,both Seeded mixmnl and Seeded damnl are selected;For the other two approaches,Con-strained damnl and Feedback damnl are selected since they outperform their mixmnl counterparts. The Constrained damnl algorithm is obviously the winner,producing signi?cantly higher NMI val-ues in most cases.The only exception is on the classic dataset with small amount of labeled data, as discussed above.Feedback damnl is overall better than Seeded mixmnl,with superior perfor-mance on three datasets(classic,k1b,and ng20)and mixed results on the remaining three datasets (ohscal,tr11,and la12).Except on tr11dataset,Seeded mixmnl is the second best algorithm when the fraction of labeled data is5%or higher.

Incomplete Labels Scenario

Figures13–16show the NMI results for the incomplete labels scenario.There is generally no strong upward trend except on ng20and tr11dataset and mainly for mixmnl approaches;Most other curves are relatively?at.This indicate that the bene?t of labeled data in the incomplete labels scenario is small.

The Seeded mixmnl algorithm outperforms Seeded damnl only on the classic dataset.When the fraction of labeled data is less than10%,Seeded damnl is always comparable to or signi?cantly better than Seeded mixmnl.This observation matches the one seen in the complete labels scenario.

Mixed results are observed when comparing Constrained damnl with Constrained mixmnl—the former wins on ng20and tr11datasets but loses on the others(Figure14).For feedback approaches, Feedback damnl fares at least as well as Feedback mixmnl and signi?cantly better on most datasets (classic,ng20,tr11,and la12),as shown in Figure15.

The three semi-supervised approaches are compared in Figure16.The Constrained mixmnl seems to deliver very similar performance to the seeded counterpart.This can be seen in Figure16, where the curves for Seeded mixmnl and Constrained mixmnl almost completely overlap.For the constrained approach,both Constrained mixmnl and Constrained damnl are selected because of the mixed results in Figure14;for the feedback approach,Feedback damnl is selected due to better performance than its mixmnl counterpart.Overall,Feedback damnl is the best algorithm and the only one that has consistent superior performance across all six datasets.The other algorithms usually perform comparably with Feedback damnl on two or three(out of?ve)datasets but signi?cantly worse on others.

In summary,the experimental results match favorably with the hypotheses discussed in Section3 and encourage us to further explore feedback-type approaches in real-world adaptive environments. Time Complexity

Some time results are given in Figure8for the largest dataset ng20.They are measured on a 2.4GHz Pentium-4PC running Windows XP,with the algorithms written in Matlab.The damnl algorithms are about four to ten times slower than the mixmnl ones.This could be improved,for

example,by using fewer iterations at each temperature.Such improvements will be investigated in the future.

Figure8:Running time results for ng20dataset.

While the time complexities of all mixmnl algorithms go down as the fraction of labeled data grows,Constrained damnl is the only one in the damnl category with this trend.This is not surprising since the mixmnl algorithms will likely converge faster for a higher fraction of labeled data.On the other hand,the Seeded damnl and Feedback damnl use labeled information only at the beginning or the end of a damnl process and thus cannot take advantage of faster mixmnl algorithms in the whole DA process.

5Related Work

The constrained approach studied in this paper is most related to the semi-supervised EM algorithm presented in Nigam et al.(2000),where the classes of unlabeled data are treated as missing data and labeled data used as E-step constraints in the EM algorithm.Nigam et al.(2000)presented the algorithm,however,in a semi-supervised classi?cation setting and assumed that the class categories in labeled data are complete.In contrast,Dong and Bhanu(2003)employed the semi-supervised EM algorithm in a clustering setting and extended it to include the cannot-be-in-certain-cluster constraints for image retrieval applications.

Wagsta?et al.(2001)proposed a constrained k-means algorithm to integrate background knowledge into the clustering process.They considered two types of constraints:must-link—two data instances must be in the same cluster,and cannot-link—two data instances cannot be in the same cluster.These constraints are di?erent from labeled data constraints and more similar to the constraints used in Dong and Bhanu(2003).However,the constrained k-means algorithm is heuristic and the EM approach adopted by Dong and Bhanu(2003)seems to be a more principled approach.

The must-link and cannot-link types of knowledge can also be used as feedback in an interactive clustering process(Cohn et al.,2003).Our feedback-based approach is di?erent in the type of feedback—we use the class labels in labeled data,instead of must-link and cannot-link pairs,as feedback constraints.Also,unlike our feedback-based approaches,Cohn et al.(2003)employed feedback information to adjust distance metrics used for clustering.Similar work on semi-supervised

clustering through learning distance metrics appeared in Chang and Yeung(2004)and in Basu et al. (2004).These methods mainly addressed the pairwise constraint-type feedback,which is sometimes argued to be a more realistic assumption than available cluster labels.Although we focused on labels,our feedback-based algorithm(Figure11)can be adapted to take into account pairwise constraints.This extension will be investigated in our future work.

Basu et al.(2002)compared seeded spherical k-means and constrained spherical k-means for clustering documents and showed that the constrained version performs better.Our results sup-ported the same conclusion,but for multinomial models and using deterministic annealing exten-sions.

Even in a semi-supervised classi?cation setting,clustering has been used to help increase the amount of labeled data.For example,Zeng et al.(2003)showed that the clustering-based strategy excels when the amount of labeled data is very small,according to experimental results on text classi?cation.

6Conclusion

We have presented deterministic annealing extensions of three semi-supervised clustering approaches based on a model-based DA clustering framework,with multinomial distributions representing doc-ument clusters.Experimental results on several text datasets show that:(a)The annealing process can often signi?cantly improve semi-supervised clustering results;(b)the constrained approach is superior when the available labels are complete while the feedback-based approach should be selected if the labels are incomplete.In real world applications(e.g.,dynamic web page cluster-ing,gene expression clustering,and intrusion detection),where new concepts/classes are likely not covered in a small set of labeled data,we expect the feedback-based approach to be a better choice.

We are aware that the speci?c seeded and feedback-based algorithms used in our experiments are heuristic and just one of many possible designs.They can be improved but we doubt that it will signi?cantly change the conclusion drawn in this paper.In the future,we plan to incorporate pairwise constraints(Wagsta?et al.,2001;Basu et al.,2002)into our feedback-based approaches.

In our experiments,all algorithms are batch methods.That is,model parameters are updated once for each iteration of going through the whole dataset.Online parameter update(on visit-ing each individual data instance)can be employed to improve performance and is useful in a data stream environment(Zhong,2005).For example,Nigam and Ghani(2000)reported that incremen-tal approaches work better than iterative(batch)approaches.Developing an incremental version of the algorithms studied in this paper seems to be a viable future direction.

Another interesting(and natural)direction is to automate the?nding of new classes in unla-beled data.Two existing methods are likely relevant—multi-clustering(Friedman et al.,2001)and conditional information bottleneck(Gondek&Hofmann,2003).The former extends the informa-tion bottleneck method to generate two or more clusterings of the same data that are as orthogonal as possible.The multi-clustering idea may be used in the semi-supervised setting such that one clustering is supervised by labeled data and others seek for new class labels.The conditional in-formation bottleneck targets directly at?nding clusters that have low mutual information with existing labels,thus it may be suitable for discovering new classes.

Acknowledgments

We thank David DeMaris and anonymous reviewers for helpful comments.

References

Banerjee,A.,Dhillon,I.,Ghosh,J.,&Merugu,S.(2004).An information theoretic analysis of maximum likelihood mixture estimation for exponential families.Proc.21st Int.Conf.Machine Learning(pp.57–64). Ban?,Canada.

Banerjee,A.,Dhillon,I.,Sra,S.,&Ghosh,J.(2003).Generative model-based clustering of directional data. Proc.9th ACM SIGKDD Int.Conf.Knowledge Discovery Data Mining(pp.19–28).

Ban?eld,J.D.,&Raftery,A.E.(1993).Model-based Gaussian and non-Gaussian clustering.Biometrics, 49,803–821.

Basu,S.,Banerjee,A.,&Mooney,R.(2002).Semi-supervised clustering by seeding.Proc.19th Int.Conf. Machine Learning(pp.19–26).Sydney,Australia.

Basu,S.,Bilenko,M.,&Mooney,R.J.(2004).A probabilistic framework for semi-supervised clustering. Proc.10th ACM SIGKDD Int.Conf.Knowledge Discovery Data Mining(pp.59–68).Seattle,WA.

Blum,A.,&Chawla,S.(2001).Learning from labeled and unlabeled data using graph mincuts.Proc.18th Int.Conf.Machine Learning(pp.19–26).

Blum,A.,&Mitchell,T.(1998).Combining labeled and unlabeled data with co-training.The11th Annual https://www.360docs.net/doc/c016805072.html,putational Learning Theory(pp.92–100).

Castelli,V.,&Cover,T.M.(1996).The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter.IEEE https://www.360docs.net/doc/c016805072.html,rm.Theory,42,2102–2117.

Chang,H.,&Yeung,D.-Y.(2004).Locally linear metric adaptation for semi-supervised clustering.Proc. 21st Int.Conf.Machine Learning(pp.153–160).Ban?,Canada.

Cohn,D.,Caruana,R.,&McCallum,A.(2003).Semi-supervised clustering with user feedback(Technical Report TR2003-1892).Cornell University.

Cozman,F.G.,Cohen,I.,&Cirelo,M.C.(2003).Semi-supervised learning of mixture models.Proc.20th Int.Conf.Machine Learning(pp.106–113).

Dempster,A.P.,Laird,N.M.,&Rubin,D.B.(1977).Maximum-likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society B,39,1–38.

Dhillon,I.,&Guan,Y.(2003).Information theoretic clustering of sparse co-occurrence data.Proc.IEEE Int.Conf.Data Mining(pp.517–520).Melbourne,FL.

Dhillon,I.S.,Mallela,S.,&Kumar,R.(2002).Enhanced word clustering for hierarchical text classi?cation. Proc.8th ACM SIGKDD Int.Conf.Knowledge Discovery Data Mining(pp.446–455).

Dhillon,I.S.,&Modha,D.S.(2001).Concept decompositions for large sparse text data using clustering. Machine Learning,42,143–175.

Dom,B.(2001).An information-theoretic external cluster-validity measure(Technical Report RJ10219). IBM.

Dong,A.,&Bhanu,B.(2003).A new semi-supervised em algorithm for image retrieval.Proc.IEEE Int. https://www.360docs.net/doc/c016805072.html,puter Vision Pattern Recognition(pp.662–667).Madison,MI.

Friedman,N.,Mosenzon,O.,Slonim,N.,&Tishby,N.(2001).Multivariate information bottleneck.Proc. 17th Conf.Uncertainty in Arti?cial Intelligence.

Ghosh,J.(2003).Scalable clustering.In N.Ye(Ed.),Handbook of data mining,341–https://www.360docs.net/doc/c016805072.html,wrence Erlbaum Assoc.

Gondek,D.,&Hofmann,T.(2003).Conditional information bottleneck clustering.IEEE Int.Conf.Data Mining Workshop on Clustering Large Datasets(pp.36–42).Melbourne,FL.

Guerrero-Curieses,A.,&Cid-Sueiro,J.(2000).An entropy minimization principle for semi-supervised terrain classi?cation.Proc.IEEE Int.Conf.Image Processing(pp.312–315).

Han,E.H.,Boley,D.,Gini,M.,Gross,R.,Hastings,K.,Karypis,G.,Kumar,V.,Mobasher,B.,&Moore, J.(1998).WebACE:A web agent for document categorization and exploration.Proc.2nd Int.Conf. Autonomous Agents(pp.408–415).

He,J.,Lan,M.,Tan,C.-L.,Sung,S.-Y.,&Low,H.-B.(2004).Initialization of cluster re?nement algorithms: A review and comparative study.Proc.IEEE Int.Joint Conf.Neural Networks(pp.297–302).

Hersh,W.,Buckley,C.,Leone,T.J.,&Hickam,D.(1994).OHSUMED:An interactive retrieval evaluation and new large test collection for research.Proc.ACM SIGIR(pp.192–201).

Jardine,N.,&van Rijsbergen,C.J.(1971).The use of hierarchical clustering in information retrieval. Information Storage and Retrieval,7,217–240.

Joachims,T.(1999).Transductive inference for text classi?cation using support vector machines.Proc.16th Int.Conf.Machine Learning(pp.200–209).

Karypis,G.(2002).CLUTO-a clustering toolkit.Dept.of Computer Science,University of Minnesota. https://www.360docs.net/doc/c016805072.html,/~karypis/cluto/.

Katsavounidis,I.,Kuo,C.,&Zhang,Z.(1994).A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters,1,144–146.

Mardia,K.V.(1975).Statistics of directional data.J.Royal Statistical Society.Series B(Methodological), 37,349–393.

McCallum,A.K.(1996).Bow:A toolkit for statistical language modeling,text retrieval,classi?cation and clustering.Available at https://www.360docs.net/doc/c016805072.html,/~mccallum/bow.

Meila,M.,&Heckerman,D.(2001).An experimental comparison of model-based clustering methods. Machine Learning,42,9–29.

Nigam,K.(2001).Using unlabeled data to improve text classi?cation.Doctoral dissertation,School of Computer Science,Carnegie Mellon University.

Nigam,K.,&Ghani,R.(2000).Understanding the behavior of co-training.KDD Workshop on Text Mining. Nigam,K.,Mccallum,A.,Thrun,S.,&Mitchell,T.(2000).Text classi?cation from labeled and unlabeled documents using EM.Machine Learning,39,103–134.

Rose,K.(1998).Deterministic annealing for clustering,compression,classi?cation,regression,and related optimization problems.Proc.IEEE,86,2210–2239.

Seeger,M.(2001).Learning with labeled and unlabeled data(Technical Report).Institute for Adaptive and Neural Computation,University of Edinburgh.

Slonim,N.,&Weiss,Y.(2003).Maximum likelihood and the information bottleneck.Advances in Neural Information Processing Systems15(pp.335–342).Cambridge,MA:MIT Press.

Stark,H.,&Woods,J.W.(1994).Probability,random processes,and estimation theory for engineers. Englewood Cli?s,New Jersey:Prentice Hall.

Strehl,A.,&Ghosh,J.(2002).Cluster ensembles—a knowledge reuse framework for combining partitions. Journal of Machine Learning Research,3,583–617.

Tishby,N.,Pereira,F.C.,&Bialek,W.(1999).The information bottleneck method.Proc.37th Annual Allerton https://www.360docs.net/doc/c016805072.html,munication,Control and Computing(pp.368–377).

Wagsta?,K.,Cardie,C.,Rogers,S.,&Schroedl,S.(2001).Constrained k-means clustering with background knowledge.Proc.18th Int.Conf.Machine Learning(pp.577–584).

Wu,Y.,&Huang,T.S.(2000).Self-supervised learning for visual tracking and recognition of human hand. Proc.17th National Conference on Arti?cial Intelligence(pp.243–248).

Zeng,H.-J.,Wang,X.-H.,Chen,Z.,Lu,H.,&Ma,W.-Y.(2003).Cbc:Clustering based text classi?cation requiring minimal labeled data.Proc.IEEE Int.Conf.Data Mining(pp.443–450).Melbourne,FL.

Zhang,T.,&Oles,F.(2000).A probabilistic analysis on the value of unlabeled data for classi?cation problems.Proc.17th Int.Conf.Machine Learning(pp.1191–1198).

Zhao,Y.,&Karypis,G.(2001).Criterion functions for document clustering:experiments and analysis (Technical Report#01-40).Department of Computer Science,University of Minnesota.

Zhong,S.(2005).E?cient online spherical k-means clustering.Proc.IEEE Int.Joint Conf.Neural Networks. Montreal,Canada.To appear.

Zhong,S.,&Ghosh,J.(2003).A uni?ed framework for model-based clustering.Journal of Machine Learning Research,4,1001–1037.

Zhong,S.,&Ghosh,J.(2005).Generative model-based document clustering:A comparative study. Knowledge and Information Systems:An International Journal.In press,published online?rst at https://www.360docs.net/doc/c016805072.html,.

Zhu,X.,Ghahramani,Z.,&La?erty,J.(2003).Semi-supervised learning using gaussian?elds and harmonic functions.Proc.20th Int.Conf.Machine Learning(pp.912–919).Washington DC.

职工信息管理系统程序设计

.. 引言 通过大一下学期对C语言的学习,了解到了很多C语言的相关知识。学习的过程有很多困惑但是当自己能够独立的看懂,能够独立的完成一个简单的程序时,心中就会收获无限的喜悦和成就感。我可以看懂一些简单的程序,编写一些简单的程序,更多的是学会了一种思想——编程,它让我去思考很多日常生活中的事物是怎样通过一个个小小的函数实现其功能的,激发我探究的兴趣,更让我认真学习C语言的程序设计。 C语言是在国内外广泛使用的一种计算机语言。C语言简洁紧凑、使用灵活方便、运算符丰富、适用范围大、可移植性好。它既具有高级语言的特点,又具有汇编语言的特点。它可以作为系统设计语言,编写工作系统应用程序,也可以作为应用程序设计语言,编写不依赖计算机硬件的应用程序。 在这次的课程设计中我将通过亲自设计程序,让自己熟悉C语言操作,更熟练的掌握C语句。初步体会编程的过程,在不断的调试中获得

最为完整的程序,为将来的程序深入学习打下基础和培养兴趣。 1 功能简介和设计要求 1.1 程序功能简介 可以向文件中录入、删除、添加、查询职工信息,也可以从文件中导出来浏览 1.2 程序设计要求 职工信息包括职工号、姓名、性别、年龄、学历、工资、住址、电话等(职工号不重复)。试设计职工信息管理系统,使之能提供以下功能: 系统以菜单方式工作 职工信息录入功能(职工信息用文件保存)--输入 职工信息浏览功能--输出 查询信息功能:(至少一种查询方式) --算法 按职工号查询 按学历查询 按电话查询 职工信息删除、添加功能

2 程序总体设计框图 :用键盘输入职工信息 :将信息写入指定文本文件 :将信息导出放在结构数组em 中 :将所有信息显示在屏幕上 :输入职工号显示信息 :输入名字显示信息 :输入学历显示信息 :删除原有的职工信息 :添加新的职工信息 3 主要函数介绍 主函数:main() 显示系统工作菜单,罗列该系统所有功能。先声明所有将会调用到的函数名。再运用选择函数switch 即可根据使用者所输入的功能代号进入对应的功能程序。亮点:定义一个全局变量*p 和全局变量a。其中

职工信息管理系统课程设计报告(定版)

面向对象课程设计报告课程设计题目:职工信息管理系统 姓名: 班级: 学号: 指导老师: 2011年11月8日

目录 摘要 (3) 第一章绪论 (4) 1.1面向对象C++语言程序设计 (4) 1.2职工信息管理系统 (4) 1.3程序编译环境 (4) 第二章职工信息管理系统需求分析 (5) 2.1编写目的 (5) 2.2需求概述 (5) 2.3需求说明 (6) 第三章:详细设计 (7) 3.1编写目的 (7) 3.2详细设计 (10) 第四章:源程序编码及实现 (11) 4.1程序源代码 (16) 4.2程序实现结果 (17) 第五章:系统测试 (18) 第六章:结束语 (21) 参考文献: (21)

摘要 在当今社会,互联网空间的发展,给人们的工作和生活带来了极大的便利和高效,信息化,电子化已经成为节约运营成本,提高工作效率的首选。当前大量企业的职工管理尚处于手工作业阶段,不但效率低下,还常常因为管理的不慎而出现纰漏。随着信息技术在管理上越来越深入而广泛的应用,管理信息系统的实施在技术上已逐步成熟。管理信息系统是一个不断发展的新型学科,任何一个单位要生存要发展,要高效率地把内部活动有机地组织起来,就必须建立与自身特点相适应的管理信息系统。 本程序是一个关于职工信息管理的系统,通过这个系统用户可以简捷、方便的对职工信息进行增加、修改、添加、查询、浏览等功能,它不仅可以帮助企业单位达到员工管理办公自动化、节约管理成本、更能达到提高企业单位工作效率的目的。 关键词:职工,信息管理,系统,程序 一.绪论

1.1面向对象C++语言程序设计 C++是种面向对象的程序设计语言,它是在C语言基础上发展起来的。虽然它不是最早的面向对象的程序设计语言,但是它是目前使出比较广泛的面向对象的程序设计语言。 什么是面向对象?简单地说,它和面向过程—样都是软件开发的一种方法。但是它与面向过程不同,面向对象是—种运用对象、类、继承、封装、聚合、消息传递、多态性等概念来构造系统的软件开发方法。 1.2职工信息管理系统 在当今信息技术高速发展的时代,企业单位迫切需要这样一个系统:它能高效的管理企业或单位内部所有员工的个人信息,并能正确快速的对系统的使用者的操作做出回应,以提高效率,降低成本。该系统能够满足以上的要求,使用户可以快速准确的管理员工的信息。 1.3程序编译环境 使用Visual C++ 6.0集成环境来编辑、编译并运行程序。Visual C++ 不仅仅是一个编译器。它是一个全面的应用程序开发环境,使用它你充分利用具有面向对象特性的C++ 来开发出专业级的Windows 应用程序。为了能充分利用这些特性,你必须理解C++ 程序设计语言。

职工信息管理系统(C语言)

课程设计(论文) 题目名称职工信息管理系统 课程名称C语言程序课程设计 学生姓名刘丹 学号1241302028 系、专业信息工程系、计算机科学与技术专业指导教师黄磊 2013年6月 6 日

目录 1 前言 (1) 2 需求分析 (1) 2.1 课程设计目的 (1) 2.2 课程设计任务 (2) 2.3 设计环境 (2) 2.4 开发语言 (2) 3 分析和设计 (3) 3.1 模块设计 (4) 3.2 系统流程图 (4) 3.3 主要模块的流程图 (5) 4 具体代码实现 (7) 5 课程设计总结 (9) 5.1程序运行结果/ 预期运行结果 (9) 5.2 课程设计体会 (13) 参考文献 (14) 致谢 (14)

1 前言 编写一个程序来处理职工信息管理系统。通过结构体数组来存放输入的每一位职工的记录(包括工号、姓名、性别、年龄、学历、工资、住址、电话等),然后将其录入的职工信息以文件形式保存。然后输入名字、工号、学历查询该同学的信息,并且对其进行浏览、查询、修改、删除等基本操作,建立职工信息管理的文件。 2 需求分析 1、程序结构合理 2、界面比较美观 3、最好使用结构、体指针 4、输入时有提示,输出美观,整齐 职工信息由工号、姓名、性别、年龄、学历、工资、住址、电话等构成。 功能要求: (1)系统以菜单方式工作 (2)职工信息录入功能(职工信息用文件保存)——输入 (3)职工信息浏览功能——输出 (4)查询和排序功能:(至少一种查询方式)——算法 (5)按职工号查询 (6)按学历查询等 (7)职工信息删除、添加功能 2.1 课程设计目的 学生在教师指导下运用所学课程的知识来研究、解决一些具有一定综合性问题的专业课题。通过课程设计(论文),将课本上的理论知识和实际有机结合起

#员工管理信息系统的设计与实现

计算机科学和工程学院 课程设计报告 题目全称:员工管理信息系统的设计和实现—岗位和薪金信息管理 学生学号:2606005011姓名:李伟德 指导老师:刘勇国职称:副教授 指导老师评语: 签字: 课程设计成绩: 设计过程表现设计报告质量总分 一、实验室名称:计算机学院软件实验室 二、实验项目名称:员工管理信息系统的设计和实现—岗位和薪 金信息管理 三、实验学时:32 四、实验原理: 员工管理信息系统是由员工管理,部门管理,岗位管理以及薪金管理四部分组成。系统前台采用Visual Stdio 2005 工具开发而成,开发语言是C#程序设计语言,主要是因为C#是微软为.NET平台量身定做的编程语言,它是一种现代面向对象程序设计语言,使程序员能够快速地在.NET平台上开发种类丰富的使用程序,它继承了C++和Java的语法,去掉了C++中的许多复杂和容易引起问题的东西,是由C和C++发展而来的一种“简单、高效、面向对象、类型安全”的程序设计语言,其综合了Visual Basic的高效率和C++的强大功能。 系统后台的数据库采用Miscrosoft Access 2003数据库,主要依据是考虑到系统的数据规模并不大,如果用SQL Server 2005等数据库会造成浪费,而且维护起来比较难。而Access数据库是一个轻量级的数据库,其具有简单,方便的特性,已经满足我们的需求。 五、实验目的: 1.使学生掌握数据库的实现原理,了解SQL的查询命令,并能在实践中使用。

2.使学生学会使用C#语言进行程序设计,了解Vistual Stdio 2005 的开发工具的原理, 并设计出实际可行的项目。 3.加强学生的动手能力,把课堂上学到得东西,融入到实际的项目,达到学以致用的目的。 4.锻炼学生的思维能力,使学生能够领略计算机编程的实现方法,达到举一反三的效果。 六、实验内容: 在员工信息管理系统中完成“岗位”和“薪金”信息管理功能。 岗位信息管理功能包括: 1. 添加岗位:可以添加岗位名称,岗位描述等信息。 2. 删除岗位:可以删除岗位名称,岗位描述等信息。 3. 修改岗位:可以修改指定岗位的岗位名称,岗位描述等信息。 4. 查询岗位:可以查询指定岗位的岗位名称,岗位描述等信息。 薪金信息管理功能包括: 1. 添加员工薪金信息:可以添加员工姓名,月份,备注,薪金等信息。 2. 删除员工薪金信息:可以删除指定员工的姓名,月份,备注,薪金等信息。 3. 修改员工薪金信息:可以修改指定员工的姓名,月份,备注,薪金等信息。 4. 查询员工薪金信息:可以查询指定员工的薪金等信息。 七、实验器材(设备、元器件): 1.一台Windows XP平台或以上的PC机; 2.Vistual Stdio 2005开发软件及Microsoft ACCESS2003数据库软件; 八、实验步骤: 1、设计系统结构组成 系统提供了一套员工综合信息管理平台,使得系统管理人员对公司的岗位进行分类,进而确定各个岗位所对应的部门信息,在已有部门信息的基础上能够对所有员工信息进行分类管理。主要功能有:岗位设置、员工个人信息管理、员工所属部门信息管理、员工薪金信息管理。 系统模块设计划分如下: 员工薪金信息模块:可以删除、添加、修改和查询员工薪金信息; 岗位设置模块:可以删除、添加、修改和查询岗位; 它们之间既是相互联系同时又是彼此独立的,整个框架结构如图1所示。

职工信息管理系统

职工信息管理系统 LG GROUP system office room 【LGA16H-LGYY-LGUA8Q8-LGA162】

信息科学与技术学院 程序设计基础课程设计报告 题目名称:职工信息管理系统 学生姓名:董吉华 学号:189 专业班级:电子信息工程1班 指导教师:郭理 2017年 12月 30日 目录 一.课程设计题目与要求 (3) 设计题目 (3) 设计要求 (3) 二.总体设计 (4) 总体功能框架 (4) 数据结构概要设计 (5) 三.详细设计 (6) 数据结构详细设计 (6) 系统功能详细设计 (7)

主函数 (7) 主界面函数 (9) 输入函数 (11) 输出函数 (12) 查找函数 (14) 排序函数 (16) 删除或修改函数 (18) 结束函数 (20) 四.运行结果 (21) 主界面 (21) 主菜单界面 (23) 录入职工信息界面 (24) 五.课程设计总结 (34) 编程中的问题及解决方法 (34) 小结 (34) 心得体会 (34) 程序设计方法 (35) 参考文献 (35) 《职工信息管理系统》 一.课程设计题目与要求 设计题目 职工信息管理系统

设计要求 职工信息包括职工号、姓名、性别、年龄、学历、工资、住址、电话等(职工号不重复)。试设计一职工信息管理系统,使之能提供以下功能:系统以菜单方式工作 (2)职工信息录入功能(职工信息用文件保存) (3)职工信息浏览功能 (4)查询和排序功能:(至少一种查询方式) 按工资查询 按学历查询等 (5)职工信息删除、修改功能 二.总体设计 总体功能框架 实现航班信息的输入,航班信息的输出,航班信息的查找,订票系统,退票系统功能 三.

一个项目设计:职工信息管理系统

#include #include #include #include #include #define N 100 struct employee//职工基本情况 { int num; //工号 char name[10]; //姓名 int sex; //性别 int position; //职位 int age; //年龄 int cult; //学历 int salary; //工资 int state; //健康情况 long tel; //联系电话 char adr[50]; //住址 }em[N]; int num[N]={0}; struct employee newem; void mainmenu(); //主菜单 void input(); //输入模块 void display(); //显示模块 void del(); //删除模块 void add(); //添加模块 void count(); //统计模块 void change(); //修改模块 int changeposition(); //修改职位 int changecult(); //修改学历 int changesalary(); //修改工资 int changestate(); //修改身体状况void changmany(); //修改多项信息void print(); //打印函数 void select(); //查询模块 void numselect(); //按工号查询void nameselect(); //按姓名查询

员工信息管理系统需求分析报告

1 引言 1.1 背景 随着社会的发展,人类科技文明的进步,企业为人类生活所创造的财富是巨大的,企业在社会经济所起到的重要作用更无法估量的。并且随着我国与国际上先进的现代化企业的接轨,如:合资,独资企业的不断涌现,新型企业内部对其自身现代化信息管理的水准的要求也在不断提升。因此,不同的企业都需要有适合自己管理规范标准的企业“员工管理系统” ,从而达到提高企业的管理水平、提高经济效益为社会、为人类服务的目的。另外,事业单位拥有“员工管理系统”可以科学、全面、高效进行人事管理水平。因此,针对事业单位所开发的“员工管理系统” ,也可以是功能全面地实用的“人事管理系统”。 1.2 目的 学习使用Java设计与开发“员工信息管理系统”,能把多所学到的Java6 技术、数据库技术更好的进行融合,让学生在Eclipse 开发平台上进行一次有意义的实战开发演戏。在此系统的设计过程中,学生可以充分展示个人的发散思维以及小组集体的创造力,从而达到开发别具风格与特色的“员工管理系统”。使学生在此综合实训过程中达到学会学习软件设计的目的,达到培养自身综合素质的能力。 为下一阶段的学习,也为走向社会工作岗位奠定良好的基础。 1.3 意义 编写此篇文档的主要意义是让使用该系统的人可以清晰地明白该系统的主要功能,使用户可以合理的应用该系统,减少由于用户的不当操做给该系统所带来的危害。 1.4 参考文献 《Java 学习笔记》编著:林信良出版社:清华大学出版社 《软件需求工程》编著:毋国庆、梁正平、袁梦霆、李勇华出版社:机械工业出版社

2项目概述 2.1总体功能描述 员工信息管理系统是对员工信息的管理,其中包括对新员工信息的录入, 对在职员工信息进行修改,删除,查询。整个项目大致划分为增加员工基本 信息,修改员工基本信息,删除员工信息,查询员工基本信息这四大模块, 也是整个项目的核心。 功能模块 2.2用户特点 员工工资管理系统面向企业,属于企业信息管理的一部分。操作本软件 的工作人员只需具备基本的计算机知识,而系统的维护人员需要具备Eclipse 和数据库的相关知识。 2.3假定和约束 本程序在开发的过程中,分为技术实现和软件工程两大部分。两部分都 有侧重点,若技术支持出现故障或疑难问题无法解决、程序开发出现偏差, 会延误工程进度,影响工程的按期完成。若软件工程陈述出现问题,部分描 述含糊不清,则会影响系统的完整性与可继承性。在管理方面,如管理者没 有预见性,对出现的问题无法提出可行的解决手段,都会影响开发模块之间 的互动,从而影响工程的顺利开展,导致工程无法按期开工。 管理各部门及员工 个人信息 请假 查看考勤 支岀报销 工资发放 员工信息维护 奖罚管理 请假考勤管理 数据维护 系统维护

企业职工信息管理系统

企业职工信息管理系统 EWIMSystem(Enterprise Workers Information Manager System) 目录 第一章绪论 (3) 1.1 相关背景 (3) 1.2 开发目的 (3) 1.3 论文内容 (3) 1.4 意义 (3) 1.5 分工 (4) 第二章系统需求分析 (5) 2.1系统功能需求分析 (5) 2.2辅助功能需求分析 (6) 2.2.1打印报表.............................................................................. 错误!未定义书签。 2.2.2修改密码 (6) 2.3 软件的运行环境 (6) 2.3.1 硬件平台 (6) 2.3.2 软件平台 (6) 2.3.3 开发环境 (6) 第三章系统功能设计与实现 (7) 3.1 系统目标设计 (7) 3.2 数据库分析与设计 (9) 3.2.1数据库表设计 (9) 3.2.2数据库表关系图 (12) 3.3 系统功能概要设计 (13) 3.4 系统功能详细设计 (14)

企业职工信息管理系统 摘要 随着科技的不断发展,企业的不断壮大,传统的企业人事管理主要以人工为主,人工管理既费力、费时,又容易出现错误,严重制约了企业员工管理的实施,目前人工管理已不能满足市场的需要,所以建立现代化的智能化的企业职工信息管理系统势在必行。这样可以提高企业的管理效率,同时减轻了人事部门的工作量,使原本复杂和枯燥无味的工作变得简单而轻松。 企业职工信息管理系统是一个基于C/S模式的管理系统。 关键字:企业职工信息管理系统,C/S模式

职工信息管理系统

职工信息管理系统 1.可行性分析 在当今社会,互联网的发展,给人们的工作和生活带来了极大的便利和高效,信息化,电子化已经成为节约运营成本,提高工作效率的首选。 当前大量企业的员工管理尚处于手工作业阶段,不但效率低下,还常常因为管理的不慎而出现纰漏。因此部分企业需求,设计企业员工信息管理系统,以帮助企业达到员工管理办公自动化、节约管理成本、提高企业工作效率的目的。员工信息管理系统主要对企业员工的信息进行集中管理,方便企业建立一个完善的、强大的员工信息数据库,它是以SQL2000数据库作为开发平台,使用java编写程序、完成数据输入、修改、存储、调用查询等功能。并使用SQL 2000数据库形成数据,进行数据存储。本项目开发计划旨在明确规范开发过程,保证项目质量,统一小组成员对项目的理解,并对其开发工作提供指导;同时还作为项目通过评审的依据。并说明该软件开发项目的实现在技术上、经济上和社会因素上的可行性,评述为了合理地达到开发目标可供选择的各种可能实施方案,说明并论证所选定实施方案的理由。 1.1 技术可行性 根据用户提出的系统功能、性能及实现系统的各项约束条件,根据新系统目标,来衡量所需技术是否具备。本系统主要采用数据库管理方法,服务器选用MySQL Server 数据库,他是它是目前能处理所有中小型系统最方便的流行数据库,它能够处理大量数据,同时保持数据的完整性并提供许多高级管理功能。它的灵活性、安全性和易用性为数据库编程提供了良好的条件。硬件方面,该系统短小精悍对赢家没有太大要求,只要能够运行windows操作系统就可以很好的运行该软件。 1.2操作可行性 由系统分系可以看出本系统的开发在技术上具有可行性。首先系统对于服务器端和客户端所要求的软、硬件的最低配置现在大多数的用户用机都能达到。本系统对管理人员和用户没有任何的特殊要求,实际操作基本上以鼠标操作为主并辅以少量的键盘操作,操作方式很方便。因此该项目具有良好的易用性。用户只要具备简单的应用计算机的能力无论学历,无论背景,均可以使用本系统,用户界面上的按钮的功能明确,用户一看就可以了解怎么使用本系统,以及本系统能够完成的功能,因此本系统在操作上是可行的。 1.3经济可行性 估算新系统的成本效益分析,其中包括估计项目开发的成本,开发费用和今后的运行、维护费用,估计新系统将获得的效益,估算开发成本是否回高于项目预期的全部经费。并且,分析系统开发是否会对其它产品或利润带来一定影响。本系统作为一个课程设计,没有必要考虑维护费用,以及本系统可获得的效益等问题。 1.4法律及社会效益方面的可行性

c程序设计报告职工信息管理系统

职工信息管理系统 题目要求 (2) 设计目的 (2) 总体设计 (2) 详细设计 (2) 调试与测试 (12) 源程序 (14) 总结 (27)

职工信息管理程序 一.题目要求 1.问题描述: 设计一个系统来管理职工的信息。职工信息包括职工号、姓名、性别、年龄、学历、工资、住址、电话等(职工号不重复) 2.要求: ?系统需要提供一下功能: –1、以菜单方式工作 –2、职工信息浏览功能 –3、职工信息查询功能 ?查询方式:按学历查询和按职工号查询 –4、职工信息删除 –5、职工信息修改 –6、职工信息的输入 –职工信息存储在文本中。. 二.设计目的

根据题目要求,由于职工信息是存放在文件中,所以应提供文件的输入,输出等操作;在程序中需要浏览职工的信息,应提供显示,查找,排序等操作;另外还应提供键盘式选择菜单实现功能选择. 三.总体设计 根据上面的需求分析,可以将这个系统分为以下模块:输入模块,修改模块,删除模块,查找模块,显示模块. 1、职工信息管理系统 1、1输入信息 1、2查询信息 1、3删除信息 1、4修改信息 四.详细设计 1.主函数: 主函数一般设计得比较简单,只提供输入,处理和输出部分的函数调用,其中各功能模块用菜单方式选择. menu(); int a; char b; printf("选择要进行的项目\n"); scanf("%d",&a); exa: switch(a){ case 1: printf("输入职工信息\n");printf("\n"); input();

break; case 2: printf("浏览职工信息\n");printf("\n"); display(); break; case 3: printf("查询职工信息\n");printf("\n"); search(); break; case 4: printf("修改职工信息\n"); printf("\n"); xiugai(); break; case 5: printf("删除职工信息\n");printf("\n"); del(); break; /* case 6: printf("添加职工信息\n"); printf("\n"); add(); break; */ case 6:exit(0);break; default :break; }

职工信息管理系统

职工信息管理系统

职工信息管理系统 摘要 随着计算机的飞速发展,它的应用已经十分广泛,它在人们的生产、生活、工作和学习中发挥着重要的作用。例如一个现代化的公司,拥有数千名的员工,那么如何管理这么庞大的职工信息档案呢?这时,开发一个功能完善的职工信息管理系统就必不可少了。本文介绍了在https://www.360docs.net/doc/c016805072.html,框架下采用“自上而下地总体规划,自下而上地应用开发”的策略开发本系统的详细过程,提出了实现职工信息、部门信息查询、管理、更新的基本目标并阐述系统结构设计和功能设计从软件工程的角度进行了科学而严谨的阐述。从职工信息的查询到管理实现了自动化的模式,从而提高了工作效率。 本系统采用了B/S模式的结构设计,为企业的人事部门提供了一套操作简易、应用广泛、扩展性强的人事管理系统。可以对企业内部的员工管理更加方便。这种采用网络管理的好处是可以对企业的众多

员工的信息进行动态的管理,修改、添加和删除都非常方便,不必再像原来准备一个巨大的档案库,在诸多文挡中查找资料,减少了这些重要工作出错的可能性。 本文通过作者设计和开发一个中小型职工信息管理系统的实践,阐述了人事管理软件中所应具有的功能及其设计与实现。主要有以下三个方面的功能:1.职工和部门信息的查询;2.职工和部门信息的管理(包括添加、删除和修改)3.评出每个月工作成绩最优秀的职工。 关键词:职工信息管理,https://www.360docs.net/doc/c016805072.html,,B/S

Abstract With the development at full speed of computer, its application is very extensively, and it is giving play to the important effect in the production, life, work and study of people. Does a such as modernized company possess the staff of several thousand, and how manages so huge staff and workers' information archives like that? At this moment, the staff and workers' information management system that to develop the function perfect was indispensable. The tactics that this text, article, etc. introduced at https://www.360docs.net/doc/c016805072.html, and adopts under the frame " the development is applied in the comprehensive planning from top to bottom from bottom to top " are developed the detailed course of this system, and put forward the basic objective to realize that staff and workers' information and department information are inquired about, are managed and are renewed and expounding system structural design from the angle of software engineering having carried on expounding of science and

职工信息管理系统设计(C语言)

程序设计课程设计报告 职工信息管理系统设计专业 计算机科学与技术 (软件工程(NIIT)) 学生姓名 班级 学号 指导教师 完成日期2011年7月

目录 1课程设计目的 (1) 2课程设计内容 (1) 3设计流程图 (1) 4源程序清单 (2) 5小结 (21)

。 职工信息管理系统的设计 1课程设计目的 1.加深对《C语言程序设计》课程知识的理解,掌握C语言应用程序的开发方法和步骤; 2.进一步掌握和利用C语言进行程序设计的能力; 3.进一步理解和运用结构化程序设计的思想和方法; 4.初步掌握开发一个小型实用系统的基本方法; 5.学会跳是一个较长程序的基本方法; 6.学会利用流程图火N-S图表示算法; 7.掌握书写长须设计开发文档的能力(书写课程设计报告) 2 课程设计内容 设计职工信息管理系统,要求职工心想包括职工号、姓名、性别、年龄、学历、工资、住址、电话等(职工号不重复)。设计一职工信息管理系统,使之能提供以下功能: (6)系统以菜单方式工作 (7)职工信息录入功能(职工信息用文件保存)--输入 (8)职工信息浏览功能--输出 (9)查询和排序功能(至少一种查询方式)--算法 (10)按工资查询 (11)按学历查询等 (12)职工信息删除、修改功能(任选项) 3设计流程图

. . .

。 (1)菜单函数 void menu() /*菜单函数*/ { printf(" ☆☆☆计算机科学与技术软件技术☆☆☆\n"); printf("\n"); printf(" ∮1010704422 杨婷婷∮\n"); printf("\n"); printf(" ******************职工信息管理****************\n"); printf(" 1.录入职工信息"); printf(" 2.浏览职工信息\n"); printf(" 3.查询职工信息"); printf(" 4.删除职工信息\n"); printf(" 5.添加职工信息"); printf(" 6.修改职工信息\n"); printf(" 7.退出\n"); printf(" ********************谢谢使用******************\n"); printf("\n"); printf("\n"); } (2)录入职工信息 void append() { if((fp=fopen("worker.xls","a"))==NULL) { printf("\n不能打开该文件!"); exit(); } printf("\n请输入添加职工信息(姓名、职工号、性别、年龄、学历、职位、工资、电话、地址)\n"); scanf("%s%s%s%s%s%s%s%s%s",https://www.360docs.net/doc/c016805072.html,,one.num,one.sex,one.age,one.record,one.positio n,one.wanges,one.tel,one.addr); fprintf(fp,"%-10s%-8s%-5s%-5s%-10s%-8s%-8s%-10s%-15s\n",https://www.360docs.net/doc/c016805072.html,,one.num,one.sex,o ne.age,one.record,one.position,one.wanges,one.tel,one.addr); fclose(fp); } (3)查询职工信息数据查找共分为职工号、姓名等方式查找 void search() { int l; printf("\t\t\t\t*姓名查找按:1*\n"); printf("\t\t\t\t*学历查找按:2*\n"); printf("\t\t\t\t*职工号查按:3*\n"); scanf("%d",&l); if(l>0&&l<4) {switch(l) {

人力资源管理系统需求分析报告

《人力资源管理系统》需求分析报告 1.需求获取及分析 1.1业务需求 随着计算机技术,网络技术和信息技术的发展,现在的办公系统更加趋于系统化,科学化和网络化,网络办公自动化系统是计算机技术和网络迅速发展的一个办公应用解决方案,它的的主要目的是实现信息的交流和信息共享,提供协同工作的手段,本系统对公司的人力资源进行管理,为人力资源管理人员提供一套简单的操作,使用可靠,界面友好,易于管理和使用的处理工具,对人力资源各种数据进行统一管理,避免数据存取,数据处理的重复,提高工作效率,减少数据处理的复杂性。 1.2用户需求 人力资源管理系统在企业中起着通行桥梁的作用,通过与其它的各个管理系统模块的信息连接,将整个企业有机、高效地带动起来,使得企业各个方面的工作因人力资源管理系统的高效、简便而更加顺利。 企业方面: 可以有效的进行对职工信息管理;增加、删除、修改员工信息;薪金发放;考勤以及招聘等工作。 职工方面: 每个职工都可以对自己的信息进行查看,查询薪金发放情况以及职称评比情况。 1.3功能需求 本系统的实现的功能主要划分为:

A.信息输入模块 B.用户查询模块 C.系统维护模块 D.系统输出显 示模块 E.考勤模块 F.招聘模块 本系统是一个集多项功能于一身的集成应用系统,用户只有按照提示信息,使用鼠标和键盘录入相应的信息内容即可完成所需的功能。本系统所有的提示信息均为中文显示,以方便用户的使用。 系统的主要功能有: A.信息输入模块:包括普通职工用户和管理员用户通过网络的输入登录条件和查询条件等操作。 B.查询模块:根据普通职工用户输入的查询条件(职工号)对系统所保存的记录内容(工资和奖金等)进行查询检索。 C.系统维护模块:包括对维护人员的个人信息密码的修改、数据库的及时更新,添加和过期记录增删以及对本系统的使用访问情况查询统计等维护工作。 D.系统输出显示模块:对查询到的数据集进行显示反馈,并多大量的数据进行分页显示;对普通职工用户输入的查询条件进行过滤判断,对错误的条件,进行反馈提示;同时对管理员的所有维护操作的成功与否进行反馈。 E.考勤模块:对员工上下班打卡情况以及出勤情况进行记录,汇总。 F.招聘模块:显示招聘信息并进行及时的更新。 1.4非功能需求 1.系统必须严格按照设定的安全权限机制运行,并有效防止非授权用户进入本系统

(完整版)C语言程序设计-职工信息管理系统[1]

C语言课程设计

C语言课程设计任务书 一、题目:职工信息管理系统 二、目的和要求 目的:要求熟练掌握C语言的基本知识和编辑技能; 基本掌握结构化程序设计的基本思路和方法。 要求:设计一个职工信息管理系统,使之能提供以下功能: 1、应提供一个界面来调用各个功能,调用界面和各个功能的操作界面 应尽可能清晰美观! 2、输入功能:职工信息录入(职工信息用文件保存),可以一次完成 若干条记录的输入。 3、浏览功能:完成对全部职工信息的显示。 4、查找功能:①完成按职工的职工号查询职工的相关信息,并显示。 ②完成按职工的学历查询职工的相关信息,并显示。 ③完成按职工的电话号码查询职工的相关信息,并显 示。 5、删除功能:通过输入职工的姓名完成对该名职工的信息进行删除。 6、添加功能:完成添加新职工的信息的任务。 7、修改功能:通过输入职工的姓名完成对该名职工的信息进行修改。 8、退出职工信息管理系统。 三、信息描述 职工信息包括职工号、姓名、性别、年龄、学历、工资、住址、电话等。 四、解决方案 1、首先进行需求分析,搞清楚系统功能和任务; 2、然后在总体设计中确定模块结构、划分功能模块,将软件功能需求分配给所划分的最单元模块。确定模块间的联系,确定数据结构、文件结构、数据库模式,确定测试方法与策略; 3、在详细设计中,为每个模块确定采用的算法,选择适当的工具表达算法的过程(流程图)来描述模块的详细过程。确定每一模块采用的数据结构和模块接口的细节,包括对系统外部的接口和用户界面,对系统内部其他模块的接口; 4、根据分析编写C语言代码。

五、进度安排 课程设计时间为两周,分为五个阶段完成: 1、分析设计阶段。在老师的指导下自主学习和钻研问题,明确设计要求, 找出现实方法。按照需求分析、总体设计、详细设计几个步骤进行。这一 阶段前1-2天完成; 2、编码调试阶段。根据设计分析方案编写C语言代码,然后调试该代码, 实现课题要求的功能。这一阶段在3-7天完成; 3、总结报告阶段。总结设计工作,写出课程设计说明书,包括需求分析、 总体设计、详细设计、编码、测试的步骤和内容。这一阶段在8-9天完成; 4、考核阶段。 六、写课程设计总结 课程设计报告要求 总结报告包括需求分析、总体设计、详细设计、编码(详细写出编程步骤)、测试的步骤和内容、课程设计总结和参考资料等。 七、参考资料 《C程序设计》(第三版)谭浩强著清华大学出版社 《C程序设计题解与上机指导》(第三版)谭浩强编著清华大学出版社

公司业务管理系统报告

目录 一、概述 §1.1 项目背景 §1.2 项目目的 二、需求分析 §2.1 业务描述 §2.2功能需求分析 §2.2.1 基本单元管理 §2.2.1.1 员工信息管理 §2.2.1.2 部门信息管理 §2.2.1.3 申请类型管理 §2.2.1.4 申请状态管理 §2.2.1.5 文化程度管理 §2.2.1.6 婚姻状况管理 §2.2.2 操作员工管理 §2.2.3 申请信息管理 §2.3 性能需求 §2.3.1 硬件要求 §2.3.2 软件要求 三、系统功能模块划分 §3.1系统模块设计 四、.数据库设计 §4.1实体和部分属性图(E-R图) §4.2数据库关系设计 五、详细设计 §5.1登录模块设计 §5.2主界面模块设计 §5.3基础单元设置模块设计 §5.4操作员工模块设计 §5.5申请信息管理模块设计

§5.6系统维护模块设计 §5.7打印模块设计 六、总结 七、问题汇总 八、参考文献 九、使用说明书

一、概述 §1.1 项目背景 某公司进行工作业务管理电子化,该公司有多个部门,每个部门有多个人员。有许多业务要进行审核、批准、督办、检查的工作。为了对每个流程进行监督而设计本系统。 §1.2 项目目的 公司业务管理系统是对其公司的业务信息进行管理,它主要功能包括基础单元管理、操作员工管理、申请管理等。基础单元管理包括员工信息管理、部门管理、申请类型管理、申请状态管理、文化程度管理、以及婚姻状况管理等信息管理;申请管理包抱申请信息的添加(提交)、删除、查询、审批、打印、导出等操作。 二、需求分析 §2.1 业务描述 某公司员工可以通过“增加申请”来提交各种申请信息,如请假或者购买设备等等。然后需要经过一级审批和二级审批。其中二级审批是最后的审核过程。 一级审批和二级审批是通过权力值的大小来分配审核权的。 §2.2 功能需求分析 §2.2.1基本单元管理 §2.2.1.1 员工信息管理 ●添加员工信息 ●修改员工信息 ●删除员工信息 ●查询员工信息 ●员工信息预览,打印和导出

企业员工管理系统

公司职工管理系统 设计说明书 院系:信息工程学院 班级:计科11网络普 学号姓名:

引言 在人才过剩的今天,企业对于有用人才的需求逐步加大,企业内部的人事变动和部门规划也开始加速,传统的人事档案已经不能满足各个企业的人员流动速度,迫使人们起用新的管理方法来管理员工的相关信息。科学技术日新月异的进步,让人类生活发生了巨大的变化,计算机技术的飞速发展,使各行各业在计算机技术应用方面得到了广泛的普及和使用。信息化时代的到来成为不可抗拒的潮流,人类文明正在进入一个崭新的时代。因此,员工管理系统也以方便、快捷、费用低、绿色环保的优点正慢慢地进入各个行业和领域,将传统的员工管理方式彻底的解脱出来,提高效率,减轻工人人员以往繁忙的工作,加速信息的更新速度,使企业管理层第一时间了解到员工的信息,从而进一步的进行对员工的调配。 同时,为了适应现代企业或公司经营发展的需要,为提高企业工作效率、保证企业职工信息管理质量、快而准确地为企业制定好的经营方针与决策,我们有必要开发一个职工信息管理系统。

目录 引言 (1) 一、需求分析 (3) 1.1编写目的 (3) 1.2背景 (3) 1.3定义 (3) 1.4参考资料 (5) 二、程序系统的结构 (5) 2.1系统结构图 (5) 2.2系统流程图 (7) 2.3数据库E-R图 (7) 二、模块设计 (9) 3.1系统登录界面 (9) 3.2系统主界面 (10) 3.3系统管理子界面 (10) 3.4员工基本信息管理界面 (11) 3.5基本信息查询界面 (11) 3.6左边小工具修改密码界面 (12) 3.7部门信息查询界面 (12) 四、设计总结 (13)

企业员工信息管理系统

本科毕业设计说明书 企业员工信息管理系统的设计与实现EMPLOYEE INFORMATION MANAGEMENT SYSTEM DESIGN AND IMPLEMENTATION 学院(部): 专业班级: 学生姓名: 指导教师: 2012年5月25日

企业员工信息管理系统的设计与实现 摘要 现今互联网发展越来越迅速,给人们的工作和生活带来了极大的便利和高效,信息化,电子化已经成为节约运营成本,提高工作效率的首选。因此在信息化科技飞速发展的今天,借助于电脑,通过员工信息管理系统管理各部门职工,能为企业人力资源的管理者提供人性化的服务。同时也能为企业的员工提供一定的方便。 本系统具有多方面特点:系统功能完备,用户使用方便简捷,人性化的用户界面,安全保密设置齐全,大大减少了操作人员和用户的工作负担,提高了企业员工信息管理的工作效率和企业的信息化的水平。 本论文从员工信息管理系统的初步调查开始,详细介绍员工信息管系统的需求分析和数据流程分析,并进行了系统总体结构设计、数据结构和数据库设计、输入输出设计等。 关键词:J2EE,Mysql,struts2,企业员工信息管理

EMPLOYEE INFORMATION MANAGEMENT SYSTEM DESIGN AND IMPLEMENTATION ABSTRACT Nowadays, the Internet development is fast, bringing people's work and life tremendous convenience with efficiently.Therefore, the rapid development of technology of information technology today, through the use of computers, employee information management system to manage the various departments and workers, to provide personalized service for corporate human resources managers.Also provides a convenience for the employees of the enterprise. This system has a various characteristics:The system function is complete, using conveniently, the customer interface humanization, the safety keeps secret a constitution well-found, reduced an operation the work of the personnel and customer burden consumedly.Raise the work efficiency of the business enterprise information management and the information-based level of the business enterprise. Papers from personnel management information system, the preliminary survey began detailed introduction of human resource management information system requirements analysis, and data flow analysis, and a system overall structure design, data structure and database design, input/output design, etc. KEYWORDS:J2EE, Mysql,struts2,Employee information management

相关文档
最新文档