The ESO Nearby Abell Cluster Survey. XII. The mass and mass-to-light-ratio profiles of rich

合集下载

2023届无锡市侨谊实验中学高三第三次模拟考试英语试卷含解析

2023年高考英语模拟试卷注意事项：1．答卷前，考生务必将自己的姓名、准考证号填写在答题卡上。

2．回答选择题时，选出每小题答案后，用铅笔把答题卡上对应题目的答案标号涂黑，如需改动，用橡皮擦干净后，再选涂其它答案标号。

回答非选择题时，将答案写在答题卡上，写在本试卷上无效。

3．考试结束后，将本试卷和答题卡一并交回。

第一部分（共20小题，每小题1.5分，满分30分）1．（2018·海淀二模）This view is common _________ all sections of the community.A．across B．aboveC．around D．along2．Acceptance is not about liking a situation. It is about acknowledging all that has been lost and ________ to live with that loss.A．learning B．learnedC．to learn D．having learned3．In spite of _________ has been said, quite a lot of people are still uncertain about the state of the country’s economy. A．what B．whichC．that D．as4．---________ I say something to you? You were really, really something back there. Incredible!---Are you talking to ---me? Whoa!A．Must B．Can C．Need D．Should5．A new airport may be constructed in Nantong, ____ the pace of economic growth will be accelerated.A．in which case B．in that case C．in what case D．in whose case6．—I'm sorry. I shouldn't have been so rude to you.—You ________ something not very nice to me, but that's OK.A．have said B．had said C．were saying D．did say7．Before you hand in your final report, ________ there are no spelling mistakes.A．make sure B．to make sureC．made sure D．making sure8．Looking people in the eye ______ sometimes make them nervous and embarrassed.A．must B．canC．should D．might9．Who ______ the fight against the H1N1 flu ______ it not been for the Chinese scientists’ great efforts?A．could have won; had B．would win; had C．would have won; has D．could win; has10．Air pollution, together with littering, ____ many problems in our large industrial cities today.A．causes B．cause C．caused D．causing11．If you were to have a few minutes free, I______ the opportunity to ask you one more question.A．would appreciate B．would have appreciatedC．were to appreciate D．had appreciated12．–My TV set doesn’t work, the water tap is dripping and my car is still under repair!–You sure ________ bad luck these days.A．had B．will have C．had had D．are having13．________ in the last examination, she was more confident of another success in the coming one.A．To succeed B．To have succeeded C．Having succeeded D．succeeding14．I broke my relationship with Peter because he always found _______ with me.A．error B．failureC．mistake D．fault15．There is ________ as a free dinner in this world. As the saying goes, “No pains, no gains.”A．no such a good thing B．such no good thingC．no such good thing D．not so a good thing16．—How long do you think it will be________the Stock Market returns to normal?—At least one year, I guess.A．when B．before C．until D．that17．Her doctor indicated that even adding a(n) _____ amount of daily exercise would dramatically improve her health. A．modest B．equalC．exact D．considerable18．---Are you satisfied with his school report?---Not at all. It couldn’t have been ______.A．worse B．so badC．better D．the worst19．—Sometimes I am even sleepy in class.—It’s terrible. You _____ as well go to bed earlier.A．should B．willC．might D．can20．— Can you do me a favour, Mr. Smith? My car ______ start.— No problem. Y ou can count on it.A．won’t B．mustn’tC．needn’t D．shouldn’t第二部分阅读理解（满分40分）阅读下列短文，从每题所给的A、B、C、D四个选项中，选出最佳选项。

Extreme Ultraviolet Emission from Abell 4059

a r X i v :a s t r o -p h /0011274v 1 14 N o v 2000EXTREME ULTRA VIOLET EMISSION FROM ABELL 4059Thomas W.Bergh¨o fer,1,2Stuart Bowyer,1and Eric Korpela 1Space Sciences Laboratory,University of California,Berkeley,CA 94720-7450,USAABSTRACTWe present the results of a search for ExtremeUltraviolet emission in A4059,a cluster with an X-ray emitting cluster gas.Our analysis of Extreme Ultraviolet Explorer (EUVE)Deep Survey obser-vations of this cluster shows that it is associated with diﬀuse EUV emission.Outside the central 2arcmin radius the entire EUV emission detected is explained by the low energy tail of the X-ray emit-ting gas.Within the central 2arcmin region of the cluster we ﬁnd a deﬁcit of EUV emission com-pared to that expected from the X-ray gas.This ﬂux deﬁcit is discussed in the context of the clus-ter’s cooling ﬂow.The results derived for A4059are compared to EUVE results obtained for other clusters such as Coma,Virgo,A1795,and A2199.As part of the study we have carried out a detailed investigation of the stability of the EUVE Deep Survey detector background.Based on long inte-grations of blank sky over 27months we disprove claims of substantial time dependent changes in the detector background by R.Lieu and cowork-ers.We also show,contrary to the claim of R.Lieu and coworkers,that the images obtained with the detector are independent of the pulse height threshold of the detector over a substantial range of threshold settings.1.IntroductionA4059at ∼300Mpc (z =0.0487)is well stud-ied in the X-ray,radio,and optical.It is rela-tively compact and is classiﬁed as richness class l (Schwartz et al.,1991and references therein).It is dominated by a central CD galaxy,ESO 349-G010,and contains the radio galaxy PKS 2354-35(Tay-lor,Barton,&Gee,l994).Taylor et al.(1994)2.Data and Data AnalysisA4059was observed on two separate occasions. Theﬁrst observation was carried out in June1998and provided39,373s of data;the second obser-vation was made in November1998and provided 93,473s of data.During the observation in June1998the cluster center was placed near the bore-sight of the Deep Survey Telescope about two ar-cmin away from the known dead spot of the de-tector.In order to avoid the dead spot and to mini-mize other detector position dependent eﬀects on the detected EUV emission the second observa-tion of A4059in November1998was carried outat two detector oﬀset positions.During theﬁrst part of the second observation,48,856s of datawas obtained≈10′to the right of the optical axis;the cluster was then observed for another44,617s ≈10′to the left of the optical axis.A comparison of the cluster’s radial EUV proﬁles obtained dur-ing the diﬀerent EUVE observations shows thatthe eﬀect of the dead spot during the June1998observation is small and not visible within the sta-tistical error bars.The reduction of the data was carried out with the EUV package built in IRAF.We in-tended to employ the analysis methods described in Bowyer,Bergh¨o fer&Korpela(1999)and Bergh¨o fer,Bowyer&Korpela(2000)in this work. However,several aspects of these procedures have been questioned by Lieu et al.(1999).We note that the criticisms of Lieu et al.were made as declarative statements,and were unsupported by any analysis.Nonetheless,we have examined their points in detail.In speciﬁc,Lieu et al.claim that the character of the images obtained with the Deep Survey Telescope are dependent upon the lower level pulse height cutoﬀemployed,and that the cutoﬀthreshold for valid data varies substan-tially over the face of the detector.In addition, these authors claim that the background for obser-vations with the Deep Survey Telescope must be taken nearly simultaneously with the observation itself since this background varies over time.To examine the claim that the pulse-heightthreshold level setting varies widely over the face of the detector,weﬁrst assembled363Ks of data from a variety of blankﬁelds.We then examined the eﬀect of varying the lower level threshold on the summed image of theseﬁelds.In speciﬁc,we constructed an image taken with a lower level set-ting of2700(a detector control number that is lin-early related to the voltage of the pulse-height cut-oﬀsetting).We then constructed an image taken with a lower level cutoﬀof1280.We compared these two images using a cross correlation analy-sis and found the two had a correlation coeﬃcient of0.97.Hence the use of a lower level threshold cutoﬀthat varies by well over a factor of two has essentially no eﬀect on theﬁnal images and us-ing any threshold within this range provides valid data.We next studied the claim that the background data must be taken almost simultaneously with data from theﬁeld to be studied because the sta-bility of the Deep Survey Telescope detector sen-sitivity function varies over time.We found that virtually every blankﬁeld observation made with the EUVE Deep Survey Telescope is wellﬁt with a two-parameter proﬁle.One proﬁle is aﬂat back-ground that varies from observation to observa-tion(though typically by less than a factor of two).This reﬂects the diﬀerent charged particle backgrounds encountered at diﬀerent times.Such background changes are well known from other de-tectors in space-borne instruments.We note,for example,that there are several types of slowly varying background in the ROSAT PSPC data that are routinely accounted for in the analysis of data obtained with that instrument.The second proﬁle needed to parameterize a blankﬁeld is the detector sensitivity variation over the face of the detector.This proﬁle accounts for telescope vignetting,obscuration by support structures in the detector window,and other fac-tors.To study the stability of this parameter,we added a number of blankﬁelds,individually ob-tained over a period of27months,to obtain a to-tal data set of363Ks of blank sky.We subtracted the appropriateﬂat background determined from highly obscured regions at the outer most parts of theﬁeld of view near theﬁlter frames from each of theseﬁelds.We then convolved the data with a32 pixel wide Gaussian corresponding to the instru-ment resolution.We show the result in Figure1a.A previous assemblage of425Ks of data obtained from a diﬀerent set of blankﬁelds and processed in the same manor is shown as Figure1of Bowyer et al.(1999)and is reproduced here as Figure1b.We compared these two data sets and found these two images are correlated at the97%level over the active area of the Deep Survey Telescope detector.Given that the statistical errors in each element in each data set are about1.5%,this is consistent with there being no diﬀerence betweenthe images.We also employed a separate test to check the similarity of these two images.We com-puted the reducedχ2of theﬁt of one image to the other and obtained a result of1.05,again indicat-ing that the two data sets are identical to within the statistical uncertainties.Because this proﬁle is stable there is no need to obtain a contemporaneous background for a particular individual observation.By combining a number of blankﬁeld observations,this eﬀective detector sensitivity proﬁle can be established to any level of desired statistical certainty.We note that a similar type of correction for the variation of the detector sensitivity over the ﬁeld of view is routinely used to correct ROSAT PSPC observations where the detector sensitiv-ity variations in that instrument are incorporated into an eﬀective area exposure map(Snowden et al 1994).A similar procedure is also routinely used in ground based observations of astronomicalﬁelds with CCD detectors where the procedure is desig-nated as“ﬂatﬁelding”.In consideration of this demonstration of the validity of our analysis procedure,we began our analysis of the EUV data on A4059.Weﬁrst ex-amined the pulse-height spectrum of the detected events to exclude background events with pulse-heights signiﬁcantly larger or lower than the source events.Events with low pulse-heights are primar-ily noise events,and events with very large pulse heights are primarily cosmic ray events.The pho-ton events are concentrated in a Gaussian proﬁle on top of a noise background that rises exponen-tially towards lower pulse-heights.A demonstra-tion of this eﬀect is shown in Figure2.Here we show a smoothed pulse height distribution(PHD) of data from a77Ks blankﬁeld observation.The dashed line shows the PHD of the non-photonic background for data obtained from highly ob-scured regions of the detector at the outermost parts of theﬁeld of view.The sold line represents the PHD of events obtained over the entire por-tion of the detector exposed to the celestialsky.Fig.1.—The EUVE Deep Survey Telescope De-tector Sensitivity over the face of the detector. Figure1a shows363Ks of data obtained from a number of blankﬁeld observations over a period of27months.Figure1b shows a separate assem-blage of425Ks obtained from a diﬀerent set of blankﬁelds processed in the same manner.Both plots provide contour levels of95%,90%,85%, 75%,and65%.The two data sets are identical to within the statistical errors.The curves have been normalized to the detector area of the samples.The diﬀerence between the two curves represents the PHD of photon events. In the case of the June1998A4059data set we ex-cluded events with pulse heights below2,500and above12,500.For the November observation we excluded events below2,500and above13,500.Corrections for the detector dead time and telemetry limitations were then applied and raw DS EUV images were produced for the observa-tions at the three diﬀerent detector positions.Foreach of these images aﬂat non-photonic back-ground,determined from highly obscured regions at the outer most parts of theﬁeld of view near the ﬁlter frames as discussed above,was subtracted from the image.We computed azimuthally av-eraged radial emission proﬁles of A4059centered on the respective detector positions of the cluster from the individual EUV Deep Survey images.The detector sensitivity map(orﬂatﬁeld)that we constructed from the assemblage of788Ks blankﬁeld observations was then used to deter-mine radial sensitivity proﬁles for the three detec-tor positions where the cluster was centered during the observations.These radial sensitivity proﬁles were thenﬁt to the respective proﬁles of the dis-tinct cluster observations at radii between15–20′. In all three cases,the June1998observation and the two parts of the November1998observation, theﬁt of the sensitivity map to the observations provide an excellent representation of the data at radii larger than≈5′.In Figure3we show the combined observed az-imuthally averaged,radial EUV emission proﬁle of the cluster A4059.The background and its statis-tical error is shown as a gray shaded area.Here we have used the entire788Ks background data set.Given this extensive data set,the statistical uncertainties in this background are small.The data in thisﬁgure shows diﬀuse EUV emis-sion in A4059that extends to a radius of4to 5′.Beyond this point,the radial emission proﬁle matches the detector sensitivity function,showing there is no EUV emission at larger radii.We next determined the EUV contribution of the low energy tail of the X-ray emitting cluster gas in A4059.To this end we analyzed5,439s of ROSAT PSPC archival data on this ing standard procedures implemented in the EXSAS5.0•103 1.0•104 1.5•104Pulse Height0.0000.0020.0040.0060.0080.010Counts/Pixel/1PHunitsFig. 2.—The pulse height distribution of back-ground and photon events in the Deep Survey Telescope detector.The dashed line shows the non-photonic background of the detector obtained from regions in which celestial EUV photons are obscured.The sold line shows the distribution over the unobscured portion of the detector that views the sky.The diﬀerence between the two represents the pulse height distribution of photonevents.Fig. 3.—The azimuthally averaged radial EUV emission proﬁle of A4059.The background ob-tained from788Ks of background and its statisti-cal errors are shown as grey shaded regions.software package,we produced a cluster image in the ROSAT hard energy band(PSPC channels51 to201)and constructed an azimuthally averaged radial emission X-ray proﬁle centered on the clus-ter position.We employed the cluster X-ray gas tempera-ture proﬁle derived by Huang&Sarazin(1998),the abundance proﬁle from ASCA measurements (Ohashi1995),and the MEKAL plasma code to simulate ROSAT PSPC to EUVE DS counts con-version factors.The correction for the interven-ing absorption of the Galactic interstellar medium (ISM)was carried out using an interstellar hy-drogen absorption column of N H=1.06±0.1×1020cm−2(Murphy et al.2000)and an absorption model including cross sections and ionization ra-tios for the ISM as described in Bowyer,Bergh¨o fer &Korpela(1999).Note that the ISM column em-ployed,based on new radio measurements,is lower than that used in the analysis of Huang&Sarazin (1998).We conclude that the eﬀect of the some-what lower ISM column and the improved ISM cross sections on the X-ray temperatures derived by Huang&Sarazin is small.However,an accu-rate modeling of the foreground absorption is re-quired in our work in order to obtain an accurate conversion from ROSAT PSPC counts into EUVE DS counts.For the range of X-ray temperatures(1.8to 4.0keV)and element abundances of0.29to0.63 times the solar value as established for A4059by ASCA observations(Ohashi1995),we determine the ROSAT PSPC hard band to EUVE Deep Sur-vey counts conversion factor fell between115to 130.Employing these values as limits and us-ing the azimuthally averaged radial X-ray emis-sion proﬁle derived from the PSPC hard energy band,we derived upper and lower limits for the EUV emission from the X-ray emitting gas in the EUVE Deep Survey band pass.In Figure4we show the EUV emission from the X-ray emitting gas as shaded regions with the un-certainties in this emission indicated by the size of the shaded region.We also show the EUV emission in the cluster as derived from the data displayed in Fig.3;we have subtracted the back-ground from the signal and show the emission with the errors in the signal and background added in quadrature.Within the central2′bins the EUV emission has a deﬁcit compared to theexpected Fig.4.—The EUV emission from the cluster ob-tained from the data displayed in Figure3.The statistical errors in the background and signal are added in quadrature.We also show the EUV emis-sion from the X-ray gas.There is no evidence of excess EUV emission from the cluster to within the statistical uncertainties.emission from the diﬀuse X-ray gas.At larger radii the EUV emission is consistent with the EUV con-tribution of the X-ray emitting cluster gas.3.DiscussionThe data displayed in Figure4shows there is no excess EUV emission in this cluster.There is, in fact,an EUV deﬁcit in the innermost two arc minute region of the cluster.This deﬁcit is similar to the EUV deﬁcit found by Bowyer et al.1999in the inner core of A1795and A2199.This eﬀect is due to absorption by partially ionized material in the coolingﬂow material as has been discussed in Allen et al.(1996)and Bowyer et al.(1999).Given that A4059appears to have achieved a relaxed state and has a coolingﬂow,characteris-tics similar to those in A1795and A2199,which Bowyer et al.(1999)showed did not posses an excess EUV emission,it is perhaps not surprising that we did notﬁnd excess EUV emission in this cluster.On the other hand,A4059does contain a radio galaxy.Since Bowyer&Bergh¨o fer(1998) have shown the EUV excess in the Coma Clusteris non-thermal and Bergh¨o fer et al.(2000)have shown that radio emission may be at least indi-rectly associated with the EUV emission in the Virgo Cluster,it is reasonable to speculate that EUV emission would be present in this cluster as well.Since it is not,there must be some other mechanism involved in the production of thisﬂux, either as the underlying cause,or as a correlative requirement.In any attempt to understand the phenomena of EUV excess in clusters of galaxies,one must confront the fact that Lieu et al.(1999)claimed to have found excess EUV emission from A2199 while Bowyer et al(1999)did not.This diﬀerence is clearly due to the diﬀerent analyses procedures employed.Lieu et al.claim the pulse height threshold varies widely in the EUVE DS detector and that this is of fundamental importance.We have dis-cussed this issue at length earlier in this paper and show that it is incorrect.Their second claim is that a background observation must be made nearly contemporaneously with theﬁeld observa-tion because of the instability of the EUVE Deep Survey telescope background.We discussed this point earlier in this paper and show that this point is also incorrect.An additional problem in the Lieu et al.analy-sis is introduced by the small background data set these authors employed.In their Figure4they claim the radial emission proﬁle shows the emis-sion extends out to almost30′.However,their background observation is about oneﬁfth as long as their cluster observation and consequently the errors in this background measurement are sub-stantially larger than those for the cluster obser-vation.It is an elementary statistical requirement that in order to optimize the measurement of a weak signal embedded in a background with a sim-ilar intensity,both the signal and the signal plus the background must have data sets of similar size. This is obviously not the case and the large sta-tistical uncertainties in the background as shown in Figure4a of Lieu et al.clearly dominate any attempt at establishing a signal even out to three arcmin.This can be veriﬁed by comparing the er-ror bars of the background in Fig4a with those of the signal plus background shown in Figure4b, although this comparison would be easier if the background and its uncertainties were not omit-ted in Figure4b.The only diﬀerence between our analysis and the analysis of Lieu et al.that has any eﬀect is their use of a wavelet analysis to establish their detector sensitivity proﬁle orﬂatﬁeld.The pro-ﬁle they obtain using a wavelet analysis is clearly diﬀerent than ours as can be seen by compar-ing Figure1of Lieu et al.showing their back-ground with Figure1herein showing our proﬁle.A wavelet analysis is a complex procedure and may well produce surprising results if inappropri-ately employed.Given the lack of transparency in their description of the use of a wavelet anal-ysis,it is impossible to determine the source of eﬀects seen in their Figure1.We note that in an extensive anaysis of clusters of galaxies,Mohr et al.(1999)ﬁnd that wavelet analyses of clusters of galaxies by Durret et al(1994),provide results that are diﬀerent from virtually all other authors; they also comment that it is”diﬃcult to discern the source of the disagreement.”One possibility is that the large edge eﬀects seen in the background in our Figure1are adding substantial power to their template.We have asked Lieu(2000,private communication)if this has occurred,but he has stated that he will not discuss this issue with us.An apparent conﬁrmation of a diﬀuse EUV emission from A2199is provided by Kaastra et al.(1999)who use BeppoSAX and EUVE data in their analysis.These authors do not describe their reduction of the EUVE data so we cannot com-ment on that part of their analysis.However,these authors claim the BeppoSAX data show an excess EUV emission in this cluster.However,in analyz-ing the BeppoSAX data these authors use a curi-ous mix of in-ﬂight observational data and ground-based calibration data to determine the detector sensitivity function.The cosmic X-ray and parti-cle background was obtained from in-ﬂight data of emptyﬁles taken at high Galactic latitude.How-ever,this in-ﬂight data was not used to determine the eﬀective area,point-spread function,strong back obscuration and vignetting of the detector. Instead these authors use a function that is based on response matrices derived from ground-based calibrations and ray trace codes.The lack of trans-parency of this procedure makes it diﬃcult for an outside person to establish the validity of this pro-cess.We do note,however,that it was the use of a ground-based detector response function for thebackground of the EUVE detector that resulted in the errors in the original reports of EUV excesses in clusters of galaxies.It was only when in-ﬂight background data were employed that the true ex-tent of the EUV excess could be established.It is superﬁcially curious that excess EUV emis-sion has been found in the Virgo Cluster by Lieu et al.(1996a)and Bergh¨o fer et al.(2000),and in the Coma Cluster by Lieu et al.(1996b)and Bowyer et al.(1999)using diﬀerent methods of analysis.Upon reﬂection it is clear that this is because the emission in both these clusters is suf-ﬁciently intense and extended that both groups obtain clear evidence for an excess.Nonetheless, the results obtained using these diﬀerent methods diﬀer in detail.While there is agreement that there is an EUV excess in both Virgo and Coma,this emission is more complex than previously imagined.In the Virgo Cluster EUV emission is associated with the core and jet of the central galaxy M87.Addi-tionally,in the vicinity of M87,diﬀuse emission is observed out to a distance of13′.The spatial distribution of thisﬂux is incompatible with ther-mal plasma emission originating from a gravita-tionally bound gas.Furthermore,the diﬀuse EUV emission is not directly correlated with either the X-ray or radio emission in the cluster(Bergh¨o fer et al.2000).New data on the Coma Cluster shows the EUV emission is not only associated with the main clus-ter but is also present in the subcluster to the northwest(Korpela et al.in progress).A thermal source for the emission in the cluster can be ruled out since the emission is not consistent with a1/r2 gravitational potential.Nonetheless,the emission is spatially intermixed with the high temperature X-ray emitting thermal emission.4.ConclusionWe analyzed unpublished EUVE Deep Survey observations of the cluster of galaxies A4059.In order to test the integrity of our results for this clusters and to test the validity of our background model for the EUVE Deep Survey instrument we also analyzed363Ks of blankﬁeld observations and compared this to a previous assemblage of 425Ks of blank ing statistical tests of these two backgrounds we deﬁnitely conﬁrm the stabil-ity of the instrument’s background and disprove claims of time dependent changes in the detector background by Lieu et al.(1999).Further analysis shows that for a wide range of threshold cutoﬀs the pulse-height threshold level setting has essentially no eﬀect on theﬁnal EUVE Deep Survey images, another claim cooked up by Lieu et al.(1999).Our analysis of the EUVE observations of the cluster of galaxies A4059shows that this cluster exhibits diﬀuse EUV emission.However,the emis-sion over most of the cluster is that expected from the low energy contribution of the X-ray emitting cluster gas.The EUV emission in the central2 arcmin of the cluster shows an EUV deﬁcit.The observed level of EUVﬂux is extremely sensitive to absorption eﬀects and theﬂux deﬁcit demon-strates intrinsic absorption of the X-ray emitting cluster gas.Together with A1795and A2199 (Bowyer et al.1999),A4059is the third exam-ple of a cluster of galaxies with EUV absorption due to cooler gas in the existing central cooling ﬂow.This work was supported in part by NASA cooperative agreement NCC5-138and an EUVE Guest Observer Mini-Grant.TWB was sup-ported in part by a Feodor-Lynen Fellowship of the Alexander-von-Humboldt-Stiftung. REFERENCESAllen,S.,Fabian,A.,Edge,A.C.,Bautz,M.,Fu-ruzawa,A.,&Tawara,Y.1996,MNRAS,283, 263Bergh¨o fer,T.W.,Bowyer,S.,&Korpela,E.J.2000,ApJ,535,615Bowyer,S.&Bergh¨o fer,T.W.1998,ApJ,506, 502Bowyer,S.,Bergh¨o fer,T.W.,&Korpela,E.J.1999,ApJ,526,592Durret,F.,Gerbal,D.,Lachieze-Rey,M.,&Sadat, R.1994A&A,287,733Edge,A.C.,&Stewart,G.C.1991,MNRAS,252, 414Edge,A.C.,Stewart,G.C.,&Fabian,A.C.1992, MNRAS,258,177Huang,Z.,&Sarazin,C.L.1998,ApJ,496,728 Kaastra,J.S.,Lieu,R.,Mittaz,J.P.D.,Bleeker, J.,Mewe,R.,Colafrancesco,S.,&Lockman,F.1999,ApJ,519,L119Lieu,R.,Mittaz,J.P.D.,Bowyer,S.,Breen,J., Lockman,F.J.,Murphy,E.M.,&Hwang,C-Y.1996a,Science,274,1335Lieu,R.,Mittaz,J.P.D.,Bowyer,S.,J.,Lock-man,F.J.,Hwang,C-Y.,&Schmitt,J.H.M.M.1996b,ApJ,458,L5Lieu,R.,Bonamente,M.,Mittaz,J.P.D.,Durret,F.,Dos Santos,S.,Kaastra,J.S.1999,ApJ527,L77Mohr,J.,Mathisen,B.,&Evrard,A.1999,ApJ, 517,627Murphy, E.,Sebach,K.,&Lockman,F.2000, ApJS,in prep.Ohashi,T.1995,in Dark Matter,ed.S.S.Holt&C.L.Bennett(New York:AIP),255 Schwartz,D.A.,Bradt,H.V.,Remillard,R.A., &Tuohy,I.R.1991,ApJ,376,424 Snowden,S.L.,McCammon,D.,Burrows,D.N., &Mendenhall,J.A.1994ApJ,424,714 Taylor,G.B.,Barton,E.J.,&Ge,J.1994,AJ, 107,1942。

Detection of Weak Lensing in the Fields of Luminous Radiosources

a r X i v :a s t r o -p h /9507076v 1 20 J u l 1995A&A manuscript no.(will be inserted by handlater)2 B.Fort et al.:Detection of Weak Lensing in the Fields of Luminous Radiosourcesous galaxy clumps distributed in the Large Scale Struc-tures of the Universe(hereafter LSS)if a substantial frac-tion of them have almost the critical surface mass den-sity.In fact,the excess of QSOs and radiosources around the Zwicky,the Abell and the ROSAT clusters reported recently(BSH94,SS95c)already supports the idea that cluster-like structures may play a signiﬁcant role in mag-nifying a fraction of bright quasars.If this hypothesis is true these massive,not yet detected deﬂectors in visible could show up through their weak lensing eﬀects on the background galaxies.The gravitational weak lensing analysis has recently proved to be a promising technique to map the pro-jected mass around clusters of galaxies(KS93,BMF94, FKSW94,SEF94).Far from the centers of such mass condensations,background galaxies are weakly stretched perpendicular to the gradient of the gravitationalﬁeld. With the high surface density of background galaxies up to V=27.5(≈43faint sources per square arcminute with V>25)the local shear(or polarization of the images)can be recovered from the measurement of the image distor-tion of weakly lensed background galaxies averaged over a sky aperture with typical radius of30arcsec.The implicit assumption that the magniﬁcation matrix is constant on the scanning aperture is not always valid and this obser-vational limitation will be discussed laterThe shear technique was also used with success to de-tect large unknown deﬂectors in front of the doubly im-aged quasar Q2345+007(BFK+93).This QSO pair has an abnormally high angular separation,though no strong galaxy lens is visible in its neighbourhood.The shear pat-tern revealed the presence of a cluster mass oﬀcentered at one arcminute north-east from the double quasar,which contributes to the large angular separation.Further ultra-deep photometric observations in the visible and the near infrared have a posteriori conﬁrmed the presence of the cluster centered on the center of the shear pattern and detected a small associated clump of galaxies as well,just on the QSO line of sight.Both lensing agent are at a red-shift larger than0.7(MDFFB94,FTBG94,PMLB+95). The predicting capability of the weak lensing was quite remarkable since it a priori provided a better signature of the presence of a distant cluster than the actual over-density of galaxies,which in the case of Q2345+007was almost undectable without a deep”multicolor”analysis.On a theoretical side,numerical simulations in stan-dard adhesion HDM or CDM models(BS92a)can predict the occurrence of quasar magniﬁcation.They have shown that the large magniﬁcations are correlated with the high-est amplitudes of the shear,which intuitively means that the largest weak lensing magniﬁcations are in the immedi-ate vicinity of dense mass condensations.For serendipity ﬁelds they found from their simulations that at least6%of background sources should have a shear larger than5%. However,for a subsample of rather bright radiosources or QSOs the probability should be larger,so that we can reasonably expect quasarﬁelds with a shear pattern above the detection level.Since we can detect shear as faint as3%(BMF94), both observational and theoretical arguments convince us to start a survey of the presence of weak shear around sev-eral bright radiosources.In practice,mapping the shear re-quires exceptional subarcsecond seeing(<0.8arcsec.)and long exposure times,typically4hours in V with a four meter class telescope.Observations of a large unbiased selected sample of QSOs will demand several years and before promoting the idea of a large survey we decided to probe a few bright QSOﬁelds where a magniﬁcation bias is more likely.In this paper,we report on a preliminary tests at CFHT and ESO ofﬁve sources at z≈1.The analysis of the shape parameters and the shear is based on the?bon-net95technical paper,with some improvements to mea-sure very weak ellipticities.Due to instrumental diﬃculties only one,Q1622+238,was observed at CFHT.Neverthe-less,we found a strong shear pattern in the immediate vicinity of the quasar quite similar to the shear detected in the QSO lens Q2345+007(BFK+93).The QSO is mag-niﬁed by a previously unknown distant cluster of galax-ies.The four other QSOs were observed with the imaging camera SUSI at the NTT with a signiﬁcantly lower instru-mental distortion but with a smallerﬁeld of view.In this case the limited size of the camera makes the mapping of strong deﬂector like in Q1622+238harder.However, with the high image quality of SUSI it is possible to see on the images a clear correlation between the amplitude and direction of the shear and the presence of foreground overdensities of galaxies.Some of them are responsible for a magniﬁcation bias of the QSO.By comparing the preliminary observations at CFHT and ESO we discuss important observational issues, namely the need for a perfect control of image quality and a largeﬁeld of view.We also show that invisible masses as-sociated with groups and poor clusters of galaxies can be seen through their weak lensing eﬀect with NTT at ESO. These groups of galaxies may explain the origin of a large angular correlation between the distribution of distant ra-diosources(z>1)and the distribution of low redshift galaxies(z<0.3)The study of the correlation between the local shear and nearby overdensity of foreground galaxies (masses)will be investigated in following papers after new spectrophotometric observations of the lensing groups. 2.Selection and observations of the sourcesThe double magniﬁcation bias hypothesis maximises the probability of a lensing eﬀect for luminous distant sources (BvR91).Therefore whenever possible we try to select sources that are both bright in radio(F>2Jy,V<18). We also looked at quasars with absorption lines at lower redshift,to know if some intervening matter on the lines of sight is present.The QSOs are chosen at nearly theB.Fort et al.:Detection of Weak Lensing in the Fields of Luminous Radiosources3 objectα50δ50m V zﬂux Tel./Instr.exp.numb.seeingtimeﬁles(arcsec.)Table1.Observational data for the5QSOsﬁelds.The V magnitude stars.The radioﬂux is the5009MHz value from the 1Jy catalogue.The total exposure time corresponds to the coaddition of several individual images with30-45minutes exposure time.The seeing is the FWHM of stars on the composite imagemean redshift of the faint background galaxies(z from0.8to1.)used as an optical template to map the shear offoreground deﬂectors.So far,we have observed5QSOs atredshift about1with a V magnitude and radioﬂux in therange from17to19and1.7to3.85respectively(Table1).Except Q1622+238(z=0.97)which was suspected tohave a faint group of galaxies nearby(HRV91),the4othercandidates(PKS0135-247,PKS1508-05,PKS1741-03,and3C446.0)have been only selected from the?hewitt87,andthe?veronveron85catalogues,choosing those objects withgood visibility during the observing runs.The V magni-tude of each QSO was determined with an accuracy betterthan0.05mag.rms from faint?landolt92calibration stars(Table1).The observations started simultaneously in June1994at the ESO/NTT with SUSI and at CFHT with FOCAM,both with excellent seeing conditions(<0.8”)and stabletransparency.For the second run at ESO in November1994,only one of the two nights has good seeing condi-tions for the observation of PKS0135-247.We used the1024×1024TeK and the2048×2048LORAL CCDs with15micron pixel,which correspond to0.13”/pixel at theNTT and0.205”/pixel at CFHT,and typicalﬁelds of viewof2’and7’respectively.In both cases we used a standardshift and add observing technique with30to45min expo-sures.The resultingﬁeld of view is given in table2.The total exposure was between16500and23700seconds in V(Table1).The focusing was carefully checked between each individual exposure.After prereduction of the data with the IRAF software package,all frames were coadded leading to a composite image with an eﬀective seeing of 0.78”at CFHT and0.66”-0.78”at NTT(Table1).Al-though the seeing was good at CFHT we are faced with a major diﬃculty when trying to get a point spread function for stars(seeing disk)with small anisotropic deviations from circularity less than b/a=0.05in every direction). This limitation on the measurement of the weak shear am-plitude will be discussed more explicitly in the following section.3.Measurement of the shearThe measurements of the shear patterns have been ob-tained from an average of the centered second order mo-Fig. 2.Histogram of the independent measurements of the axis ratio b/a in all theﬁelds with a scanning aperture of30 arcsec.radius.The peak around0.99is representative of the noise level that deﬁnes a threshold of amplitude detection near 0.985.menta as computed by Bonnet and Mellier(1995)of all individual galaxies in a square aperture(scanning aper-ture size:57+3/-5arcsec.)containing at least25faint galaxies with V between25and27.5(Table2).Because very elongated objects increase the dispersion of the mea-surement of the averaged shape parameters(see Bonnet and Mellier1995,Fig.4),and blended galaxies give wrong ellipticities,we rejected these objects from the samples. The direction of the polarization of background galaxies is plotted on each QSOﬁeld(Figures3b,3d,4b,5b,6b) at the barycentre of the25background galaxies that are used to calculate the averaged shear.Each plot has the4 B.Fort et al.:Detection of Weak Lensing in the Fields of Luminous RadiosourcesFig.1.Figure1a:NTT Field of view of PKS1741which was used as a star template to study the instrumental distortion of the SUSI camera.Figure2b:plot of the apparent residual”shear amplitude”of the stars on5points of theﬁeld where the galaxy shears are determined in other NTT images;ﬁgures4,5,6same amplitude scale for comparison between images and the instrumental distortion found from a starﬁeld anal-ysis(Figure1b).This explains why the mapping is not rigorously made with a regular step between each polar-ization vector on theﬁgures.The small step variation re-ﬂects the inhomogeneity of the distribution of background sources.For the exceptional shear pattern of Q1622+238, a plot with a smaller sampling in boxes of22arcsec.gives a good view of the coherence of the shear(Figure3b).All other maps are given with a one arcminute box,includ-ingﬁgure3d,so that each measurement of the shear is completely independent.For quantitative study the coor-dinates of each measurement are given on table3with the value of the apparent amplitude1−b/a and the direction of the shear.The ellipticity e=1−b/a given in Table3 is drawn on the variousﬁelds with the same scale.A description of the technique used to map the shear can be found in?bonnet95.We have only improved when necessary the method to correct the instrumental distor-tion in order to detect apparent shear on the CCD images down to a level of about2.0%(Figure2).Notice that we call here”apparent shear”the observed shear on the im-age which is not corrected for seeing eﬀects and which is averaged within the scanning aperture.To achieve this goal we observed at NTT,in similar conditions as other radiosources,theﬁeld of PKS1741-03which contains ap-proximately26±6stars per square arcminute(Figure 1a,b).After a mapping of the instrumental distortion of stars we have seen that prior to applying the original?bon-net95method,it is possible to restore an ideal circular see-ing disk with a gaussian distribution of energy for stars in theﬁeld(pseudo deconvolution).The correction almost gives conservation of the seeing eﬀective radius with:s=√B.Fort et al.:Detection of Weak Lensing in the Fields of Luminous Radiosources5ture(Figure1b).However we verify with the PKS1741-03ﬁeld that the restoration of the circularity of the spread function can give a residual”polarization”of stars in the ﬁeld as low as1−<b/a>=0.0009±0.0048(dispersion).In fact the restoration of the point spread function ap-peared to be more diﬃcult with CFHT images because of a higher level of instrumental distortion whose origin is not yet completely determined:guiding errors,atmospheric dispersion,larger mechanicalﬂexure of a non-azimuthal telescope,3Hz natural oscillation of the telescope(P. Couturier,private communication),optical caustic of the parabolic mirror,and indeed greater diﬃculties in getting excellent image quality on a largerﬁeld.Thus,the level of instrumental distortion measured on stars is currently 1−<b/a>=0.08-0.12with complex deviations from a circular shape.After the restoration of an ideal seeing spread function we are able to bring the shear accuracy of CFHT images to a level of0.03.But like the classical measurement of light polarization it should be far better to start the observations with a level of instrumental po-larization as low as possible.In summary we are now able to reach the intrinsic lim-itation of Bonnet&Mellier’s method on the measurement of the shear amplitude at NTT with a typical resolution of about60arcsec.diameter(25-30faint galaxies per res-olution element)with a rms error of about0.015(Figure 2).Below this value the determination of the amplitude of the shear is meaningless although the direction may still be valid.At CFHT the detectivity is almost two times less but theﬁeld is larger.We are currently developing meth-ods to correct the instrumental distortion at the same level we get with the NTT.This eﬀort is necessary for future programmes with the VLT which would be aimed toward the mapping of Large Scale Structures(shear of0.01)witha lower spatial resolution(>10arcminute apertures).4.ResultsIn this section we discuss the signiﬁcance of the shear pat-tern in each QSOﬁeld and the eventual correlation with the isopleth or isodensity curves of background galaxies with20<V<24.5.For a fair comparison both the iso-pleth(surface density numbers)or isoluminosity curves (isopleth weighted by individual luminosity)are smoothed with a gaussianﬁlter having nearly the resolution of the shear map(40”FWHM).1.Q1622+238A coherent and nearly elliptical shear pattern is de-tected with an apparent amplitude0.025±0.015at a distance ranging from50”to105”of the QSO(Figure 3b).The center of the shear can be calculated with the centering algorithm described by Bonnet&Mel-lier.The inner ellipses inﬁgure3b show the position of the center at the1,2and3σconﬁdence level.It co-incides with a cluster of galaxies identiﬁed on the deepV image10arcsec South-East from the QSO(Figure 3c).The external contour of the isopleth map inﬁg-ures3c corresponds to a density excess of galaxies of twice the averaged values on theﬁeld for a30arc-sec circular aperture.The isoluminosity map shows a light concentration even more compact than the num-ber density map.About70%of the galaxies of the condensation have a narrow magnitude range between V=24and24.5and are concentrated around a bright galaxy with V=21.22±0.02.This is typical for a clus-ter of galaxies.A short exposure in the I band gives a corresponding magnitude I=19.3±0.1for the bright central galaxy.A simple use of the magnitude-redshift relationship from a Hubble diagramme and the(V−I) colors of the galaxy suggest a redshift larger than0.5.By assuming such a redshiftObjectﬁeld Ng/N G Mag(pixels)/(arcsec.)range Table2.Table2:Number Ng of(background)galaxies from V=22to24.5which are used to trace isopleth and number N G of(distant)galaxies from V=25to27.5.detected on each observedﬁeldit is possible to mimic the shear map with a deﬂec-tor velocity dispersion of at least500km/sec.Aftera correction for the seeing eﬀect with the?bonnet95diagram and taking into account the local shear of the lens at the exact location of the QSO we can estimate that the magniﬁcation bias could be exceptionally high in this case(>0.75magnitude).Further spectropho-tometric observations of theﬁeld are needed to get a better description of the lens.It is even possible that multiply imaged galaxies are present at the center of this newly discovered cluster.2.PKS1741-03Thisﬁrst NTTﬁeld was chosen for a dedicated study of the instrumental distortion of the SUSI instrument.Indeed it is crowded with stars and the mapping of the isopleth was not done due to large areas of the sky occulted by bright stars.The center of theﬁeld of PKS1741-03shows a faint compact group of galaxies(marked g onﬁg1a).A de-tailed investigation of the alignment of individual faint galaxies nearby shows that a few have almost orthora-dial orientation to the center of the group.The ampli-tude of the”apparent”shear on theﬁg1b is low prob-ably because it rotates within the scanning aperture around a deﬂector having an equivalent velocity dis-6 B.Fort et al.:Detection of Weak Lensing in the Fields of Luminous RadiosourcesFig.3.Figure3a:CFHTﬁeld of view of Q1622+238in V.North is at the top.Figure3b:Shear map of Q1622+238with a resolution step of22arcsec.The ellipses shows the position of the center of the central shear with the1,2,3σconﬁdence level. The center almost coincides with a distant cluster clearly visible onﬁgure3c.persion lower than400km/s.Outside the box the ap-parent shear is already below the1−<b/a>=0.015 threshold level and it is not possible to detect the cir-cular shear at distance from the group larger than one arcmin.This remark is important because it illustrates the limitation of the method in detecting lenses with a1−<b/a>=0.015on angular scales smaller than the scanning aperture.Therefore a low amplitude of the shear on the scanning aperture could be the actual signature of a small deﬂector rather than a sky area with a low shear!Although the compact group is only 30arcsec South-East of the QSO it might contribute toa weak lensing of PKS1741-03but it is diﬃcult to geta rough estimate of the amplitude of the magniﬁcationbias.3.PKS1508-05This is the second bright radiosource of the sample.At one arcminute North-West there is also a group arounda bright galaxy(G)which could be responsible for alarge shear.This distant group or cluster may con-tribute to a weak magniﬁcation by itself,but there is also a small clump of galaxies in the close vicinity of the radiosource with the brightest member at a distance of 8arcseconds only.The situation is similar to the case of the multiple QSO2345+007(BFK+93).This couldbe the dominant lensing agent which provides a larger magniﬁcation bias,especially if the nearby cluster has already provided a substantial part of the critical pro-jected mass density.4.3C446The radiosource is among the faintest in the optical (table1).There is a loose group of galaxies at40arc-sec South-West from the QSO.The orientation of the shear with respect to the group of galaxies can be reproduced with a rough2D simulation(Hue95)al-though atﬁrst look it was not so convincing as the PKS0135-247case.The lensing conﬁguration could be similar to PKS1508-05with a secondary lensing agentG near the QSO(ﬁg6a,b).Surprisingly there is also alarge shear amplitude which is not apparently linked to an overdensity of galaxies in V in the North-East corner.In such a case it is important to conﬁrm the re-sult with an I image to detect possible distant groups at a redshift between0.5and0.7.A contrario it is important to mention that the shear is almost null in the North-West area of theﬁeld which actually has no galaxy excess visible in V(ﬁg6b).B.Fort et al.:Detection of Weak Lensing in the Fields of Luminous Radiosources7 Fig.4.Figure3c:Zoom at the center of theﬁeld of view of Q1622+238.The distant cluster around the bright central elliptical galaxy E is clearly identiﬁed on this very deep V image.Figure3d:Shear map of Q1622+238with a resolution step of60arcsec. similar to the resolution on other NTTﬁelds.The ellipses shows the position of the center of the central shear with the1,2,3σconﬁdence level.The center almost coincides with a distant cluster clearly visible onﬁgure3c.5.DiscussionDue to observational limitations on the visibility of ra-diosources during the observations the selection criteria were actually very loose as compared with what we have proposed in Section2for a large survey.The results we present here must be considered as a sub-sample of QSOs with a moderate possible bias.Nevertheless,for at least 3of the sources there are some lensing agents which are associated with foreground groups or clusters of galaxies that are detected and correlated with the shearﬁeld.For the2other cases the signature of a lensing eﬀect is not clear but cannot be discarded from the measurements.All the radiosources may have a magniﬁcation bias enhanced by a smaller clump on the line of sight or even an(unseen) foreground galaxy lying a few arcsec from the radiosource (compound lens similar to PKS1508).The occurrence of coherent shear associated with groups in theﬁeld of the radiosources is surprisingly high.This might mean that a lot of groups or poor clusters which are not yet identiﬁed contain a substantial part of the hidden mass of LSS of the Universe below z=0.8.Some of them responsible for the observed apparent shear may be the most massive pro-genitor clumps of rich clusters still undergoing merging.Although these qualitative results already represent a fair amount of observing time we are now quite convinced that all of theseﬁelds should be reobserved,in particular in the I and K bands,to assess the nature of the deﬂec-tors.Spectroscopic observation of the brightest members of each clump is also necessary to determine the redshift of the putative deﬂectors.This is an indispensable step to connect the shear pattern to a quantitative amount of lensing mass and to link the polarization map with some dynamical parameters of visible matter,such as the ve-locity dispersion for each deﬂector,or possibly the X-ray emissivity.at the present time,we are only able to say that there is a tendency for a correlation between the shear and light overdensity(FM94).From the modelling point of view,simulations have been done and reproduce fairly well the direction of the shear pattern with a distribution of mass that follows most of the light distribution given by the isopleth or isolumi-nosity contour of the groups in theﬁelds.Some of these condensations do not play any role at all and are probably too distant to deﬂect the light beams.Unfortunately,in order to make accurate modelling it is necessary to have a good estimate of the seeing eﬀect on the amplitude of the shear by comparing with HST referenceﬁelds,and good redshift determinations as well of the possible lenses to get their gravitational weight in theﬁeld.It is also impor-tant to consider more carefully the eﬀect of convolution8 B.Fort et al.:Detection of Weak Lensing in the Fields of Luminous RadiosourcesFig.5.Figure4a:NTTﬁeld of view for PK0135.North is at the top.Note the group of galaxies around g1,g2,g3and g4 responsible for a coherent shear visible onﬁgure4bFig. 6.Figure5a:NTTﬁeld of view for PKS1508.Note the North-West group of galaxies near the brighter elliptical E responsible for a larger amplitude of the shear onﬁgure5b and the small clump of galaxies g right on the line of sight of the QSO.of the actual local shear which varies at smaller scales than the size of the scanning beam(presently about one arcminute size).This work is now being done but is also waiting for more observational data to actually start to study the gravitational mass distribution of groups and poor clusters of galaxies in theﬁeld of radiosources.6.ConclusionThe shear patterns observed in theﬁelds ofﬁve bright QSOs,and the previous detection of a cluster shear in Q2345+007(BFK+93)provide strong arguments in fa-vor of the?bartelmann93b hypothesis to explain the large scale correlation between radiosources and foreground galaxies.The LSS could be strongly structured by nu-merous condensations of masses associated with groups of galaxies.These groups produce signiﬁcant weak lensing eﬀects that can be detected.A rough estimate of the mag-niﬁcation bias is given by the polarization maps around these radiosources.It could sometimes be higher than half a magnitude and even much more with the help of an in-dividual galaxy deﬂector at a few arcsec.of the QSO line of sight.The results we report here also show that we can study with the weak shear analysis the distribution of density peaks of(dark)massive gravitational structures (ieσ>500km/s)and characterise their association with overdensities of galaxies at moderate redshift(z from0.2 to0.7).A complete survey of a large sample of radiosource ﬁelds will have strong cosmological interest for the two as-pects we mentioned above.Furthermore,the method can be used to probe the intervening masses which are associ-ated with the absorption lines in QSOs or to explain the unusually high luminosity of distant sources like the ultra-luminous sources IR10214+24526(SLR+95)or the most distant radio-galaxy8C1435+635(z=4.25;LMR+94).Therefore we plead for the continuation of systematic measurements of the shear around a sample of bright ra-diosources randomly selected with the double magniﬁca-tion bias procedure(BvR91).Our veryﬁrst attempt en-countered some unexpected obstacles related to the lim-itedﬁeld of view of CCDs or the correction of instrumen-tal distortion.It seems that they can be overcome in the near future.We have good hopes that smooth distribu-tions of mass associated with larger scale structures likeB.Fort et al.:Detection of Weak Lensing in the Fields of Luminous Radiosources9 Fig.7.Figures6a:NTTﬁeld of3C446.Note onﬁgure6b the shear pattern relatively to the isopleth of possible foreground groups and the galaxies g on the line of sight of the QSOﬁlaments and wall structures could be observed with a dedicated wideﬁeld instrument that minimizes all instru-mental and observational systematics,or still better with a Lunar Transit telescope(FMV95). Acknowledgements We thank P.Schneider,N.Kaiser,R. Ellis,G.Monet,S.D’Odorico,J.Bergeron and P.Cou-turier for their enthusiastic support and for useful discus-sions for the preparation of the observations.The data obtained at ESO with the NTT would probably not have been so excellent without the particular care of P.Gitton for the control of the image quality with the active mirror. We also thank P.Gitton for his helpful comments and T. Brigdes for a careful reading of the manuscript and the en-glish corrections.This work was supported by grants from the GdR Cosmologie and from the European Community (Human Capital and Mobility ERBCHRXCT920001). ReferencesDar.A.Nucl.Phys.B.Proc.Suppl.,28A:321,1992.G.O.Abell.ApJS,3:211,1958.J.M.Alimi, F.R.Bouchet,R.Pellat,J.F.Sygnet,andF.Moutarde.Ap.J,354:3–12,1990.G.O.Abell,Jr.Corwin,H.G.,and R.P.Olowin.ApJS,70:1,1989.E.Aurell,U.Frisch,J.Lutsko,and M.Vergassola.J.FluidMech.,238:467–486,1992.M.-C.Angonin,F.Hammer,and O.Le F`e vre.In L.Nieser R.Kayser,T.Schramm,editor,in Gravitational Lenses.Springer,1992.V.I.Arnold.Singularities of Caustics and Wave Fronts.Kluwer,Dordrecht,The Netherlands,1990.A.Arag´o n-Salamanca,R.S.Ellis,and R.M.Sharples.MN-RAS,248:128,1991.V.I.Arnol’d,S.F.Shandarin,and Ya.B.Zeldovich.Geophys.Astrophys.Fluid Dynamics,20:111–130,1982.m.Math.Phys.,1993.M.Avellaneda and m.Math.Phys.,1993. Fort B.In G.Giacomelli A.Renzini Third ESO/CERN symposium.M.Caﬀo,R.Fanti,editor, Astronomy,Cosmology and Fundamental Physics.Kluwer Academic Publisher,1989.N.A.Bahcall.Ap.J.,287:926,1984.M.Bartelmann.A&A,276:9,1993.M.C.Begelman and R.D.Blandford.Nat,330:46,1987.J.M.Bardeen,J.R.Bond,N.Kaiser,and A.S.Szalay.Ap.J., 304:15–61,1986.N.A.Bahcall and R.Cen.Ap.J.,407:L49–52,1993.T.J.Broadhurst,R.S.Ellis,and T.Shanks.MNRAS,235:827, 1988.H.Bonnet,B.Fort,J.-P.Kneib,Y.Mellier,and G.Soucail.A&A,280:L7,1993.U.G.Briel,J.P.Henry,and H.B¨o hringer.A&A,259:L31, 1992.R.D.Blandford and M.Jaroszy´n ski.ApJ,246:2,1981.R.D.Blandford and C.S.Kochanek.ApJ,321:658,1987.。

基于滤波器的聚类算法

第43卷第10期 2010年10月天津大学学报 Journal of Tianjin UniversityV ol.43 No.10Oct. 2010收稿日期：2009-06-02；修回日期：2009-10-20.基金项目：中国博士后科学基金资助项目(20090450767)；天津市高等学校科技发展基金资助项目((20080810)．作者简介：张强（1972— ），男，博士，讲师．通讯作者：张强，zq8622@ .基于滤波器的聚类算法张强，李成，吴腾飞(天津大学精密测试技术及仪器国家重点实验室，天津 300072)摘要：针对聚类分析精度和效率低的问题，设计了一种聚类算法FBCLUS ．应用卷积定理和傅里叶变换，提出了频率滤波法来消除噪声的干扰；提出了单阈值、多阈值幅度滤波法消除噪声和提取不同密度的感兴趣区间；提出一个数学形态学算子提取聚类簇．实验表明：FBCLUS 算法能够发现任意形状的聚类；速度快，计算复杂度为O (N )；能够发现不同密度的聚类簇；抗噪声性能强；对网格大小有一定的适应性．FBCLUS 算法有很高的聚类精度和效率．关键词：频率滤波；幅度滤波；数学形态学；聚类中图分类号：TP391 文献标志码：A 文章编号：0493-2137(2010)10-0884-06A Filter -Based Clustering AlgorithmZHANG Qiang ，LI Cheng ，WU Teng-fei(State Key Laboratory of Precision Measuring Technology and Instruments ，Tianjin University ，Tianjin 300072，China )Abstract ：To overcome the problems of low accuracy and low efficiency ，a clustering algorithm FBCLUS was pro-posed. Firstly ，convolution theorem and Fourier transform were used to design frequency filters to reduce noise ；single threshold and multi-threshold amplitude filters were introduced to reduce noise and distinguish different density regions of interest ．Thirdly ，a mathematical morphology clustering operator was designed to discov er clusters. The experiments show that FBCLUS is able to detect arbitrarily shaped clusters ，and it is very efficient with a complexity of O (N )；it can distinguish clusters of different density ；it is insensitive to large amounts of noise ；and it is not sen-sitive to the grid size. FBCLUS has high accuracy and efficiency.Keywords ：frequency filter ；amplitude filter ；mathematics morphology ；clustering聚类分析是一种重要的无指导学习方法，有着广泛的应用领域．在图像处理方面，聚类被用来实现图像的自动分割、人脸的自动识别．在地球物理方面，聚类被用来处理卫星遥感信息．在生物医学方面，聚类被用来分析基因表达的关系．在文本处理方面，聚类被用来实现因特网中文本的自动分类．当前的聚类算法主要有4类：①划分方法，如文献[1-2]等，其缺点是确定参数K 困难，算法对噪声敏感；②层次方法，如文献[3-5]等，其缺陷是聚类质量较低；③密度方法，如文献[6-7]等，其缺点是计算量过大，不适合处理大量数据，对参数过分敏感；④网格的方法，如CLUGD [8-9]等，其缺点是对网格大小过于敏感，生成的簇的形状不是水平的就是垂直的，结果精度低．针对以上问题，笔者提出了基于滤波器的聚类算法 (filter-based clustering algorithm ，FBCLUS )．其主要贡献是：①利用卷积定理和傅里叶变换，提出了频率滤波法来消除噪声的干扰；②提出了单阈值、多阈值幅度滤波法消除噪声和提取不同密度的感兴趣区域(可能包含聚类簇的区域)；③提出了一个新的数学形态学算子，从感兴趣区域中提取聚类簇．FBCLUS 算法的特点有：①能够发现任意形状的聚类；②计算复杂度与数据量呈线性关系；③能够区分不同密度性质的聚类簇；④抗噪声能力强；⑤对网格大小有一定的适应性；⑥聚类结果与数据输入顺序无关．2010年10月张强等：基于滤波器的聚类算法 ·885·1FBCLUS聚类算法FBCLUS算法在思想上借鉴了数字信号处理的原理：提出聚类滤波方法降低噪声水平，通过数学形态学方法提取聚类簇．FBCLUS分为3步：①将数据空间网格化；②进行频率滤波和幅度滤波；③以数学形态学方法发现聚类簇．FBCLUS的第1步是将数据空间划分为等间距的网格，每个网格保存着其包含数据对象的个数．如果将数据集看作多维度信号，那么网格中数据对象的个数就标志着该网格处信号的强弱，通过信号的滤波和数学形态学处理，最终得到聚类簇．数据空间网格化的目的是浓缩信息，提高后续处理的速度．确定合理的网格大小(或者疏密)对后续处理有重要影响，目前提出的网格聚类算法对网格大小十分敏感，网格大小的变化对聚类结果有严重的影响．笔者提出的FBCLUS算法由于采用了模板滤波和数学形态学聚类，所以对网格的大小有一定的适应性．1.1频率滤波实际应用中获得的数据通常含有大量的噪声对象，它们不但影响聚类的精度，同时也加大了计算量，所以有必要将它们滤除．如果将数据集视为多维度信号，那么聚类簇和噪声所在区域表现为不同的特征[10]：聚类簇所在的区域信号以低频为主，信号的幅度大，信号的能量高；噪声所在的区域信号以高频为主，信号的幅度小，能量低．文中提出了使用频率和幅度两种滤波器过滤数据集的方法，以提高聚类处理的精度．首先，使用频率低通滤波器去除噪声所对应的高频信号；然后，使用幅度滤波器滤除残存的小幅度噪声信号．经过两次滤波处理后，噪声水平明显降低，为提高聚类簇的质量提供了保证．为了降低频率滤波的计算复杂度，提出了时域和频域的综合滤波法，避免了对整个空间的变换和反变换．实验表明，该方法有很好的精度和效率．频率滤波可以直接施加于信号，也可以先对信号做变换(如傅里叶变换、小波变换等)，然后对变换后的结果进行滤波．前者称为时域滤波，因为信号所处的空间是时域空间；后者称为频域滤波，因为变换后的空间是频域空间．在时域中进行频率滤波的优点是计算量低，因为它不需要进行空间变换．但是在时域中信号的频率特性没有被明显展示，所以频率滤波器设计困难．在频域中进行频率滤波的优点是滤波器设计简单直观，因为信号直接表现为频率的函数．但是频域滤波需要对数据集进行变换，计算量大．文献[11]在聚类中采用了频域滤波，使用小波变换实现时域到频域的变换，在频域中使用低通滤波器实现过滤．文献[11]只考虑了不同区域的频率特性而没有考虑幅度特性．其算法优点是抗噪声强、精度高；缺点是需要对整个数据空间做小波变换和反变换，计算量大．笔者提出的频率滤波方法综合了时域和频域滤波的优点，避免了各自的缺点，不但精度高而且减小了计算量．其核心思想是：在频域中设计低通滤波器；根据卷积定理，使用傅里叶反变换得到时域空间中对应的低通滤波器；在时域中完成滤波．其优点是：①在频域中能够方便地设计出高质量的滤波器，通过傅里叶反变换在时域中得到的对应滤波器具有频域中相同的性能，这是由卷积定理保证的；②使用时域滤波器直接滤波避免了空间的傅里叶变换和傅里叶反变换，减少了计算量．下面详细讨论频域中低通滤波器的设计和时域中对应滤波器的获得以及时域中滤波的处理．频域中频率滤波的数学模型为(,)(,)(,)G u v H u v F u v=式中：(,)F u v为数据集(,)f x y在频域的傅里叶变换，也就是要被滤波的对象；(,)H u v是频域中滤波器函数，完成对信号的滤波；(,)G u v是对(,)F u v滤波后的结果．问题的关键是如何确定频率滤波器(,)H u v．以原型法设计频率滤波器，确定滤波器的类型，再确定滤波器的参数．采用原型法的最大好处是现有的滤波器理论成熟，可以很容易地设计出符合要求的滤波器．主要考虑理想低通滤波器、巴特沃思低通滤波器和高斯低通滤波器[12]．以这3种滤波器为设计原型是因为它们的频率特性曲线涵盖了从尖锐陡峭(理想低通滤波器)到圆滑平缓(高斯低通滤波器)的整个范围．理想低通和高斯低通滤波器是两个极端，巴特沃思低通滤波器则介于二者之间，为过渡类型．在频域中完成滤波器的设计之后需要将它变换到时域中，进而完成滤波．卷积定理提供了理论依据，即频域的乘积对应于时域的卷积(,)(,)(,)(,)F u v H u v f x y h x y⇔∗由卷积定理和傅里叶反变换就可以获得需要的时域滤波器(,)h x y，从而可以直接对时域中的数据滤波．现在时域空间是大小为R T×的网格，应采用离散形式的卷积运算，即·886· 天津大学学报第43卷第10期(,)(,)f x y h x y ∗=11001(,)(,)R T i j f i j h x i y j RT−−==−−∑∑式中：(,)h x y 的大小(作用范围)是R T ×个网格．为了提高处理速度需要将(,)h x y 做得小一些，文献[12] 指出以小尺寸的(,)h x y 在时域滤波的效率高于频域滤波，同时有很好的滤波效果．下面采用大小为r t ×的滤波器，并用模板实现与网格的卷积，即(,)(,)f x y h x y ∗=11001(,)(,)r t i j f i j h x i y j rt −−==−−∑∑=11221rt rt rt i i i w f w f w f w f =+++=∑ 式中：i w 为掩模的系数；i f 是与该系数对应的网格信号强度(即网格中含有数据对象的个数)．以模板计算卷积的过程是在网格中逐个网格地移动模板，在每一个网格(x ，y )处滤波器的响应由模板的掩模系数与相应网格乘积之和给出．对于一个33×的模板(见图1)，在网格(x ，y )处卷积结果为(,)(,)f x y h x y ∗=1,1(1,1)w f x y −−−−+…+0,0(,)w f x y +…+1,1(1,1)w f x y ++计算中要求网格(x ，y )与模板中心0,0w 对应．对于r t ×的模板，r 和t 都要求是奇数． 1,1w −− 1,0w −1,1w − 0,1w − 0,0w 0,1w 1,1w −1,0w1,1w图1 3×3的模板 Fig.1 3×3 mask经过频率滤波之后高频噪声被大幅度衰减，但还会有些残存，笔者采用幅度滤波法以进一步降低噪声水平． 1.2 幅度滤波表1为DBSCAN 和FBCLUS 的运行时间．从表1可知，噪声区域的信号强度明显小于聚类簇区域，表1DBSCAN 和FBCLUS 的运行时间 Tab.1Run time of DBSCAN and FBCLUS 运行时间/s 对象个数 DBSCAN FBCLUS 100 000 69.2 0.87200 000 195.3 1.71300 000 576.82.53 500 0001 277.94.37所以文中提出通过信号幅度的差异来进一步滤除噪声．幅度滤波的思想是确定一个幅度阈值ξ，所有幅度小于ξ的区域都视为噪声加以滤除，其数学表达为cell(,)ROI cell(,)ROI x y x y ∈⎧⎨∉⎩ if if frefiltered frefiltered(,)(,)I x y I x y ξξ≥＜式中cell(,)x y 为中心坐标为(x ，y )的网格，frefiltered (,)I x y 为网格cell(,)x y 经过频率滤波后的信号强度；ROI (regions of interest )表示后续聚类处理的感兴趣区域．如果网格(x ，y )的信号强度高于幅度阈值ξ，则表明该网格可能属于某个聚类簇，需要算法进一步分析处理；反之，表明该网格信号强度太小，则属于噪声，要加以滤除．幅度滤波的关键是如何确定合理的幅度阈值ξ，笔者以直方图方式选择幅度阈值．通常情况下，经过频率滤波后的信号幅度是非负的实数，呈现为连续分布．为了构建直方图，需要对其离散化，采用均匀量化，将信号幅度转化为非负整数．然后分别计算各个幅度值对应网格数量的百分数，其数学处理方法是：设非空网格的总数为M ，幅度值为i 的网格数量是i m ，则幅度值为i 的网格数量百分比为i P =i mM．以幅度值I 为横轴，以i P 为纵轴生成幅度直方图．幅度直方图一般呈多峰、谷分布，因为噪声信号的强度很低，所以选取第一个明显的谷底对应的幅度值作为阈值ξ即可．通过以上的幅度滤波，整个空间被分为ROI 区域(可能属于某个聚类簇)和噪声区域，所以这种幅度滤波也可以看作是一种二值划分问题．为了进一步提高聚类分析的精确性，又提出了多阈值的幅度滤波方法，它对应于多值划分问题，其核心思想是：呈多峰、谷分布的幅度直方图中每一个峰及其附近隆起部分都反映某一种特定的信号强度分布，同时也对应于特定的数据点密度分布．而两个峰之间的谷底则是两种不同密度分布的分割点，选取这些谷底对应的幅度值作为幅度阈值，就可以将数据空间划分为不同密度的区域，每一种密度区域都可能对应不同的物理性质．多幅度阈值的数学表达为1cell(,)cell(,)cell(,)ROI cell(,)ROI iz x y x y x y x y 0∈ΡΟΙ⎧⎪∈ΡΟΙ⎪⎪⎪⎨∈⎪⎪⎪∈⎪⎩ if if if iffrefiltered 00frefiltered 11frefiltered 1frefiltered (,)(,)(,)(,)i i z zI x y I x y I x y I x y ξξξξξξξ−−≤≤≤＜＜＜＜式中010i z ξξξξ ≤＜＜＜＜＜，是一组选定的阈值．区域1ROI 到ROI z 是不同密度的感兴趣区域，对应于不同的ROI ，需要分别处理来发现不同密度的聚类簇．区域0ROI 是噪声区域，要加以滤除．通过多阈值幅度滤波，FBCLUS 算法能够发现不同密度的聚2010年10月张强等：基于滤波器的聚类算法 ·887·类簇．1.3 数学形态学聚类经过频率和幅度滤波之后，得到ROI 区域，对于二值幅度滤波，ROI 区域只有一个，而对于多值幅度滤波，ROI 区域有多个，每一个ROI 对应的区域密度不同，需要分别处理．下面任务是从每个ROI 区域中提取由相连网格构成的聚类簇．笔者提出了一个数学形态学算子完成聚类簇的搜索，其伪代码见算法，其中⊕代表膨胀运算，Δ为膨胀运算的结构元素．算子处理过程：第1步，以ROI 区域作为种子集合seeds ，并从中任意取出一个没有被处理的网格p 作为起始点0i X 进行第i 轮迭代．第2步，以起始点为中心进行膨胀，并将膨胀后新得到的网格作为中心继续膨胀，直到没有发现新的网格为止．为了保证膨胀不会超出ROI 区域，需要将新得到的网格与seeds 取交．第3步，将上面膨胀得到的结果归属于一个新的聚类簇i C ，并从种子集合中去除已经处理过的部分，如果种子集合不为空，则返回第1步继续第i +1轮迭代．此外，这一步还滤除了体积过小的聚类簇．第4步，当迭代结束后输出得到的聚类簇．算法 i =1seeds=ROIWHILE (seeds ≠Φ)∀p ∈seeds ，0i X =p k =1i k X =(1i k X −⊕Δ)∩seedsAdd =i k X -1i k X − WHILE (add ≠Φ)k =k +1i k X =(1i k X −∪(add ⊕Δ))∩seedsadd=i k X -1i k X − END WHILE i C =i k Xseeds=seeds -(i C ∩seeds ) i = i +1if i C is too small then ignore it END WHILE OUTPUT {i C }END笔者提出的数学形态学算子本身具有一定的降噪能力，它主要消除那些体积过小相互集聚的网格，如图2所示．图2(a )中小黑点是一些高密度网格的聚集体，数学形态学算子能够将它们滤除(见图2(b )．图2(a )中小黑点出现的原因是：为了提高效率，对频率和幅度滤波均采用小尺寸的模板，从而导致抗局部高密度噪声能力的下降．但是，数学形态学算子只能区分体积大小的不同，而不能区分密度的疏密，所以必须与频率和幅度滤波相结合才能取得较好的效果．(a ) 降噪前 (b ) 降噪后图2 新的数学形态学算子的降噪效果Fig.2 Filtering effect of new mathematical morphologicaloperator1.4 算法参数的选择滤波器模板的大小对算法的计算复杂度有很大的影响，模板越大，计算量越大，文献[12]指出在使用小模板的情况下，卷积的计算量会明显小于傅里叶变换．通常模板大小需要由实验决定，经验表明：在数据点密集区域模板中能够包含10个以上的数据点就能基本满足要求．在某些情况下，小尺寸模板会导致局部高密度噪声的滤除不净(见图2)，这可以由提出的数学形态学算子解决．膨胀运算结构元素的大小决定了算法对提取聚类簇时的黏合性的强弱．当网格划分过密时，聚类区域相对分散，要采用较大的结构元素；反之，当网格划分稀疏时，聚类簇区域相对集中，要采用较小的结构元素．可以通过试探法决定结构元素的大小：首先结构元素的大小取一个较小的值，然后逐步增大结构元素的大小，此时得到的聚类簇个数会逐步减小，当增加结构元素的大小而聚类簇个数变化不大的时候，就找到了合适的大小．对于多阈值幅度滤波得到的多个ROI 区域，由于不同类聚类簇的密度差异较大，需要分别确定结构元素的大小，然后再提取聚类簇，这样就能够保证类内样本较为稀疏的时候，算法判别正确．滤波器模板和膨胀运算结构元素的形状在低维度下选择矩形，以保证高的聚类质量；在高维度下选择十字形，以减少计算量．因为矩形的模板或者运算结构中包含网格的数目与维度呈指数增长，而十字形中包含网格的数目与维度呈线性增长，所以十字结构可以降低高维度处理的计算量．·888·天津大学学报第43卷第10期1.5算法性能分析为了讨论方便，规定数据集中对象的个数为N，非空网格的数目为M，滤波模板中网格的数量为W，滤波得到的ROI网格的总数目为R，膨胀结构元素中网格的数量为D，FBCLUS算法第1步空间网格化的计算复杂度为O(N)，第2步空间过滤的计算复杂度为O(MW)，第3步数学形态学的计算复杂度为O(RD)，整体复杂度为O(N+MW+RD)．通常MW<N，RD<N，整体复杂度近似为O(N)，与数据量呈线性关系．2实验测试2.1精度测试测试算法发现任意形状聚类簇的能力和抗噪声性能．测试数据集和聚类结果如图3所示，其中数据集含有噪声对象．(a)数据集(b)聚类结果图3精度测试的数据集和聚类结果Fig.3Data set and clustering result of accuracy test2.2网格适应性测试测试算法对网格尺寸的适应性．测试数据集如图4(a)所示，使用两种网格进行聚类．第1种网格是50×50，第2种网格是1000×1000，网格边长相差20倍．目前提出的网格聚类算法一般只能在50×50的网格中正常工作，例如CLUGD[8]算法在提取聚类簇时只考虑相邻网格的联通性，当网格划分过于密集时，高密度网格之间被空网格分割开来，其连通性遭到破坏，从而导致算法不能正常工作．而FBCLUS算法在两种网格下均能得到图4(b)的聚类结果．(a)数据集(b)聚类结果图4网格适应性测试的数据集和聚类结果Fig.4Data set and clustering result of grid size test2.3含障碍物聚类的测试以含障碍物聚类为例测试算法功能的可扩展性，测试数据集和聚类结果如图5所示，它包含两个障碍物，聚类算法需要区分出被障碍物分割开的聚类簇．本测试将含有障碍物的网格标记为膨胀禁止区，为了防止禁止区的泄漏采用33×的十字形膨胀结构元素．聚类结果表明，算法区分出左侧被障碍物完全分割开的聚类簇．(a)数据集(b)聚类结果图5含障碍物的测试数据集和聚类结果Fig.5Data set and clustering result of obstacle test2.4效率测试测试算法的效率，数据集由随机数产生器生成：每个数据集含有5个聚类簇和10%的噪声．测试计算机为神舟承运M715D，算法FBCLUS和DBSCAN[13]使用JA V A语言编写，处理时间见表1．测试结果表明FBCLUS的速度明显高于DBSCAN．3结语笔者设计的基于滤波器的聚类算法FBCLUS提出以频率滤波器和幅度滤波器来消除噪声并获得感兴趣区域ROI，设计一个新的数学形态学算子从ROI中提取聚类簇．理论分析和测试结果表明，FBCLUS算法能够发现任意形状的聚类；计算复杂度与数据量呈线性关系；能够区分不同密度性质的聚类簇；抗噪声能力强；对网格大小有一定的适应性．参考文献：［1］Har-Peled S，Kushal A. Smaller corsets for k-median andk-means clustering[J]. Discrete and Computational Ge-ometry，2007，37(1)：3-19.［2］Rosa La，Nehorai P S，Eswaran A，et al. Detection of uter-ine MMG contractions using a multiple change point es-timator and the k-means cluster algorithm[J]. IEEETr ansactions on Biomedical Engineer ing，2008，55(2)：453-467.2010年10月张强等：基于滤波器的聚类算法 ·889·［3］Loewenstein Y，Portugaly E，Fromer M，et al. Efficient algorithms for accurate hierarchical clustering of hugedatasets：Tackling the entire protein space[J]. Bioinfor-matics，2008，24(13)：i41-i49.［4］Lee H. E，Park K H，Bien Z Z. Iterative fuzzy clustering algorithm with supervision to construct probabilisticfuzzy rule base from numerical data[J]. IEEE Tr ansac-tions on Fuzzy Systems，2008，16(1)：263-277.［5］Santos J M，Sa de J M，Alexandre L A，et al. LEGClust：A clustering algorithm based on layered entropic sub-graphs[J]. IEEE Tr ansactions on Patter n Analysis andMachine Intelligence，2008，30(1)：62-75.［6］Duan Lian， Xu Lida，Guo Feng，et al. A local-density based spatial clustering algorithm with noise[J]. Infor-mation Systems，2007，32：978-986.［7］Li Jianhua，Behjat L，Kennings A. Net cluster：A net-reduction-based clustering preprocessing algorithm forpartitioning and placement[J]. IEEE Tr ansactions onComputer-Aided Design of Integr ated Cir cuits and Sys-tems，2007，26(4)：669-679.［8］Sun Zhiwei，Zhao Zheng. A fast clustering algorithm based on grid and density[C]//2005 Canadian Confer-ence on Electrical and Computer Engineering. SaskatoonSask，Canada：IEEE Press，2005：2276-2279.［9］Rasheed，Tinku Javaid，Usman Meddour，et al. An effi-cient stable clustering algorithm for scalable mobilemulti-hop networks[C]// 4th IEEE Consumer Communi-cations and Networ king Confer ence. Las Vegas，USA，2007：89-94［10］Han Jiawei，Kamber M. Data Mining Concepts and Tech-niques[M]. 2nd ed. San Francisco，CA：Morgan Kauf-mann Publishers，2006.［11］战立强，刘大昕，张健沛. 基于小波滤波的时间序列子序列聚类[J]. 计算机工程与应用，2007，43(10)：4-7.Zhan Liqiang，Liu Daxin，Zhang Jianpei. Time series sub-sequence clustering based on wavelet filters[J]. Com-puter Engineering and Applications，2007，43(10)：4-7(inChinese).［12］Gonzalez R C，Woods R E. Digital Image Pr ocess-ing[M]. 3th ed. New Jersey，USA：Pearson PrenticeHall，2008.［13］Ester M，Kriegel H P，Sander J，et a1. A density-based algorithm for discovering clusters in large spatial data-bases [C]//Proceedings of 2nd International Conferenceon Knowledge Discovery and Data Mining. Poland：ACMPress，1996：226-231..。

–1– AXIONS AND OTHER VERY LIGHT BOSONS, PART III (EXPERIMENTAL LIMITS)

AXIONS AND OTHER VERY LIGHT BOSONS, PART III(EXPERIMENTAL LIMITS)(by C.Hagmann,K.van Bibber,and L.J.Rosenberg)In this section we review the experimental methodology and limits on light axions and light pseudoscalars in gen-eral.(A comprehensive overview of axion theory is given by H.Murayama in the Part I of this Review,whose notation we follow[1].)Within its scope are searches where the axion is assumed to be dark matter,searches where the Sun is presumed to be a source of axions,and purely laboratory experiments.We restrict the discussion to axions of mass m A<O(eV),as the al-lowed range for the axion mass is nominally10−6<m A<10−2 eV.Experimental work in this range predominantly has been through the axion-photon coupling g Aγ,to which the present review is conﬁned.As discussed in Part II of this Review by G.Raﬀelt,the lower bound derives from a cosmological overclo-sure argument,and the upper bound from SN1987A[2].Limits from stellar evolution overlap seamlessly above that,connecting with accelerator-based limits which ruled out the original axion. There it was assumed that the Peccei-Quinn symmetry-breaking scale was the electroweak scale,i.e.,f A∼250GeV,implying axions of mass m A∼O(100keV).These earlier limits from nuclear transitions,particle decays,etc.,while not discussed here,are included in the Listings.While the axion mass is well determined by the Peccei-Quinn scale,i.e.,m A=0.62eV(107GeV/f A),the axion-photon coupling g Aγis not:g Aγ=(α/πf A)gγ,with gγ= (E/N−1.92)/2,where E/N is a model-dependent number.It is noteworthy however,that two quite distinct models lead to axion-photon couplings which are not very diﬀerent.For the case of axions imbedded in Grand Uniﬁed Theories,the DFSZ axion[3],gγ=0.37,whereas in one popular implementation of the“hadronic”class of axions,the KSVZ axion[4],gγ=−0.96. The Lagrangian L=g AγE·BφA,withφA the axionﬁeld, permits the conversion of an axion into a single real photon in an external electromagneticﬁeld,i.e.,a Primakoﬀinteraction. In the case of relativistic axions,kγ−k A∼m2A/2ω ω, pertinent to several experiments below,coherent axion-photonmixing in long magneticﬁelds results in signiﬁcant conversion probability even for very weakly coupled axions[5].Below are discussed several experimental techniques con-straining g Aγ,and their results.Also included are recent but yet-unpublished results,and projected sensitivities for experi-ments soon to be upgraded.III.1.Microwave cavity experiments:Possibly the most promising avenue to the discovery of the axion presumes that axions constitute a signiﬁcant fraction of the dark matter halo of our galaxy.The maximum likelihood density for the Cold Dark Matter(CDM)component of our galactic halo is ρCDM=7.5×10−25g/cm3(450MeV/cm3)[6].That the CDM halo is in fact made of axions(rather than e.g.WIMPs)is in principle an independent assumption,however should very light axions exist they would almost necessarily be cosmologically abundant[2].As shown by Sikivie[7],halo axions may be de-tected by their resonant conversion into a quasi-monochromatic microwave signal in a high-Q cavity permeated by a strong mag-neticﬁeld.The cavity is tunable and the signal is maximum when the frequencyν=m A(1+O(10−6)),the width of the peak representing the virial distribution of thermalized axions in the galactic gravitational potential.The signal may possess ultra-ﬁne structure due to axions recently fallen into the galaxy and not yet thermalized[8].The feasibility of the technique was established in early experiments of small sensitive volume, V=O(1liter)[9,10]with High Electron Mobility Transistor (HEMT)ampliﬁers,which set limits on axions in the mass range4.5<m A<16.3µeV,but at power sensitivity levels2–3 orders of magnitude too high to see KSVZ and DFSZ axions (the conversion power P A→γ∝g2Aγ).A recent large-scale ex-periment(B∼7.5T,V∼200liter)has achieved sensitivity to KSVZ axions over a narrow mass range2.77<m A<3.3µeV, and continues to take data[11].The exclusion regions shown in Fig.1for Refs.[9–12]are all normalized to the best-ﬁt Cold Dark Matter densityρCDM=7.5×10−25g/cm3(450MeV/cm3), and90%CL.Recent developments in DC SQUID ampliﬁers[12] and Rydberg atom single-quantum detectors[13]promise dra-matic improvements in noise temperature,which will enablerapid scanning of the axion mass range at or below the DFSZ limit.The region of the microwave cavity experiments is shown in detail in Fig.2.101010101010101010m(eV)AFigure1:Exclusion region in mass vs.axion-photoncoupling(m A,g Aγ)for various experiments.The limitset by globular cluster Horizontal Branch Stars(“HBStars”)is shown for Ref.2.III.2.Telescope search for eV axions:For axions of mass greater than about10−1eV,their cosmological abundance is no longer dominated by vacuum misalignment or string ra-diation mechanisms,but rather by thermal production.Their contribution to the critical density is small,Ω∼0.01(m A/eV). However,the spontaneous-decay lifetime of axions,τ(A→2γ)∼1025sec(m A/eV)−5while irrelevant forµeV axions,is short enough to aﬀord a powerful constraint on such thermally produced axions in the eV range,by looking for a quasi-monochromatic photon line from galactic clusters.This line, corrected for Doppler shift,would be at half the axion mass and10-10-10-10-10610-510-10-10-m g γ2()[]GeV -2eV 2A A m [eV]A 10-Figure 2:Exclusion region from the microwave cavity experiments,where the plot is ﬂattened by present-ing (g Aγ/m A )2vs.m A .The ﬁrst-generation experi-ments (Rochester-BNL-FNAL,“RBF”[9];University of Florida,“UF”[10])and the US large-scale exper-iment in progress (“US”[11])are all HEMT-based.Shown also is the full mass range to be covered by the latter experiment (shaded line),and the im-proved sensitivity when upgraded with DC SQUID ampliﬁers [12](shaded dashed line).The expected performance of the Kyoto experiment based on a Ry-dberg atom single-quantum receiver (dotted line)is also shown [13].its width would be consistent with the observed virial motion,typically ∆λ/λ∼10−2.The expected line intensity would be of the order I A ∼10−17(m A /3eV)7erg cm −2arcsec −2˚A −1sec −1for DFSZ axions,comparable to the continuum night emission.The conservative assumption is made that the relative density of thermal axions fallen into the cluster gravitational poten-tial reﬂects their overall cosmological abundance.A search for thermal axions in three rich Abell clusters was carried out at Kitt Peak National Laboratory [14];no such line was observed between 3100–8300˚A (m A =3–8eV)after “on-oﬀﬁeld”sub-traction of the atmospheric molecular background spectra.Alimit everywhere stronger than g Aγ<10−10GeV−1is set,which is seen from Fig.1to easily exclude DFSZ axions throughout the mass range.III.3.A search for solar axions:As with the telescope search for thermally produced axions above,the search for solar axions was stimulated by the possibility of there being a “1eV window”for hadronic axions(i.e.,axions with no tree-level coupling to leptons),a“window”subsequently closed by an improved understanding of the evolution of globular cluster stars and SN1987A[2].Hadronic axions would be copiously produced within our Sun’s interior by a Primakoﬀprocess.Theirﬂux at the Earth of∼1012cm−2sec−1(m A/eV)2,which is independent of the details of the solar model,is suﬃcient for a deﬁnitive test via the axion reconversion to photons in a large magnetic ﬁeld.However,their average energy is∼4keV,implying an oscillation length in the vacuum of2π(m2A/2ω)−1∼O(mm), precluding the mixing from achieving its theoretically maximum value in any practical magnet.It was recognized that one could endow the photon with an eﬀective mass in a gas,mγ=ωpl, thus permitting the axion and photon dispersion relationships to be matched[15].Aﬁrst simple implementation of this proposal was carried out using a conventional dipole magnet with a conversion volume of variable-pressure helium gas and a xenon proportional chamber as the x-ray detector[16].The magnet wasﬁxed in orientation to take data for∼1000sec/day. Axions were excluded for g Aγ<3.6×10−9GeV−1for m A< 0.03eV,and g Aγ<7.7×10−9GeV−1for0.03eV<m A<0.11 eV(95%CL).A more ambitious experiment has recently been commissioned,using a superconducting magnet on a telescope mount to track the Sun continuously.A preliminary exclusion limit of g Aγ<6×10−10GeV−1(95%CL)has been set for m A<0.03eV[17].Another search for solar axions has been carried out,using a single crystal germanium detector.It exploits the coherent conversion of axions into photons when their angle of incidence satisﬁes a Bragg condition with a crystalline plane.Analysis of 1.94kg-yr of data from a1kg germanium detector yields abound of g Aγ<2.7×10−9GeV−1(95%CL),independent of mass up to m A∼1keV[18].III.4.Photon regeneration(“invisible light shining through walls”):Photons propagating through a transverse ﬁeld(with E B)may convert into axions.For light axions with m2A l/2ω 2π,where l is the length of the magnetic ﬁeld,the axion beam produced is colinear and coherent with the photon beam,and the conversion probabilityΠis given byΠ∼(1/4)(g AγBl)2.An ideal implementation for this limit is a laser beam propagating down a long,superconducting dipole magnet like those for high-energy physics accelerators. If another such dipole magnet is set up in line with the ﬁrst,with an optical barrier interposed between them,then photons may be regenerated from the pure axion beam in the second magnet and detected[19].The overall probability P(γ→A→γ)=Π2.Such an experiment has been carried out,utilizing two magnets of length l=4.4m and B=3.7T. Axions with mass m A<10−3eV,and g Aγ>6.7×10−7GeV−1 were excluded at95%CL[20,21].With suﬃcient eﬀort,limits comparable to those from stellar evolution would be achievable. Due to the g4Aγrate suppression however,it does not seem feasible to reach standard axion couplings.III.5.Polarization experiments:The existence of axions can aﬀect the polarization of light propagating through a transverse magneticﬁeld in two ways[22].First,as the E component,but not the E⊥component will be depleted by the production of real axions,there will be in general a small rotation of the polarization vector of linearly polarized light. This eﬀect will be a constant for all suﬃciently light m A such that the oscillation length is much longer than the magnet (m2A l/2ω 2π).For heavier axions,the eﬀect oscillates and diminishes with increasing m A,and vanishes for m A>ω.The second eﬀect is birefringence of the vacuum,again because there can be a mixing of virtual axions in the E state,but not for the E⊥state.This will lead to light which is initially linearly polarized becoming elliptically polarized.Higher-order QED also induces vacuum birefringence,and is much stronger thanthe contribution due to axions.A search for both polarization-rotation and induced ellipticity has been carried out with the same magnets described in Sec.(III.4)above[21,23].As in the case of photon regeneration,the observables are boosted linearly by the number of passes the laser beam makes in an optical cavity within the magnet.The polarization-rotation resulted in a stronger limit than that from ellipticity,g Aγ< 3.6×10−7GeV−1(95%CL)for m A<5×10−4eV.The limits from ellipticity are better at higher masses,as they fall oﬀsmoothly and do not terminate at m A.There are two experiments in construction with greatly improved sensitivity which while still far from being able to detect standard axions, should measure the QED“light-by-light”contribution for the ﬁrst time[24,25].The overall envelope for limits from the laser-based experiments in Sec.(III.4)and Sec.(III.5)is shown schematically in Fig.1.References1.H.Murayama,Part I(Theory)of this Review.2.G.Raﬀelt,Part II(Astrophysical Constraints)of thisReview.3.M.Dine et al.,Phys.Lett.B104,199(1981);A.Zhitnitsky,Sov.J.Nucl.Phys.31,260(1980).4.J.Kim,Phys.Rev.Lett.43,103(1979);M.Shifman et al.,Nucl.Phys.B166,493(1980).5.G.Raﬀelt and L.Stodolsky,Phys.Rev.D37,1237(1988).6. E.Gates et al.,Ap.J.449,123(1995).7.P.Sikivie,Phys.Rev.Lett.51,1415(1983);52(E),695(1984);Phys.Rev.D32,2988(1985).8.P.Sikivie and J.Ipser,Phys.Lett.B291,288(1992);P.Sikivie et al.,Phys.Rev.Lett.75,2911(1995).9.S.DePanﬁlis et al.,Phys.Rev.Lett.59,839(1987);W.Wuensch et al.,Phys.Rev.D40,3153(1989).10. C.Hagmann et al.,Phys.Rev.D42,1297(1990).11. C.Hagmann et al.,Phys.Rev.Lett.80,2043(1998).12.M.M¨u ck et al.,to be published in Appl.Phys.Lett.13.I.Ogawa et al.,Proceedings II.RESCEU Conference on“Dark Matter in the Universe and its Direct Detection,”p.175,Universal Academy Press,ed.M.Minowa(1997).14.M.Bershady et al.,Phys.Rev.Lett.66,1398(1991);M.Ressell,Phys.Rev.D44,3001(1991).15.K.van Bibber et al.,Phys.Rev.D39,2089(1989).16. zarus et al.,Phys.Rev.Lett.69,2333(1992).17.M.Minowa,Proceedings International Workshop Non-Accelerator New Physics,Dubna(1997),and private com-munication(1998).18. F.Avignone III et al.,ibid.19.K.van Bibber et al.,Phys.Rev.Lett.59,759(1987).A similar proposal has been made for exactly masslesspseudoscalars:A.Ansel’m,Sov.J.Nucl.Phys.42,936 (1985).20.G.Ruoso et al.,Z.Phys.C56,505(1992).21.R.Cameron et al.,Phys.Rev.D47,3707(1993).22.L.Maiani et al.,Phys.Lett.B175,359(1986).23.Y.Semertzidis et al.,Phys.Rev.Lett.64,2988(1990).24.S.Lee et al.,Fermilab proposal E-877(1995).25. D.Bakalov et al.,Quantum Semiclass.Opt.10,239(1998).。

The opticalnear-IR spectral energy distribution of the GRB 000210 host galaxy

Abstract. We report on UBVRIZJsHKs-band photometry of the dark GRB
000210 host galaxy. Fitting a grid of spectral templates to its Spectral Energy Distribution (SED), we derived a photometric redshift (z = 0.842+−00..005442) which is in excellent agreement with the spectroscopic one (z = 0.8463 ± 0.0002; Piro et al. 2002). The best ﬁt to the SED is obtained with a blue starburst template with an age of 0.181+−00..003276 Gyr. We discuss the implications of the inferred low value of AV and the age of the dominant stellar population for the non detection of the GRB 000210 optical afterglow.
arXiv:astro-ph/0301564v1 29 Jan 2003
GAIA Spectroscopy, Science and Technology ASP Conference Series, Vol. XXX, 2002 U.Munari ed.
The optical/near-IR spectral energy distribution of the GRB 000210 host galaxy

聚类A survey of Clustering Algorithms

14A survey of Clustering AlgorithmsLior RokachDepartment of Information Systems EngineeringBen-Gurion University of the Negevliorrk@bgu.ac.ilSummary.This chapter presents a tutorial overview of the main clustering methods used in Data Mining.The goal is to provide a self-contained review of the concepts and the mathemat-ics underlying clustering techniques.The chapter begins by providing measures and criteria that are used for determining whether two objects are similar or dissimilar.Then the clustering methods are presented,divided into:hierarchical,partitioning,density-based,model-based,grid-based,and soft-computing methods.Following the methods,the challenges of perform-ing clustering in large data sets are discussed.Finally,the chapter presents how to determine the number of clusters.Key words:Clustering,K-means,Intra-cluster homogeneity,Inter-cluster separa-bility,14.1IntroductionClustering and classiﬁcation are both fundamental tasks in Data Mining.Classiﬁ-cation is used mostly as a supervised learning method,clustering for unsupervised learning (some clustering models are for both).The goal of clustering is descriptive,that of classiﬁcation is predictive (Veyssieres and Plant,1998).Since the goal of clus-tering is to discover a new set of categories,the new groups are of interest in them-selves,and their assessment is intrinsic.In classiﬁcation tasks,however,an important part of the assessment is extrinsic,since the groups must reﬂect some reference set of classes.“Understanding our world requires conceptualizing the similarities and differences between the entities that compose it”(Tyron and Bailey,1970).Clustering groups data instances into subsets in such a manner that similar in-stances are grouped together,while different instances belong to different groups.The instances are thereby organized into an efﬁcient representation that character-izes the population being sampled.Formally,the clustering structure is represented as a set of subsets C =C 1,...,C k of S ,such that:S = k i =1C i and C i ∩C j =/0for i =j .Consequently,any instance in S belongs to exactly one and only one subset.O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook , 2nd ed., DOI 10.1007/978-0-387-09823-4_14, © Springer Science+Business Media, LLC 2010270Lior RokachClustering of objects is as ancient as the human need for describing the salient characteristics of men and objects and identifying them with a type.Therefore,it embraces various scientiﬁc disciplines:from mathematics and statistics to biology and genetics,each of which uses different terms to describe the topologies formed using this analysis.From biological “taxonomies”,to medical “syndromes”and ge-netic “genotypes”to manufacturing ”group technology”—the problem is identical:forming categories of entities and assigning individuals to the proper groups within it.14.2Distance MeasuresSince clustering is the grouping of similar instances/objects,some sort of measure that can determine whether two objects are similar or dissimilar is required.There are two main type of measures used to estimate this relation:distance measures and similarity measures.Many clustering methods use distance measures to determine the similarity or dissimilarity between any pair of objects.It is useful to denote the distance between two instances x i and x j as:d(x i ,x j ).A valid distance measure should be symmetric and obtains its minimum value (usually zero)in case of identical vectors.The dis-tance measure is called a metric distance measure if it also satisﬁes the following properties:1.Triangle inequality d(x i ,x k )≤d(x i ,x j )+d(x j ,x k )∀x i ,x j ,x k ∈S .2.d(x i ,x j )=0⇒x i =x j ∀x i ,x j ∈S .14.2.1Minkowski:Distance Measures for Numeric AttributesGiven two p -dimensional instances,x i =(x i 1,x i 2,...,x ip )and x j =(x j 1,x j 2,...,x jp ),The distance between the two data instances can be calculated using the Minkowski metric (Han and Kamber,2001):d (x i ,x j )=( x i 1−x j 1 g + x i 2−x j 2 g +...+ x ip −x jp g )1/gThe commonly used Euclidean distance between two objects is achieved when g =2.Given g =1,the sum of absolute paraxial distances (Manhattan metric)is obtained,and with g=∞one gets the greatest of the paraxial distances (Chebychev metric).The measurement unit used can affect the clustering analysis.To avoid the de-pendence on the choice of measurement units,the data should be standardized.Stan-dardizing measurements attempts to give all variables an equal weight.However,if each variable is assigned with a weight according to its importance,then the weighted distance can be computed as:d (x i ,x j )=(w 1 x i 1−x j 1 g +w 2 x i 2−x j 2 g +...+w p x ip −x jp g )1/g where w i ∈[0,∞)14A survey of Clustering Algorithms271 14.2.2Distance Measures for Binary AttributesThe distance measure described in the last section may be easily computed for continuous-valued attributes.In the case of instances described by categorical,bi-nary,ordinal or mixed type attributes,the distance measure should be revised.In the case of binary attributes,the distance between objects may be calculated based on a contingency table.A binary attribute is symmetric if both of its states are equally valuable.In that case,using the simple matching coefﬁcient can assess dissimilarity between two objects:d(x i,x j)=r+sq+r+s+twhere q is the number of attributes that equal1for both objects;t is the number of attributes that equal0for both objects;and s and r are the number of attributes that are unequal for both objects.A binary attribute is asymmetric,if its states are not equally important(usually the positive outcome is considered more important).In this case,the denominator ignores the unimportant negative matches(t).This is called the Jaccard coefﬁcient:d(x i,x j)=r+s q+r+s14.2.3Distance Measures for Nominal AttributesWhen the attributes are nominal,two main approaches may be used:1.Simple matching:d(x i,x j)=p−m pwhere p is the total number of attributes and m is the number of matches.2.Creating a binary attribute for each state of each nominal attribute and computingtheir dissimilarity as described above.14.2.4Distance Metrics for Ordinal AttributesWhen the attributes are ordinal,the sequence of the values is meaningful.In such cases,the attributes can be treated as numeric ones after mapping their range onto [0,1].Such mapping may be carried out as follows:z i,n=r i,n−1 M n−1where z i,n is the standardized value of attribute a n of object i.r i,n is that value before standardization,and M n is the upper limit of the domain of attribute a n(assuming the lower limit is1).272Lior Rokach14.2.5Distance Metrics for Mixed-Type AttributesIn the cases where the instances are characterized by attributes of mixed-type,one may calculate the distance by combining the methods mentioned above.For instance, when calculating the distance between instances i and j using a metric such as the Euclidean distance,one may calculate the difference between nominal and binary attributes as0or1(“match”or“mismatch”,respectively),and the difference between numeric attributes as the difference between their normalized values.The square of each such difference will be added to the total distance.Such calculation is employed in many clustering algorithms presented below.The dissimilarity d(x i,x j)between two instances,containing p attributes of mixedtypes,is deﬁned as:d(x i,x j)=p∑n=1δ(n)i j d(n)i jp∑n=1δ(n)i jwhere the indicatorδ(n)i j=0if one of the values is missing.The contribution of at-tribute n to the distance between the two objects d(n)(x i,x j)is computed according toits type:•If the attribute is binary or categorical,d(n)(x i,x j)=0if x in=x jn,otherwise d(n)(x i,x j)=1.•If the attribute is continuous-valued,d(n)i j=|x in−x jn|max h x hn−min h x hn ,where h runs over allnon-missing objects for attribute n.•If the attribute is ordinal,the standardized values of the attribute are computed ﬁrst and then,z i,n is treated as continuous-valued.14.3Similarity FunctionsAn alternative concept to that of the distance is the similarity function s(x i,x j)that compares the two vectors x i and x j(Duda et al.,2001).This function should be symmetrical(namely s(x i,x j)=s(x j,x i))and have a large value when x i and x j are somehow“similar”and constitute the largest value for identical vectors.A similarity function where the target range is[0,1]is called a dichotomous simi-larity function.In fact,the methods described in the previous sections for calculating the“distances”in the case of binary and nominal attributes may be considered as similarity functions,rather than distances.14.3.1Cosine MeasureWhen the angle between the two vectors is a meaningful measure of their similarity, the normalized inner product may be an appropriate similarity measure:14A survey of Clustering Algorithms273s(x i,x j)=x T i·x j x i ·x j14.3.2Pearson Correlation MeasureThe normalized Pearson correlation is deﬁned as:s(x i,x j)=(x i−¯x i)T·(x j−¯x j) x i−¯x i ·x j−¯x jwhere¯x i denotes the average feature value of x over all dimensions.14.3.3Extended Jaccard MeasureThe extended Jaccard measure was presented by(Strehl and Ghosh,2000)and it isdeﬁned as:s(x i,x j)=x T i·x jx i 2+x j2−x T i·x j14.3.4Dice Coefﬁcient MeasureThe dice coefﬁcient measure is similar to the extended Jaccard measure and it isdeﬁned as:s(x i,x j)=2x T i·x j x i 2+x j214.4Evaluation Criteria MeasuresEvaluating if a certain clustering is good or not is a problematic and controversial issue.In fact Bonner(1964)was theﬁrst to argue that there is no universal deﬁni-tion for what is a good clustering.The evaluation remains mostly in the eye of the beholder.Nevertheless,several evaluation criteria have been developed in the litera-ture.These criteria are usually divided into two categories:Internal and External. 14.4.1Internal Quality CriteriaInternal quality metrics usually measure the compactness of the clusters using some similarity measure.It usually measures the intra-cluster homogeneity,the inter-cluster separability or a combination of these two.It does not use any external information beside the data itself.274Lior RokachSum of Squared Error(SSE)SSE is the simplest and most widely used criterion measure for clustering.It is cal-culated as:SSE=K∑k=1∑∀x i∈C kx i−μk 2where C k is the set of instances in cluster k;μk is the vector mean of cluster k.The components ofμk are calculated as:μk,j=1N k∑∀x i∈C kx i,jwhere N k=|C k|is the number of instances belonging to cluster k.Clustering methods that minimize the SSE criterion are often called minimum variance partitions,since by simple algebraic manipulation the SSE criterion may bewritten as:SSE=12K∑k=1N k¯S kwhere:¯Sk=1N2k∑x i,x j∈C kx i−x j2(C k=cluster k)The SSE criterion function is suitable for cases in which the clusters form com-pact clouds that are well separated from one another(Duda et al.,2001).Other Minimum Variance CriteriaAdditional minimum criteria to SSE may be produced by replacing the value of S k with expressions such as:¯Sk=1N2k∑x i,x j∈C ks(x i,x j)or:¯Sk=minx i,x j∈C ks(x i,x j)14A survey of Clustering Algorithms275 Scatter CriteriaThe scalar scatter criteria are derived from the scatter matrices,reﬂecting the within-cluster scatter,the between-cluster scatter and their summation—the total scatter matrix.For the k th cluster,the scatter matrix may be calculated as:S k=∑x∈C k(x−μk)(x−μk)TThe within-cluster scatter matrix is calculated as the summation of the last deﬁnitionover all clusters:S W=K ∑k=1S kThe between-cluster scatter matrix may be calculated as:S B=K∑k=1N k(μk−μ)(μk−μ)Twhereμis the total mean vector and is deﬁned as:μ=1mK∑k=1N kμkThe total scatter matrix should be calculated as:S T=∑x∈C1,C2,...,C K(x−μ)(x−μ)TThree scalar criteria may be derived from S W,S B and S T:•The trace criterion—the sum of the diagonal elements of a matrix.Minimizing the trace of S W is similar to minimizing SSE and is therefore acceptable.This criterion,representing the within-cluster scatter,is calculated as:J e=tr[S W]=K∑k=1∑x∈C kx−μk 2Another criterion,which may be maximized,is the between cluster criterion:tr[S B]=K∑k=1N k μk−μ 2•The determinant criterion—the determinant of a scatter matrix roughly measures the square of the scattering volume.Since S B will be singu-lar if the number of clusters is less than or equal to the dimensionality,or if m−c is less than the dimensionality,its determinant is not an appropriate criterion.If we assume that SW is nonsingular,the determinant criterion function using this matrix may be employed:276Lior RokachJ d=|S W|=K ∑k=1S k•The invariant criterion—the eigenvaluesλ1,λ2,...,λd ofS−1W S Bare the basic linear invariants of the scatter matrices.Good partitions are ones for which the nonzero eigenvalues are large.As a result,several criteria may be derived including the eigenvalues.Three such criteria are:1.tr[S−1W S B]=d ∑i=1λi2.J f=tr[S−1T S W]=d∑i=111+λi3.|S W||S T|=d∏i=111+λiCondorcet’s CriterionAnother appropriate approach is to apply the Condorcet’s solution(1785)to the rank-ing problem(Marcotorchino and Michaud,1979).In this case the criterion is calcu-lated as following:∑C i∈C∑x j,x k∈C ix j=x ks(x j,x k)+∑C i∈C∑x j∈C i;x k/∈C id(x j,x k)where s(x j,x k)and d(x j,x k)measure the similarity and distance of the vectors x j and x k.The C-CriterionThe C-criterion(Fortier and Solomon,1996)is an extension of Condorcet’s criterion and is deﬁned as:∑C i∈C∑x j,x k∈C ix j=x k(s(x j,x k)−γ)+∑C i∈C∑x j∈C i;x k/∈C i(γ−s(x j,x k))whereγis a threshold value.Category Utility MetricThe category utility(Gluck and Corter,1985)is deﬁned as the increase of the ex-pected number of feature values that can be correctly predicted given a certain clus-tering.This metric is useful for problems that contain a relatively small number of nominal features each having small cardinality.14A survey of Clustering Algorithms277 Edge Cut MetricsIn some cases it is useful to represent the clustering problem as an edge cut minimiza-tion problem.In such instances the quality is measured as the ratio of the remaining edge weights to the total precut edge weights.If there is no restriction on the size of the clusters,ﬁnding the optimal value is easy.Thus the min-cut measure is revised to penalize imbalanced structures.14.4.2External Quality CriteriaExternal measures can be useful for examining whether the structure of the clusters match to some predeﬁned classiﬁcation of the instances.Mutual Information Based MeasureThe mutual information criterion can be used as an external measure for clustering (Strehl et al.,2000).The measure for m instances clustered using C={C1,...,C g} and referring to the target attribute y whose domain is dom(y)={c1,...,c k}is de-ﬁned as follows:C=2mg∑l=1k∑h=1m l,h log g·km l,h·mm.,l·m l,.where m l,h indicate the number of instances that are in cluster C l and also in class c h. m.,h denotes the total number of instances in the class c h.Similarly,m l,.indicates the number of instances in cluster C l.Precision-Recall MeasureThe precision-recall measure from information retrieval can be used as an external measure for evaluating clusters.The cluster is viewed as the results of a query for a speciﬁc class.Precision is the fraction of correctly retrieved instances,while re-call is the fraction of correctly retrieved instances out of all matching instances.A combined F-measure can be useful for evaluating a clustering structure(Larsen and Aone,1999).Rand IndexThe Rand index(Rand,1971)is a simple criterion used to compare an induced clus-tering structure(C1)with a given clustering structure(C2).Let a be the number of pairs of instances that are assigned to the same cluster in C1and in the same cluster in C2;b be the number of pairs of instances that are in the same cluster in C1,but not in the same cluster in C2;c be the number of pairs of instances that are in the same cluster in C2,but not in the same cluster in C1;and d be the number of pairs of instances that are assigned to different clusters in C1and C2.The quantities a and d278Lior Rokachcan be interpreted as agreements,and b and c as disagreements.The Rand index isdeﬁned as:RAND=a+da+b+c+dThe Rand index lies between0and1.When the two partitions agree perfectly,the Rand index is1.A problem with the Rand index is that its expected value of two random cluster-ing does not take a constant value(such as zero).Hubert and Arabie(1985)suggest an adjusted Rand index that overcomes this disadvantage.14.5Clustering MethodsIn this section we describe the most well-known clustering algorithms.The main reason for having many clustering methods is the fact that the notion of“cluster”is not precisely deﬁned(Estivill-Castro,2000).Consequently many clustering methods have been developed,each of which uses a different induction principle.Farley and Raftery(1998)suggest dividing the clustering methods into two main groups:hier-archical and partitioning methods.Han and Kamber(2001)suggest categorizing the methods into additional three main categories:density-based methods,model-based clustering and grid-based methods.An alternative categorization based on the in-duction principle of the various clustering methods is presented in(Estivill-Castro, 2000).14.5.1Hierarchical MethodsThese methods construct the clusters by recursively partitioning the instances in ei-ther a top-down or bottom-up fashion.These methods can be sub-divided as follow-ing:•Agglomerative hierarchical clustering—Each object initially represents a clus-ter of its own.Then clusters are successively merged until the desired cluster structure is obtained.•Divisive hierarchical clustering—All objects initially belong to one cluster.Then the cluster is divided into sub-clusters,which are successively divided into their own sub-clusters.This process continues until the desired cluster structure is obtained.The result of the hierarchical methods is a dendrogram,representing the nested grouping of objects and similarity levels at which groupings change.A clustering of the data objects is obtained by cutting the dendrogram at the desired similarity level.The merging or division of clusters is performed according to some similarity measure,chosen so as to optimize some criterion(such as a sum of squares).The hierarchical clustering methods could be further divided according to the manner that the similarity measure is calculated(Jain et al.,1999):•Single-link clustering(also called the connectedness,the minimum method or the nearest neighbor method)—methods that consider the distance between two clusters to be equal to the shortest distance from any member of one cluster to any member of the other cluster.If the data consist of similarities,the similarity between a pair of clusters is considered to be equal to the greatest simi-larity from any member of one cluster to any member of the other cluster(Sneath and Sokal,1973).•Complete-link clustering(also called the diameter,the maximum method or the furthest neighbor method)-methods that consider the distance between two clusters to be equal to the longest distance from any member of one cluster to any member of the other cluster(King,1967).•Average-link clustering(also called minimum variance method)-methods that consider the distance between two clusters to be equal to the average distance from any member of one cluster to any member of the other cluster.Such clus-tering algorithms may be found in(Ward,1963)and(Murtagh,1984).The disadvantages of the single-link clustering and the average-link clustering can be summarized as follows(Guha et al.,1998):•Single-link clustering has a drawback known as the“chaining effect“:A few points that form a bridge between two clusters cause the single-link clustering to unify these two clusters into one.•Average-link clustering may cause elongated clusters to split and for portions of neighboring elongated clusters to merge.The complete-link clustering methods usually produce more compact clusters and more useful hierarchies than the single-link clustering methods,yet the single-link methods are more versatile.Generally,hierarchical methods are characterized with the following strengths:•Versatility—The single-link methods,for example,maintain good performance on data sets containing non-isotropic clusters,including well-separated,chain-like and concentric clusters.•Multiple partitions—hierarchical methods produce not one partition,but mul-tiple nested partitions,which allow different users to choose different partitions, according to the desired similarity level.The hierarchical partition is presented using the dendrogram.The main disadvantages of the hierarchical methods are:•Inability to scale well—The time complexity of hierarchical algorithms is at least O(m2)(where m is the total number of instances),which is non-linear with the number of objects.Clustering a large number of objects using a hierarchical algorithm is also characterized by huge I/O costs.•Hierarchical methods can never undo what was done ly there is no back-tracking capability.14.5.2Partitioning MethodsPartitioning methods relocate instances by moving them from one cluster to another,starting from an initial partitioning.Such methods typically require that the numberof clusters will be pre-set by the user.To achieve global optimality in partitioned-based clustering,an exhaustive enumeration process of all possible partitions is re-quired.Because this is not feasible,certain greedy heuristics are used in the form ofiterative ly,a relocation method iteratively relocates points be-tween the k clusters.The following subsections present various types of partitioningmethods.Error Minimization AlgorithmsThese algorithms,which tend to work well with isolated and compact clusters,arethe most intuitive and frequently used methods.The basic idea is toﬁnd a cluster-ing structure that minimizes a certain error criterion which measures the“distance”of each instance to its representative value.The most well-known criterion is theSum of Squared Error(SSE),which measures the total squared Euclidian distanceof instances to their representative values.SSE may be globally optimized by ex-haustively enumerating all partitions,which is very time-consuming,or by giving anapproximate solution(not necessarily leading to a global minimum)using heuristics.The latter option is the most common alternative.The simplest and most commonly used algorithm,employing a squared errorcriterion is the K-means algorithm.This algorithm partitions the data into K clusters (C1,C2,...,C K),represented by their centers or means.The center of each cluster is calculated as the mean of all the instances belonging to that cluster.Figure14.1presents the pseudo-code of the K-means algorithm.The algorithmstarts with an initial set of cluster centers,chosen at random or according to someheuristic procedure.In each iteration,each instance is assigned to its nearest clustercenter according to the Euclidean distance between the two.Then the cluster centersare re-calculated.The center of each cluster is calculated as the mean of all the instances belongingto that cluster:μk=1N kN k ∑q=1x qwhere N k is the number of instances belonging to cluster k andμk is the mean of the cluster k.A number of convergence conditions are possible.For example,the search may stop when the partitioning error is not reduced by the relocation of the centers.This indicates that the present partition is locally optimal.Other stopping criteria can be used also such as exceeding a pre-deﬁned number of iterations.The K-means algorithm may be viewed as a gradient-decent procedure,which begins with an initial set of K cluster-centers and iteratively updates it so as to de-crease the error function.Input:S(instance set),K(number of cluster)Output:clusters1:Initialize K cluster centers.2:while termination condition is not satisﬁed do3:Assign instances to the closest cluster center.4:Update cluster centers based on the assignment.5:end whileFig.14.1.K-means Algorithm.A rigorous proof of theﬁnite convergence of the K-means type algorithms is given in(Selim and Ismail,1984).The complexity of T iterations of the K-means algorithm performed on a sample size of m instances,each characterized by N at-tributes,is:O(T∗K∗m∗N).This linear complexity is one of the reasons for the popularity of the K-means algorithms.Even if the number of instances is substantially large(which often is the case nowadays),this algorithm is computationally attractive.Thus,the K-means algorithm has an advantage in comparison to other clustering methods(e.g.hierar-chical clustering methods),which have non-linear complexity.Other reasons for the algorithm’s popularity are its ease of interpretation,simplic-ity of implementation,speed of convergence and adaptability to sparse data(Dhillon and Modha,2001).The Achilles heel of the K-means algorithm involves the selection of the ini-tial partition.The algorithm is very sensitive to this selection,which may make the difference between global and local minimum.Being a typical partitioning algorithm,the K-means algorithm works well only on data sets having isotropic clusters,and is not as versatile as single link algorithms, for instance.In addition,this algorithm is sensitive to noisy data and outliers(a single out-lier can increase the squared error dramatically);it is applicable only when mean is deﬁned(namely,for numeric attributes);and it requires the number of clusters in advance,which is not trivial when no prior knowledge is available.The use of the K-means algorithm is often limited to numeric attributes.Haung (1998)presented the K-prototypes algorithm,which is based on the K-means al-gorithm but removes numeric data limitations while preserving its efﬁciency.The algorithm clusters objects with numeric and categorical attributes in a way similar to the K-means algorithm.The similarity measure on numeric attributes is the square Euclidean distance;the similarity measure on the categorical attributes is the number of mismatches between objects and the cluster prototypes.Another partitioning algorithm,which attempts to minimize the SSE is the K-medoids or PAM(partition around medoids—(Kaufmann and Rousseeuw,1987)). This algorithm is very similar to the K-means algorithm.It differs from the latter mainly in its representation of the different clusters.Each cluster is represented by the most centric object in the cluster,rather than by the implicit mean that may not belong to the cluster.The K-medoids method is more robust than the K-means algorithm in the pres-ence of noise and outliers because a medoid is less inﬂuenced by outliers or other extreme values than a mean.However,its processing is more costly than the K-means method.Both methods require the user to specify K,the number of clusters.Other error criteria can be used instead of the SSE.Estivill-Castro(2000)ana-lyzed the total absolute error ly,instead of summing up the squared error,he suggests to summing up the absolute error.While this criterion is superior in regard to robustness,it requires more computational effort.Graph-Theoretic ClusteringGraph theoretic methods are methods that produce clusters via graphs.The edges of the graph connect the instances represented as nodes.A well-known graph-theoretic algorithm is based on the Minimal Spanning Tree—MST(Zahn,1971).Inconsis-tent edges are edges whose weight(in the case of clustering-length)is signiﬁcantly larger than the average of nearby edge lengths.Another graph-theoretic approach constructs graphs based on limited neighborhood sets(Urquhart,1982).There is also a relation between hierarchical methods and graph theoretic clus-tering:•Single-link clusters are subgraphs of the MST of the data instances.Each sub-graph is a connected component,namely a set of instances in which each instance is connected to at least one other member of the set,so that the set is maximal with respect to this property.These subgraphs are formed according to some similarity threshold.•Complete-link clusters are maximal complete subgraphs,formed using a similar-ity threshold.A maximal complete subgraph is a subgraph such that each node is connected to every other node in the subgraph and the set is maximal with respect to this property.14.5.3Density-based MethodsDensity-based methods assume that the points that belong to each cluster are drawn from a speciﬁc probability distribution(Banﬁeld and Raftery,1993).The overall distribution of the data is assumed to be a mixture of several distributions.The aim of these methods is to identify the clusters and their distribution param-eters.These methods are designed for discovering clusters of arbitrary shape which are not necessarily convex,namely:x i,x j∈C kThis does not necessarily imply that:α·x i+(1−α)·x j∈C kThe idea is to continue growing the given cluster as long as the density(number of objects or data points)in the neighborhood exceeds some ly,the。

Survey of clustering data mining techniques

A Survey of Clustering Data Mining TechniquesPavel BerkhinYahoo!,Inc.pberkhin@Summary.Clustering is the division of data into groups of similar objects.It dis-regards some details in exchange for data simpliﬁrmally,clustering can be viewed as data modeling concisely summarizing the data,and,therefore,it re-lates to many disciplines from statistics to numerical analysis.Clustering plays an important role in a broad range of applications,from information retrieval to CRM. Such applications usually deal with large datasets and many attributes.Exploration of such data is a subject of data mining.This survey concentrates on clustering algorithms from a data mining perspective.1IntroductionThe goal of this survey is to provide a comprehensive review of diﬀerent clus-tering techniques in data mining.Clustering is a division of data into groups of similar objects.Each group,called a cluster,consists of objects that are similar to one another and dissimilar to objects of other groups.When repre-senting data with fewer clusters necessarily loses certainﬁne details(akin to lossy data compression),but achieves simpliﬁcation.It represents many data objects by few clusters,and hence,it models data by its clusters.Data mod-eling puts clustering in a historical perspective rooted in mathematics,sta-tistics,and numerical analysis.From a machine learning perspective clusters correspond to hidden patterns,the search for clusters is unsupervised learn-ing,and the resulting system represents a data concept.Therefore,clustering is unsupervised learning of a hidden data concept.Data mining applications add to a general picture three complications:(a)large databases,(b)many attributes,(c)attributes of diﬀerent types.This imposes on a data analysis se-vere computational requirements.Data mining applications include scientiﬁc data exploration,information retrieval,text mining,spatial databases,Web analysis,CRM,marketing,medical diagnostics,computational biology,and many others.They present real challenges to classic clustering algorithms. These challenges led to the emergence of powerful broadly applicable data2Pavel Berkhinmining clustering methods developed on the foundation of classic techniques.They are subject of this survey.1.1NotationsTo ﬁx the context and clarify terminology,consider a dataset X consisting of data points (i.e.,objects ,instances ,cases ,patterns ,tuples ,transactions )x i =(x i 1,···,x id ),i =1:N ,in attribute space A ,where each component x il ∈A l ,l =1:d ,is a numerical or nominal categorical attribute (i.e.,feature ,variable ,dimension ,component ,ﬁeld ).For a discussion of attribute data types see [106].Such point-by-attribute data format conceptually corresponds to a N ×d matrix and is used by a majority of algorithms reviewed below.However,data of other formats,such as variable length sequences and heterogeneous data,are not uncommon.The simplest subset in an attribute space is a direct Cartesian product of sub-ranges C = C l ⊂A ,C l ⊂A l ,called a segment (i.e.,cube ,cell ,region ).A unit is an elementary segment whose sub-ranges consist of a single category value,or of a small numerical bin.Describing the numbers of data points per every unit represents an extreme case of clustering,a histogram .This is a very expensive representation,and not a very revealing er driven segmentation is another commonly used practice in data exploration that utilizes expert knowledge regarding the importance of certain sub-domains.Unlike segmentation,clustering is assumed to be automatic,and so it is a machine learning technique.The ultimate goal of clustering is to assign points to a ﬁnite system of k subsets (clusters).Usually (but not always)subsets do not intersect,and their union is equal to a full dataset with the possible exception of outliersX =C 1 ··· C k C outliers ,C i C j =0,i =j.1.2Clustering Bibliography at GlanceGeneral references regarding clustering include [110],[205],[116],[131],[63],[72],[165],[119],[75],[141],[107],[91].A very good introduction to contem-porary data mining clustering techniques can be found in the textbook [106].There is a close relationship between clustering and many other ﬁelds.Clustering has always been used in statistics [10]and science [158].The clas-sic introduction into pattern recognition framework is given in [64].Typical applications include speech and character recognition.Machine learning clus-tering algorithms were applied to image segmentation and computer vision[117].For statistical approaches to pattern recognition see [56]and [85].Clus-tering can be viewed as a density estimation problem.This is the subject of traditional multivariate statistical estimation [197].Clustering is also widelyA Survey of Clustering Data Mining Techniques3 used for data compression in image processing,which is also known as vec-tor quantization[89].Dataﬁtting in numerical analysis provides still another venue in data modeling[53].This survey’s emphasis is on clustering in data mining.Such clustering is characterized by large datasets with many attributes of diﬀerent types. Though we do not even try to review particular applications,many important ideas are related to the speciﬁcﬁelds.Clustering in data mining was brought to life by intense developments in information retrieval and text mining[52], [206],[58],spatial database applications,for example,GIS or astronomical data,[223],[189],[68],sequence and heterogeneous data analysis[43],Web applications[48],[111],[81],DNA analysis in computational biology[23],and many others.They resulted in a large amount of application-speciﬁc devel-opments,but also in some general techniques.These techniques and classic clustering algorithms that relate to them are surveyed below.1.3Plan of Further PresentationClassiﬁcation of clustering algorithms is neither straightforward,nor canoni-cal.In reality,diﬀerent classes of algorithms overlap.Traditionally clustering techniques are broadly divided in hierarchical and partitioning.Hierarchical clustering is further subdivided into agglomerative and divisive.The basics of hierarchical clustering include Lance-Williams formula,idea of conceptual clustering,now classic algorithms SLINK,COBWEB,as well as newer algo-rithms CURE and CHAMELEON.We survey these algorithms in the section Hierarchical Clustering.While hierarchical algorithms gradually(dis)assemble points into clusters (as crystals grow),partitioning algorithms learn clusters directly.In doing so they try to discover clusters either by iteratively relocating points between subsets,or by identifying areas heavily populated with data.Algorithms of theﬁrst kind are called Partitioning Relocation Clustering. They are further classiﬁed into probabilistic clustering(EM framework,al-gorithms SNOB,AUTOCLASS,MCLUST),k-medoids methods(algorithms PAM,CLARA,CLARANS,and its extension),and k-means methods(diﬀer-ent schemes,initialization,optimization,harmonic means,extensions).Such methods concentrate on how well pointsﬁt into their clusters and tend to build clusters of proper convex shapes.Partitioning algorithms of the second type are surveyed in the section Density-Based Partitioning.They attempt to discover dense connected com-ponents of data,which areﬂexible in terms of their shape.Density-based connectivity is used in the algorithms DBSCAN,OPTICS,DBCLASD,while the algorithm DENCLUE exploits space density functions.These algorithms are less sensitive to outliers and can discover clusters of irregular shape.They usually work with low-dimensional numerical data,known as spatial data. Spatial objects could include not only points,but also geometrically extended objects(algorithm GDBSCAN).4Pavel BerkhinSome algorithms work with data indirectly by constructing summaries of data over the attribute space subsets.They perform space segmentation and then aggregate appropriate segments.We discuss them in the section Grid-Based Methods.They frequently use hierarchical agglomeration as one phase of processing.Algorithms BANG,STING,WaveCluster,and FC are discussed in this section.Grid-based methods are fast and handle outliers well.Grid-based methodology is also used as an intermediate step in many other algorithms (for example,CLIQUE,MAFIA).Categorical data is intimately connected with transactional databases.The concept of a similarity alone is not suﬃcient for clustering such data.The idea of categorical data co-occurrence comes to the rescue.The algorithms ROCK,SNN,and CACTUS are surveyed in the section Co-Occurrence of Categorical Data.The situation gets even more aggravated with the growth of the number of items involved.To help with this problem the eﬀort is shifted from data clustering to pre-clustering of items or categorical attribute values. Development based on hyper-graph partitioning and the algorithm STIRR exemplify this approach.Many other clustering techniques are developed,primarily in machine learning,that either have theoretical signiﬁcance,are used traditionally out-side the data mining community,or do notﬁt in previously outlined categories. The boundary is blurred.In the section Other Developments we discuss the emerging direction of constraint-based clustering,the important researchﬁeld of graph partitioning,and the relationship of clustering to supervised learning, gradient descent,artiﬁcial neural networks,and evolutionary methods.Data Mining primarily works with large databases.Clustering large datasets presents scalability problems reviewed in the section Scalability and VLDB Extensions.Here we talk about algorithms like DIGNET,about BIRCH and other data squashing techniques,and about Hoﬀding or Chernoﬀbounds.Another trait of real-life data is high dimensionality.Corresponding de-velopments are surveyed in the section Clustering High Dimensional Data. The trouble comes from a decrease in metric separation when the dimension grows.One approach to dimensionality reduction uses attributes transforma-tions(DFT,PCA,wavelets).Another way to address the problem is through subspace clustering(algorithms CLIQUE,MAFIA,ENCLUS,OPTIGRID, PROCLUS,ORCLUS).Still another approach clusters attributes in groups and uses their derived proxies to cluster objects.This double clustering is known as co-clustering.Issues common to diﬀerent clustering methods are overviewed in the sec-tion General Algorithmic Issues.We talk about assessment of results,de-termination of appropriate number of clusters to build,data preprocessing, proximity measures,and handling of outliers.For reader’s convenience we provide a classiﬁcation of clustering algorithms closely followed by this survey:•Hierarchical MethodsA Survey of Clustering Data Mining Techniques5Agglomerative AlgorithmsDivisive Algorithms•Partitioning Relocation MethodsProbabilistic ClusteringK-medoids MethodsK-means Methods•Density-Based Partitioning MethodsDensity-Based Connectivity ClusteringDensity Functions Clustering•Grid-Based Methods•Methods Based on Co-Occurrence of Categorical Data•Other Clustering TechniquesConstraint-Based ClusteringGraph PartitioningClustering Algorithms and Supervised LearningClustering Algorithms in Machine Learning•Scalable Clustering Algorithms•Algorithms For High Dimensional DataSubspace ClusteringCo-Clustering Techniques1.4Important IssuesThe properties of clustering algorithms we are primarily concerned with in data mining include:•Type of attributes algorithm can handle•Scalability to large datasets•Ability to work with high dimensional data•Ability toﬁnd clusters of irregular shape•Handling outliers•Time complexity(we frequently simply use the term complexity)•Data order dependency•Labeling or assignment(hard or strict vs.soft or fuzzy)•Reliance on a priori knowledge and user deﬁned parameters •Interpretability of resultsRealistically,with every algorithm we discuss only some of these properties. The list is in no way exhaustive.For example,as appropriate,we also discuss algorithms ability to work in pre-deﬁned memory buﬀer,to restart,and to provide an intermediate solution.6Pavel Berkhin2Hierarchical ClusteringHierarchical clustering builds a cluster hierarchy or a tree of clusters,also known as a dendrogram.Every cluster node contains child clusters;sibling clusters partition the points covered by their common parent.Such an ap-proach allows exploring data on diﬀerent levels of granularity.Hierarchical clustering methods are categorized into agglomerative(bottom-up)and divi-sive(top-down)[116],[131].An agglomerative clustering starts with one-point (singleton)clusters and recursively merges two or more of the most similar clusters.A divisive clustering starts with a single cluster containing all data points and recursively splits the most appropriate cluster.The process contin-ues until a stopping criterion(frequently,the requested number k of clusters) is achieved.Advantages of hierarchical clustering include:•Flexibility regarding the level of granularity•Ease of handling any form of similarity or distance•Applicability to any attribute typesDisadvantages of hierarchical clustering are related to:•Vagueness of termination criteria•Most hierarchical algorithms do not revisit(intermediate)clusters once constructed.The classic approaches to hierarchical clustering are presented in the sub-section Linkage Metrics.Hierarchical clustering based on linkage metrics re-sults in clusters of proper(convex)shapes.Active contemporary eﬀorts to build cluster systems that incorporate our intuitive concept of clusters as con-nected components of arbitrary shape,including the algorithms CURE and CHAMELEON,are surveyed in the subsection Hierarchical Clusters of Arbi-trary Shapes.Divisive techniques based on binary taxonomies are presented in the subsection Binary Divisive Partitioning.The subsection Other Devel-opments contains information related to incremental learning,model-based clustering,and cluster reﬁnement.In hierarchical clustering our regular point-by-attribute data representa-tion frequently is of secondary importance.Instead,hierarchical clustering frequently deals with the N×N matrix of distances(dissimilarities)or sim-ilarities between training points sometimes called a connectivity matrix.So-called linkage metrics are constructed from elements of this matrix.The re-quirement of keeping a connectivity matrix in memory is unrealistic.To relax this limitation diﬀerent techniques are used to sparsify(introduce zeros into) the connectivity matrix.This can be done by omitting entries smaller than a certain threshold,by using only a certain subset of data representatives,or by keeping with each point only a certain number of its nearest neighbors(for nearest neighbor chains see[177]).Notice that the way we process the original (dis)similarity matrix and construct a linkage metric reﬂects our a priori ideas about the data model.A Survey of Clustering Data Mining Techniques7With the(sparsiﬁed)connectivity matrix we can associate the weighted connectivity graph G(X,E)whose vertices X are data points,and edges E and their weights are deﬁned by the connectivity matrix.This establishes a connection between hierarchical clustering and graph partitioning.One of the most striking developments in hierarchical clustering is the algorithm BIRCH.It is discussed in the section Scalable VLDB Extensions.Hierarchical clustering initializes a cluster system as a set of singleton clusters(agglomerative case)or a single cluster of all points(divisive case) and proceeds iteratively merging or splitting the most appropriate cluster(s) until the stopping criterion is achieved.The appropriateness of a cluster(s) for merging or splitting depends on the(dis)similarity of cluster(s)elements. This reﬂects a general presumption that clusters consist of similar points.An important example of dissimilarity between two points is the distance between them.To merge or split subsets of points rather than individual points,the dis-tance between individual points has to be generalized to the distance between subsets.Such a derived proximity measure is called a linkage metric.The type of a linkage metric signiﬁcantly aﬀects hierarchical algorithms,because it re-ﬂects a particular concept of closeness and connectivity.Major inter-cluster linkage metrics[171],[177]include single link,average link,and complete link. The underlying dissimilarity measure(usually,distance)is computed for every pair of nodes with one node in theﬁrst set and another node in the second set.A speciﬁc operation such as minimum(single link),average(average link),or maximum(complete link)is applied to pair-wise dissimilarity measures:d(C1,C2)=Op{d(x,y),x∈C1,y∈C2}Early examples include the algorithm SLINK[199],which implements single link(Op=min),Voorhees’method[215],which implements average link (Op=Avr),and the algorithm CLINK[55],which implements complete link (Op=max).It is related to the problem ofﬁnding the Euclidean minimal spanning tree[224]and has O(N2)complexity.The methods using inter-cluster distances deﬁned in terms of pairs of nodes(one in each respective cluster)are called graph methods.They do not use any cluster representation other than a set of points.This name naturally relates to the connectivity graph G(X,E)introduced above,because every data partition corresponds to a graph partition.Such methods can be augmented by so-called geometric methods in which a cluster is represented by its central point.Under the assumption of numerical attributes,the center point is deﬁned as a centroid or an average of two cluster centroids subject to agglomeration.It results in centroid,median,and minimum variance linkage metrics.All of the above linkage metrics can be derived from the Lance-Williams updating formula[145],d(C iC j,C k)=a(i)d(C i,C k)+a(j)d(C j,C k)+b·d(C i,C j)+c|d(C i,C k)−d(C j,C k)|.8Pavel BerkhinHere a,b,c are coeﬃcients corresponding to a particular linkage.This formula expresses a linkage metric between a union of the two clusters and the third cluster in terms of underlying nodes.The Lance-Williams formula is crucial to making the dis(similarity)computations feasible.Surveys of linkage metrics can be found in [170][54].When distance is used as a base measure,linkage metrics capture inter-cluster proximity.However,a similarity-based view that results in intra-cluster connectivity considerations is also used,for example,in the original average link agglomeration (Group-Average Method)[116].Under reasonable assumptions,such as reducibility condition (graph meth-ods satisfy this condition),linkage metrics methods suﬀer from O N 2 time complexity [177].Despite the unfavorable time complexity,these algorithms are widely used.As an example,the algorithm AGNES (AGlomerative NESt-ing)[131]is used in S-Plus.When the connectivity N ×N matrix is sparsiﬁed,graph methods directly dealing with the connectivity graph G can be used.In particular,hierarchical divisive MST (Minimum Spanning Tree)algorithm is based on graph parti-tioning [116].2.1Hierarchical Clusters of Arbitrary ShapesFor spatial data,linkage metrics based on Euclidean distance naturally gener-ate clusters of convex shapes.Meanwhile,visual inspection of spatial images frequently discovers clusters with curvy appearance.Guha et al.[99]introduced the hierarchical agglomerative clustering algo-rithm CURE (Clustering Using REpresentatives).This algorithm has a num-ber of novel features of general importance.It takes special steps to handle outliers and to provide labeling in assignment stage.It also uses two techniques to achieve scalability:data sampling (section 8),and data partitioning.CURE creates p partitions,so that ﬁne granularity clusters are constructed in parti-tions ﬁrst.A major feature of CURE is that it represents a cluster by a ﬁxed number,c ,of points scattered around it.The distance between two clusters used in the agglomerative process is the minimum of distances between two scattered representatives.Therefore,CURE takes a middle approach between the graph (all-points)methods and the geometric (one centroid)methods.Single and average link closeness are replaced by representatives’aggregate closeness.Selecting representatives scattered around a cluster makes it pos-sible to cover non-spherical shapes.As before,agglomeration continues until the requested number k of clusters is achieved.CURE employs one additional trick:originally selected scattered points are shrunk to the geometric centroid of the cluster by a user-speciﬁed factor α.Shrinkage suppresses the aﬀect of outliers;outliers happen to be located further from the cluster centroid than the other scattered representatives.CURE is capable of ﬁnding clusters of diﬀerent shapes and sizes,and it is insensitive to outliers.Because CURE uses sampling,estimation of its complexity is not straightforward.For low-dimensional data authors provide a complexity estimate of O (N 2sample )deﬁnedA Survey of Clustering Data Mining Techniques9 in terms of a sample size.More exact bounds depend on input parameters: shrink factorα,number of representative points c,number of partitions p,and a sample size.Figure1(a)illustrates agglomeration in CURE.Three clusters, each with three representatives,are shown before and after the merge and shrinkage.Two closest representatives are connected.While the algorithm CURE works with numerical attributes(particularly low dimensional spatial data),the algorithm ROCK developed by the same researchers[100]targets hierarchical agglomerative clustering for categorical attributes.It is reviewed in the section Co-Occurrence of Categorical Data.The hierarchical agglomerative algorithm CHAMELEON[127]uses the connectivity graph G corresponding to the K-nearest neighbor model spar-siﬁcation of the connectivity matrix:the edges of K most similar points to any given point are preserved,the rest are pruned.CHAMELEON has two stages.In theﬁrst stage small tight clusters are built to ignite the second stage.This involves a graph partitioning[129].In the second stage agglomer-ative process is performed.It utilizes measures of relative inter-connectivity RI(C i,C j)and relative closeness RC(C i,C j);both are locally normalized by internal interconnectivity and closeness of clusters C i and C j.In this sense the modeling is dynamic:it depends on data locally.Normalization involves certain non-obvious graph operations[129].CHAMELEON relies heavily on graph partitioning implemented in the library HMETIS(see the section6). Agglomerative process depends on user provided thresholds.A decision to merge is made based on the combinationRI(C i,C j)·RC(C i,C j)αof local measures.The algorithm does not depend on assumptions about the data model.It has been proven toﬁnd clusters of diﬀerent shapes,densities, and sizes in2D(two-dimensional)space.It has a complexity of O(Nm+ Nlog(N)+m2log(m),where m is the number of sub-clusters built during the ﬁrst initialization phase.Figure1(b)(analogous to the one in[127])clariﬁes the diﬀerence with CURE.It presents a choice of four clusters(a)-(d)for a merge.While CURE would merge clusters(a)and(b),CHAMELEON makes intuitively better choice of merging(c)and(d).2.2Binary Divisive PartitioningIn linguistics,information retrieval,and document clustering applications bi-nary taxonomies are very useful.Linear algebra methods,based on singular value decomposition(SVD)are used for this purpose in collaborativeﬁlter-ing and information retrieval[26].Application of SVD to hierarchical divisive clustering of document collections resulted in the PDDP(Principal Direction Divisive Partitioning)algorithm[31].In our notations,object x is a docu-ment,l th attribute corresponds to a word(index term),and a matrix X entry x il is a measure(e.g.TF-IDF)of l-term frequency in a document x.PDDP constructs SVD decomposition of the matrix10Pavel Berkhin(a)Algorithm CURE (b)Algorithm CHAMELEONFig.1.Agglomeration in Clusters of Arbitrary Shapes(X −e ¯x ),¯x =1Ni =1:N x i ,e =(1,...,1)T .This algorithm bisects data in Euclidean space by a hyperplane that passes through data centroid orthogonal to the eigenvector with the largest singular value.A k -way split is also possible if the k largest singular values are consid-ered.Bisecting is a good way to categorize documents and it yields a binary tree.When k -means (2-means)is used for bisecting,the dividing hyperplane is orthogonal to the line connecting the two centroids.The comparative study of SVD vs.k -means approaches [191]can be used for further references.Hier-archical divisive bisecting k -means was proven [206]to be preferable to PDDP for document clustering.While PDDP or 2-means are concerned with how to split a cluster,the problem of which cluster to split is also important.Simple strategies are:(1)split each node at a given level,(2)split the cluster with highest cardinality,and,(3)split the cluster with the largest intra-cluster variance.All three strategies have problems.For a more detailed analysis of this subject and better strategies,see [192].2.3Other DevelopmentsOne of early agglomerative clustering algorithms,Ward’s method [222],is based not on linkage metric,but on an objective function used in k -means.The merger decision is viewed in terms of its eﬀect on the objective function.The popular hierarchical clustering algorithm for categorical data COB-WEB [77]has two very important qualities.First,it utilizes incremental learn-ing.Instead of following divisive or agglomerative approaches,it dynamically builds a dendrogram by processing one data point at a time.Second,COB-WEB is an example of conceptual or model-based learning.This means that each cluster is considered as a model that can be described intrinsically,rather than as a collection of points assigned to it.COBWEB’s dendrogram is calleda classiﬁcation tree.Each tree node(cluster)C is associated with the condi-tional probabilities for categorical attribute-values pairs,P r(x l=νlp|C),l=1:d,p=1:|A l|.This easily can be recognized as a C-speciﬁc Na¨ıve Bayes classiﬁer.During the classiﬁcation tree construction,every new point is descended along the tree and the tree is potentially updated(by an insert/split/merge/create op-eration).Decisions are based on the category utility[49]CU{C1,...,C k}=1j=1:kCU(C j)CU(C j)=l,p(P r(x l=νlp|C j)2−(P r(x l=νlp)2.Category utility is similar to the GINI index.It rewards clusters C j for in-creases in predictability of the categorical attribute valuesνlp.Being incre-mental,COBWEB is fast with a complexity of O(tN),though it depends non-linearly on tree characteristics packed into a constant t.There is a similar incremental hierarchical algorithm for all numerical attributes called CLAS-SIT[88].CLASSIT associates normal distributions with cluster nodes.Both algorithms can result in highly unbalanced trees.Chiu et al.[47]proposed another conceptual or model-based approach to hierarchical clustering.This development contains several diﬀerent use-ful features,such as the extension of scalability preprocessing to categori-cal attributes,outliers handling,and a two-step strategy for monitoring the number of clusters including BIC(deﬁned below).A model associated with a cluster covers both numerical and categorical attributes and constitutes a blend of Gaussian and multinomial models.Denote corresponding multivari-ate parameters byθ.With every cluster C we associate a logarithm of its (classiﬁcation)likelihoodl C=x i∈Clog(p(x i|θ))The algorithm uses maximum likelihood estimates for parameterθ.The dis-tance between two clusters is deﬁned(instead of linkage metric)as a decrease in log-likelihoodd(C1,C2)=l C1+l C2−l C1∪C2caused by merging of the two clusters under consideration.The agglomerative process continues until the stopping criterion is satisﬁed.As such,determina-tion of the best k is automatic.This algorithm has the commercial implemen-tation(in SPSS Clementine).The complexity of the algorithm is linear in N for the summarization phase.Traditional hierarchical clustering does not change points membership in once assigned clusters due to its greedy approach:after a merge or a split is selected it is not reﬁned.Though COBWEB does reconsider its decisions,its。

Scaling laws in X-ray Galaxy Clusters at redshift between 0.4 and 1.3

−1/2 3/2 4/3 −2/3
2
பைடு நூலகம்
S. Ettori et al.: Scaling laws in high-z X-ray Galaxy Clusters
Fig. 1. Correlations between the best-ﬁt parameters of the β −model: higher values of the core radius, rc , correspond to higher estimates of β . No redshift-segregation is evident. (for a ﬂat cosmology with matΩm (1 + z )3 + 1 − Ωm ter density Ωm ), and on the overdensity ∆z account for the fact that all the quantities are estimated at a given overdensity ∆z with respect to the critical density estimated at redshift z , 2 ρ c , z = 3 Hz /(8πG). Hydrodynamical simulations (e.g. Evrard, Metzler, Navarro 1996, Bryan & Norman 1998, Thomas et al. 2001, Bialek et al. 2001, Borgani et al. 2002 and references therein) and observational analyses (e.g. from Mushotzky 1984, Edge & Stewart 1991 to the more recent work of Allen & Fabian 1998, Markevitch 1998, Arnaud & Evrard 1999, Nevalainen et al. 2000, Finoguenov et al. 2001, Ettori et al. 2002) of the best-studied correlations between X-ray luminosity, total gravitating mass and gas temperature show that signiﬁcant deviations exist between observations and the expectations based on self–similar scaling: (i) a difference of the order of 30–40 per cent in the normalization of the Mtot − Tgas relation (Horner et al. 1999, Nevalainen et al. 2000); (ii) steeper slopes of the Lbol –T , Mgas –T and, possibly, Mtot –T relations, in particular when low temperature (Tgas < 3 keV) systems are also considered; (iii) excess entropy in the central regions of clusters and groups, with respect to the S ∝ T expectation (e.g., Ponman et al. 1999, 2003). Such deviations are currently interpreted as evidence for non-gravitational processes, such as extra heating and radiative cooling, that affect the assembly of the baryons in the cluster potential well (e.g. Cavaliere, Menci & Tozzi 1999, Bower et al. 2001, Tozzi & Norman 2001, Babul et al. 2002, Voit et al. 2002). The same processes that determine the shape of the local relations should have affected also the evolution of the scaling laws with the cosmic epoch. However, due to the objective

聚类分析clusteranaly课件

其中D.2. 为欧氏距离的平方
J
n.为各类中所含样品数
聚类分析clusteranaly课件 2002年11月
聚类分析clusteranaly课件 2002年11月
（六）可变类平均法
(flexible-beta method)
K
M
L
类平均法的变型
DM 2 J(1)nnM K DK 2JnnM L DL2JDK 2L J 1;SA软 S 件预置 0.25为
选项
人为固定分类数 ANOVA表，初
读写凝聚点始凝聚点等
聚类分析clusteranaly课件 2002年11月
（二）SAS聚类分析
样品聚类：PROC CLUSTER pseudo
RSQUARE STD METHOD=(AVE, AVERAGE, CEN,
CENTROID, COM, COMPLETE, DEN, DENSITY, EML, FLE, FLEXIBLE, MCQ, MCQUITTY, MED, MEDIAN, SIN,
聚类分析clusteranaly课件
1,通常情况1下～0取之－间的数
聚类分析clusteranaly课件 2002年11月
（五）类平均法
(average linkage between group)
K
M
L SPSS作为默认方法，称为 between-
groups linkage
DM2 J
nK nM
DK2J
nL nM
DL2J
冰柱的方向
聚类分析clusteranaly课件 2002年11月
Method
聚类方法
亲疏关系指标
标准化变换
聚类分析clusteranaly课件

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

a r X i v :a s t r o -p h /0310060v 1 2 O c t 2003The ESO Nearby Abell Cluster Survey 1XII.The mass and mass-to-light-ratio proﬁles of rich clustersPeter KatgertSterrewacht Leiden,Postbus 9513,Niels Bohrweg 2,2300RA Leiden,The Netherlandskatgert@strw.leidenuniv.nlAndrea Biviano INAF/Osservatorio Astronomico di Trieste,Via Tiepolo 11,I-34131Trieste,Italy biviano@ts.astro.it and Alain Mazure OAMP/Laboratoire Astrophysique de Marseille,Traverse du Siphon,Les Trois Lucs,BP 8,F-13376Marseille Cedex,France alain.mazure@oamp.fr ABSTRACT We determine the mass proﬁle of an ensemble cluster built from 3056galaxies in 59nearby clusters observed in the ESO Nearby Abell Cluster Survey.The mass proﬁle is derived from the distribution and kinematics of the Early-type (elliptical and S0)galaxies only,with projected distances from the centers of their clusters ≤1.5r 200.These galaxies are most likely to meet the conditions for the application of the Jeansequation,since they are the oldest cluster population,and are thus quite likely to be in dynamical equilibrium with the cluster potential.In addition,the assumption that the Early-type galaxies have isotropic orbits is supported by the shape of their velocity distribution.For galaxies of other types (the brightest ellipticals with M R ≤−22+5log h ,and the early and late spirals)these assumptions are much less likely to be satisﬁed.For the determination of the mass proﬁle we also exclude Early-type galaxies in subclusters.Application of the Jeans equation yields a non-parametric estimate of the cumulative mass proﬁle M (<r ),which has a logarithmic slope of −2.4±0.4in the density proﬁle at r 200(approximately the virial radius).We compare our result with several analytical models from the literature,and we estimate their best-ﬁt parameters from a comparison of observed and predicted velocity-dispersion proﬁles.We obtain acceptable solutions for all models (NFW,Moore et al.1999,softened isothermal sphere,and Burkert 1995).Our data do not provide compelling evidence for the existenceof a core;as a matter of fact,the best-ﬁtting core models have core-radii well below100h−1kpc.The upper limit we put on the size of the core-radius provides a constraintfor the scattering cross-section of dark matter particles.The total-mass density appearsto be traced remarkably well by the luminosity density of the Early-type galaxies.Onthe contrary,the luminosity density of the brightest ellipticals increases faster towardsthe center than the mass density,while the luminosity density proﬁles of the early andlate spirals are somewhatﬂatter than the mass density proﬁle.Subject headings:Galaxies:clusters:general–Galaxies:kinematics and dynamics–Cosmology:observations1.IntroductionMany attempts have been made to determine the amount and distribution of dark matter in clusters,since Zwicky(1933,1937)and Smith(1936)concluded that the mass implied by the sum of the luminosities of the galaxies falls short of the total mass by as much as a factor of10.In recent years,cosmological simulations have shown that dark matter halos have a universal density proﬁle(Navarro,Frenk,&White1996,1997;NFW hereafter),but its precise form is still debated (Moore et al.1999,M99hereafter),while the’universality’of the mass density proﬁle has also been questioned(Jing&Suto2000;Thomas et al.2001;Ricotti2002).The issue is critical,since knowledge of the total mass(visible and dark,baryonic and non-baryonic)of clusters,and of its distribution(also in relation to the light distribution),gives important clues about the formation process of the clusters(e.g.Crone,Evrard,&Richstone1994;Jing et al.1995),and of the galaxies in them(e.g.Mamon2000),as well as on the nature of dark matter(see,e.g.,Natarajan et al. 2002a).The dark matter can be‘weighed’and‘imaged’in three ways.First,the gravitational lensing of distant objects yields an estimate of the projected mass distribution in the cluster(see e.g. Tyson,Wenk,&Valdes1990;Squires et al.1996).The mass density proﬁle can subsequently be derived assuming the geometry of the cluster.Lensing mostly works for clusters at intermediate and large distances since the lensing equation is unfavourable for nearby clusters,and only few nearby cluster lenses are known(see e.g.Campusano,Kneib,&Hardy1998).Some results from gravitational lensing observations are consistent with the NFW proﬁle(e.g.Athreya et al.2002; Clowe&Schneider2001),while others favourﬂatter mass density proﬁles,such as the isothermal sphere(see Fischer&Tyson1997;Sand,Treu,&Ellis2002;Shapiro&Iliev2000;Taylor et al. 1998).A second method uses the distribution of the hot,X-ray emitting,gas in the cluster potential, and the radial variation of the temperature of the gas(see e.g.Hughes1989;Ettori,De Grandi,& Molendi2002).Recently,temperature maps have become available for a sizable number of clusters (see e.g.Markevitch et al.1998;Irwin&Bregman2000;De Grandi&Molendi2002).However, the uncertainties in the X-ray temperature proﬁles are still substantial(see e.g.Irwin,Bregman, &Evrard1999;Kikuchi et al.1999),with corresponding uncertainties in the total mass and the mass proﬁle.Moreover,X-ray observations in general sample only the inner cluster regions,even with the new class of X-ray satellites(see,e.g.,Pratt&Arnaud2002).Cooling can also play a rˆo le,although it is probably not dominant(Suginohara&Ostriker1998).Most recent results from X-ray observations indicate consistency with the NFW or the M99proﬁle(e.g.Allen,Ettori, &Fabian2001;Allen,Schmidt,&Fabian2002;Markevitch et al.1999;Pratt&Arnaud2002; Tamura et al.2000),but aﬂatter mass density proﬁle like that of the isothermal sphere seems to be preferred in some clusters(Arieli&Rephaeli2003;Ettori et al.2002).Generally,the distribution of the the X-ray emitting gas is found to beﬂatter than that of the total cluster mass(e.g.Allen et al.2001,2002;Markevitch&Vikhlinin1997;Nevalainen,Markevitch,&Forman1999,Pratt& Arnaud2002).The third,and most traditional way to probe the dark matter content and its distribution in clusters is through the kinematics and spatial distribution of’tracer particles’moving in the cluster potential.The virial theorem applied to the galaxy population gives an estimate of the total mass but,since the data in general comprise only the central part of a cluster,projection eﬀects and possible anisotropy of the velocity distribution must be taken into account for an accurate estimate (see e.g.Heisler,Tremaine,&Bahcall1985;The&White1986).In addition,the virial theorem assumes that mass follows light and if this assuption is not justiﬁed that can have a large eﬀect on a cluster mass estimate(see,e.g.,Merritt1987).Recently,Diaferio&Geller(1997;see also Diaferio1999)used the caustics in the plane of line-of-sight velocities vs.clustercentric distances to estimate the amplitude of the velocityﬁeld in the infall region.This yields an estimate of the escape velocity at several distances,and thus of the mass distribution.This method has the advantage of being free of the assumption of dynamical equilibrium,and hence can be(and has been)used to constrain the cluster mass distribution well beyond the virialization region(Geller,Diaferio,&Kurtz1999;Reisenegger et al.2000;Rines et al. 2000,2001,2002;Biviano&Girardi2003).However,systematic uncertainties limit the accuracy in the mass determination to∼50%(Diaferio1999).Results obtained with this method indicate consistency with the NFW or M99proﬁles,or strongly constrain or rule out the isothermal sphere model.The most detailed form of the’kinematical weighing’uses the distribution and kinematics of the galaxies to estimate the mass distribution,through application of the full Jeans equation of stellar dynamics(see,e.g.,Binney&Tremaine1987,BT hereafter).This method requires knowledge of the orbital characteristics of the galaxies.Aﬁrst analysis along these lines was made for the Coma cluster by Kent&Gunn(1982).Merritt(1987)used the same data to estimate the orbitalanisotropy of the galaxies for various assumptions about the dependence of the mass-to-light ratio M/L on radius.The shape of the mass proﬁle of a cluster would follow directly from the projected luminosity density proﬁle of the galaxies if the mass-to-light ratio M/L were constant.Evidence in favour of a constant M/L comes from the analyses of Carlberg,Yee,&Ellingson(1997a),van der Marel et al.(2000),and Rines et al.(2001).Carlberg et al.(2001)insteadﬁnd an increasing M/L-proﬁle in groups from the CNOC2survey,while both Rines et al.(2000)and Biviano&Girardi(2003) argue for a decreasing M/L-proﬁle.The shape of the cluster M/L-proﬁle seems to depend on which class of cluster galaxies is used to measure the light proﬁle.This is hardly surprising,since the various types of galaxies are known to have diﬀerent projected distributions.These diﬀerences are the result of the morphology-density relation,ﬁrst described by Oemler(1974)and Melnick&Sargent(1977),and described more fully by Dressler(1980)and,lately,Thomas&Katgert(2003,paper X of this series).In clusters,this relation produces diﬀerences in the radial distribution of the various galaxy classes,and those are related to diﬀerences in the kinematics.Evidence for the relation between spatial distribution and kinematics for diﬀerent cluster galaxy populations was e.g.found in the Coma cluster(Colless &Dunn1996),in CNOC clusters at z≈0.3by Carlberg et al.(1997b),in clusters observed in the ESO Nearby Abell Cluster Survey(ENACS,hereafter;de Theije&Katgert1999,paper VI; Biviano et al.2002,paper XI),as well as in other clusters(see,e.g.,Mohr et al.1996;Adami, Biviano,&Mazure1998a).Galaxies with emission lines(ELG)provide an extreme example of the eﬀect.The ELG are less centrally concentrated and have a higher r.m.s line-of-sight velocity than the galaxies without emission lines.This was clearly demonstrated by Biviano et al.(1997,paper III),using75ENACS clusters.To study the mass proﬁle in detail,one must combine the data for many galaxy systems,as was done by e.g.,Biviano et al.(1992),Carlberg et al.(1997a,1997b,1997c,2001),van der Marel et al.(2000),Biviano&Girardi(2003),and in papers III,VI,VII(Adami et al.1998b),IX(Thomas, Hartendorp,&Katgert2003),X,and XI of this series.For an‘ensemble’cluster,built from14 clusters with redshifts between about0.2and0.5,with a total of1150CNOC redshifts,Carlberg et al.(1997c)found that the number density proﬁle is consistent with the NFW mass density proﬁle. This result was conﬁrmed with a more detailed analysis by van der Marel et al.(2000).Using the ENACS dataset,in paper VII we found that the number density proﬁle of a composite cluster of29 nearby ACO clusters with smooth projected galaxy distributions did not show the central cusp of the NFW mass proﬁle,in particular when the brightest galaxies are removed from the sample.This could mean that M/L increases towards the center.Recently,Biviano&Girardi(2003)analysed an’ensemble’cluster,built from43nearby clusters observed in the2dF Galaxy Redshift Survey (Colless et al.2001).From that sample,which has only three clusters(A957,A978and A2734)in common with the present sample,they concluded that both cuspy proﬁles of the NFW and M99 form,and proﬁles with a core are acceptable,as long as the core radius is suﬃciently small.The above results indicate the importance of a study of the mass proﬁle and of the radial dependence of M/L.In this paper we present such a study,based on a sample of rich nearby clusters observed in the ENACS.This paper is organised as follows.In§2we summarize the data.In§3we justify our choice to use the Early-type galaxies that are not in substructures as tracers of the potential.In§4we present the number-density and velocity-dispersion proﬁles of the Early-type galaxies and we describe how we obtained a non-parametric estimate of the cluster mass proﬁle via direct solution of the Jeans equation.In§5we compare our result with models,and we derive the best-ﬁt models from a comparison of the observed and predicted velocity-dispersion proﬁles.In§6we derive the radial dependence of the M/L-ratio,for all galaxies together and for the Early-type galaxies.In§7we discuss our results,which are summarized in§8,where we also give our conclusions.In Appendix A we describe our method of interloper rejection.In Appendix B we detail the methods by which we determined the number-density,luminosity-density, and velocity-dispersion proﬁles.In Appendix C we review and discuss the basic assumptions made in the determination of the mass proﬁle.2.The dataOur determination of the mass proﬁle of rich clusters is based on data obtained in the context of the ENACS.The multi-objectﬁber spectroscopy with the3.6-m telescope at La Silla is described in Katgert et al.(1996,1998,papers I and V of this series,respectively).In those papers,the photometry of the5634galaxies in107rich,nearby(z 0.1)Abell clusters is also discussed.After the spectroscopic survey was done,CCD images were obtained with the Dutch92-cm telescope at La Silla for2295ENACS galaxies.These have yielded morphological types(Thomas2003,paper VIII),which were used to reﬁne and recalibrate the galaxy classiﬁcation based on the ENACS spectra,as carried out previously in paper VI.The CCD images also yielded structural parameters, through a decomposition of the brightness proﬁles into bulge and disk contributions(paper IX).The ENACS morphological types were supplemented with morphological types from the liter-ature,and subsequently combined with the spectral types into a single classiﬁcation scheme.This has provided galaxy types for4884ENACS galaxies,of which56%are morphological,35%are spectroscopic,6%are a combination of morphological an spectral types,and the remaining3% were special in that they had an early morphological type(E or S0)but showed emission lines in the spectrum.These galaxy types were used to study the morphology-radius and morphology-density relations(paper X).They also form the basis of the study of morphology and luminosity segregation which uses velocities as well as positions(paper XI).In combining the data for an ensemble of many clusters,all projected clustercentric distances were expressed in terms of r200,as derived from the global velocity dispersion(see e.g.Carlberg et al.1997c).This ensures that we avoid,as much as possible,mixing inner virialized cluster regions with external non-virialized cluster regions.Similarly,all galaxy velocities,relative to the mean velocity of their parent cluster,were expressed in terms of the global velocity dispersion of theirparent cluster,σp.The ensemble cluster eﬀectively represents each of our clusters if these form a homologous set,and if we adopted the correct scaling.An indication that clusters form a homologous set comes fromthe existence of a fundamental plane that relates some of the cluster global properties(Schaeﬀeret al.1993;Adami et al.1998c,paper IV of this series).As shown by Beisbart,Valdarnini,&Buchert(2001)clusters with substructure tend to deviate from the fundamental plane,therebyviolating homology.The exclusion of all clusters with even a minor amount of substructure wouldhave greatly reduced the size of the ensemble cluster.Instead,by eliminating the galaxies that intheir respective clusters are in substructures,we have tried to reduce the eﬀects of substructure inour analysis as much as possible.In paper XI we combined the data for59clusters with z<0.1,each with at least20membergalaxies with ENACS redshifts,and with galaxy types for at least80%of the members(see Ta-ble A.1in paper XI).The ensemble cluster contains3056member galaxies,for2948(or96%)ofwhich a galaxy type is known.The rejection of interlopers was based on the method of den Hartog&Katgert(1996),which is summarized in Appendix A.In this ensemble cluster it was found thatgalaxies in substructure have diﬀerent phase-space distributions from those outside substructure,and that this is true for all galaxy types.The analysis of the mass proﬁle cannot be based on the sample of galaxies within substructurebecause the members of substructure are orbiting together,so that their kinematics does not onlyprobe the large-scale properties of the cluster potential.For the galaxies outside substructure,it wasfound that there are4galaxy classes that must be distinguished because they have diﬀerent phase-space distributions.These4classes are:(i)the brightest ellipticals(with M R≤−22+5log h), which we refer to as’E br’in the following,(ii)the other ellipticals together with the S0galaxies(referred to as’Early-type’galaxies,in the following),(iii)the early spirals(Sa–Sb),which we referto as’S e’in the following,and(iv)the late spirals and irregulars(Sbc–Ir)together with the ELG(except those with early morphology),globally referred to as’S l’in what follows.In Table1we show the number of cluster members outside substructure in the59clusters,ineach of the4galaxy classes deﬁned above.As explained in Appendix B,in building the numberdensity proﬁles we need to correct for sampling incompleteness.In order to keep the correctionfactor small,galaxies located in poorly-sampled regions are not used in the present analysis,andare not counted in Table1.Table1:The numbers of galaxies outside substructureE br Early S e S l341129177328Fig.1.—The velocity distribution of the Early-type class,with the best-ﬁtting Gaussian(dashed-line),and the best-ﬁtting Gauss-Hermite polynomial(dash-dotted line)superposed.3.The Early-type galaxies as tracers of the mass proﬁleSince we want to derive the mass proﬁle of the ensemble cluster by application of the Jeans equation of stellar dynamics,we must assess the suitability of each of the4galaxy classes to serve as tracers of the potential.To begin with,suitable tracers should be in equilibrium with the cluster potential.However,application of the Jeans equation also requires that the orbital characteristics are known or,in other words:that we have information on the(an-)isotropy of the velocity distribution.For that reason,and in view of the results in papers III and VI,it is very unlikely that the S l can be used,since the analyses in these papers suggest that they may be onﬁrst-approach orbits towards the central regions.This could still mean that they are in equilibrium with the potential, but as we have no a priori knowledge of their orbits,except that they are unlikely to be isotropic, these galaxies do not qualify as good tracers.The E br could,for all that we know,have an isotropic velocity distribution.However,it is unlikely that,as a class,they satisfy the basic assumption underlying the Jeans equation,namely that it describes a collisionless particleﬂuid.The usual interpretation that the brightest ellipticals have been directed to the central regions through dynamical friction after which they have grown at the expense of other galaxies might make them unﬁt for estimates of the mass proﬁle.As to the choice between the two remaining classes,the relative statistical weights clearly point to the use of the Early-type galaxies as tracers of the mass proﬁle.However,in addition to this pragmatic argument there are other,more fundamental reasons for this choice.The most important one is that the stellar population in most ellipticals is quite old,and evidence is mounting that most of the ellipticals formed before they entered the cluster.This should have allowed them to settle in the potential and become good tracers of it.As was shown in paper XI,the(R,v)-distribution of the S0galaxies is very similar to that of the ellipticals.Atﬁrst,this may appear somewhat surprising.Although the stellar population of many S0galaxies is probably as old as that of the ellipticals,it is generally believed that a signiﬁcant fraction of S0’s has formed rather recently through the transformation of early spirals by impulsive encounters.Apparently,the present rate of formation of S0’s is suﬃciently low that the bulk of the transformation of early spirals has taken place suﬃciently long ago,so that the phase space distribution of the S0’s has relaxed to that of the older population of E’s.Use of the Early-type galaxies requires that we know,or can make an educated guess about, the anisotropy of their velocity distribution.We will assume that their orbits are isotropic,and this assumption can be checked to some extent because the distribution of relative radial velocities depends on the anisotropy of the3-D velocity distribution,as was shown by Merritt(1987)and van der Marel et al.(2000).By calculating the even-order Gauss-Hermite coeﬃcients for the velocity distribution of the CNOC1survey,van der Marel et al.(2000)showed that they could constrain the range of allowed values of the velocity anisotropy,from a comparison with dynamical models.Fig.2.—Top:The best LOWESS estimate(solid line)of I(R)of the Early-type galaxies,within the1-σconﬁdence interval determined from bootstrap resamplings(dashed lines).Bottom:The best estimate(solid line)ofν(r),extrapolated to r=6.67r200,within the1-σconﬁdence interval determined from bootstrap resamplings(dashed lines).Fig.3.—Top:The best LOWESS estimate(heavy line)ofσp(R)of the Early-type galaxies,with 68%conﬁdence levels(dashed lines),together with binned estimates.Bottom:The best estimate (heavy line)of<v2r>(r),with68%conﬁdence levels.The velocity distribution of the Early-type galaxies in our ensemble cluster is shown in Fig.1, with the best-ﬁtting Gaussian and the best-ﬁtting Gauss-Hermite polynomial superposed.It ap-pears that the deviations from a Gaussian are very small.This is conﬁrmed by the values of h4 and h6,which are−0.016and0.005.Although we refrain from constructing plausible distribution functions,the projected number density–and therefore also the3-D number density–is suﬃciently close to that used by van der Marel et al.(2000)that we can conclude from their Fig.8that our assumptionβ(r)=1−<v2t>(r)/<v2r>(r)≡0for the E+S0class is very plausible(as a matter of fact we conclude that−0.6 β 0.1,or,equivalently,0.8r2 rν(r)<v2r>(r)drr2−R2,(1) where,as before:<v2t>(r)β(r)≡1−and<v2r>(r),<v2t>(r)are the mean squared components of the radial and tangential velocity.For the special caseβ(r)≡0the solution is:<v2r>(r)=−1dRdRR2−r2.(3)The<v2r>(r)proﬁle and c.l.resulting from I(R),σp(R),and the c.l.onσp(R)(see Appendix B for details)are shown in the lower panel of Fig.3.The mass proﬁle then follows directly from the Jeans equation(see,e.g.,BT):M(<r)=−r<v2r>d ln r+d ln<v2r>(r/r s)(1+r/r s)2(5)ρM99(r)=ρ01+(r/r c)2(7)ρBurkert(r)=ρ02Since the median velocity dispersion of our cluster sample is699km s−1,and the median value of r200is1.2 h−1Mpc one mass unit corresponds to≈1.4×1014h−1M⊙Fig. 4.—Top:The mass proﬁle M(<r)calculated from the Jeans equation,using the Early-type galaxies as tracers of the potential(heavy solid line).Isotropic orbits were assumed,and the approximate68%conﬁdence region is indicated by the shading.The mass scale is in arbitrary units.Also shown are the best-ﬁt mass models of the NFW(long dashes),M99(short dashes), SIS(dotted line)and Burkert(dash-dotted line)types.Note that the best-ﬁt mass models were not derived fromﬁts to M(<r),but from a comparison of the observed and predicted velocity-dispersion proﬁles(see§5).Bottom:the mass density proﬁleρ(r)derived by diﬀerentiating our observed M(<r)(heavy solid line)compared with the4models(same coding as above).to the early and late spirals(again,assuming isotropy).The S e mass proﬁle is not very diﬀerent from that in Fig.4,but the S l mass proﬁle is considerably steeper,which supports our assumption. In a forthcoming paper(Biviano&Katgert,2004),we discuss in detail the dynamical equilibrium (including the orbital anisotropies)of the E br,S e and S l classes,in the potential derived here from the data of the Early-type galaxies.parison with model mass-proﬁlesIn order to compare our observed mass proﬁle with models from the literature,we work in the domain of observables,viz.inσp(R).This is possible because one can solve for<v2r>(r)using the observedν(r),an assumed mass proﬁle M(<r),and an assumedβ(r),as follows:ν(r)<v2r>(r)=−G ∞rν(ξ)M(<ξ)x dξ(9) This solution,given by van der Marel(1994),is a special case of a more general solution of the Jeans equation which was developed by Bacon,Simien,&Monnet(1983)for building dynamical models of elliptical galaxies.Forβ=0the above equation reduces to:ν(r)<v2r>(r)=−G ∞rν(x)M(<x)x−2dx(10)Given a model mass proﬁle,M(<r),and an observed number-density proﬁle,ν(r),it is thus possible to compute model projected velocity-dispersion proﬁles,through eq.10and the usual Abel relation.The comparison between the model and the observed velocity-dispersion proﬁle yields a value forχ2by which we measure the acceptability of the assumed mass-proﬁle model.A straightforward determination of theχ2value requires the observed proﬁle to have independent data-points.In this comparison we therefore use a binnedσp(R)-proﬁle,rather than the LOWESS proﬁle.Only the data points within the virial radius r200are considered for theﬁt,since galaxies at larger radii might not yet have relaxed to dynamical equilibrium.The uncertainty in the observedν(R), determined from the bootstrap resamplings of I(R)(see Appendix B.1)is taken into account in the χ2-analysis.In practice,for each bootstrap resampling of I(R)we have a correspondingν(r),and hence,a diﬀerent model velocity dispersion proﬁle,obtained via eq.10.This allows us to compute an r.m.s.for each point of the model velocity dispersion proﬁle.The model r.m.s.is then added in quadrature to the uncertainty of the observedσp(R)to give the full uncertainty to be used in the χ2-analysis.We checked that our results are robust w.r.t.diﬀerent choices of both the binning radii for σp(R),and the way in which I(R)is extrapolated to large radii to yieldν(r)through the Abel inversion(see Appendix B).Diﬀerent choices of theσp(R)binning and of the I(R)extrapolation aﬀect the best-ﬁt values of the mass model parameters by 15%and 10%,respectively.Fig.5.—The observed velocity-dispersion proﬁle of the Early-type galaxies(ﬁlled circles with error bars)and the four velocity-dispersion proﬁles predicted for the4mass models:NFW(long dashes), M99(short dashes),SIS(dotted line)and Burkert(dash-dotted line).Note that theﬁts do notinclude the outermost point(open circles).We consider two’cuspy’model mass proﬁles(NFW and M99),and two proﬁles with a‘core’(SIS and that of Burkert1995)–see§3.All four model mass proﬁles provide acceptableﬁts to the observedσp(R),although the models of NFW and Burkert provide marginally betterﬁts than the M99and,in particular,the SIS models.More speciﬁcally,using theχ2statistics,we estimate the rejection probabilities of the best-ﬁt NFW,M99,SIS and Burkert models to be19%,28%,72% and11%,respectively.Note that theχ2values were calculated for the points with R/r200<1.0, because inclusion of data beyond the virial radius does not seem very sensible.As a mattter of fact,if the outmost point is included all4models provideﬁts of similar quality.The68%c.l.range for the NFW-proﬁle scale parameter is0.15≤r s/r200≤0.40,with a best-ﬁt value of r s=0.25r200.In terms of the concentration parameter,the best-ﬁt value is c≡r200/r s=4+2.7−1.5.The68%c.l.range for the M99-proﬁle scale parameter is0.30≤r M/r200≤0.75 with a best-ﬁt value of r M=0.45r200.On the other hand,for no value of the scale parameter r c is the SIS model acceptable at the68%c.l.or better.The best-ﬁt is obtained for r c=0.02r200 and values r c>0.075r200are rejected at>99%c.l.The68%c.l.range for the Burkert scale parameter r0is0.10≤r0/r200≤0.18with a best-ﬁt value of r0=0.15r200.Values r0>0.25r200 are rejected at>99%c.l.The allowed range for the Burkert scale parameter seems larger than for the SIS scale parameter.Note,however,that the two scale parameters do not have the same meaning.If we deﬁne the core radius as the clustercentric distance where the density falls belowhalf its central value,rρ0/2,this corresponds to r c in the SIS model,and to∼0.5r0in the Burkertmodel.We conclude that the core models(and in particular the SIS-model)thatﬁt our data have such small core-radii that they very much resemble the two core-less models.In Fig.5we show the observed(binned)σp(R)of the Early-type class,together with the velocity dispersion proﬁles predicted from the best-ﬁt NFW,M99,SIS and Burkert M(<r)models.For the sake of clarity,we do not show the uncertainties in the model velocity dispersion proﬁles related to the uncertainties inν(r).These are however much smaller than the uncertainties of the observed velocity dispersion proﬁle.From Fig.5it can be seen that the best-ﬁt NFW and M99models predict almost indistinguishable velocity dispersion proﬁles.On the other hand,ﬁtting the inner part ofσp(R)requires such a small core radius for the SIS model,that the model velocity dispersion proﬁleﬂattens already at r≈0.4r200,while the observed proﬁle continues to drop.Formally,the best-ﬁt Burkert model also provides the bestﬁt to the observations,as it reproduces the broad maximum inσp(R)better than do the other three models.However,we are somewhat suspicious of the very strong decrease ofσp(R)below r≈0.1r200,although it is only2.5σlower than the LOWESS estimate ofσp(R).6.The radial variation of the mass-to-light ratio M/LAs mentioned in the Introduction,there is not yet a clear,unambiguous result for the de-pendence of M/L on radius in the literature,and therefore we have also used our data to derive M/L(r).The luminosity-density proﬁle L all(r)was determined from the projected luminosity den-。