Network Anomaly Detection Against Frequent Episodes of Internet Connections

合集下载

Anomaly Detection via Optimal Symbolic Observation of Physical Processes

Anomaly Detection via Optimal SymbolicObservation of Physical Processes∗Humberto E.Garcia and Tae-Sic YooSensor,Control,and Decision Systems GroupIdaho National LaboratoryIdaho Falls,ID83415-6180{humberto.garcia}@AbstractThe paper introduces a symbolic,discrete-event approach for online anomaly detection.The approach uses automata representations of the underlying physicalprocess to make anomaly occurrence determination.Automata may represent adiscrete-event formulation of the operation of the monitored system during bothnormal and abnormal conditions.Automata may also be constructed from gener-ated symbol sequences associated with parametric variations of equipment.Thiscollection of automata represents the symbolic behavior of the underlying physicalprocess and can be used as a pattern for anomaly detection.Within the possible be-havior,there is a special sub-behavior whose occurrence is required to detect.Thespecial behavior may be speciﬁed by the occurrence of special events representingdeviations of anomalous behaviors from the nominal behavior.These intermit-tent or non-persistent events or anomalies may occur repeatedly.An observationmask is then deﬁned,characterizing the actual observation conﬁguration availablefor collecting symbolic process data.The analysis task is to determine whetherthis observation conﬁguration is capable of detecting the speciﬁed anomalies.Theassessment is accomplished by evaluating several observability notions,such asdetectability and diagnosability.To this end,polynomial-time,computationally-eﬃcient veriﬁcation algorithms have been developed.The synthesis of optimalobservation masks can also be conducted to suggest an appropriate observationconﬁguration guaranteeing the detection of anomalies and to construct associatedmonitoring agents for performing the speciﬁed on-line condition monitoring task.The proposed discrete-event approach and supporting techniques for anomaly de-tection via optimal symbolic observation of physical processes are brieﬂy presentedand illustrated with examples.1IntroductionCondition monitoring and anomaly detection are essential for the prevention of cascading failures but also for the assurance of acceptable operations dynamics and the improve-ment of process reliability,availability,performance,and cost.Anomaly detection is also a key element for strengthening nuclear non-proliferation objectives and for deploying∗This work was supported by the U.S.Department of Energy contract DE-AC07-05ID14517advanced proliferation detection measures such as nuclear operations accountability[2]. An anomaly can be deﬁned as a deviation from a system nominal behavior.Three types of anomalies are considered here.First,anomalies may be associated with parametric or non-parametric changes evolving in system components.Lubricant viscosity changes, bearing damages,and structural fatigues are examples in this category.Second,anoma-lies may be associated with violations executed during operations that are opposite to demanded operability speciﬁcations.For example,a speciﬁcation may be to avoid start-ing a given pump when its associated down-stream valve is closed.If a command is sent to start the pump when its valve is closed,this condition needs to be monitored and reported(and possibly aborted).Third,anomalies may be associated with the occur-rence of special events or behaviors.One relevantﬁeld is failure analysis,in which special events are identiﬁed as faults.Other examples of special behaviors include(permanent) failures,execution of critical events,reaching unstable states,or more generally meeting formal speciﬁcations deﬁning anomalies or special behaviors.To detect anomalies(including the three types mentioned above),it is needed not only a set of sensors(i.e.,a sensor conﬁguration)to retrieve process data but also an observer to integrate and analyze the collected process information.Thus,optimizing sensor conﬁgurations and rigorously synthesizing their corresponding observers are important design goals in on-line condition monitoring.One relevantﬁeld is failure analysis,in which special events are identiﬁed as faults.Recently,signiﬁcant attention has been given to anomaly detection and fault analysis;see for example[1-10]and their references.The deﬁnition of diagnosability based on failure-event speciﬁcations wasﬁrst introduced in [8].Variations to the initial deﬁnition in[8]have been proposed recently.Failure states are introduced in[10]and the notion of diagnosability is accordingly redeﬁned.The issue of diagnosing repeatedly and the associated notion of[1,∞]-diagnosability areﬁrst introduced in[5],along with a polynomial algorithm for checking it.To improve the complexity of previously-reported algorithms,which severely restricts their applicability, methods and an associated tool have been developed that utilizes the approach introduced in[9]for checking[1,∞]-diagnosability with the reduced complexity.Recently,techniques in symbolic time series analysis[7]have been proposed to reformulate the problem of anomaly detection from a time series setup to a discrete event framework,upon which the above developed algorithms can be utilized.This transformation allows to deal with complex processes and information systems in a more eﬃcient manner by abstracting monitored systems/signals into simpler and rigorous mathematical representations.This paper builds upon the above eﬀorts to introduce a rigorous methodology for opti-mizing sensor conﬁgurations and synthesizing associated observers meeting given system property requirements regarding on-line condition monitoring.Applications include su-pervisory observation and event/anomaly detection.2Problem StatementIn anomaly detection applications,the objective is to detect abnormal conditions occur-ring within the monitored system by analyzing observable process data.To this end, models are often constructed to characterize normal and abnormal behaviors.A model may represent the possible and unacceptable operational dynamics of a monitored pro-cess.Finite state machine(FSM)representations of system components(e.g.,tanks, valves,and pumps)can be formulated and composed to describe relevant operations of the integrated system(e.g.,a nuclear fuel reprocessing installation).For example,opera-tions models may deﬁne the expected changes on the state of a tank based on the states of associated valves and pumps.Operation models may also represent entityﬂow descrip-tions deﬁned for a given routing network regarding possible and special item transfers (e.g.,violations or critical movements).Models may also be generated from symbolic string generations characterizing variations on representative parameters associated with process signals.In this case,time series data from a signal may be symbolized into discrete symbolic strings.This symbolization may be accomplished using wavelet trans-form,for example.In particular,coeﬃcients of the wavelet transform of the time-domain signal are utilized for symbol generation instead of directly using the time series data[7]. Variations in the monitored signal is thus detected as variations of its associated wavelet coeﬃcients.From the symbol sequences,a FSM model can then be constructed.Meth-ods have been proposed for encoding the underlying process dynamics from observed time series data and for constructing FSM models from symbolic sequences.Within the scope of this paper,the mentioned models are formulated as discrete event systems (DES)in order to describe their dynamics at a higher level of abstraction,reduce compu-tational complexity,and beneﬁt from a developed mathematical framework suitable for computing optimal sensor conﬁgurations and synthesizing their corresponding observers.In either symbolic time series analysis or event/speciﬁcation violation detection,the objective is to detect whether a special event or an operability speciﬁcation violation has occurred by recording and analyzing observable events.System behavior is often divided into two mutually exclusive components,namely,the special behavior of interest(needed to be detected)and the ordinary behavior(which does not need to be reported).To accomplish the task of online anomaly detection,two design elements must be addressed. Theﬁrst element is the identiﬁcation of the observational information required by an ob-server to determine whether a special event or an operability speciﬁcation violation has occurred.The second element is the construction of the associated observer algorithm that automatically integrates and analyzes collected data to assess system condition.To improve information management and cost,the design goal is to construct a monitoring observer with a detection capability that relies on not only current measurements but also on recorded knowledge built from past observations.It is then important to rig-orously assess whether the monitored DES is intrinsically observable for a given sensor conﬁguration and special behavior of interest.Otherwise,the task is to identify opti-mized observation conﬁgurations that meet given observability property requirements. The related cost functional may be based on diﬀerent design criteria,such as costs and implementation diﬃculties of considered sensor technologies.3Proposed Anomaly Detection ApproachA methodology and associated tool have been developed to identify optimal sensor con-ﬁgurations and associated observers for detecting anomalies.The developed framework requiresﬁrst formal descriptions of the given monitored DES,(observability)property requirements,and observational constraints as shown in Fig. 1.Property requirements may include meeting detectability(e.g.,[8])or/and supervisory observability(e.g.,[6]) objectives,for example.Given these descriptions,optimized observational conﬁgura-tions and associated algorithms for data integration and analysis can be systematically computed that meet the speciﬁed property requirements.To formalize the monitored process,a DES model G must be constructed deﬁning how system states change due to event occurrences.Other design elements are requested by the developed framework ac-Figure1:Flow chart of developed sensor optimization frameworkcording to the optimization task at hand.For example,in the case of designing observers for determining whether given operability speciﬁcations are being met during operations, one element must be speciﬁed,namely,the set of operability speciﬁcations S that should be preserved at all times(the intrinsic observability property P here is supervisory ob-servability).Similarly,in the case of designing sensor conﬁgurations for event detection applications,two elements must be speciﬁed,namely,the set of anomalies or special events S requiring detection and the intrinsic observability property P(i.e.,detectability or diagnosability)regarding S.To formalize observational constraints,a cost functional C should be included indicating the costs associated with observation devices.Given G, S,P,and C,the design task is to compute an observational conﬁguration or observation mask M that guarantees P of S with respect to G,while optimizing C.This mask M deﬁnes an underlying observational conﬁguration required to assure the observability of anomalies or the detection of operability violations.After a suitable observation mask M has been computed,the implementation task is to construct an observer O that will guarantee P of S by observing G via the observation mask M.The use of the proposed methodology in computing optimized sensor conﬁgurations for anomaly detection can be summarized as follows.For veriﬁcation,the developed technology assesses whether a given observation conﬁguration assures the observability of special behaviors within possible system behaviors(Fig.2.(a)).For design,the methodology identiﬁes,for each event,which attributes need to be observed and suggests an optimal observation con-ﬁguration meeting the speciﬁed on-line condition monitoring requirements(Fig.2.(b)).(a)Veriﬁcation(b)DesignFigure2:Use of developed framework for event detection applications4Observability in Anomaly Detection4.1PreliminaryDenote by G the FSM model of the monitored system considered,with G={X,Σ,δ,x0}, where X is aﬁnite set of states,Σis aﬁnite set of event labels,δ:X×Σ→X is a partial transition function,and x0∈X is the initial state of the system.The symbol denotes the silent event or the empty trace.This model G accounts for both the ordinary(non-special)and special behavior of the monitored system,for example.To model observational limitations,an observation mask function M:Σ→∆∪{ }is introduced,where∆is the set of observed symbols.4.2DeﬁnitionsLet S denote the set of either operability speciﬁcations,which should be met,or special events,which should be detected.In the case of event detection,special events can occur repeatedly,so they need to be detected repeatedly.It is assumed that events in S are not fully-observable because otherwise they could be detected/diagnosed trivially.Under supervisory observability,the interest is in signaling the occurrence of viola-tions to operability speciﬁcations.Under detectability,the interest is in signaling the occurrence of special events,but without explicitly indicating which event exactly has occurred.Diagnosability is a reﬁned case of detectability,where the interest often is in exact event identiﬁcation.The developed mathematical framework can be used to evalu-ate diﬀerent system properties.To illustrate,let’s assume we are interested in the event detectability property termed[1,∞]-diagnosability(deﬁned next)of a given monitored system.The proposed methodology then utilizes the polynomial algorithm described in [9]for checking this notion.Other notions can also be checked,including the observability of a given system regarding operability speciﬁcations,for example.Deﬁnition1(Uniformly bounded delay)[1,∞]-Diagnosability[5,9]A symbolic string(or language L)generated by a monitored system G is said to be uni-formly[1,∞]-diagnosable with respect to a mask function M and a special-event partitionΠs on S if the following holds:(∃n d∈N)(∀i∈Πs)(∀s∈L)(∀t∈L/s)[|t|≥n d⇒D∞] where N is the set of non-negative integers and the condition D∞is given by:D∞:(∀w∈M−1M(st)∩L)[N iw ≥N is].The above deﬁnition assumes the following necessary notation.For allΣsi∈Πs and atrace s∈L,let N is denote the number of events in s that belongs to the special eventtypeΣsi(or i for simplicity).The post-language L/s is the set of possible suﬃxes of a trace s;i.e.,L/s:={t∈(Σ)∗:st∈L}.4.3Optimal Sensor ConﬁgurationsThe problem of selection of an optimal mask function is studied in[4].Assuming a mask-monotonicity property,it introduces two algorithms for computing an optimal mask func-tion.However,these algorithms assume that a sensor set supporting the mask function can be always found,which may not be true in practice.Given the above considerations, the developed framework utilizes instead the algorithm introduced in[1].This algorithm searches the sensor set space rather than the mask function space.The computed sen-sor set induces a mask function naturally.Thus,it does not suﬀer from the issue of realization of the mask function.4.4Implementing Symbolic ObservationThe design task leads into a twofold objective:i)to compute objective-driven sensor con-ﬁgurations that optimize given information costs,and ii)to construct formal observers that guarantee the detectability of special events,speciﬁcation violations,or anomalies, in general.The key design issue is then the management of sensor deployments.After computing an acceptable M that guarantees the desired property requirement(e.g.,su-pervisory observability,detectability,or diagnosability)using the optimization algorithm of Fig.1,an associated observer O is constructed.In event detection applications,for example,the observer algorithm will integrate and analyze observed event information (or measurements)and report the occurrences of special events.In supervisory control applications,the observer estimates system state and determines whether events executed by the monitored system violate given operability speciﬁcations.To implement the observer,either an oﬄine or an online design approach may be used for its construction.Under an oﬄine design approach,the deterministic automa-ton representation of the observer is a priori constructed,task that may be of a high computational complexity.To overcome computational complexity,an online approach may be used instead,as proposed in[5].Further improving[5]regarding computational complexity,the developed framework utilizes an improved version of the algorithm re-ported in[9].The proposed mathematical construction of observers can thus guarantee the fulﬁlment of given observability requirements regarding the detection of anomalies. 5Illustrative ApplicationsTo illustrate the notion of anomaly detection via optimal symbolic observation of physical processes,an application in speciﬁcation violation detection and another in event detec-tion are brieﬂy introduced next.Due to page limitation,no application of the proposed approach to symbolic time series analysis is discussed.5.1Speciﬁcation Violation DetectionConsider the monitored system illustrated in Fig.3.This system consists of a pump, a tank,two valves,and interconnecting pipes.The monitored system may represent a portion of a nuclear fuel reprocessing facility,for example.The basic operation of this system is as follows.With Valve1open and Valve2close,the pump starts and operates in order toﬁll the tank by pumping aﬂuid from an up-stream reservoir(not shown). When the tank is full,the pump should stop,Valve1should close,and Valve2should open until the tank is emptied;the cycle then repeats.Assume that there is the need to monitor the system and detect the possible violation of three operability speciﬁcations. In particular,Spec.1delineates that the pump should not start when Valve1is closed; Spec.2delineates that Valve1should not be closed when the pump is running;and Spec.3delineates the basic system operation described earlier.The synthesis task is to compute an optimized sensor conﬁguration and associated observer to conduct this anomaly detection.To this end,DES models of each component(i.e.,pump,tank,valve 1,valve2)and their interactions are constructed.FSMs of the concerned speciﬁcations are also formulated.The developed framework then automatically determines minimal sets of events(and associated observers)that need to be observed to achieve the desired on-line condition monitoring task.For example,using the proposed methodology,it was determined that Valve2does not need to be observed(hence no sensor for Valve 2is needed)in order for the monitoring system to make a determination on whether a speciﬁcation violation has occurred.Figure3:Monitored system under speciﬁcation violation detection5.2Event/Anomaly DetectionConsider the monitored system illustrated in Fig.4(a).This system consists of one input port,I1,four internal stations,S i,i=1,2,3,and4,and two output ports,O1and O2.This system may represent a nuclear reprocessing facility or a nuclear power plant site,for example.Two authorized routes,(1)or(2),are identiﬁed in Fig.4(a).Under route(1),an item should enter the monitored system through the input port I1,move sequentially to locations S1and S3,and move either to location S2or S4;if it goes to S2, then an item may either exit through the output port O2or continue to location S4;if at location S4,it should exit through the output port O1.Under route(2),an item shouldenter the monitored system through the input port I1,move sequentially to locations S1, S2,and S3;it may then exit through the output port O2or continue to location S4,from which it should exit through the output port O2.Besides the normal(non-special)item(a)Monitored System(b)Ad-hoc Sensor Placement SolutionFigure4:Monitored system and ad-hoc sensor placement solution movements shown,assume that the two item transfer anomalies labeled with an S(for special)in Fig.4(a)(i.e.,1S and2S)are also possible.The design objective is to identify observation conﬁgurations(i.e.,set of sensors and locations)M that provide suﬃcient tracking information to an observer O for detecting the occurrence of any anomaly deﬁned in S.For comparison,Fig.4(b)illustrates a sensor conﬁguration that would allow an observer to immediately detect any anomaly after its occurrence.Three sensor types are shown for retrieving item movement data.“Circle,”“square,”and“triangular”sensors provide current item locations,previous item locations,and item types,respectively.This conﬁguration may result from conducting an ad hoc design,without a rigorous analysis of the anomaly detection problem at hand.It is desired to determine whether there are other(objective-driven)sensor conﬁgurations with reduced information requirement and optimal information management.To this end,the possible-behavior model G of the system illustrated in Fig.4(a)is constructed.The monitoring goal P regarding the set of special events S is also speciﬁed.Finally,an information cost C criterion is formulated.The developed framework is then invoked to compute an observation mask M that optimizes C and meets P.Figs.5illustrate optimized sensor conﬁgurations and the reduction in the observational requirement M that may be obtained when selecting detectability rather than diagnosability of S as the observability goal P.The imposed cost objective C is to reduce information requirements and preferably exclude sensors that communicate item previous locations(i.e.,avoid using square sensors).Figs.6 show the eﬀect of sensor reliability on required sensor conﬁgurations for meeting a given detection conﬁdence requirement.In particular,Figs.6suggest that as the reliability of circle sensors(implemented as motion sensors,for example)decreases,more sensors may be required to meet the speciﬁed observability requirements.While the monitored system used in this example corresponds to an itemﬂow process,the DES model G used could have also been a high level representation of any other physical process.Numerous simulations were conducted with diﬀerent M and corresponding O for given P and C,under both event and speciﬁcation violation detection applications.Asguaranteed by the mathematical setting of the developed framework,the observer was always capable to meet the given observability requirements.(a)Diagnosability(b)DetectabilityFigure5:Optimized Sensor Placements:Case of reliable sensors(a)Sensor reliability≥60%(b)40%≤Sensor reliability≤60%Figure6:Optimized Sensor Placements:Case of unreliable sensors6ConclusionAn approach to anomaly detection via optimal symbolic observation of physical pro-cesses was presented.Symbolic,discrete-event reformulation of the problem of anomaly detection is suggested to deal with system complexities and utilize a rigorous framework where optimal sensor conﬁgurations and associated observers for on-line condition mon-itoring can be synthesized.The proposed methodology can thus be used to answer the question of how to optimally instrument a given monitored system.This design and im-plementation approach opens the possibility for information management optimization to reduce costs,decrease intrusiveness,and enhance automation,for example.Further-more,it provides rich analysis capability(enabling optimization,sensitivity,what-if,andvulnerability analysis),guarantees mathematical consistency and intended monitoring performance,yields a systematic method to deal with system complexity,and enables portability of condition monitoring.Brieﬂy mentioned here,future research involves the extension of the proposed approach into the symbolic time series analysis paradigm. References[1]H.E.Garcia and T.Yoo,“Model-based detection of routing events in discreteﬂownetworks,”Automatica,41:583-594,2005.[2]H.E.Garcia and T.Yoo,“Option:a software package to design and implementoptimized safeguards sensor conﬁgurations,”In Proc.45th INMM Annual Meeting, Orlando,FL,Jul18-22,2004.[3]H.E.Garcia and T.Yoo,“A methodology for detecting routing events in discreteﬂownetworks,”In Proc.2004American Control Conf.,2004.[4]S.Jiang,R.Kumar,and H.E.Garcia,“Optimal sensor selection for discrete eventsystems with partial observation,”IEEE Trans.Autom.Control,48(3):369-381,2003.[5]S.Jiang,R.Kumar,and H.E.Garcia,“Diagnosis of repeated/intermittent failures indiscrete event systems,”IEEE Trans.Robotics and Automation,19(2):310-323,2003.[6]F.Lin and W.M.Wonham,“On observability of discrete-event systems,”InformationSciences,44(3):173-198,1988.[7]A.Ray,“Symbolic dynamic analysis of complex systems for anomaly detection,”Signal Processing,84:1115-1130,2004.[8]M.Sampath,R.Sengupta,K.Sinnamohideen,fortune,and D.Teneketzis,“Di-agnosability of discrete event systems,”IEEE Trans.Autom.Control,40(9):1555-1575,1995.[9]T.Yoo and H.E.Garcia,“Event diagnosis of discrete event systems with uniformlyand nonuniformly bounded diagnosis delays,”In Proc.2004American Control Conf., 2004.[10]S.H.Zad,“Fault diagnosis in discrete event and hybrid systems,”Ph.D.thesis,University of Toronto,1999.。

基于时域卷积神经网络的网络异常检测

第54卷第3期2021年3月通信技术Communications TechnologyVol.54 No.3Mar. 2021文献引用格式：谭天，叶倩，孙艳杰. 基于时域卷积神经网络的网络异常检测[J].通信技术，2021，54（3）：705-710.TAN Tian, YE Qian, SUN Yanjie. Network Anomaly Detection based on Temporal ConvolutionalNeural Network [J].Communications Technology,2021,54(3):705-710.doi:10.3969/j.issn.1002-0802.2021.03.028基于时域卷积神经网络的网络异常检测*谭天，叶倩，孙艳杰(杭州迪普信息技术有限公司,浙江杭州 310051)摘要：随着计算机网络技术及相应的网络应用的快速发展，计算机网络已渗透到人们生活的方方面面。

计算机网络在极大地便利和丰富人们的日常生活的同时也带来了许多安全方面的问题，其扩展性和开放性的特点使得计算机网络及相关技术得到了快速的发展，但也使得网络容易受到攻击。

网络异常检测是网络安全领域一个非常重要的问题，通过数据分析来识别网络中的攻击或异常行为。

研究人员提出了许多异常检测方法来提升网络异常检测系统的检测能力，但是因为网络环境的复杂多变，很多方法只能在特定的网络环境中取得比较好的效果。

为能够自动适应不同网络环境进行异常检测，提出一种基于时域卷积神经网络（Temporal Convolutional Network，TCN）的网络异常检测方案，利用神经网络强大的建模能力在线对网络中的数据进行建模，及时发现异常行为。

关键词：计算机网络；攻击；网络安全；异常检测；TCN中图分类号：TP393.08 文献标识码：A 文章编号：1002-0802(2021)-03-0705-06Network Anomaly Detection based on Temporal Convolutional NeuralNetworkTAN Tian, YE Qian, SUN Yanjie(Hangzhou DPtech Information Technologies Co., Ltd., Hangzhou Zhejiang 310051, China) Abstract: With the rapid development of computer network technology and corresponding network applications, computer networks have penetrated into all aspects of people’s life. The computer network greatly facilitates and enriches people’s daily life, but also brings many security problems. Its expansibility and openness make computer networks and related technologies develop rapidly, but also make networks vulnerable to attack. Network anomaly detection is a very important problem in the field of network security. It uses data analysis to identify attacks or abnormal behaviors in the network. Researchers have proposed many anomaly detection methods to improve the detection capabilities of network anomaly detection systems. However, due to the complexity of network environment, many methods can only achieve better results in a specific network environment. In order to automatically adapt to different network environments for anomaly detection, a network anomaly detection scheme based on TCN (Temporal Convolutional Network) is proposed. It uses the powerful modeling capabilities of neural networks to model the data in the network online and detect abnormal behaviors in time.Keywords: computer network; attack; network security; anomaly detection; TCN* 收稿日期：2020-11-12;修回日期：2021-02-20 Received date:2020-11-12;Revised date:2021-02-20图1 因果卷积示意图在图1中，每一层的一个神经元都对应一个时刻，每个神经元的感受野对应其当前时刻及之前的一些时刻。

Anomaly Detection

Andrew Ng
Aircraft engines motivating example 10000 good (normal) engines 20 flawed engines (anomalous)
Training set: 6000 good engines CV: 2000 good engines ( ), 10 anomalous ( Test: 2000 good engines ( ), 10 anomalous ( ) )
• Monitoring machines in a data center
Andrew Ng
Anomaly detection
Choosing what features to use
Machine Learning
Non-gaussian features
Error analysis for anomaly detection Want large for normal examples . small for anomalous examples . Most common problem: is comparable (say, both large) for normal and anomalous examples
:
Anomaly if
Andrew Ng
Anomaly detection example
Andrew Ng
Anomaly detection
Developing and evaluating an anomaly detection system
Machine Learning
The importance of real-number evaluation When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm. Assume we have some labeled data, of anomalous and nonanomalous examples. ( if normal, if anomalous). Training set: anomalous) Cross validation set: Test set: (assume normal examples/not

基于聚类分析的网络异常流量入侵检测方法

TECHNOLOGY AND INFORMATION科学与信息化2023年1月下 65基于聚类分析的网络异常流量入侵检测方法陈晓燕濮阳市公安局情报指挥中心河南濮阳 457000摘要为了提高网络异常流量入侵检测方法的检测速度和检测准确率，满足现阶段网络流量检测的需求，本文基于聚类分析算法，对网络异常流量入侵检测方法展开研究。

具体做法是将流量进行采集和分类，基于聚类分析计算相似度，检测入侵的网络流量。

通过实验可知，文中提出的FART K-means聚类分析网络异常流量检测方法与传统方法相比，准确率提高了12.6%，运行速度提高了4.3s，能够满足设计需求，具有较好的实际应用效果。

关键词聚类分析；网络流量；异常流量；入侵检测Network Anomalous Traffic Intrusion Detection Method Based on Cluster Analysis Chen Xiao-yanPuyang City Public Security Bureau intelligence command center, Puyang 457000, Henan Province, ChinaAbstract In order to improve the detection speed and accuracy of the network anomalous traffic intrusion detection method and meet the needs of network traffic detection at the present stage, this paper studies the network anomalous traffic intrusion detection method based on the cluster analysis algorithm. Specifically, traffic is collected and classified, the similarity is calculated based on cluster analysis, and network traffic intrusion is detected. It can be seen from experiments that the FART K-means cluster analysis network anomalous traffic detection method proposed in this paper improves the accuracy by 12.6% and the running speed by 4.3 s compared with the traditional method, which can meet the design requirements and has good practical application effects.Key words cluster analysis; network traffic; anomalous traffic; intrusion detection引言网络互动已经越来越成为人类生活中必不可少的部分。

Anomaly detection A survey

This work was supported by NASA under award NNX08AC36A, NSF grant number CNS-0551551, NSF ITR Grant ACI-0325949, NSF IIS-0713227, and NSF Grant IIS-0308264. Access to computing facilities was provided by the Digital Technology Consortium. Author’s address: V. Chandola, Computer Science Department, University of Minnesota, 200 Union St., Minneapolis, MN 55414. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies show this notice on the ﬁrst page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior speciﬁc permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 8690481, or permissions@. c 2009 ACM 0360-0300/2009/07-ART15 $10.00 DOI 10.1145/1541880.1541882 /10.1145/ 1541880.1541882

混沌RBF神经网络异常检测算法

混沌RBF神经网络异常检测算法翁鹤;皮德常【摘要】针对传统神经网络异常检测算法的准确率问题，文中将混沌和RBF( Radial Basis Function)神经网络相结合，既可利用混沌的随机性、初值敏感性等特点，也可发挥RBF神经网络大规模并行处理、自组织自适应性等功能。

文中对混沌时间序列进行相空间重构得到相空间向量，作为RBF神经网络的输入，通过RBF神经网络构建电力负荷序列的拟合函数，在此基础上进一步预测，比较预测值与真实值的偏差，从而判断检测信号是否为异常信号。

实验结果表明，该方法相对其他算法预测精度更高，具有较好的异常检测能力。

%For the accuracy problem of traditional neural network anomaly detectionalgorithm,propose a method of combining chaos and RBF ( Radial Basis Function) neural network,not only can take advantages of the randomness and initial value sensitivity and others of chaos,but also make use of the large-scale parallel processing,self-organization and adaptive capability of RBF neural networks. Recon-struct the chaotic time sequence to obtain the phase space vector as the input of RBF neural network,by which build the electricity load sequence fitting function. Then use this function to take one-step prediction in the phrase space reconstruction. At last,compare predicted value and true value of the deviation,in order to determine whether the abnormal signal or detection signal. Experimental results show that this method has better prediction accuracy and anomaly detection capabilities.【期刊名称】《计算机技术与发展》【年(卷),期】2014(000)007【总页数】5页(P29-33)【关键词】电力负荷;相空间重构;混沌时间序列;RBF神经网络;异常检测【作者】翁鹤;皮德常【作者单位】南京航空航天大学计算机科学与技术学院，江苏南京 210016;南京航空航天大学计算机科学与技术学院，江苏南京 210016【正文语种】中文【中图分类】TP1830 引言随着信息产业的高速发展，生产和生活事件中收集并存储的数据信息规模由GB向TB、PB级别发展，大数据中隐含着大量的异常数据或者异常点。

基于随机映射与聚类的网络流量异常检测

第36卷第3期计算机仿真2019年3月文章编号:1006-9348 (2〇19 )03-〇289-〇5基于随机映射与聚类的网络流量异常检测刘雅婷1，王永程2，强延飞1，谷源涛1(1.清华大学电子工程系，北京1〇〇〇84;2.西南电子电信技术研究所，四川成都610041)摘要:研究网络安全领域的网络流量异常检测问题。

针对传统异常检测算法实时性差、对数据分布特性要求高、正确识别率低且误判率高等问题，采用滑动窗口、多次随机映射以及无监督聚类算法相组合的新型方法。

利用随机映射进行网络数据包的汇聚获取待测对象的时间序列;对各滑动窗口内的流量序列进行KmeanS++聚类检测得到多个待定异常集;对多个待定异常集进行交集操作，从而得出最终异常对象集。

通过仿真得出结论，改进算法具有高准确率和低误判率，能够实时检测网络中的异常数据。

关键词：网络流量异常检测；随机映射;聚类;时间序列;交集中图分类号:TP393.08 文献标识码：BNetwork Traffic Anomaly Detection Based onRandom Projection and ClusteringLIU Ya—ting1，WANG Yong-cheng2, JIANG Yan-fei1，GU Yuan—tao1(1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China；2. Southwest Electronics and Telecommunication Technology Research Institute, Chengdu CSichuan 610041 , China))A B S T R A C T：In t h i s paper, network t r a f f i c anomaly detection of security was researched.To solve the problems including poor r e a l time, high requirement f o r data distribution, low true positive r a t e and high fal s e positive r a t e of t r ad itional anomaly detection methods,a new combination method i s adopted, which integrates sliding time window,multiple random projections and unsupervised clustering algorithm.W e f i r s t aggregated network t r a f f i c by using ran-dom projection t o get time s e ries of object.Then, Kmeans + + clustering detection was applied with t r a f f i c series of sliding windows t o get multiple alarm sets.W e next exploited the intersection operation t o determine f i n a l anomaly st, based on M A W I L A B we experimented with dataset and obtained the conclusion t hat the new detection method has higher true positive r a t i o and lower fal s e positive r a t i o and can detect anomaly of network in r e a l time.K E Y W O R D S：Network t r a f f i c anomaly detection；Random projection；Clustering；Time series；Intersectioni引言随着互联网技术的快速发展与普及，网络安全问题变得曰益突出，大量的威胁与安全隐患例如木马程序、蠕虫病毒、D D0S攻击等扰乱了社会的正常运转与经济的持续发展。

Impact of packet sampling on portscan detection

stub network. For example, the study in [3] suggests that a global view of the trafﬁc could better capture the scanning patterns. Finally, a stub network such as an enterprise may out source the detection task to its upstream provider due to lack of resources or expertise. The impact of sampling has been extensively studied in terms of well known statistical metrics, e.g., mean rate and ﬂow size distribution, from the perspective of determining the volume characteristics of the trafﬁc as a whole [4]–[7]. However, anomaly detection (e.g., worm scan detection) often depends on a diverse set of metrics such as address access pattern, connection status, and distinct per source behaviors. How packet sampling impacts these trafﬁc features has not been previously addressed. This paper presents a ﬁrst attempt to address this important open question: Does packet sampling distort or lose pertinent information from the original trafﬁc proﬁle that affects the effectiveness of existing anomaly detection techniques? If so, by how much? There is a rich set of literature on two general approaches to anomaly detection: specialized detection algorithms that target speciﬁc types of anomalies, and generalized trafﬁc proﬁling algorithms. Example target speciﬁc algorithms include [8]– [10] designed primarily to detect portscans. On the other hand, trafﬁc classiﬁcation algorithms, such as [11], [12], are generalized algorithms that do not target a speciﬁc anomaly. Instead they classify different trafﬁc features and raise alarm ﬂags when they detect large variations. Algorithms from both categories typically assume the availability of detailed packet payload, e.g., at the network edge. However, it is not clear how their performance is impacted if the same solutions utilize only sampled packet header data. We note that it is clearly infeasible to perform an exhaustive study on the impact of sampling for every anomaly detection algorithm presented in literature. Instead, we focus on one common class of non-volume based anomalies, portscans, which causes increasing security concerns. We choose representative algorithms aimed at portscan detection from the two categories of detection algorithms mentioned above. Speciﬁcally, this paper presents a detailed study that quantiﬁes the effect of packet sampling on two target-speciﬁc and one trafﬁcproﬁling algorithms: (a) Threshold Random Walk (TRW) [9], (b) Time Access Pattern Scheme (TAPS) [10], and (c) Entropybased behavior modeling proposed recently [11]. TRW performs stateful analysis of the trafﬁc to identify connection status, while TAPS exploits the knowledge of the “connection patterns” of scanners. The general trafﬁc proﬁling algorithms compute entropy values of each of the four “features” of the IP header in order to identify “signiﬁcant ﬂows” and capture abrupt changes in the feature set. We believe that these algorithms cover a wide range of anomaly

异常侦测集群AnomalyDetectionclustering

類神經網路分析 (Neural Network Analysis) ：利用類神經網路具有學習能力的特性，經由適當入侵資料與正常資料的訓練後，使其具有辨識異常行為發生的能力，目前已廣泛使用在信用卡詐欺偵測中。

Anomaly Detection的優點及缺點

異常偵測的優點:
異常偵測主要的優點是不需要針對每一個攻擊徵兆建立資料庫，並提出解決方法，所以在資料庫的成長速度較慢，且在資料比對執行速度會比誤用偵測速度要來的快。異常偵測主要利用學習的技術來學習使用者行為，僅需要取出某個正常使用者的資料模型便可以進行比對，所以節省了資料定義與輸入的時間。
異常偵測集群 Anomaly Detection clustering
Anomaly Detection的崛起

根據美國電腦網路危機處理暨協調中心報告指出，在過去的幾年內攻擊事件正以指數方式增加，而目前最常用於入侵偵測的方式是不當行為偵測(misuse detection)，但此方法是利用先前已知的事件建立各種攻擊模式，再比對找出異於正常行為的行為模式。然而缺點是必須時常更新特徵資料庫或偵測系統，倘若現行攻擊行為不存在於攻擊模式資料中，將無法偵測此行為。因為如此的限制，使得近來結合Data Mining方法於異常偵測(anomaly detection)受到廣大的矚目與研究。
資料集群演算與標稱概念圖

集群演算法本身是非監督式學習方法(unsupervised learning)，因此無法得知每一個集群得本身所含的資訊或其所代表的意義，如下圖之資料集群演算與標稱概念圖所示，集群演算結果仍然無法判斷測試資料的行為模式。有鑑於此，系統的建立仍須利用標記技術(labeling technique)，標稱每一個集群為正常或攻擊模式，而這一組具標稱的集群變成為我們實驗中異常偵測系統的核心，因此我們可以利用這些具有標稱的集群作測試資料的比對並預測其行為模式。

基于机器学习的网络异常检测与防范研究

基于机器学习的网络异常检测与防范研究Chapter 1: IntroductionWith the rapid development of the Internet, network security has become a significant concern for individuals, organizations, and governments. The increasing number of cyber-attacks has highlighted the necessity for proactive measures to detect and prevent network anomalies. In recent years, machine learning techniques have shown great potential in addressing this issue. This research aims to explore the application of machine learning in detecting and preventing network anomalies.Chapter 2: Network Anomalies2.1 Definition and Types of Network AnomaliesNetwork anomalies refer to abnormal patterns or behaviors that deviate from the expected normal state of a network. They can be categorized into various types, such as network intrusion, denial of service (DoS) attacks, network traffic anomalies, and network-based malware.2.2 Challenges in Network Anomaly DetectionDetecting network anomalies poses several challenges due to the increasing complexity and diversity of cyber-attacks. These challenges include the high volume and velocity of network data, the presence ofunknown and evolving anomalies, and the necessity for real-time detection without affecting network performance.Chapter 3: Machine Learning Techniques for Network Anomaly Detection3.1 Supervised LearningSupervised learning algorithms utilize labeled training data to train a model that can classify network traffic as normal or anomalous. Popular techniques include support vector machines (SVM), random forests, and neural networks. These algorithms can achieve high accuracy but heavily rely on labeled data, which can be costly and time-consuming to obtain.3.2 Unsupervised LearningUnsupervised learning algorithms aim to detect anomalies without prior knowledge of normal and abnormal instances. Clustering algorithms, such as k-means and DBSCAN, can group similar instances together and identify outliers as potential anomalies. However, they may generate false positives and struggle to differentiate between different types of anomalies.3.3 Reinforcement LearningReinforcement learning algorithms learn from interactions with an environment to make informed decisions. In the context of network anomaly detection, reinforcement learning can be applied to createadaptive and evolving models that can dynamically update anomaly detection strategies to handle new and evolving network attacks.Chapter 4: Feature Selection and Extraction4.1 Network Data RepresentationNetwork data can be represented in various formats, such as raw packet-level data or aggregated flow-level data. Feature selection is critical to extract relevant information from the data and reduce dimensionality. Techniques like principal component analysis (PCA), information gain, and genetic algorithms can be used to select informative features.4.2 Feature EngineeringFeature engineering involves transforming and creating new features that better represent the underlying patterns of network traffic. Statistical measures, domain knowledge, and expert insights are leveraged to engineer features that capture the characteristics of normal and abnormal network behavior.Chapter 5: Evaluation Metrics and Performance Analysis5.1 Evaluation MetricsTo assess the performance of network anomaly detection systems, various evaluation metrics are employed, including accuracy, precision, recall, and F1 score. A comprehensive evaluation framework helpsresearchers and practitioners compare different approaches and select the most suitable one for specific network environments.5.2 Performance AnalysisReal-world network datasets are used to evaluate the performance of machine learning algorithms for network anomaly detection. Comparative analysis and statistical methods are employed to analyze the results, identify limitations, and propose improvements for future research.Chapter 6: Network Anomaly PreventionWhile detection is crucial, prevention plays an equally important role in network security. This chapter explores different preventive measures, including access control, firewalls, intrusion prevention systems, and network segmentation. Machine learning techniques can be integrated into these preventive measures to enhance their effectiveness and adaptability in detecting and responding to emerging threats.Chapter 7: ConclusionIn conclusion, machine learning offers promising solutions for network anomaly detection and prevention. The combination of supervised, unsupervised, and reinforcement learning techniques, along with effective feature selection and engineering, can significantly improve the accuracy and efficiency of detecting network anomalies. However, continuous research and development are needed to addressthe evolving nature of network attacks and enhance the overall resilience of network security systems.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Network Anomaly Detection Against Frequent Episodes of Internet ConnectionsMin Qin and Kai HwangInternet and Grid Computing LaboratoryUniversity of Southern California, Los Angeles, CA 90089{mqin, kaihwang} @Abstract: New datamining techniques are developed for generating frequent episode rules of traffic events. These episode rules are used to distinguish anomalous sequences of TCP, UDP, or ICMP connections from normal traffic episodes. Fundamental rule pruning techniques are introduced to reduce the search space by 40-70%. Our approach accelerates the entire process of machine learning and profile matching. The new detection scheme was tested over real-life Internet trace data at USC mixed up with 10 days of MIT/LL intrusive attack data set.Our anomaly detection scheme results in a detection rate up to 47% for DoS, R2L, and port-scanning attacks. These results demonstrate an average of 51% improvement over the use of association rules alone. We experienced 10 to 20 false alarms over 200 network attacks using scanning windows from 100 sec to 2 hours. Our scheme detects many novel attacks that cannot be detected by Snort, including the Smurf, Apache2, Guesstelnet, etc. This anomaly-based intrusion detection scheme can be used jointly with signature-based IDS to achieve even higher detection efficiency.Index Terms: Network security, intrusion detection, datamining,anomaly detection, connection episodes, false alarms,Internet traffic, and distributed computing1. IntroductionIn order to distinguish between intrusive and normal network traffic, new datamining algorithms are developed to generate frequent episode rules (FER) [19] from audit Internet records. An episode is represented by a sequence of Internet connections. Mannila and Toivonen [20] first introduced the concept of frequent episodes with minimal occurrences. The association rules for datamining capture intra-record connections, while FERs detect inter-record patterns. With huge audit records, datamining generates many long FERs with redundancy or repetitions. We remove the ineffective episode rules to accelerate the anomaly detection process.* Manuscript submitted to RAID 2004, March 26, 2004. The research is supportedby NSF/ITR Grant ACI-0325409. Corresponding author is Kai Hwang via Emailkaihwang@. Min Qin can be reached at mqin@.The NSS Group [23] in UK evaluated various commercial IDSs from security companies. Gaffney et al [9] proposed a decision theoretic approach to evaluate IDS. Method for reducing false alarm rate of IDS was introduced by Axelsson [2]. Integrating access control and intrusion detection was introduced by Ryutov et al [28]. Other recent studies on IDS can be found in Burroughs et al [6], Gopalakrishna [10], Ranum [26], and Sekar et al [29]. We choose a datamining approach by the previous work reported by Barbara, et al [3] and Lee, et al [16].In Lee’s JAM project, they use axis and reference attributes to constrain the rule generation and to capture some temporal features. The system uses RIPPER [7] in building classifiers to detect signature of attacks. JAM is essentially a misuse-modeled IDS. Fan et al [8] extended Lee’s work by introducing artificial anomalies to discover accurate boundaries between known attack classes and unknown anomalies.Bridge et al [5] applies fuzzy rules to attack the problem of intrusion detection. The ADAM project ([3][4]) offers a Datamining framework for detecting network intrusions. Unlike JAM, ADAM is an anomaly-based detection system. ADAM uses a sliding window to scan frequent associations in TCP connections. These associations are compared with normal traffic profiles. ADAM has the ability to detect novel attacks through a pseudo-Bayes estimator with a low false alarm rate.We test the new detection method against real-life Internet traffic trace data mixed with MIT Lincoln Lab intrusion datasets. This mixed NetAttack dataset was generated at USC, locally. We demonstrate the advantages of using the rule pruning techniques to reduce the search space. Our method differs from Lee’s scheme and ADAM system in using a new base-support Datamining scheme. Our new FER-matching methodology is based on rule pruning and simplification to reduce the overhead in searching large rule databases.The rest of the paper is organized as follows: In Section 2, we introduce basic techniques for mining audit data. Axis and reference attributes are revisited. Section 3 presents an anomaly-based IDS architecture using a new datamining scheme. The USC NetAttack dataset is described here. We introduce a base-support algorithm to generate useful FERs. Our new FER generation algorithm compares favorably with the level-wise algorithm developed by Lee, et al [17].In Section 4, we present new pruning techniques to eliminate ineffective episode rules. Proofs of these rules can be found in Qin and Hwang [25]. In section 5, the experimental results are reported in terms of intrusion detection rate and false alarm rate. We also compare our results with those obtained by using the Snort in the same network environment. Finally, we summarize the original contributions and make a few suggestions for further research effort.2. Internet Connection Episodes RulesThe tasks of datamining are described by association rules or by frequent episode rules. An association rule is aimed at finding interesting intra-relationship inside a single connection record. The FER describes the inter-relationship among multiple connection records in a sequence. In this sense, the FER is more powerful tocharacterize traffic episodes than using the association rules alone. Formal definitions of these two types of rule are given below.2.1 Frequent Episode Rules vs. Association RulesLet T be a set of traffic connections and A be a set of attributes defined overT . For example, A consists of {timestamp, duration, service, srchost, desthost } for TCP connections. Let I be a set of attribute-value pairs defined over A . For example, I = { timestamp = 10 sec, duration = 1 sec , service = http, srchost = 128.125.1.1, desthost = 128.125.1.10 } for a typical http connection .We call each attribute-value pair an item. Any subset of I is called an itemset,representing a subset of characteristics of the connections. Thus, an itemset consists of one or more items, surrounded by a pair of parentheses, such as (timestamp=10 sec, duration=1 sec). An episode is represented by an ordered sequence of itemsets. Each itemset represents all traffic connections satisfying all the items listed within the parentheses.Let X be a traffic itemset under evaluation. The support value for X , denotedSupport (X ), is defined by the percentage of connection records in T that satisfies X . For example, X = (timestamp=10 sec, duration=1 sec) is an itemset. Y = (service = http) is another itemset . In this example, the intersection is empty φ=Y I X . The union of the two itemsets X U Y = (timestamp =10 sec, duration=1 sec, service=http) represents the characteristics of the three traffic attributes being listed. Association Rules: An association rule is defined between two traffic itemsets, X and Y. These two itemsets are disjoint with . The rule is denoted by:φ=Y X I X → Y ( c, s ) (1.a) The association rule is characterized by a support value s and a confidence level c . These are probabilities of the corresponding traffic events, defined by:and Y)(X Support s U =(X)Support Y)(X Support c U = (1.b)Both s and c are fractions calculated directly from the above Supportfunctions. Given below is an example association rule for an http connection:(service = http) → (duration = 1) (0.8, 0.1)The rule indicates that 80% of all the http connections have duration less than onesecond. There are 10% of all network connections that are initiated from http requests with a duration less than one second.Frequent Episode Rules: In general, an FER is expressed by the expression:L 1, L 2, …, L n → R 1, … , R m (c, s, window) (2.a)where L i (1 ≤ I ≤ n ) and R j (1 ≤ j ≤ m ) are ordered itemsets in a traffic record setT . We call L 1, L 2, …L n the LHS (left hand side ) episode and R 1,….R m the RHS (righthand side ) episode of the rule. Note that all itemsets are sequentially ordered, that is L 1, L 2, …L n , R 1,…., R m must occur in the ordering as listed. However, other itemsets could be embedded within our episode sequence. We define the support and confidence of rule (2.a) by the following two expressions:0s s ≥=)R ...R ...L Support(L m 121U U U (2.b) 021121c )L ...L L (Support )R ...R ...L L (Support n m ≥=U U U U U U U c (2.c)An example FER is given below for a sequence of network events:(service = authentication) → (service = smtp) (service = smtp) (0.6, 0.1, 2 sec)This rule specifies an authentication event. If the authentication service isrequested at time t, there is a confidence level of c = 60% that two smtp services will follow before the time t + w , where the event window w = 2 sec. The support of 3 traffic events (service = authentication), (service = smtp), (service = smtp) accounts for 10% of all network connections.Here we consider the minimal occurrence introduced by Mannila et al [20]of the episode sequence in the entire traffic stream. The support value s is defined by the percentage of occurrences of the episode within the parentheses out of the total number of traffic records audited.The confidence level c is the probability of the minimal occurrence of thejoint episodes out of the LHS episode. Both parameters are lower bounded by s o and c o , the minimum support value and the minimum confidence level , respectively. The window size is an upper bound on the time duration of the entire episode sequence. The traffic connections on both sides of a FER need not be disjoint in an episode sequence of events. Episode rules can be used to characterize attacks. The SYN flood attack is specified by the following episode rule:(service = http, flag = S0) (service = http, flag = S0) → (service = http, flag = S0) where the event (service = http, flag = S0) is an association. Flag S0 signals only the SYN packet being seen in a particular connection. The combination of associations and FERs reveals useful information on normal and intrusive behaviors. These rules can be applied to build IDS to defend against both known and unknown attacks.2.2 Axis Attributes vs. Reference AttributesFor all network connections, we use the IDS tool Bro [24] to extract some keyfeatures. These key features are summarized in Table 1. Because the FER generation does not take any domain-specific knowledge into consideration, many ineffective or useless FERs are generated. How to eliminate the useless rules is a major problem in traffic Datamining for effective rule generation.For example, the association rule: Srcbytes = 200 → destbytes = 300 is oflittle interest to the intrusion detection process, since the number of bytes sent by thesource (srcbytes) and destination (destbytes) is irrelevant to the threat conditions. Lee et al [17] has introduced the concepts of axis attributes and reference attributes to constrain the generation of redundant rules. All Itemsets in an FER must be built only with axis attributes. Axis attributes are independent of attacks being detected. The choice of axis attributes will reduce the number of FERs generated.According to Lee, axis attributes are selected from essential attributes such as srchost (source host), desthost (destination host), srcport (source port), and service (destination port). These attributes are essential to identify a connection. Different combinations of the essential attributes form the axis attributes. We consider the connection flags as essential attributes, since some flags are rare in daily network traffic. However, the flag has to be combined with at least another essential attribute to form a bundle of axis attributes. In order to detect various attacks, different combinations of the axis attributes should be tried. The reference attribute s demand itemsets to have exactly the same reference value.Table 1 Key Features Extracted from Internet Connections Feature Name DescriptionTimestamp Time when the first packet of the connection is seenDuration Length of the connection in seconds, ignored for UDP packetsSrchost IP address of the source hostSrcport Port number of the source hostSrcbyte Number of bytes sent by the source hostDestbyte Number of bytes sent by the destination hostDesthost IP address of the destinationDestport Port Number of the destination hostFlag Connection status flag. Typical flag values given below.SF: both SYN and FIN packets are known for a connectionS0: only the SYN packet was seen in a TCP connectionREJ: the connection was rejected by the destinationUrgent Number of urgent flags in the connectionFrag_Error Number of Fragment errors in the connection3. Datamining for Anomaly Intrusion DetectionOur long-term goal is to build an intelligent IDS that helps secure any distributed computing infrastructure such as a computational Grid system. The system can detect not only known intrusion patterns, but also novel unknown intrusions. In order to achieve this objective, we use datamining to profile frequent network patterns for detecting the anomalies.3.1 The Network Datamining ArchitectureFigure 1 shows three major components of our IDS: episode rule mining engine,anomaly detection engine, and alarm generation engine. We apply the normal profile database and construct the anomaly detection engine. The alarm generation is beyond the scope of this report. In order to correctly detect intrusion patterns, weextract two levels of information from raw audit data of the network traffic. Although connection level information is used to fight against flood and scan attacks, it can detect only a small portion of the attacks.Figure 1 Our datamining architecture for anomaly-based intrusion detectionWe have generated a NetAttack dataset to evaluate our Internet datamining scheme for anomaly intrusion detection. This benchmark is obtained by a mixture of locally captured Internet trace files at USC [12] and the DARPA 1999 datasets generated at MIT Lincoln Lab [15][16]. McHugh [21] criticized DARPA data for its superficial background traffic and unreliable accuracy yielded by tuning an IDS system towards the target attacksAccording to a recent analysis by Mahoney and Chan [19], the attack-free training data of DARPA evaluation lacks the attributes to cover TCP SYN regularity, source address spectrum, checksum and packet header information, etc. However, these attributes do exist in the attack dataset. Mahoney and Chan have developed the necessary tools to mix real Internet traffic data with the MIT/LL dataset.The USC trace file is obtained from an ISP through the Los Nettos network in Southern California. It contains of 4 Gigabytes real traffic data including both attacks and regular background traffic. The typical daytime load of Los Nettos was 38K packets/sec. Packet drop rate was around 0.04%. The USC trace file does not contain packet payload. We stretched the 40-minute trace file to inter-mix with the MIT/LL traffic files collected at weeks 1, 3, 4 and 5. The stretched data has a similar connection/sec rate as the MIT/LL data. The USC trace data simply adds real-life background traffic to the MIT/LL data, forming the NetAttack dataset.3.2 A Base-Support Network Datamining SchemeMost mining techniques exclude infrequent traffic patterns. This will make the IDS ineffective in detecting rare network events. For example, authentication is infrequent in common traffic. If we lower the support threshold, then a large number of uninteresting patterns associated with frequent services will be discovered. We introduce a new base-support traffic datamining process to handle this problem. The process is specified in Algorithm 1. Our method is improved from the level-wise algorithm developed by Lee, et al [17].Algorithm 1: Base-Support Traffic DataminingInput: Base-support threshold f0 , all axis attributes, andthe set T of all network connectionsOutput: New frequent episode rules generated to add into existing rule set LBeginFor each axis itemset X in T, calculate the Support (X) using Eq(1.b) Scan the traffic set T to form the rule set L = { itemset Y |f (Y)≥f0} While (There are new episode rules generated) do beginGenerate a new episode I1, I2, …, I n for L that satisfiesSupport (I1, I2, …, I n) ≥f0×Min{S (I i)}Generate an FER from the episode I1, I2, …, I n withconfidence c ≥c0as in Eq(2.c), and add FER into rule set LEndWhileIn using Lee’s algorithm, one must iteratively lower the minimum support value. Initially, a high minimum support value is chosen to find the episodes related to high frequency axis attribute values. Then the procedure iteratively lowers the support threshold by half. This links each new candidate FER with at least one “new” axis value. The procedure terminates, when a very small threshold is reached.Let X be an itemset. The support base of X is denoted by S(X), which is the support value of the axis itemset that is a subset of X. For example, when choosing service and flag as axis attributes, the support base for itemsetX= (service=ftp, flag=S0, srchost=128.1.1.1, destination=121.1.1.1)is defined by S(X) = Support (service=ftp, flag=S0). The base-support fraction f for itemset X is defined by:) ()( )(XS XSupportXf=(3)Similarly, the base-support fraction f of an episode is defined as the percentage of the number of minimal episode occurrences to the total number ofrecords in T , which contains the most uncommon axis attributes embedded in this episode. The minimum support base value of an episode I 1, I 2, …, I n is denoted by Min{ S (I i ) }. We use a simplified approach in Algorithm 1 to generate FERs over a few days. To generate an episode, its base-support fraction must exceed a threshold f 0, which is a lower bound on the support base f (X ) for all qualified itemsets.To construct the normal network profiles, the attack-free training connectionrecords are fed into the datamining engine. After finding FERs from each day’s audit record, we simply merge them into a large rule set by removing the redundant rules. Using the level-wise algorithm, it is likely for a common service to appear in an episode rule with extremely low support value.This is especially true when the individual connections are independent ofeach another. However, our base-support algorithm solves this problem by requiring all FERs related to the common services to occur more often than others. For normal traffic profiling, we use the training datasets that are attack free.We tested the performance of the two traffic datamining algorithms over theUSC Internet trace file. Figure 2 shows the testing results on all TCP connections in two weeks of attack-free traffic data. Both algorithms apply the minimum confidence value 0.6 and window size of 3 sec. We choose the destination host as the reference attribute and current service as the axis attribute. Fewer rules are generated from our base-support algorithm than that generated from Lee’s level-wise algorithm. Most of the rules discarded by our scheme are those related to common services that have appeared with fairly low support.100200300400500600N u m b e r o f r u l e s g e n e r a t e d D ay of training in U SC trace dataFigure 2. Rule generation effects by traffic datamining on each of the9 days of TCP connections in the USC Internet trace fileOur base-support datamining algorithm is fair to various axis attribute values. This is due to the fact that the same percentage of records is required for different axis attributes. Higher base-support value will result in fewer FERs. With different axis attribute values, we offer the advantage of being sensitive to the base-support threshold f0. This makes it easier to control the generation rate of useful episode rules. Fewer rules imply faster database search and matching in the traffic datamining process. We avoided using frequent episode rules with extreme low support values.In using the level-wise algorithm, changing the initial support value only has limited impact at the first few iterations. The initial support value is halved at each iteration. It has little impact on the rule generation rate after several iterations. It is hard for users to control the generation of rules using infrequent axis attributes. The base-support algorithm enables us to control the number of rules generated.4. Pruning of Ineffective Episode RulesBecause of large number of records in a TCPdump, there are many uninteresting or ineffective rules generated for slightly different episodes. We need to reduce the number of rules generated and to provide a simplified view of the traffic profiles. We have developed in Qin and Hwang [24] three pruning techniques to reduce the episode rule space. Here, we use a few example FERs to demonstrate how to apply the rule pruning techniques for speeding up the detection process.4.1 Transformation of Frequent Episode RulesWithout reduction, the rule search space may escalate too large to be practical for just one day’s collection of traffic records. We consider an FER effective, if it is more applicable and more frequently used in the anomaly detection process. An episode rule is said to be ineffective, if it is rarely used in detecting anomalies in a network traffic trace.Some FER rules differ from each other only at the LHS (left hand side) or at the RHS (right hand side) of the rule. Keeping all those rules will increase the search space and thus lower the detection efficiency. The following examples show the advantages in transforming FERs for the benefit of shortening the rule search and matching process.A . Transposition of Episode RulesComparing the following two FER rules, the first one is more effective than the second one. The itemset (service = http, flag = S0) is implied by the itemset (service = smtp, flag = SF) in the first rule. Therefore, the second rule can be induced by the first rule. We need only to include the first rule in the normal rule set.(service = smtp, flag = SF) → (service = http, flag = S0),(service = http, flag = SF)(service = smtp, flag = SF), (service = http, flag = S0) →(service = http, flag = SF)During detection phase, we generate only one FER from a frequent episode.A large number of redundant rule comparisons can be avoided, if more complex rulescan be reduced or removed. The general rule of the thumb is to make the LHS as shortas possible. An itemset that can be implied by an earlier itemset on the LHS should bemoved to the RHS. This is called transposition property.B. Elimination of Episode RulesIn general, rules with shorter LHS are more effective than rules with longerLHS. This is because shorter rules are often easier to apply or to compare. Clusteringof shorter rules is desired. Reducing long rules to shorter ones is useful to enhance the performance of an IDS system. For example, the following rule:(service = http) (service = authentication) →(service = smtp) (0.6, 0.1)is ineffective, because of the existence of the following rule:(service = authentication) → (service = smtp) (0.65, 0.1)The authentication is related only to the smtp operation, the http does notaffect the other two itemsets. Therefore,the itemset (service = http) can be ignored.Because many ineffective rules are not in the normal traffic profile, removingthem can reduce the false alarms to some extent. For intrusion FER with many associations, we can identify the key associations inside the FER. The elimination lawis applicable here.C . Reconstruction of Episode RulesMany FERs detected from the network traffic have some transitive patterns.If we have two rules A → B and B →C in the rule set, the rule A →B, C is implied.Here A, B and C are associations. The rule A →B,C seems redundant, since we can reconstruct it from the two short rules. We are mainly interested in daily network traffic. In particular, we pay attention to the TCPdump.For example, the rule (service = ftp, srcbyte = 1000) → (service =smtp) (service = authentication) is ineffective, because it can be reconstructed fromthe following two rules: (Service = ftp, srcbyte = 1000) → (service = smtp) and (service = smtp) → (service = authentication). To reconstruct the first rule, wefollow the following transitive sequence of itemsets.(service = ftp, srcbyte =1000), (service = smtp), (service = auth).The rule reconstruction helps us split long FERs into short ones, thus facilitating the detection process. This reconstruction law is particularly powerfulwhen the window size is large. For smaller window sizes, the occurrence of the episode L1, R1, R2may often have a duration longer than the window size, which violates our “approximated” assumption.Rule pruning may impact the false alarm rate (false positives). During the detection phase, many ineffective rules will be generated from real-life traffic. If the normal profile doesn’t contain these ineffective rules, the system may raise the false alarms. However, the rule pruning eliminates such ineffective rules automatically. Thus the pruning scheme results in fewer false alarms.4.2 Episode Rule Pruning Algorithm and ResultsWe generated both association and episode rules depending on the connection patterns encountered. Figure 3 plots the rule set growth against the window size under three different network traffic conditions by using the MIT/LL intrusion dataset, and the mixed NetAttack dataset.10020030040050060070080090010001100N u m b e r o f R u l e s G e n e r a t e d W in d o w S iz e (S e c )(a) Episode rule growth for MIT/LL dataset and for USC NetAttack dataset1020304050607080P r u n i n g R a t e (%)W in d o w s iz e (s e co n d s )(b) Episode rule pruning rates in testing MIT/LL vs. USC NetAttack datasetsFigure 3 Effects of pruning on episode rules generated from USC Internet trace, MIT/LLIDS evaluation dataset, and USC NetAttack mixture of both traffic datasetsIn general, the result shows that 40-70% of the episode rules can be prunedby applications of the pruning techniques introduced. Figure 3 shows the effects ofrule pruning on various Internet trace data sets. Figure 3(a) shows the growth of theFER space over two traffic datasets: the MIT/LL vs. USC NetAttack. The pruningeffect becomes more effective in testing over the NetAttack dataset. This is due to therandomness from the background traffic encountered. Figure 3(b) shows the pruningrate plotted for the two traffic datasets. After a window size of 20 seconds, bothdatasets result in a pruning rate between 40% and 70%.To apply the association and frequent episode rules in real-time intrusionanalysis, we use a scan window (sliding window) to collect the connection data. Thescan window is continuously moving forward by a fixed amount of time, called stepsize. For each window, we apply our datamining framework to generate the FERs andthe association rules with high support. This approach is similar to association rulesused by Barbara, et al [3].To apply an FER within a scan window, we calculate the minimal number ofoccurrences of this rule as an additional feature to characterize the traffic. For thispurpose, we amend the format of an FER to as follows:X →Y, (c, s, m)(4.a)where c, s, m are the confidence, support, and minimal occurrence of this FER,respectively. During the training phase, the maximum occurrence number iscalculated in multiple attack-free scan windows. We denote this maximum value asM, An FER is anomalous, if its minimal occurrence number exceeds the maximumnumber, as formally specified below:≥γ(4.b)m⋅Mwhere γ≥ 1 is a relaxation factor. For large γ, we accept more FER occurrenceswithin a scan window. The effect of the relaxation factor (γ)on false alarms is triggered by TCP connections in the NetAttack dataset.We have experimented different γ values and their effect on number of falsealarms by extracting all the TCP connections from NetAttack dataset. Here the scanwindow was set from 100 sec to 1000 sec. The episode window was set between 2 secand 10 sec. The result shown in Fig.4 indicates an initial sharp drop of false alarms.Increasing the relaxation factor γbeyond certain limit (say 1.2 in Fig.4)does notreduce the number of false alarms further.The process of datamining for anomaly-based intrusion detection isillustrated in Algorithm 2. To evaluate pure anomaly detection using our FERs, weuse the audit data sets collected from the first and the third week of the NetAttackdataset. Only TCPdump data was applied. We use the last two-week NetAttack data togenerate the FERs and to compare with the rules generated by normal traffic profiles.。