Probability Based Optimal Algorithms for Multi-sensor Multi-target Detection
嵌入式系统论文英文

J Sign Process SystDOI 10.1007/s11265-011-0650-6Instruction Cache Locking for Embedded Systems using Probability ProfileTiantian Liu· Minming Li · Chun Jason XueReceived: 27 August 2010 / Revised: 31 August 2011 / Accepted: 21 November 2011© Springer Science+Business Media, LLC 2011Abstract Cache is effective in bridging the gap between processor and memory speed. It is also a source of unpredictability because of its dynamic and adaptive behavior. A lot of modern processors provide cache locking capability which locks instructions or data of a program into cache so that a more precise estimationof execution time can be obtained. The selection of instructions or data to be locked in cache has dramatic influence on the system performance. For real-time systems, cache locking is mostly utilized to improve the Worst-Case Execution Time (WCET). However, Average-Case Execution Time (ACET) is also an im-portant criterion for some embedded systems, espe-cially for soft real-time embedded systems, such as image processing systems. This paper aims to utilize instruction cache (I-Cache) locking technique to guar-antee a minimized estimable ACET for embedded sys-tems by exploring the probability profile information.A Probability Execution Flow Tree (PEFT) is intro-duced to model an embedded application with runtime profile information. The static I-Cache locking problem is proved to be NP-Hard and two kinds of locking, fully locking and partially locking, are proposed to find the instructions to be locked. Dynamic I-Cache locking can further improve the ACET. For dynamic I-Cache locking, an algorithm that leverages the application’s branching information is proposed. All the algorithms are executed during the compilation time and the re-sults are applied during the runtime. ExperimentalT. Liu (B) · M. Li · C. J. XueDepartment of Computer Science, City University ofHong Kong, Kowloon, Hong Konge-mail: tiantianster@ results show that the proposed algorithms reduce the ACET of embedded applications further compared to state-of-the-art techniques.Keywords Cache locking· Probability Profile · ACET1 IntroductionCaches are known for their effectiveness in bridging the gap between processor and memory speed, but noto-rious for their unpredictability. With the utilization of cache, the Average-Case Execution Time (ACET) ofan application is improved dramatically. However, the unpredictable dynamic behavior of cache also makesthe estimation of an application’s ACET difficult and imprecise. ACET is an important metric for the designof some embedded systems[1, 2 ], especially for soft real-time embedded systems, such as image processing systems. With the use of cache, the ACET of an ap-plication is most likely to be overestimated, which in turn leads to an unnecessary increased hardware costof embedded systems.To remedy the unpredictability of cache, cache lock-ing technique is provided by a wide selection of modern processors, such as ARM9 series[3 ], MIPS32 series [4] and MCF5249[5]. Cache locking is to select andlock specific content of a program or data in a cache.For an application-specific embedded system, cache locking can effectively guarantee the precision of some cache hit/miss behaviors, thus a tighter ACET bound could be obtained. Embedded systems are mostly application-specific, which means that the applicationto be executed on a specific system is known before-hand. This characteristic enables researchers to utilizeJ Sign Process Syst the application’s properties to make some informed 2 Related Workdecisions before the execution. Therefore, in this paper,we utilize instruction cache (I-Cache) locking technique Several previous work has been done regarding theto find an estimable minimized ACET for an embedded cache locking problem in embedded systems. The most system based on the probability profiling information related works to this paper are [6, 7]. Both of theof the specific application running on the system. two works target to eliminate conflict miss rate within In this paper, a Probability Execution Flow Tree a cache set to reduce the ACET. Anand et al. [6 ] (PEFT) is introduced to model an embedded applica- devise a cost-benefit model to discover the memory ad- tion’s program with its probability profile information dresses to be locked in the I-Cache. Their experiments and application-specific information. Two schemes of confirm that cache locking is beneficial in improving cache locking are considered: static and dynamic. In average case performance. However, their cost/benefitthe static locking scheme, cache contents are loaded formulation contains some profile information whichat application start-up and remain unchanged until is hard to be obtained or not accurate. Additionally,the end. In the dynamic locking scheme, locked cache they focus on finding the beneficial blocks which are contents can be changed at specific reloading points mapped to the same cache set. This will lead to anbased on the runtime information. The cache can be unbalance between different sets, because some setsfully locked or partially locked. The I-Cache locking may contain more valuable blocks while others may problem in this paper aims to analyze the application not. Liang et al. in [7 ] introduce temporal reuse profile during its compilation time, and select a set of nodes to model the cost and benefit of locking memory blocksto be locked in the I-Cache statically or sets of nodes to in the cache. They propose an optimal algorithm and abe reloaded and locked in the I-Cache dynamically. The heuristic approach that use the temporal reuse profilegoal is to optimize the ACET of an embedded system. to determine the most beneficial memory blocks. How- The contributions of this paper are as follows: ever, each cache set is also analyzed individually intheir work. When implementing their methods, both1. Propose I-Cache locking techniques to minimizeof the two works use the trampolines [8] approach to the average execution time of embedded applica-introduce the locking instruction to the binary code so tions by exploring applications’ statistical profilethat the mapping addresses of the blocks will not be information and application-specific foreknowingchanged.information.Most other researchers utilize I-Cache locking in2. Prove that the static I-Cache locking problem forreal-time applications to guarantee a tighter estimation ACET reduction is an NP-Hard problem, and pro-of Worst-Case Execution Time (WCET). Puaut et al.pose a fully locking algorithm and a partially lock-propose some heuristic methods about I-Cache locking ing algorithm.on minimization of WCET and Worst-Case Utilization 3. Propose an off-line algorithm for the dynamic I-(WCU) [9, 10 ].Campoy et al. use genetic algorithms Cache locking by exploring runtime branching in-for both static locking [11] and dynamic locking [12]i n formation. The outputs of the algorithm are usedmultitask, preemptive real-time systems. Falk et al. [13 ] during runtime.take the changing of worse-case execution path into The remainder of this paper is organized as fol- consideration and adopt a greedy strategy to choose lows. Section 2 presents related work. Section 3 an- instructions into cache. Liu et al. [14] study the static alyzes the cache architecture and presents the PEFT I-Cache locking problem to minimize WCET for real- model of an application. In Section 4, the static I- time embedded systems. The problem is proved to Cache locking problem is formulated using the in- be NP-Hard and optimal algorithms are proposed for teger linear programming model and proved to be subsets of the general problem with special propertiesNP-Hard. Fully locking and partially locking algo- and patterns.rithms are proposed respectively. For the dynamic Scratchpad memory is an alternative to cache. TheI-Cache locking problem, an off-line algorithm us- allocation of code/data to the scratchpad memory ising the static locking results and branching informa- under software control. Significant effort has beention to obtain the dynamic locking decisions is pro- invested in developing efficient allocation techniques posed in Section 5. Cache conflict problem caused for scratchpad memories. [15 , 16 ] aim at reducing theby cache locking is discussed in Section 6.S ection7 ACET of programs through memory access profiles. shows the experimental results compared with previ- Puaut et al. [17] propose an algorithm for off-line con- ous work. Finally, concluding remarks are presented in tents selection of on-chip memory, supporting both the Section 8.locked cache and scratchpad memory. They find thatJ Sign Process Systthe performance of applications using the two types of memory are very close to each other in most cases.Little previous work has explored the statistical in-formation and the foreknowing information of em-bedded applications for the I-Cache locking problem.Liang et al.[18 ] utilize the probability information ofan application for cache configuration design whichis orthogonal to this paper’s work. In[19], an ap-proach for early branch resolution and subsequent fold-ing is presented. The application-specific informationis captured by the micro-architecture through a low-cost reprogrammable hardware, thus attaining the twin benefits of processor standardization and application-specific customization. Several work has used the fore-knowing information to provide scheduling methodsto improve timing performance for embedded systems [20– 22 ].Although there were a number of previous efforts on the cache locking problem, most of them focus on re-ducing the WCET[9–14 ]. The most related work to this paper primarily targets to eliminate conflict miss ratewithin one cache set to improve the ACET[6][7 ]. The unbalanced and random distribution of the beneficialblocks in different sets may weaken their methods. Inthis paper, we consider the problem from a differentangle. We first target to find the most efficient blockswithin the whole block sets to minimize ACET, then we use compilation techniques, such as padding and codepositioning[26] to avoid conflicts among these selected blocks. As concluded in[17 ] that using locked cacheand scratchpad memory are very close to each other in most cases, the algorithms proposed in this paper canalso be applied to scratchpad memory allocation.3 Cache Architecture and Task ModelThis section introduces the notations used in this paper concerning the cache architecture and task model.3.1 Cache ArchitectureCache locking technique is supported by several com-mercial processors[3 ,4], with different implementation methods. Some processors, for example Intel XScale[23] and MPC603E[5 ], allow developers to lock theentire cache. While others, for example RC64574[24], allow developers to lock only part of the cache. Somep rocessors[4 ,23] insert specific cache locking opera-tions into the application’s code to perform locking,while others[5, 24] use specific lock/freeze bits intheir cache control registers to lock each single cacheline content. In this paper, we assume the processor is equipped with an I-Cache with a total size of S.The proposed work is applied to a general architecture based on the above processors, resulting in a cache architecture with the following characteristics:1)I-Cache locking can be applied to each line of theI-Cache, which implies that the I-Cache can betotally locked or partially locked. This capability isprovided by several commercial processors[5, 24]. 2)The I-Cache can be loaded using a cache-fillinginstruction, which is provided by lots of processors[4 ,23 ]. During system start-up, a small routine isexecuted to pre-load the cache using the cache-filling instruction. After pre-loading the blocks, thecache is locked. Under the static locking scheme,the locked cache content will never change. Whileunder the dynamic locking scheme, the cache con-tent could be changed at specific reloading pointsby invoking these cache-filling instructions.3)The I-Cache can be either direct-mapped or set-associative. The mapping from memory space tothe I-Cache, as well as the possibility of cacheconflict within the locking selection are solved byprevious compilation techniques, such as proce-dure re-ordering[25 ], padding and code position-ing[26 ], as discussed in Section6.4)If the processor addresses an instruction that islocked in the I-Cache, this instruction will beserved from the I-Cache, resulting in fast accesstime (hit). If the processor addresses an instructionthat is not locked in the I-Cache, this instructionwill be served from the main memory, resulting inlonger access time (miss).5)This paper focuses on I-Cache locking. Data cacheis assumed to work in a normal fashion.3.2 PEFTIn this paper, a Probability Execution Flow Tree (PEFT) is used to model an embedded application.PEFT embodies the control flow of the application’scode and the profiling information of the applicationso that we can analyze it to find which part of the code should be selected into the I-Cache.Definition 1 A PEFT= (V, E, B) is a weighted tree, where V represents the set of nodes and E representsthe set of edges. B is the set of basic blocks in a program. Each b∈ B is a context-specific code block associated with three attributes: block_miss(b) is the single processing time when basic block b is not in the cache, block_hit(b) is the single processing time when basic block b is in the cache and block_s(b) is theJ Sign Process Systsize of basic block b . Node v ∈ V represents the real execution of a code block b∈ B under a certain context and therefore has two attributes: name(v) = b where b∈ B representing that basic block b is executed in this node and count(v) representing the average number of times b is executed in the current context. Edge evu ∈ E denotes a program control flow from node v to node u. Each edge has one attribute: edge_ prob(evu) which represents the execution probability of this flow. Forevery node v,u|evu∈E edge_ prob (evu) = 1.To generate a PEFT, an algorithm PEFT_CON isused, as shown in Algorithm 1. An application is firstrun in a profile tool, and the probabilities of edges are obtained and recorded in a probability matrix P[v][u]. Then, algorithm EFT_CON in[14]isusedtocon-struct an Execution Flow Tree (EFT)[14 ] (line 1). Fi-nally, we attach the statistical probability to each edge(line 2–4).A PEFT example is shown in Fig.1. Figure1( a)isa segment of the benchmark “Audio beam former”[28]and Fig.1(b) is the corresponding PEFT.Some important features about PEFT are as follows:1)The framework of PEFT is similar to the frame-work of CFG (Control Flow Graph) used in pre-vious research[10 ]. In Algorithm EFT_CON[14],each code line is scanned and different controlflows are processed accordingly. The sequentialcodes are the simplest and are treated as one ba-sic block. For branches, loops and routine calls,we process their bodies recursively and attach theEFTsub obtained to the main EFT. The differencebetween PEFT and CFG is that PEFT is explic-itly defined as a tree with probability informationand other attributes related to cache behavior. Abasic block can be one or more statements in theprogram depending on the context. For example,statement “exit(1)” in this example forms node 3in Fig.1(b).For simplicity, some of the call proce-if (!data_file) {print_usage();exit(1);}if (search_far_field == 1) {max_energy = search_far_field_angles(max_result, data_file, output_file, hamming);} else if (hill_climb == 1) {search_grid(source_location, data_file, output_file, hamming);max_result = (float*) malloc(ANGLE_ENERGY_WINDOW_SIZE*sizeof(float));} else {calc_single_pos(source_location, mic_locations, hamming, data_file, output_file);}exit(0);exit( )3print_usage( )21if (!data_file)6max_result = return of node 57search_far_field_angles( )8max_energy = return of node 75malloc( )exit( ) 12if (search_far_field == 1)49 if (hill_climb == 1)11 calc_single_pos( )14 exit( )10search_grid( )13exit( )95%5%31%69%73%27%(a) A segment of a benchmark [25].(b) Its PEFT.Figure 1 A segment of a benchmark and its PEFT.dures of the PEFT in Fig.1(b) are not presentedrecursively. For example, node 7 is an abstract pre-sentation of routine “search_far_field_angles()” .Algorithm EFT_CON does recursively process thesubroutines.2)In practical systems, the value of block_miss(b)orblock_hit(b) of a basic block b is not an accu-rate value if we consider timing anomalies, cacheand pipeline effects. It can be a range of values.In this paper, we use the average-case value ofblock_miss(b) orblock_hit(b) to form a modelfor solving the locking problem and comparingwith previous works. We run the benchmark sev-eral times using SimpleScalar[29 ] and obtain theprofiling information. From the cache miss/hit in-formation, we can obtain the estimation value ofblock_miss(b) orblock_hit(b).3)Node v has three additional attributesnode_miss(v), node_hit(v) and node_prob(v).node_miss(v) ornode_hit(v) is the real executiontime of node v and can be calculated asnode_miss(v) = block_miss(name(v)) × count (v)or node_hit(v)= block_hit(name(v)) × count(v),depending on whether name(v) is put inthe cache.node_ prob(v) is the executionJ Sign Process Systprobability of node v and can be calculatedas node_ prob(v)=node_prob(u)×edge_prob(euv),where u is the parent of v. It is easy todeduce that node_ prob(v) = node_prob(v0) ×edge_ prob(ev0 u1 )× edge_prob(eu 1 u2 )× ··· ×edge_ prob(euv) for node v along the path fromroot v0 to node v,wherenode_prob(v0) = 1.4)Algorithm EFT_CON gives the main flow of aloop. For a node v in a loop, the execution timeof its basic block could be different between itsfirst execution and each successive repetitionbecause of cache reusing[30]. The value ofnode_miss(v) can be calculated as node_miss(v) =block_miss(name(v)) + block_hit (name(v)) ×(count(v) − 1) which is its execution time underthe uncontrolled cache. The node_hit(v) is stillnode_hit(v) = block_hit(name(v)) × count(v).5)There is a procedure Duplicate() in AlgorithmEFT_CON. If a node v has an indegree(v) of atleast 2, Duplicate() instantiates the structure start-ing from v by indegree(v) times, which ensures thatthe output is a tree. For example, nodes 12, 13and 14 in Fig.1 (b) are duplicating nodes whichare introduced by the procedure Duplicate(). Eachduplicating substructure represents an invocationof the associated basic blocks (in this example,it is code line “exit(0)”), so they have the samename and count value, thus same node_miss(v) andnode_hit(v). With these duplicated nodes, thePEFT structure is still equivalent to EFG[13]or CFG[10] structure. From the definition andEFT_CON algorithm, we know that every path inan EFG or CFG is enumerated in a PEFT, whileevery path in a PEFT corresponds to one possiblepath in an EFG or CFG.3.3 ACET of a PEFTACET is the expected length of the root-leaf pathin a PEFT. Let Pi= (pi0 pi1 ... pil i )be a root-leaf path, where pi0, pi1,..., pili ∈ V and li is thenumber of edges on path Pi.Eachpath Pi has twoattributes length(Pi) and probability(Pi). length(Pi)is defined as the summation of the weights ofnodes on Pi, which represents the execution timeof this path.Let Wreal(v) = (1 − δ(name(v))) ×node_mi s s(v) + δ(name(v)) × node_hit(v),whereδ(name(v)) = 1 if name(v) is put in the cacheandδ(name(v)) = 0 otherwise. The length(Pi) is calculated as length(Pi) =li j=0 Wreal( pij). The other attributeprobability(Pi) represents the execution probability of this path and is calculated as probability(Pi) =li−1j=0 edge_ prob (epij pij+1 ).Table 1 Notations used in this paper.Notation Descriptionb A basic blockblock_s(b)Size of basic block bblock_miss(b)Execution time of basic block b when bis not in cacheblock_hit(b)Execution time of basic block b when bis in cacheblock_ prob(b)Execution weight of basic block beach_sa ving(b)ACET saving of basic block bv A nodename(v)Basic block in node vcount(v)Execution count of basic block in node v node_ prob(v)Execution probability of node vnode_miss(v)Execution time of node v when name(v)is not in cachenode_hit(v)Execution time of node v when name(v)is in cacheWreal(v)Real execution time of node v,is equal to node_miss(v) ornode_hit(v) evu An edgeedge_ prob(evu)Execution probability of edge evuPi A root-leaf pathlength(Pi)Execution time of path Piprobability(Pi)Execution probability of path PiDenote the total number of the root-leaf paths as|P|, then the ACET of a PEFT can be calculated as:|P|i=1length(Pi) × probability(Pi)(1) The notations are summarized in Table1.4 Static I-Cache LockingAs discussed in Section3.1, first we want to find a most efficient locking selection of memory blocks to mini-mize ACET of the application. In this section, we dis-cuss the static locking scheme, where cache contents are loaded at application start-up and remain unchangeduntil the end. We further consider two different locking strategies: fully locking and partially locking, depend-ing on whether or not the whole I-Cache is locked.4.1 Fully LockingThe fully locking means that the whole I-Cache is usedas the locked cache.J Sign Process Syst 4.1.1 Problem FormulationThe ACET minimization problem using static I-Cachelocking can be defined as follows. Given an I-Cache ofsize S and a PEFT representing a given program, theaim is to put a subset of basic blocks into the I-Cacheso that the total size of the chosen basic blocks doesnot exceed S and the ACET of the PEFT is minimized.With Formula1 discussed in Section3.3,w eformulatethe fully static I-Cache locking problem as an integerlinear programming (ILP) instance as follows:min|P|i=1length(Pi) × probability(Pi)s.t.⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩length(Pi) =li j=0(1 − δ(name(v))) × node_miss(v)+δ(name(v)) × node_hit(v)probability(Pi) =li−1j=0 edge_ prob (epij pij+1 )b∈Bblock_s(b) × δ(b) ≤ Sδ(b) ∈{0, 1}The variables in this ILP formulation areδ(b) foreach basic block which can only be equal to 0 or 1.The first two groups of equations give the calculationof length(Pi) and probability(Pi). The third inequalityis I-Cache size limitation. Because we use the lock-ing technique with the entire cache, every time weshould maintain this limitation no matter which kindof mapping is used. Otherwise, we will not be ableto put all the selected nodes to the cache. It is truethat some instructions may be mapped to the samecache line. This cache conflicts may happen with both direct-mapped cache and set-associative cache. We can apply some compiling techniques[26] to solve the cache conflict problem after we have decided which nodes to lock, which will be studied in Section6. The goal ofthe problem is to minimize the ACET for the PEFT by determiningδ(b) for each b ∈ B.4.1.2 Problem AnalysisFor each node v in a PEFT, define its set of outgo-ing edges as OutEdgesv ={evtm |e vtm ∈ E, 1 ≤ m ≤ Mv}, whereMv is the out-degree of node v.Let u representthe preceding node of v and ti represent the successive node of v for a path Pi which has v on it. The terms relating to Wreal(v) (node_miss(v) or node_hit(v))in Formula1 can be combined and further transformedas follows:Pi vWreal(v) × probability(Pi)= Wreal(v)×Pi v(edge_prob(epi0 pi1 )×···× edge_prob(euv)prob(evti )×··· × edge_prob(epili−1pili ))= Wreal(v) × (edge_prob(epi0 pi1 )×···×edge_ prob (eu v)×Pi v&ti=tm1≤m≤Mv(edge_prob(evtm )×···×edge_prob(epili−1pili )))= Wreal(v) × (edge_prob(epi0 pi1 )×··· × edge_prob(euv))= Wreal(v) × node_prob(v)As can be seen from the above, for every node v, the corresponding portion in Formula1 is Wreal(v) ×node_ prob(v).Formula1 can be expressed with regardto node v as:v∈VWreal(v) × node_prob(v)(2)In a PEFT, one basic block can be called by different nodes. In other words, name(u) could be equal toname(v) even when u = v. This scenario is denoted asreusing in this paper. Considering reusing, Formula2can be further transformed into:v∈VWreal(v) × node_prob(v)=v∈V((1 − δ(name(v))) × block_miss(name(v))+ δ(name(v))block_hit(name(v)))× count(v) × node_prob(v)=b∈B(((1 − δ(b)) × block_miss(b) + δ(b)× block_hit(b))v∈V& name(v)=b(count(v)× node_prob(v)))Let block_ prob(b) =v∈V& name(v)=b (count(v) ×node_ prob(v)). It is a constant when a PEFT is given.It represents the execution weight of basic block bappearing on different nodes in a PEFT. It may scaleJ Sign Process Systbigger than 1, so we do not say it is a probability.Finally, Formula1 is reduced to:b∈B((1 − δ(b)) × block_miss(b) + δ(b)× block_hit(b)) × block_prob(b)(3)Define each_sa ving(b) = (block_miss(b) − block_hit(b)) × block_prob(b) which is the ACET saving foran individual b∈ B. The static I-Cache locking prob-lem can be proved to be an NP-Hard problem.Theorem 1 The static I-Cache locking for ACET mini-mization is NP-Hard.Proof We prove that this problem is NP-Hard by areduction from the 0/1 knapsack problem. Given a0/1 knapsack instance, we have the finite set A withweight weight(a) and value value(a) for each a ∈ A,a value threshold K and a total weight limit W.Thestatic I-Cache locking problem is constructed as fol-lows. For each a∈ A,wecreateabasic block ba ∈B with each_sa ving(ba) = value(a) and block_s(ba) = weight(a). This instance can be constructed in polyno-mial time from the 0/1 knapsack instance.Define Total_time_unlock= b a∈B block_miss(b a)×block_ prob(ba).Formula3 can be transformed asTotal_time_unlock− b a∈B δ(b a) × each_saving(b a), whereδ(ba) = 1/0 means whether ba is put intocache or not. The deterministic version of staticI-Cache locking problem is to find whether there is aδ(ba) = 0/1 for every ba ∈ B whichcan achieve:Total_time_unlock− b a∈B δ(b a) ×each_sa ving(ba) ≤ Total_time_unlock − K and b a∈F block_s(b a) × δ(b a) ≤ S.This can be done if and only if there is aδ(a) = 0/1 for every a∈ A such that a∈A δ(a) × value(a) ≥ K and a∈A weight(a) × δ(a) ≤ W. Thus the desired δ(a)for every a∈ A exists for the instance of 0/1 knapsackif and only if aδ(ba) for every ba ∈ B exists for the cor-responding instance of static I-Cache locking problem.4.1.3 AlgorithmWhen we consider to lock all the I-Cache of size S,a dynamic programming method can be used to solvethe 0/1 knapsack problem optimally within pseudo-polynomial time[31]. In the static I-Cache locking problem, the transformed objective shown in Formula3 is not related to the PEFT’s structure. Therefore,we can treat basic blocks as items in the 0/1 knapsack problem and carry out dynamic programming similarly. The Algorithm SICL (Static I-Cache Locking for a PEFT) is shown in Algorithm 2.In Algorithm SICL,CalcFunProb(PEFT)isa procedure to calculate block_ prob(bi) and each_sa ving(bi) for each bi ∈ B (line1). Thechoice made under each circumstance is kept using anarray structure OPT[|B|][S + 1] (line 2). Each OPT[i][s] keeps the optimal solution for basicblock subset{b1, ··· bi} under cache size s withtwo variables: Sa ving and Cachable (lines 7–8, 11–12), which respectively represent the maximizedACET saving for this subset and whether or。
服从指数分布的极大似然估计的实际例子

服从指数分布的极大似然估计的实际例子1.统计学家使用极大似然估计来估计指数分布的参数。
Statisticians use maximum likelihood estimation toestimate the parameters of the exponential distribution.2.通过估计参数λ,我们可以更好地了解指数分布中事件发生的方式。
By estimating the parameter λ, we can gain a better understanding of how events occur in an exponential distribution.3.极大似然估计是一种常用的统计方法,可用于估计各种类型的分布。
Maximum likelihood estimation is a commonly usedstatistical method that can be used to estimate various types of distributions.4.我们可以使用极大似然估计来估计指数分布的中位数或平均值。
We can use maximum likelihood estimation to estimate the median or mean of the exponential distribution.5.估计出的参数可以用于预测未来事件的发生情况。
The estimated parameters can be used to predict the occurrence of future events.6.极大似然估计需要收集一定数量的数据来进行计算。
Maximum likelihood estimation requires collecting a sufficient amount of data for computation.7.通过比较不同参数值下的似然函数,我们可以找到最可能的参数值。
proximal policy optimization algorithms 原文

proximal policy optimization algorithms 原文Proximal Policy Optimization (PPO) is a popular algorithm for reinforcement learning that has gained significant attention in recent years. In this article, we will provide an overview of PPO and discuss some of the key concepts and techniques used in the algorithm.PPO is a type of policy optimization algorithm that is designed to find the optimal policy for a given reinforcement learning problem. The goal of PPO is to maximize the expected reward by updating the policy iteratively based on past experiences. Unlike some other policy optimization algorithms, PPO does not require any assumptions about the model dynamics and can be used with both discrete and continuous action spaces.One of the main advantages of PPO is its simplicity and ease of implementation. The algorithm is based on the policy gradient method, which involves estimating the gradient of the policy by running multiple trajectories of the agent in the environment and computing the average reward. PPO uses a surrogate objective function that approximates the policy gradient and performs multiple updates to ensure stability and convergence.The key idea behind PPO is to balance the exploration and exploitation trade-off. The algorithm achieves this by limiting the magnitude of the policy updates using a clipping parameter. This parameter ensures that the new policy does not deviate too much from the old policy, thereby avoiding catastrophic changes. Additionally, PPO introduces an adaptive penalty term that discourages the policy from changing too rapidly.Another important concept in PPO is the use of value functions to estimate the expected rewards. Value functions can be used to calculate the advantage of taking a particular action, which is then used to update the policy. PPO uses a value function approximation to estimate the advantages and computes the surrogate objective function based on these estimates.PPO also incorporates an importance sampling technique to handle off-policy training. This technique allows the algorithm to use past experiences for updating the policy, even if they were collected using an older policy. By importance sampling, PPO can estimate the probabilities of actions under the new policy, which is necessary for the policy updates.In conclusion, proximal policy optimization is a powerful algorithm for reinforcement learning that has shown promising results in various domains. Its simplicity, stability, and ability to handle both discrete and continuous action spaces make it a popular choice among researchers and practitioners. By balancing the exploration-exploitation trade-off and incorporating value functions and importance sampling, PPO has become an effective method for finding optimal policies in reinforcement learning problems.。
An efficient A based algorithm for optimal graph matching applied to computer vision

An Efficient based Algorithm for OptimalGraph Matching applied to Computer Vision Douglas Antony Louis Piriyakumar and Paul Levi Institute for Parallel and Distributed High-Performance Systems Department of Computer Science,University of StuttgartD-70565Stuttgart,GermanyPhone:+49-711-7816-358or387,Fax:+49-711-7816-250 E-Mail:piriyaku levi@informatik.uni-stuttgart.de Abstract.Many of the problems in artificial intelligence including computer vision demand expensive and computationally intensive searches.Particularly incomputer vision,the crux of the problem is usually to match two abstractrepresentation,mostly graphs.Eventhough several approaches are used,toascertain optimality the strategy is chosen.In this paper,we have pre-sented an efficient based algorithm for optimal graph matching with theincorporation of two techniques additionally.The lower and upper boundtechniques help to reduce the number of nodes generated to the appreciableamount and thereby the computation time.The heuristics used in this algo-rithms provides further reductions asserting the proper choice.Keywords:Computer vision,Graph matching,Optimal matching,Heuristicmethods,Algorithm1IntroductionIn most of the core application problems viz.artificial intelligence,code optimization in compilers,CAD and computer vision,manouvring the combinatorial search remains still to be solved efficiently.Especially in computer vision,the crux of the problem is to match two abstract representations(Graphs)[SlHr81].As early as in1964[StHU64],a heuristic program for testing pairs of directed line graphs for isomorphism was ing representative graphs and reordered graph, another efficient algorithm for graph isomorphism was presented in[CDGC70].With backtrack procedure,directed graph isomorphism was solved in[BeAT73].Following this,a fast backtracking algorithm for the same not necessarily running in polynomial1time was developed[ScDC76].An algorithm for subgraph isomorphism using graph theoretical methods is presented in[UlJR76].Mostly,two approaches viz.state-space method with branch and bound techniques [LeWd96]and nonlinear optimization methods with heuristic approximations[CKPm95] are employed to match graphs efficiently.Recently,noise included graph matching[GsRa96] and parallel algorithms[ACTS97]are also investigated.Various strategies and applica-bility of graph matching to computer vision is explained in[BDBC82].However,here we combine the two methods to efficiently get the optimal matching always.The optimality is guaranteed by using algorithm[PMPL98]with the proper function aptly suiting to this problem.This demands the formulation of the problemin terms of approach and developing heuristics for supplanting the upper bound for matching.The results of optimality have been verified by the enumeration of permuta-tions method.2The Matching Problem2.1The DefinitionGiven two graphs,and with vertices sets and along with edges sets and ,we consider here the number of vertices in both the graphs to be same(say n).A cost matrix C is defined with as the cost involved in matching of and of.There are several issues taken into consideration for incorporating them in the process of match-ing viz.degree of mismatch[ACTS97]and such others like difference between indegree and outdegree.The problem is tofind a matching vector M where is the vertex in matched with the vertex in such thatis minimal i ranging from1..n.2.2The FormulationEach child node in the state-space of(explained in the next section)denotes a par-tial assignment i.e.,assigning a non-assigned vertex in with a non-assigned vertex apart from the already available such assignments made in the parent node.Here,where is the cost of the node,is the cost of getting this node from the start node i.e.,the cost of the parent node and is the lower bound on the cost of arriving at a solution node from this node i.e.,the sum of the static levels of the non-assigned vertices in.The rest is the same as the general strategy[PMPL98].23The New Based Algorithm for Graph Matching3.1General AlgorithmAs our algorithm is based on the algorithm,for the sake of clarity and explaining our algorithm,the general algorithm used in most of the artificial intelligence problems is given here as in[NiNJ80].In algorithm,the state space graph is a tree called search tree.Each node in the tree corresponds to the assignment of a particular vertex in a graph with a specific vertex.All the internal nodes in the tree correspond to partial (or incomplete)matching and all external(leaf)nodes in the tree,correspond to either pruned node or complete graph matching.Our problem here is tofind the goal node,a leaf node corresponding to the optimal matching.Associated with a node v in the search tree is a cost function f(v)=g(v)+h(v),which is an underestimate for the minimum cost of an assignment,given that it includes the partial matching.The function g(v)is the cost of the path from the root to v and the function h(v)is a lower bound estimation of the minimum cost function(v),from the node v to a leaf node which corresponds to an optimal matching in the subtree rooted at node v.3.2The Heuristics SolutionTo set the upper bound so that any node with the cost of partial matching or together with also can be pruned,an effective heuristic is defined here.We define a priority list based on the following partial order,has more priority than provided is not less than.The heuristic chooses each time,a vertex from set of all non-assigned vertices such that no other vertex is having a higher priority.And,this vertex is assigned with a vertexin the set of all non-assigned vertices in such that is minimal considering all such non-assigned vertices.We have also tried separately with N-Queen problems solutions as heuristics.3.3The Heuristics FunctionThe heuristic function define here as follows.At node x,let there be vertices already assigned.Then,,i.Now,tofind the f(x)value,we need h(x),heuristic function.To produce always optimal solution,indeed is required.The is defined as,,i,where be the minimum in the row i.In fact,it is easy to verify thatto ascertain the optimality.33.4New Techniques for Reducing Space and Time3.4.1Lower BoundThe lower bound is for the solution which is the minimum possible attainable solution.In the algorithm,the algorithm has to continue even afterfinding a solution as it need not necessarily be optimal.Now the question lies how can it be proved that the given solution is the optimal solution so that the algorithm can be stopped atonce.The only possible way is that when the given solution is equal to the lower bound solution,obviously there could not be a better solution.Hence,the algorithm can stop.Now,the problem boils down to finding the lower bound solution to the problem which is normally difficult in the general case,is not so in graph matching.Let be the minimum in the row i.Then,the lower bound is defined as the sum of all such row minima.i.e.,.The major problem with lower bound is that the possibility of many to many mapping is probable.However,in case of multiple similar objects,denoted as several occurrences of the same subgraph,this will indeed be more desirable.One should be always careful that all feasible optimal solutions need not necessarily be lower bound solutions.The main advantage is that if the given problem has the lower bound solution,the algorithm terminates atonce itfinds such solution,thereby reducing both the memory space required by the further expansions and the time to compute the same.3.4.2Upper BoundThe upper bound is a solution which the already available minimum solution.In the algorithm,the algorithm has to evaluate the function f(x)at every node.Supposing that f(x)is greater than upper bound,that node need not to be expanded further.This will not affect the optimality as anyhow by expanding this node,the solution obtained will be more than that of the already available solution.However it start with,one should have a heuristic solution.So,this algorithm ob-viously need a heuristic method to solve the problem.This also helps in another way drastically.Supposing that the heuristic solution is equal to the lower bound solution, then the algorithm stops without creating even a single node in using the general al-gorithm.Even otherwise,this heuristic solution found initially will serve as the upper bound.So,using upper bound,the number of nodes generated are minimized thereby reducing the memory space and CPU time.3.5Our Algorithm for Optimal Graph Matchingpute the lower bound solution,LB.2.Find a heuristic solution,UB using say N-queen problem.3.IF()THEN44.Construct the priority list of vertices.5.c=0(*node count*).6.Build the initial node and insert it in the list with f()=0.7.REPEAT8.Select the node with smallest f value.9.IF(is not a solution)THEN(a)Generate the successors i.e.,trying with all unassigned vertices.(b)Do the following for each such verticescompare the vertex with all other vertices and assign.(c)FOR each such assignment DOCheck whether it is already there in the list to eliminate the duplicationIF(already available)THENDon't add the nodeELSECompute f()=g()+h()for this node.IF(f(UB)c=c+1Insert it in the listIF(is a solution)THENIF(f(=LB)THENPrint the solution and quit.IF(f(UB)THENUB=f().ENDIFENDIFENDIFELSEPrune the nodeENDIFENDIFENDIFELSEPrint the solution and quit10.UNTIL(is solution OR list is empty).53.6The Algorithms Developed with VariationsWe have developed four variations of the algorithm with our new techniques[PMPL98] of lower bound and upper bound.Variation A is a simple without employing any tech-nique.Variation B is a simple with the above techniques.Variation C is at each level of state-space tree only one vertex is selected based on the priority list.Variation D is the same as variation C together with these techniques.TABLE IComparison of the variations of and permutation AlgorithmsAlgorithm CPU time in sec28545362880than Variation B,eventhough both produces the lower bound solution due to the power of the effective heuristic defined whenever possible.We would like to also inform that these variations are only for those who long for the optimality.The heuristic defined here in this paper as well as the N-Queen problem will also serve the purpose of those who are not interested in optimal solution,but a quick reasonable sub-optimal solution.As these algorithms are highly parallelizable,we are proceeding now with parallelization.We have tried to do permutation of the input cost matrix.However,the results are almost the same for the cases we have tried.To check,the number of nodes were reduced from9to7and the whole procedure is repeated.Table II portrays the same vividly.TABLE IIComparison of the variations of and permutation Algorithms for lesser number ofnodesAlgorithm CPU time in sec140285040[BeAT73]Berztiss, A.T.,”A Backtrack Procedure for Isomorphism of Directed Graphs,”Journal of Association of Computing Machinery,vol.20,no.3,pp.365-377,July1973.[CKPm95]Christmas,W.J.,Kittler,J.and Petrou,M.,”Structural Matching in Computer Vision using Probabilistic Relaxations,”IEEE Trans.PAMI,vol17,no.8,pp.749-764,Aug.1995.[CDGC70]Corneil,D.G.and Gotleib,C.C.,”An Efficient Algorithm for Graph Isomor-phism,”Journal of the Association for Computing Machinery,vol.17,no.1,pp.51-64,January1970.[GsRa96]Gold,S.and Rangarajan,A.,”A Graduated Assignment Algorithm for Graph Matching”IEEE Trans.PAMI,vol18,no4,pp.377-388,April.1996.[LeWd96]Lawler,E.and Wood,D.,”Branch and Bound Methods:A Survey,”Opera-tions Research,vol14,pp.699-719,July-Aug.1966.[NiNJ80]Nilson,N.J.Principles of Artificial Intelligence,Palo Alto,Calif,Tiaga Pub-lications,1980.[PMPL98]Piriyakumar,D.A.L.,Murthy,C.S.R.and Levi,P.”A new Based Optimal Task Scheduling in Heterogeneous Multiprocessor Systems Applied to Com-puter Vision,”HPCN'98International conference,Amsterdam,April21-23,1998.[ScDC76]Schmidt,D.C.,”A Fast Backtracking Algorithm to Test Directed Graphs for Isomorphism Using Distance Matrices,”Journal of Association of ComputingMachinery,vol.23,no.3,pp.433-445,July1976.[SlHr81]Shapiro,L.G.and Haralick,R.M.,”Structural Descriptions and Inexact matching,”IEEE Trans.PAMI,vol3,pp.504-519,sept.1981.[UlJR76]Ullmann,J.R.,”An Algorithm for Subgraph Isomorphism,”Journal of Asso-ciation for Computing Machinery,vol.23,no.1,pp.31-42,January1976. [StHU64]Unger,S.H.,”GIT-A Heuristic Program for Testing Pairs of Directed Line Graphs for Isomorphism,”Communications of the ACM,vol.7,no.1,pp.26-34,January1964.8。
Scenario reduction in stochastic programming_ an approach using probability

1. Introduction Various important real-life decision problems can be formulated as convex stochastic programs which can be mostly written in the form min EP f (ω, x) =
An approach using probability metrics
Received: July 2000 / Accepted: May 2002 Published online: February 14, 2003 – c Springer-Verlag 2003 Abstract. Given a convex stochastic programming problem with a discrete initial probability distribution, the problem of optimal scenario reduction is stated as follows: Determine a scenario subset of prescribed cardinality and a probability measure based on this set that is the closest to the initial distribution in terms of a natural (or canonical) probability metric. Arguments from stability analysis indicate that Fortet-Mourier type probability metrics may serve as such canonical metrics. Efficient algorithms are developed that determine optimal reduceal experience is reported for reductions of electrical load scenario trees for power management under uncertainty. For instance, it turns out that after 50% reduction of the scenario tree the optimal reduced tree still has about 90% relative accuracy. Key words. stochastic programming – quantitative stability – Fortet-Mourier metrics – scenario reduction – transportation problem – electrical load scenario tree
基于核密度估计高斯混合PHD滤波的多目标跟踪算法

基于核密度估计高斯混合PHD滤波的多目标跟踪算法周卫东;张鹤冰;乔相伟【期刊名称】《系统工程与电子技术》【年(卷),期】2011(33)9【摘要】Considering the lower estimated accuracy of traditional algorithms in multi-target tracking system, a Gaussian mixture probability hypothesis density (PHD) filtering algorithm based on kernel density estimation is proposed. After pruning and merging in this algorithm, the Mean-shift algorithm is introduced to estimate kernel density of Gaussian mixture PHD distribution density function, which replaces the traditional state estimation methods. Finally, the estimated peak value is used as the state value. Simulation results show that compared with the traditional algorithms, the proposed algorithm has a higher tracking accuracy.%针对多目标跟踪系统中传统算法目标估计精度较低的问题,提出了基于核密度估计的高斯混合概率假设密度(probability hypothesis density,PHD)滤波算法.在该算法中,经过剪枝、合并后,引入核密度估计理论的Mean-shift算法,对高斯混合PHD分布密度函数进行核密度估计,取代了传统算法中的状态估计方法.最后,选择估计后得到的峰值作为目标状态估计值.仿真结果表明,基于核密度估计的高斯混合PHD滤波算法比传统算法具有更高的估计精度.【总页数】5页(P1932-1936)【作者】周卫东;张鹤冰;乔相伟【作者单位】哈尔滨工程大学自动化学院,黑龙江哈尔滨150001;哈尔滨工程大学自动化学院,黑龙江哈尔滨150001;哈尔滨工程大学自动化学院,黑龙江哈尔滨150001【正文语种】中文【中图分类】TN911【相关文献】1.基于无迹变换的多目标高斯混合粒子PHD滤波 [J], 刘欣;冯新喜;孔云波;王兢2.基于高斯混合PHD滤波的多目标跟踪 [J], 董绵绵;廖小云;曹凯;郭宝亿3.基于高斯混合PHD滤波的多目标状态提取方法 [J], 刘益;王平;高颖慧4.基于二阶中心差分滤波的高斯混合粒子PHD多目标跟踪算法 [J], 冉星浩;陶建锋;贺思三5.基于模糊混合退火分布的多目标高斯混合粒子PHD滤波算法 [J], 冉星浩; 陶建锋; 贺思三因版权原因,仅展示原文概要,查看原文内容请购买。
遗传算法交叉概率的设定

遗传算法交叉概率的设定英文回答:Setting the Crossover Probability in Genetic Algorithms.The crossover probability is a key parameter in genetic algorithms (GAs). It controls the rate at which genetic material is exchanged between parent chromosomes to create offspring. Setting the crossover probability too high ortoo low can have a significant impact on the performance of the GA.The optimal crossover probability depends on a numberof factors, including the size of the population, the selection pressure, and the mutation rate. In general, a higher crossover probability is more likely to result in convergence to a global optimum, while a lower crossover probability is more likely to result in premature convergence to a local optimum.There are a number of different methods for setting the crossover probability. One common approach is to use afixed value, such as 0.5 or 0.7. Another approach is to use an adaptive value, which changes over the course of the GA. For example, the crossover probability could be decreased as the population converges to a solution, in order to reduce the likelihood of破坏 the best solutions.The best approach for setting the crossover probability will vary depending on the specific problem being solved. However, by understanding the role of the crossover probability and the factors that affect it, GAs can be more effectively tuned to achieve optimal performance.中文回答:遗传算法交叉概率的设定。
如何利用贝叶斯采样器处理拥抱不确定性

迭代贝叶斯滤波
p( xk | z1:k )
• Prediction:
Sample space
p( xk | z1:k 1 ) p( xk | xk 1 ) p( xk 1 | z1:k 1 )dxk 1
(1)
• Update:
p( zk | xk ) p( xk | z1:k 1 ) p( xk | z1:k ) p( zk | z1:k 1 )
“Unቤተ መጻሕፍቲ ባይዱnformative” prior
贝叶斯推理:一个小例子
P(ttotal|tpast) 1/ttotal
posterior probability Random sampling
1/ttotal
“Uninformative” prior
P(ttotal|tpast)
tpast
ttotal
如何利用贝叶斯采样器处理(拥抱) 不确定性
刘斌 南京邮电大学计算机学院 2017-11-02 @ 华东师大中国R会
不确定性(Uncertainty)
概率(Probability)
“不确定性”的来源
• 世界运转的规律(规则) :有可能就是随机的 • 不可知(或尚未可知)的因素
• 观测噪声
物理概率(Physical Probability)
• The basic building-block: Importance Sampling
11
重要性采样(Importance Sampling)
• Evaluate complex integrals using probabilistic techniques • Assume we are trying to estimate a complicated integral of a function f over some domain D: F f ( x )dx
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
positions,information of detected targets. This approach will be very useful in guarding large open areas which are crisscrossed by moving targets. Since the number of sensors at our disposal are limited the sensors have to be mobile to try and cover as much of the region as possible. 2. Framework The framework consists of a large rectangular area and a limited number of mobile sensors whose coverage area is much smaller than the total area to be covered. All along the boundary lie discrete points from which targets emanate into the surveillance area. The angle at which the target emanates from the source is a random variable taking values in [0, П] and the targets enter according to Poisson distribution. The targets move across the area with constant velocity in linear trajectories. We divide our surveillance zone into smaller rectangles and further divide these into smaller cells. This is done to help in computation, to make it less complicated. Also certain assumptions are made in our framework.
Abstract– The algorithm presented in this paper is designed to be used in automated multi-sensor surveillance systems which require observation of targets in a bounded area to optimize the performance of the system. There have been many approaches which deal with multi-sensor tracking and observation, but there haven't been many which deal purely with targets detections i.e. each target needs to only be detected once. The metric used to gauge the performance of the system is percentage of targets detected among those that enter the area. Targets enter the area through source points on the side of the area according to Poisson distribution, the rate of entry is constant for all sources. The algorithm presented here uses target arrival information, sensor positions to generate an optimal motion strategy for the multi-sensor system every T time-steps i.e. every T timesteps, the probability of finding undetected targets is estimated, the optimal sensor paths for the next T timesteps are calculated. The algorithm performs robustly and optimally detecting around 80% of the targets that enter the area. 1. Introduction Many security, surveillance and reconnaissance systems require distributed autonomous observation of movement of targets navigating in a bounded area of interest. This is done using a mobile sensor system in which the sensors either work autonomously or in collaboration with other sensors. Multi sensor surveillance finds applications such as in border patrol, guarding of secured areas, search and rescue and warehouse surveillance. There have been many approaches which deal with target detection and tracking. Some of these approaches are behavior based where apriori knowledge of target behavior are used. There are also many approaches which try to optimize performance by using formations which maximise coverage. In most cases, not much is known about target arrival and in some cases even the dimensions of the surveillance area isn't known. In this paper, we take leads from Krishna's [2] approach (IROS 05). This algorithm is designed to be used to monitor a large rectangular surveillance area with a limited number of sensors. On the boundary of the surveillance area lie sources from which targets emanate into the surveillance zone. The dimensions of the surveillance area and the poisson rate of entry from the sources are known beforehand. A T-step ahead algorithm is presented, which works by calculating optimal sensor paths every T time-steps for the next T time-steps. This is done by using target arrival information, sensor
Apart from these, we have apriori knowledge of the environment and of the statistics of target entry and we use this information to try and come up with an optimal algorithm. This situation is not new in robotic and multirobotic literature where optimal path planning and scheduling algorithms require prior knowledge of the workspace in which they operate in terms of their static and dynamic contents vis-à-vis behavior based approaches that do not guarantee optimality or completeness but require no prior knowledge. 2.1 Description of Surveillance Zone, Sensors, Targets. Consider the surveillance system depicted in Figure 1. The sides of the rectangle on the right forms the boundary of the surveillance zone – the area enclosed by it is the area of interest where sensors attempt to optimize their rates of detection. The circles are used to represent the FOVs of the sensors and the radius of the circle is the effective sensor range. The field of vision (FOV) of a