PMJoin Optimizing distributed multi-way stream joins by stream partitioning

合集下载

多目标优化表量化方法

多目标优化表量化方法Multi-objective optimization is a challenging task that involves the optimization of multiple conflicting objectives simultaneously. It is widely used in various fields such as engineering, finance, and operations research. One of the main difficulties in multi-objective optimization is the trade-off between different objectives, where improving one objective often leads to the deterioration of another. This makes the decision-making process complex and requires the use of specialized techniques to find a suitable solution.多目标优化是一项具有挑战性的任务，涉及同时优化多个冲突的目标。

它在工程、金融和运营研究等各个领域被广泛应用。

多目标优化的主要困难之一是不同目标之间的权衡，提高一个目标通常会导致另一个目标的恶化。

这使得决策过程变得复杂，需要使用专门的技术来找到合适的解决方案。

One approach to tackling multi-objective optimization problems is the use of metaheuristic algorithms such as genetic algorithms, particle swarm optimization, and simulated annealing. These algorithms are capable of exploring the solution space efficiently and can find diverse solutions that represent trade-offs between differentobjectives. By iteratively improving the solutions, metaheuristics can help in finding a set of Pareto-optimal solutions that provide a range of options to decision-makers.解决多目标优化问题的一种方法是使用元启发式算法，如遗传算法、粒子群优化和模拟退火等。

外文翻译--- 供应链管理下的库存控制

外文翻译--- 供应链管理下的库存控制在供应链管理环境下，库存控制仍然存在一些问题，需要企业及时解决。

主要问题包括以下几个方面：1.信息不对称在供应链中，不同企业之间的信息不对称问题比较严重，导致企业难以准确预测市场需求，从而影响库存控制的效果。

2.订单不稳定供应链中的订单不稳定性也是影响库存控制的重要因素之一。

订单不稳定会导致企业难以确定库存水平，从而影响供应链整体绩效。

3.物流配送问题物流配送问题也是影响库存控制的重要因素之一。

物流配送不畅会导致库存积压，增加企业的库存成本。

4.缺乏协调供应链中各个企业之间缺乏协调也是影响库存控制的重要因素之一。

缺乏协调会导致企业之间的库存信息不同步，从而影响供应链整体绩效。

为了解决这些问题，企业需要采取一系列措施，如加强信息共享、优化订单管理、完善物流配送体系、建立协调机制等，以提高供应链整体绩效和库存控制的效果。

尽管从宏观角度来看，供应链管理环境下的库存控制比传统管理更具优势，但实际操作中，由于每个企业对供应链管理的理解存在差异，存在利益冲突等问题，导致实际运用时也会出现许多问题。

其中，主要存在以下几个方面的问题：1.各企业缺乏供应链管理的整体观念，导致各自为政的行为降低了供应链整体效率。

2.交货状态数据不准确，导致客户不满和供应链中某些企业增加库存量。

3.信息传递系统低效率，导致延迟和不准确的信息，影响库存量的精确度和短期生产计划的实施。

4.缺乏合作与协调性，组织障碍是库存增加的一个重要因素。

5.产品的过程设计没有考虑供应链上库存的影响，导致成本效益被库存成本抵消，引进新产品时也会遇到问题。

因此，在供应链管理环境下，需要制定合适的库存控制策略，包括建立整体观念，提高信息传递效率，加强合作与协调性，考虑库存影响的产品设计等措施，以提高供应链整体效率。

针对库存管理问题，我们推出以下策略：1.供应商管理库存策略：VMI（Vendor Managed Inventory）库存管理模式。

组合优化问题的机器学习求解方法

组合优化问题的机器学习求解方法在当今数字化和智能化的时代，组合优化问题在各个领域中频繁出现，从物流运输的路线规划，到生产制造中的资源分配，再到通信网络中的频谱分配等等。

这些问题的解决对于提高效率、降低成本、优化资源配置具有至关重要的意义。

传统的解决方法在面对复杂和大规模的问题时往往显得力不从心，而机器学习的出现为组合优化问题的求解带来了新的思路和方法。

组合优化问题通常是在一个有限的解空间中寻找最优解。

例如，在旅行商问题中，需要找到一条经过所有城市且总路程最短的路线；在背包问题中，要在有限的背包容量下选择价值最大的物品组合。

这些问题的特点是解空间巨大，通过穷举所有可能的解来找到最优解在实际中往往是不可行的。

机器学习方法为解决组合优化问题提供了新的途径。

其中一种常见的方法是基于强化学习。

强化学习的核心思想是让智能体通过与环境的交互来学习最优的策略。

在组合优化问题中，我们可以将问题建模为一个智能体在解空间中的搜索过程。

智能体通过采取一系列的动作（选择解的一部分）来逐步构建一个完整的解，并根据解的质量获得奖励。

通过不断的试错和学习，智能体逐渐掌握如何选择更好的动作，从而找到更优的解。

以旅行商问题为例，我们可以将城市看作是智能体需要访问的节点。

智能体每次选择一个未访问的城市作为下一个访问的目标，并根据当前的路线长度获得奖励。

通过大量的训练，智能体能够学习到不同城市之间的关系以及如何选择最优的访问顺序。

另一种方法是使用深度学习来预测组合优化问题的解。

深度学习模型可以学习问题的特征和模式，并根据输入的问题描述直接生成一个可能的解。

例如，对于物流配送中的车辆路径规划问题，可以将客户的位置、需求等信息作为输入，通过深度神经网络生成车辆的行驶路线。

然而，将机器学习应用于组合优化问题并非一帆风顺，还面临着一些挑战。

首先，组合优化问题的解空间通常是离散的，而机器学习模型通常处理的是连续的数据，这就需要进行特殊的处理和转换。

hive优化要点总结电脑资料

hive优化要点总结电脑资料再好的硬件没有充分利用起来，都是白扯淡，比方:通常来说前面的任务启动可以稍带一起做的事情就一起做了,以便后续的多个任务重用,与此严密相连的是模型设计,好的模型特别重要. reduce个数过少没有真正发挥hadoop并行计算的威力，但reduce 个数过多，会造成大量小文件问题，数据量、资源情况只有自己最清楚，找到个折衷点,比方:假设其中有一个表很小使用map join，否那么使用普通的reduce join，注意hive会将join前面的表数据装载内存,所以较小的一个表在较大的表之前,减少内存资源的消耗在hive里有两种比较常见的处理方法第一是使用Combinefileinputformat，将多个小文件打包作为一个整体的inputsplit，减少map任务数set mapred.max.split.size=256000000;set mapred.min.split.size.per.node=256000000set Mapred.min.split.size.per.rack=256000000sethive.input.format=bineHiveI nputFormat第二是设置hive参数，将额外启动一个MR Job打包小文件hive.merge.mapredfiles = false 是否合并Reduce输出文件，默认为Falsehive.merge.size.per.task = 256*1000*1000 合并文件的大小在hive里比较常用的处理方法第一通过hive.groupby.skewindata=true控制生成两个MR Job,第一个MR Job Map的输出结果随机分配到reduce做次预汇总,减少某些key值条数过多某些key条数过小造成的数据倾斜问题第二通过hive.map.aggr = true(默认为true)在Map端做biner,假设map各条数据根本上不一样, 聚合没什么意义，做biner反而画蛇添足,hive里也考虑的比较周到通过参数hive.groupby.mapaggr.checkinterval = 100000 (默认)hive.map.aggr.hash.min.reduction=0.5(默认),预先取100000条数据聚合,如果聚合后的条数/100000>0.5，那么不再聚合multi insert适合基于同一个源表按照不同逻辑不同粒度处理插入不同表的场景，做到只需要扫描源表一次，job个数不变，减少源表扫描次数union all用好，可减少表的扫描次数，减少job的个数,通常预先按不同逻辑不同条件生成的查询union all后，再统一group by计算,不同表的union all相当于multiple inputs,同一个表的union all,相当map一次输出多条集群参数种类繁多,举个例子比方可针对特定job设置特定参数,比方jvm重用,reduce copy线程数量设置(适合map较快，输出量较大)如果任务数多且小，比方在一分钟之内完成，减少task数量以减少任务初始化的消耗，：blog.csdn./u011750989/article/details/12024301。

多目标优化方法及实例解析

多目标优化方法及实例解析常用的多目标优化方法包括遗传算法、粒子群算法、模拟退火算法等，下面将对这几种方法进行简要介绍，并给出实例解析。

1. 遗传算法（Genetic Algorithm, GA）是模拟生物遗传和进化过程的一种优化算法。

它通过设计合适的编码、选择、交叉和变异等操作，模拟自然界中的遗传过程，逐步问题的最优解。

遗传算法的优点是可以同时处理多个目标函数，并能够在计算中保留多个候选解，以提高效率。

实例解析：考虑一个旅行商问题（Traveling Salesman Problem, TSP），即在给定的城市之间寻找一条最短的路径，使得每个城市只访问一次。

在多目标优化中，可以同时优化总路径长度和访问城市的次序。

通过遗传算法，可以设计合适的编码方式来表示路径，选择合适的交叉和变异操作，通过不断迭代，找到一组较优的解。

2. 粒子群算法（Particle Swarm Optimization, PSO）是一种模拟鸟群觅食行为的优化算法。

算法中的每个粒子表示一个候选解，在过程中通过学习其他粒子的经验和自身的历史最优值，不断调整自身位置和速度，最终找到一组较优的解。

粒子群算法的优点是收敛速度快，效果较好。

实例解析：考虑一个机器学习中的特征选择问题，即从给定的特征集合中选择一组最优的特征子集。

在多目标优化中，可以同时优化特征子集的分类准确率和特征数量。

通过粒子群算法，可以将每个粒子表示一个特征子集，通过学习其他粒子的经验和自身的历史最优值，不断调整特征子集的组成，最终找到一组既具有较高分类准确率又具有合适特征数量的特征子集。

3. 模拟退火算法（Simulated Annealing, SA）是模拟固体退火过程的一种优化算法。

算法通过模拟固体在高温下的松弛过程，逐渐降低温度，使固体逐渐达到稳定状态，从而最优解。

模拟退火算法的优点是能够跳出局部最优解，有较好的全局性能。

实例解析：考虑一个布局优化问题，即在给定的区域内摆放多个物体，使得物体之间的互相遮挡最小。

多目标粒子群优化算法

多目标粒子群优化算法多目标粒子群优化算法（Multi-objective Particle Swarm Optimization, MPSO）是一种基于粒子群优化算法的多目标优化算法。

粒子群优化算法是一种基于群体智能的全局优化方法，通过模拟鸟群觅食行为来搜索最优解。

多目标优化问题是指在存在多个优化目标的情况下，寻找一组解使得所有的目标都能得到最优或接近最优。

相比于传统的单目标优化问题，多目标优化问题具有更大的挑战性和复杂性。

MPSO通过维护一个粒子群体，并将粒子的位置和速度看作是潜在解的搜索空间。

每个粒子通过根据自身的历史经验和群体经验来更新自己的位置和速度。

每个粒子的位置代表一个潜在解，粒子在搜索空间中根据目标函数进行迭代，并努力找到全局最优解。

在多目标情况下，MPSO需要同时考虑多个目标值。

MPSO通过引入帕累托前沿来表示多个目标的最优解。

帕累托前沿是指在一个多维优化问题中，由不可被改进的非支配解组成的集合。

MPSO通过迭代搜索来逼近帕累托前沿。

MPSO的核心思想是利用粒子之间的协作和竞争来进行搜索。

每个粒子通过更新自己的速度和位置来搜索解，同时借鉴历史经验以及其他粒子的状态。

粒子的速度更新依赖于自身的最优解以及全局最优解。

通过迭代搜索，粒子能够在搜索空间中不断调整自己的位置和速度，以逼近帕累托前沿。

MPSO算法的优点在于能够同时处理多个目标，并且能够在搜索空间中找到最优的帕累托前沿解。

通过引入协作和竞争的机制，MPSO能够在搜索空间中进行全局的搜索，并且能够通过迭代逼近最优解。

然而，MPSO也存在一些不足之处。

例如，在高维问题中，粒子群体的搜索空间会非常庞大，导致搜索效率较低。

另外，MPSO的参数设置对算法的性能有着较大的影响，需要经过一定的调试和优化才能达到最优效果。

总之，多目标粒子群优化算法是一种有效的多目标优化方法，能够在搜索空间中找到最优的帕累托前沿解。

通过合理设置参数和调整算法，能够提高MPSO的性能和搜索效率。

并行计算模型设计与优化方法

并行计算模型设计与优化方法随着科技的不断发展和计算能力的不断提高，越来越多的计算问题需要使用并行计算来解决。

并行计算是指将一个大问题分解成若干个小问题，通过同时处理这些小问题来加快计算速度的方法。

本文将讨论并行计算模型的设计和优化方法，以及如何利用这些方法来提高计算效率。

在进行并行计算之前，需要确定合适的并行计算模型。

常见的并行计算模型包括Fork-Join模型、Pipeline模型和Master-Worker模型等。

Fork-Join模型是将一个大任务分解成多个子任务，等待所有子任务完成后再进行下一步操作。

Pipeline模型是将一个大任务分解成多个互相依赖的小任务，并通过管道来传递数据。

Master-Worker模型是将一个大任务分解成多个独立的子任务，由主节点协调和控制子任务的执行。

在设计并行计算模型时，需要考虑以下几个因素：任务的拓扑结构、通信开销、负载平衡和数据分布策略。

任务的拓扑结构决定了任务之间的依赖关系，通信开销是指在任务之间传递数据所需的时间和资源，负载平衡是指将任务分配给不同的处理单元时，任务之间的负载是否均衡，数据分布策略是指将数据分配给不同的处理单元时的策略。

在优化并行计算性能时，可以采取以下几种方法：并行度增加、任务调度优化、数据布局优化和通信优化。

并行度增加是指增加并行计算的规模，使用更多的处理单元来处理任务，从而提高计算速度。

任务调度优化是指合理地将任务分配给不同的处理单元，以避免负载不均衡和资源浪费。

数据布局优化是指将数据分配给不同的处理单元时，尽量减少数据的传输开销，使得数据的访问更加高效。

通信优化是指优化任务之间的通信模式和通信方式，减少通信的开销。

在实际应用中，除了设计和优化并行计算模型外，还需要考虑一些其他的因素。

例如，硬件环境的选择和配置，包括处理器的类型和数量、内存的大小和带宽等。

软件环境的选择和配置，包括操作系统的选择和配置、编译器的选择和配置等。

对于不同的应用场景，还可以采用一些特定的技术和算法，例如GPU加速、分布式并行计算等。

多任务学习与迁移学习的联合优化方法

多任务学习与迁移学习的联合优化方法多任务学习与迁移学习是机器学习领域的两个热门研究方向。

本文将介绍多任务学习与迁移学习的概念和应用领域，并提出一种联合优化方法，以提高模型的性能和泛化能力。

该方法将多任务学习和迁移学习相结合，通过共享模型参数和知识传递来实现优化。

1. 引言在现实世界中，我们经常需要同时解决多个相关任务。

例如，在自然语言处理中，我们需要同时处理文本分类、情感分析和命名实体识别等任务。

然而，传统的机器学习方法往往将每个任务视为独立的问题，并单独进行建模和训练。

这种方法忽略了不同任务之间可能存在的相关性，导致模型性能下降。

另一方面，在许多情况下，我们可能已经在一个相关领域上积累了大量数据和知识，并且希望将这些知识应用到一个新领域中。

然而，在新领域上训练一个高性能的模型往往需要大量标注数据，并且可能会面临过拟合和泛化能力不足的问题。

为了解决上述问题，多任务学习和迁移学习应运而生。

多任务学习旨在通过共享模型参数和知识传递来提高多个相关任务的性能。

迁移学习旨在通过利用源领域上的知识来改善目标领域上的模型性能。

2. 多任务学习多任务学习是指在一个模型中同时解决多个相关任务。

这些任务可以是相同类型的，也可以是不同类型的。

通过共享模型参数，多任务学习可以利用不同任务之间的相互关系来提高性能。

传统的多任务学习方法通常使用硬共享参数或软共享参数来实现。

硬共享参数指定了每个任务使用相同的参数，而软共享参数允许每个任务有一定程度上不同的参数。

这些方法通常使用交叉熵损失函数或均方误差损失函数来训练模型。

然而，传统方法忽略了不同任务之间可能存在的相关性和依赖关系。

最近提出了一些新方法，如联合训练、深度卷积神经网络和注意力机制等，在解决复杂、高维度数据中取得了显著效果。

3. 迁移学习迁移学习是指通过利用源领域上的知识来改善目标领域上的模型性能。

源领域和目标领域可以是不同的任务、不同的数据集或不同的特征空间。

迁移学习可以通过特征选择、参数初始化、模型融合和知识传递等方式来实现。

分布式多智能体系统的优化算法研究

分布式多智能体系统的优化算法研究随着人工智能技术的飞速发展，多智能体系统也逐渐成为研究热点。

多智能体系统是一种由多个智能体组成的网络系统，具有分布式的特点，每个智能体都可以相互通信和协作，在实际应用中具有广泛的潜力。

然而，如何优化多智能体系统的效率和性能，成为了一个重要的研究课题。

本文将重点探讨分布式多智能体系统的优化算法研究。

一、分布式多智能体系统介绍分布式多智能体系统（Distributed Multi-Agent System，DMAS）由多个智能体组成，每个智能体在不同的环境中可以执行独立任务或者互相合作，通过相互交流来完成任务。

多智能体系统由于具有多样性、灵活性、鲁棒性和可扩展性等优点，广泛应用于自动驾驶、机器人控制、智能交通、电力控制和分布式计算等领域。

二、多智能体系统中的优化问题在多智能体系统中，智能体之间的互动和协作对整个系统的效率和性能都有着至关重要的影响。

因此，如何优化系统的协作和效率成为研究的热点问题。

在多智能体系统中常见的优化问题包括资源分配、任务分配、联合协作和目标优化等。

1.资源分配资源分配是多智能体系统优化的重要问题之一，包括对空间、时间以及各种物质和能量等资源的分配。

例如，在机器人控制领域，多个机器人需要在一个环境中共同完成某些任务，需要合理分配资源和任务，以提高整个系统的效率和性能。

2.任务分配任务分配是多智能体系统中另一个重要的优化问题，包括将任务分配到具体的智能体上并安排任务的执行顺序，以最大化整个系统的效率和性能。

例如，在自动驾驶领域中，多个车辆需要协同完成路径规划和交通流控制任务，需要合理地分配任务，以避免交通拥堵和交通事故。

3.联合协作多智能体系统中的智能体之间可以进行联合协作，共同完成复杂任务。

当智能体之间存在合作关系时，需要找到最佳的合作策略来提高整个系统的效率和性能。

例如，在智能电网中，多个发电机需要协同控制，以保证电网的稳定性和可靠性，需要找到最好的合作策略。

粒子群算法的多目标优化

粒子群算法的多目标优化粒子群算法（Particle Swarm Optimization，PSO）是一种启发式优化算法，最初由Eberhart和Kennedy在1995年提出，灵感来自鸟群觅食行为。

它通过模拟鸟群中鸟的飞行行为，实现对多个目标的优化求解。

在传统的PSO算法中，只针对单个目标进行优化。

但在实际问题中，经常存在多个目标需要同时优化。

多目标优化问题具有复杂性、多样性和冲突性等特点，往往不能简单地通过将多个目标融合为一个综合目标进行求解，因此需要专门的多目标优化算法。

多目标粒子群算法（Multi-objective Particle Swarm Optimization，MOPSO）是一种扩展的PSO算法，可以解决多目标优化问题。

它通过改进传统PSO的算法机制，使得粒子在过程中能够维持一组非劣解集合（Pareto解集合），从而得到一系列最优解，满足不同领域的需求。

MOPSO算法的具体步骤如下：1.初始化粒子的位置和速度，并随机分布在空间内。

2.根据多个目标函数值计算每个粒子的适应度，用以评估其优劣程度。

3.更新粒子的速度和位置。

速度的更新包括惯性权重、自我认知因子和社会认知因子等参数。

位置的更新采用基本PSO的方式进行。

4.根据更新后的位置，重新计算粒子的适应度。

5.更新全局最优解集合，将非劣解加入其中。

采用非劣解排序方法来实现。

6.判断终止条件是否满足，若满足则输出所有非劣解；否则返回第3步。

MOPSO算法相对于传统的PSO算法，主要的改进在于更新全局最优解集合的方法上。

非劣解排序方法可以帮助保持解的多样性，避免陷入局部最优解。

多目标粒子群算法在多目标优化问题中具有一定的优势和应用价值。

它能够同时考虑多个目标的优化需求，并提供一系列的最优解供决策者选择。

在实际应用中，MOPSO算法已经成功应用于控制系统设计、图像处理、机器学习等多个领域。

总结起来，多目标粒子群算法是一种有效的多目标优化算法。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

PMJoin:Optimizing Distributed Multi-way Stream Joins by Stream PartitioningYongluan Zhou1,Ying Yan2,Feng Yu1,and Aoying Zhou21National University of Singapore2Fudan UniversityAbstract.In emerging data stream applications,data sources are typi-cally distributed.Evaluating multi-join queries over streams from diﬀerentsources may incur large communication cost.As queries run continuously,the precious bandwidths would be aggressively consumed without carefuloptimization of operator ordering and placement.In this paper,we focus onthe optimization of continuous multi-join queries over distributed streams.We observe that by partitioning streams into substreams we can signiﬁ-cantly reduce the communication cost and hence propose a novel partition-based join scheme-PMJoin.A few partitioning techniques are studied.Togenerate the query plan for each substream,a heuristic algorithm is pro-posed based on a rate-based model.Results from an extensive experimentalstudy show that our techniques can suﬃciently reduce the communicationcost.1IntroductionMany recently emerging applications,such as network management,ﬁnancial mon-itoring,sensor networks,stock tickers etc,fueled the development of continuous query processing techniques over data streams.In these applications,the data sources are typically distributed,e.g.the network hosts or routers in network man-agement.Collecting all the data to a centralized server may not be cost-eﬀective due to the high communication cost.Clearly,a distributed stream processing sys-tem is inevitable.Unlike traditional DBMS,where the processing in each node involves expensive I/O operations,stream processing systems often perform main memory operations.These operations are relatively inexpensive in comparison to the communication cost.As both the queries and data streams are continuous,a lot of existing work,such as[2],focus on minimizing the communication cost,espe-cially when the source nodes are connected by a wide-area network.Furthermore, as the streams are continuous and unbounded,a rate-based cost model has to be used.In this paper,we focus on multi-way window join query which is an impor-tant and expensive type of continuous queries.These queries may involve multiple streams from diﬀerent source nodes.Let us look at an example drawn from the network management application.Example1.We want to monitor the traﬃc that passes through three routers and has the same destination host within the last0.5seconds.Data collected from theTable 1.Distribution (tuples/second)S i S 1S 2S 3S 1,2S 1,3S 2,3λi 100.250.0750.039.0037.0012.0008λa i 0.10.03500.0035 1.5λb i 0.1500.0150.0010.5λc i 1000.040.02420.0008 0 1020304050 60 70 80 90 100 110 120 130 140C o m m u n i c a t i o n C o s t (b y t e s /s e c o n d s )Fig.1.plansthree routers feed three streams s 1,s 2and s 3to three processing nodes n 1,n 2and n 3.The content of each stream tuple includes the destination host ip dest of a data packet and possibly other information.This task can be represented in a three-way window join query S 11S 1.dest =S 2.dest S 21S 2.dest =S 3.dest S 3where the window size of each stream is 0.5seconds.2In Table 1,λi denotes the rate of stream S i and λa i denotes the rate of tuples from S i whose value in the dest attribute is a .Furthermore,S i,j is the result stream of S i 1S j and its rate is denoted as λi,j .The minimum communication cost that can be achieved under diﬀerent schemes are as follows:1.Centralized scheme:The best plan in this category is to send both S 2and S 3to n 1.If we assume the tuple size of every stream is 1byte,then this scheme results in communication cost of λ2+λ3=100.1(bytes/sec).2.Distributed scheme:In this category,the best plan is to send S 3to n 2ﬁrst and then ship the result S 2,3to n 1.If we assume the tuple size of a join result tuple is the sum of the two input tuples,we can derive the communication cost of this plan as λ3+λ2,3×2≈54.03(bytes/sec).3.Partitioned-based scheme:taking a closer look at the problem,we can ﬁnd that the arrival rates of tuples vary with diﬀerent values in the joining attributes.Furthermore,the popularity of the values in diﬀerent streams also vary.Hence the optimal plans for these tuples are also diﬀerent.For example in Table 1,dest =a is popular in S 3while it is unpopular in the other two streams.Hence the best plan for these tuples is to ship them from S 2to n 1to join with S 1and then the resulting tuples are sent to n 3to join with S 3.This results in the cost of 0.036(bytes/sec).However,for those tuples with dest =b ,the best plan is totally diﬀerent:those tuples from S 3should be sent to n 1and then to n 2.By exploiting this characteristic,the minimum communication cost that we can get for Example 1is approximately 0.07(bytes/sec).In this paper,we focus on the static optimization of multi-join queries.Static opti-mization can be applied to applications where the stream’s characteristics are rel-atively stable and their changes are predictable.Moreover,given that our problem has not been previously studied,it is important to examine how static optimization can be performed before extending the work to a dynamic context.To summarize,our main contributions are as follows:–We formulate the problem and propose a heuristic-based optimization algo-rithm to decide the join operation locations and the tuple routing orders based on a rate-based cost model.–To further reduce the communication cost,we propose a novel join scheme: PMJoin.We also study diﬀerent partitioning strategies(e.g.,rate-based,hash, etc).–We fully implemented the system and run a simulation study.The study shows the eﬃciency of our techniques.The rest of the paper is organized as follows.Section2reviews the related work. Section3presents the proposed techniques.In Section4,we perform extensive performance studies on our implementation.Section6concludes the paper.2Related WorkDistributed processing of multi-way join have already been extensively studied in the context of traditional relational database systems.[13]provides a thorough sur-vey on this area.The optimizers in Distributed INGRES[9]and System R*[14] consider both CPU and I/O cost as well as the communication cost of processing a whole dataset.In these systems the I/O cost are so high that they cannot be omit-ted.SDD-1[7]uses heuristics to optimize the utilization of semi-join.Semi-join is useful when a tuple is much larger than a single attribute and the selectivity is low.However,semi-join is not readily applicable to window join processing.This is because streams are normally continuous and queries should be evaluated in a nearly real time manner.For example,a tuple t i may be pruned away because there is no matching tuples in the opposite window.However,new tuples may arrive at the opposite window which may match t i.Extra complicated mecha-nisms have to be introduced to ensure the correctness.As shown in our study, we believe our PMJoin,together with the optimization heuristics,is a promising alternative to reduce the communication cost.Our techniques can also be adapted for traditional passive data processing whose performance needs further study.The above-mentioned systems and a considerable amount of work(e.g.[8,16,21])have also exploited horizontal fragmentation of relations to increase the parallelism and consequently to reduce the response time.Static and dynamic data allocation[3, 17,20]try to allocate replications to reduce communication cost or to balance the load on servers.However,none of the above techniques exploit generating diﬀerent plans for diﬀerent partitions.Furthermore,a rate-based cost model has to be used in our problem.[12]studies techniques for the evaluation of window join queries over data streams.[19,10]examine the processing of multiple joins over data streams.[4, 15,5,6]investigate the static and adaptive ordering of operators for continuous queries over data streams.However,all these studies focus on centralized process-ing.There are also several recent eﬀorts devoted to extending centralized schemes to distributed context.[1]proposes the design of a distributed stream system.[2] studies the operator placement problem for stream processing.However,these ap-proaches assume there is an already optimized query plan and then allocate the operators,while our approach does not impose such an assumption.Furthermore, they do not explore partitioning of the streams to further optimize the plans.In [18],the operators are assumed to have been allocated,and the proposed scheme adaptively decides the routing order of the tuples.3Distributed Multi-joinIn this section,weﬁrst formulate the problem and then present the scheme to generate a query plan for each substream.It also applies to the case without stream partitioning.Then we study how stream partitioning can be applied to minimize communication cost.3.1Problem FormulationIn our system,there is a set of geographically distributed data stream sourcesΣ= {S1,S2,···,S|Σ|}and a set of distributed processing nodes N={n1,n2,···,n|N|} interconnected by a widely distributed overlay network.Since the data stream sources in practice may not have the ability to communicate with multiple nodes, we separate the data sources from the processing system by assigning nodes as delegations of data sources.Streams are routed to the various processing nodes through their delegated nodes.A multi-way window join query may involve streams from multiple nodes.For simplicity,we assume the queries do not involve stored tables.As mentioned before,our main concern is to minimize the communication cost. We adopt the unit-time cost paradigm and hence communication cost of a process-ing schemeΩcan be computed as C(Ω)=Amount of communications(in bytes)Observation period .The formal problem statement is:Given a m-way window join(∀m<|Σ|) query Q,which involves a set of streamsΣand they are located at a set of nodes N,ﬁnd a join schemeΩso that the total communication cost C(Ω)is minimized.3.2Join Operation Locations and Tuple Routing OrdersBefore processing the queries,we have toﬁrst decide the placement of the join operators.Then we have to route the streams and the intermediate result streams (if necessary)around the nodes.In this subsection,we focus on how to decide the location of the join operations as well as the routing order of the tuples for each substream.Since it also applies to streams without partitioning and we treat each substream independently,we use the term“stream”instead of“substream”in the following discussions.The evaluation of the join operations allocated to each node can use any of the existing centralized join optimization and processing techniques, e.g.[19,10].In this paper,we assume the join operations in each node are evaluated using MJoin[19].In this technique,one in-memory index structure,e.g.hash tables, is built for each joining stream.The joining stream could be a source stream or an intermediate result stream generated by another node.When a tuple from a joining stream arrives,it would be inserted into its corresponding index structure, and be used to probe other index structures one by one to evaluate the query. The optimization of the probing order has already been studied in centralized processing literatures[4,6,19]and would not be considered in this paper. Notations and Cost Model.Let the set of streams and the set of nodes involved in the query Q beΣand N,respectively.The set of streams that are located in n i∈N is denoted asΣi.The result stream of S i1S j is denoted as S i,j and the result stream of S i,j1S k is denoted as S i,j,k and so on.If two streams are locatedat one node,we say that they are co-located.A function col i,j is deﬁned as follows:col i,j = 0:S i and S j are co-located 1:otherwise (1)We adopt a rate-based cost model similar to the one developed in [5].Thearrival rates of streams S i and S i,j are denoted as λi and λi,j ,respectively.Let W i and W j be the expected number of tuples in the window of S i and S j ,respectively.For a tuple-based window,W i is equal to the window size K i ,while for a time-based window,W i is equal to λi ·T i ,where T i is the window size.To estimate λi,j ,we note that for every unit time,λi tuples from S i and λj tuples from S j would be used to probe the windows of S j and S i ,respectively.Out of the λi ·W j +λj ·W i pairs of tuples,f ×(λi ·W j +λj ·W i )matches are expected to be found,where f is the join selectivity.Therefore the expected number of tuples generated by S i 1S j per unit time can be estimated asλi,j =f ×(λi ·W j +λj ·W i )(2)The tuples in the active window of the result stream S i,j are composed of those result tuples that are the join results of the tuples in the active window of S i and S j .Hence the expected number of tuples in the active window of S i,j can be computed as W i,j =f ·W i ·W j (3)Eqs.(2)and (3)can be recursively applied to obtain the values for multiple joins.Furthermore,it should be noted that the output rate and the window of the join result of a set of streams are independent of how the join is performed.Hence for a given distributed query plan,we can compute its unit-time communication cost by computing the rates of the streams that are sent over the network.A Heuristic Algorithm.Given the above cost model,we can use a speciﬁc searching algorithm to search a speciﬁc solution space.For example,we can use dynamic programming to select an optimal plan from all the left deep tree plans.The computation complexity of the algorithm is O (n !).However,as we will see soon,the search algorithm has to be applied several times in our partition-based join approach.Hence we will propose a much cheaper algorithm which runs in O (n 2)time.Algorithm 1shows the proposed stream join optimization algorithm.The input of the algorithm is the set of streams Σinvolved by the query as well as the join graph representation G of the query.A join graph consists of a set of vertices each representing a stream and a set of edges each representing a join operation between the two connected streams.Furthermore,each vertex in the graph is annotated with the source node of the corresponding stream.We use the following example to illustrate.Example 2.A query joins 5streams:S 0,S 1,S 2,S 3and S 4which are spread over 3nodes.Figure 2(a)shows the join graph of this query.The location of each stream is drawn around each vertex.The selectivities of the join operations are also drawn around the corresponding edges.Columns 2−6in Table 2list the arrival rates λi and the expected number of tuples in the window W i of these source streams.For brevity,we assume that tuples from every stream (either a source stream or an intermediate result stream)have the same sizes in the following discussions.The adoption of this assumption does not lose any generality as we can always in-corporate the tuple sizes in the calculation of cost without changing the algorithm.Algorithm 1:StreamJoinOpt (Σ,G )Input :Σ:A set of streams;G :A join graph over Σ;begin 1for each n i ∈N do2Sort Σi in increasing arrival rates;3for j =0;j <|Σi |;j ++do4for k =j +1;k <|Σi |;k ++do5if λΣi [j ]1Σi [k ]<λΣi [j ]then6Label the join between Σi [j ]and Σi [k ]as local;7Σi [j ]←Σi [j ]1Σi [k ];8Σi ←Σi −Σi [k ];9Sort Σin increasing arrival rates;10while |Σ|>1do 11Σp ←the slowest stream S i ;12Σ−=S i ;13repeat 14cost ←MaxNumber ;15for each stream S j joinable with any stream in Σp do 16if C (Σp +S j )<cost then 17k ←j ;18cost ←C (Σp +S j );19label the edges that connect any stream in Σp and S j as pending;20if case (1)is chosen then 21assign all the pending join operations to the node of S j ;22S p ←Collapse Σp and S k to one node ;23Σp ←S p ;2425Σp +=S k ;Σ−=S k ;26|Σp |=1;27Insert Σp into Σ;2829At the ﬁrst step (lines 2-9)of the algorithm,we ﬁnd whether there is any locally evaluable join operation which can result in a stream whose rate is smaller than both joining streams.Evaluating these joins locally tends to reduce the po-tential communication cost if some of the streams need to be shipped out to other sites.For Example 2,there are two locally evaluable joins:S 01S 1and S 21S 3.By using Equations (2)and (3),λS 01S 1and λS 21S 3can be estimated as 70and 15,respectively.Hence we choose to allocate S 21S 3to n 1and we label the corre-sponding edge with n 1.For ease of processing,once a join operation is allocated,we would collapse the two connected vertices in the join graph and the resulting vertex represents their join result stream.By applying this to Figure 2(a),we can derive Figure 2(b).The rate and window size of S 2,3are also listed in column 6of Table 2.f=0.0040n 0S 2n 1S 3n 1n 2S 4S 1n 0f=0.01f=0.001f=0.01n 1S (a)Join Graph 112,3n 1S 4n 2n 0S 0n 0S 1n S (b)Join Graph 2P 4n 2n 1S 0,2,3n 0S 1S (c)Join Graph 304n 2n 1S 0,2,3n 0S 1n 0n S (d)Join Graph 4Fig.2.Processing steps for an example queryTable 2.Parameters of streams S i S 0S 1S 2S 3S 4S 2,3S 0,2,3λi 1035253015159W i 10035025030015075301S 2S 3S 0S 4S Fig.3.The plan treeIn the second part (lines 10-28)of the algorithm,we employ a heuristic approach to allocate the remaining join operations.There are two nested loops in this part.For each iteration of the outer loop,we will determine the location of a subset of join operations.First,we pick a stream with the smallest rate,say S i .This is because it may result in less communication cost if S i has to be transmitted over the network.Next,to evaluate the join between S i and each of the other streams S j that are joinable with S i ,we have two cases:1.Send S i to the node of S j .The potential communication cost of this case isequal to the sum of the cost of sending S i to the node of S j and the potential cost of sending out the result stream of S i 1S j ,i.e.λi ·col i,j +λi,j .The second term is to count the potential cost of sending out the result stream to perform other join operations.2.Send both S i and S j to a third site.The potential cost of this case is λi +λj .For each stream,the case with smaller cost is used.We greedily choose a stream S k with the smallest estimated cost and move it from Σto Σp .If case (1)is chosen for S k ,that means the join operation is already allocated.We will remove streams S i and S k from Σand add the result stream S i,k to Σand start a new iteration.Correspondingly,in the join graph,we will collapse nodes S i and S k into one node S i,k .However,if case (2)is chosen for S k ,that means the join operation is still pending for allocation.We will search for another stream S l that is joinable to any stream in Σp with the smallest cost.The cost estimation is similar to the above analysis.To ease the presentation of the algorithm,we deﬁne the following function:C (Σp +S j )=min { S i ∈Σp λi +λj , S i ∈Σpλi ·col i,j +λΣp ,i }(4)For example,in Figure 2(b),we ﬁrst add the slowest stream S 0to Σp .Then for the three joinable streams S 1,S 2,3and S 4,using Eqs.(2),(3)and (4),we can ﬁnd that C (Σp +S 2,3)is the smallest.Furthermore,case (1)should happen,i.e.S 0will be sent to node n 1to perform the join with S 2,3.Hence we label the edge between S 0and S 2,3with n 1.Then we collapse nodes S 0and S 2,3to one node S 0,2,3.ThisS a1S a3S a3S b1S b2S b3S c2S c1S c2Fig.4.Plans for the substreams in Example1results in Figure2(c).The rate and window of S0,2,3is computed using Eqs.(2) and(3)and listed in column7of Table2.Now a new iteration of the outer loop in the second part of the algorithm has to be started.The currently slowest stream is S0,2,3,hence it is added toΣp.Among the two joinable streams S1and S4,the potential cost of adding S4is smaller.This time,case(2)is chosen,i.e.S0,2,3and S4have to be sent to a third site.We label the edge between node S0,2,3and S4 with a P to indicate that the join operation is pending for allocation.Then the last stream S1has to be chosen and S1and S0,2,3have to be sent to n0to perform the joins.Now the two join operations can be labeled with n0.Then all the join operations have already been allocated.The output plan of Algorithm1can be represented using a tree.In the tree, each leaf node is a source stream and each intermediate node is an MJoin operator. Each MJoin operator is located in one node and has two or more input streams.We order these streams in the order such that the right most stream(or abbreviated as the right stream)have the same location with the MJoin operator.That means all the other input streams of this MJoin operator would be sent over to the location of the right stream to perform the join operations.Figure3shows the tree representation of the output plan of Example2.3.3Stream PartitioningIn a partition-based scheme,each stream S i may be partitioned into D substreams S1i,S2i,...,S D i based on the values on the joining attribute.This is based on the observation that the arrival rates of tuples with diﬀerent values may vary much inside each single stream.Hence the optimal scheme for these tuples are diﬀerent. We denote the rate of a substream S k i asλk i.PMJoin.In this subsection,we will look at how the partition-based join can be applied to a multi-way equijoin query whose join predicates are speciﬁed on a single attribute,say attr.This kind of queries is common in a lot of applications,such as Example1in Section1.Furthermore,these could also be a subset of predicates in a multi-way join query that are speciﬁed on the same attribute.We propose a scheme that is called Partition-based Multi-way Join(PMJoin)to evaluate this set of join predicates.Every stream involved in these join predicates is partitioned into multiple substreams on attr.The substreams of all the streams can be grouped intoD groups.The k th group of substreams is{S k1,S k2,...,S k|N|}.For each group ofsubstreams,we can use Algorithm1to decide the allocation of the join operations.We illustrate the plan of PMJoin by using Example1.First,based on the value of the dest attribute,we partition each stream into three substreams S a i,S b i and S c i. These streams are grouped into three groups.Then for each group of substreams, we use Algorithm1to optimize the plan.The resulting plans for the three groups of substreams are shown in Figure4.To get the lowest cost,we can partition each stream into as many substreams as possible.For example,we can put tuples with each distinct value in the joining attribute into one substream.Let the number of these values be R then we could partition the stream into R substreams.However,it is clear that with more par-titions,more plans have to be generated and it complicates the processing.So we adopt a moreﬂexible approach where the number of partitions can be speciﬁed as any D.This can be viewed as clustering the aboveﬁnest substreams(i.e.,one substream per value)into D partitions.In the following discussions,we refer to theseﬁnest substreams as FStreams.F S k i stands for the k th FStream from stream S i.And the unique attr value of the tuples of a FStream is called the value of the FStream.We consider three approaches:1.Hash partition.A hash function can be applied to hash the values of the FStreams into one of the D buckets.The FStreams in each bucket compose a substream.This is actually a random partitioning method.2.Range partition.Divide the data range into D sub-ranges.FStreams whose values fall into the i th sub-range compose the i th substream.3.Rate-based partition.The above two approaches ignore the arrival rates of the various FStreams.A good partitioning method should put those groups of FStreams whose optimal plans are similar to each other in one partition.In this way,the generated plan for that partition would be good for all its FStreams.Here we use an approximate approach to estimate the similarity of the optimal plansof two groups of FStreams.For each group of FStreams,{F S k1,F S k2,...,F S k|N|},we sort them in increasing order of their arrival rates.Then we create a vector V k where the i th element indicates the position of F S k i in the above sorted list. For example,if we have a sorted list as F S k3,F S k1,F S k2 ,then V k= 2,3,1 . So the distance between the k th and the l th groups of FStreams is measured by the distance between V k and V l,which is measured as|V k−V l|.The intuition is that the more similar the sorted lists of the two groups of FStreams are,the more similar their optimal plans would be.Now we can employ any clustering techniques to cluster the groups of FStreams into D clusters.In this paper,we adopt the k-Means approach[11].To apply all the above mechanisms,we need to know the rate of each FStream. To reduce the cost of maintaining such statistics,we can use traditional histogram approaches.Only statistics of histogram buckets are maintained,and the rates of an FStream is estimated based on the statistics of the bucket it belongs to.Multi-join on diﬀerent attributes.For a generic multi-join query whose joins involve multiple attributes,our approach works as follows.Weﬁrst run Algorithm1 to determine the plan for the scheme without partitioning.Given the output plan of Algorithm1,we will try toﬁnd out several sets of join predicates where we can apply PMJoin.We call a MJoin operator to be partitionable on attr if the join predicates in the Mjoin operator are all(equalities)on the same attribute attr.The procedure toﬁnd the subset of join predicates to apply partitioning works in two steps.In the ﬁrst step,from the output plan of Algorithm1,we try to aggressively determine the subsets of join predicates that can be partitioned by using Algorithm2.The algorithm starts from the root.If the current operator is found to be partitionableAlgorithm2:FindPartition(O i)Input:O i:an MJoin operator;R:an boolean array,R[i]is true if O i is the right child of its parent;begin1if!R[i]AND O i is partitionable on an attribute attr then2Mark O i as PMJoin;3for each child operator O j of O i do4if O j is partitionable on attr then5Merge O j to O i;6for each child operator O j of O i do7FindPartition(O j);89on an attribute,say attr,it would be marked as a PMJoin operator.Then if any child of the current operator is also partitionable on attr,it would merge that child with the current operator.Note that after the merge,the prior grandchildren would become children of the current operator.These new children would also be searched to see if they can be merged.After the merging attempt,we recursively call the algorithm on each child of the current operator.In the second step,we try to select some of the PMJoins from those found by the above algorithm.Note that the output stream of a PMJoin consists of a number of substreams that would be located at several sites.For example,the result stream S1,2,3of Example1consists of three substreams that are located at n1,n2and n3.Now suppose the result stream has to join with another steam,say S i,on another attribute.If PMJoin is used to join S1,2,3and S i,we have to repartition the substreams of S1,2,3that are located at the three nodes.Furthermore,the substreams of S i may have to be sent to all these three nodes.This results in high communication cost.Therefore,we opt to impose two constraints on the application of PMJoin.(1)The input streams of a PMJoin should be located at a single node. That means a PMJoin cannot be the child of another PMJoin.(2)The right child of a MJoin operator cannot be a PMJoin operator.Otherwise,the other input streams of the MJoin operator have to be sent over to the output nodes of that PMJoin.Our heuristic,which is given below,favors those PMJoins that have high input stream rates.This is because they may provide more opportunities to reduce the communication cost by using PMJoin.1.Sort all the PMJoins on the total input stream rates.2.Remove the one with the largest input stream rate.3.Remove the parent PMJoin(if any)from the sorted list,and restore it back toone or more MJoin operators.4.If the list is not empty go to step2.4Performance StudyIn this section,we present a performance study of our techniques.We fully imple-mented the system using Java.To ease the control of experiments,we use a discrete。