Network Processor Architecture for Next Generation Packet Format

合集下载

NetLogic Microsystems助力中国新一代互联网网络

器内核与可从１个扩展到１８ｎＣＵｓ２个ｘＰ的可扩展性的独特组合能够提供每秒２０Ｍｐ以上的智能应用程序处理性能，４
使其成为了用于业界４～７层智能网络、服务和应用程序处理的性能最高的多核通信处理器。ＮｅＬｇｃＭｉｒｓｓｍｓ行副总裁兼总经理Ｂｈｏｚｄ称：对我们而言，中国是一个令人振奋且非常重要ｔｏｉｃｏｙｔ执ｅｅｒｏＡｂｉ “
ＤＩｃ，＾Ｉ
中的ＮＲ号不能重复。由此需要仿照四色原理，合理划分１
（）。图９
在Ｍｉｉｌ组网中，现网的Ｉ口和Ａ口的承载方式主ｎ— ｅＦｘｕ
要有Ｅ和ＡＭ。需要预留大量的冗余电路．以保证ＭＧＷ１Ｔ
退出时，容量不受影响，所以电路利用率很低同时网络结构会比较复杂。多，市场的竞争越来越激烈，网络稳定的重要性越现突出。本文对于ＭＳＯＯ＋ｎ。ｌｘ组网方式的优势和ＣＰＬＭｉｉｅ的Ｆ对网络的影响进行了分析，对保持网络稳定有着积极有效的作用。隧
Ｓｐｅｉ￣Ｔｅｃａｏｃａｈｏｌｇｙ
技术
性．提高了网络质量和可靠性。但需要ＳＲＲ及ＭＧＷＥＶＥ
进行必要的软硬件升级，带来了网络数据的复杂性。另外．ＮＲＩ识和标识用户的ＴＳ共享２ｂｔ（标ＭＩ５ｉ其中ＮＲＩ多ｌ位）。如果ＮＲＩ位数越多，则在同一个最０的ＭＳＳｒｅ池中所能容纳的ＭＳ．ｅｖｒ数量就越多．Ｃ—ｅｖｒＣＳｒｅ的但同时每个ＭＳ — ｅｖｒ能服务的用户数就会减少．因ＣＳｒｅ所此，ＮＲ一般取４５比较合适，这时每个ＭＳ — ｅｖｒＩ～位ＣＳｒｅ的最大用户数在１００万左右。由于这一限制，每个０～２０ＭＳＯＯＬ最大容量限制在３０万左右用户。由于Ｉ．ＣＰ的２０ｕＦｅ采用ＮＩ法实施调度，因此相邻两个ＭＳ —ｅｖｒｌｘＲ算ＣＳｒｅ池

德尔·艾美 S5148F-ON 25GbE 顶层架（ToR）开放网络交换机说明书

The Dell EMC S5148 switch is an innovative, future-ready T op-of-Rack (T oR) open networking switch providing excellent capabilities and cost-effectiveness for the enterprise, mid-market, Tier2 cloud and NFV service providers with demanding compute and storage traffic environments.The S5148F-ON 25GbE switch is Dell EMC’s latest disaggregated hardware and software data center networking solution that provides state-of-the-art data plane programmability, backward compatible 25GbE server port connections, 100GbE uplinks, storage optimized architecture, and a broad range of functionality to meet the growing demands of today’s data center environment now and in the future.The compact S5148F-ON model design provides industry-leading density with up to 72 ports of 25GbE or up to 48 ports of 25GbE and 6 ports of 100GbE in a 1RU form factor.Using industry-leading hardware and a choice of Dell EMC’s OS10 or select 3rd party network operating systems and tools, the S5148F-ON Series offers flexibility by provision of configuration profiles and delivers non-blocking performance for workloads sensitive to packet loss. The compact S5148F-ON model provides multi rate speedenabling denser footprints and simplifying migration to 25GbE server connections and 100GbE fabrics.Data plane programmability allows the S5148F-ON to meet thedemands of the converged software defined data center by offering support for any future or emerging protocols, including hardware-based VXLAN (Layer 2 and Layer 3 gateway) support. Priority-based flow control (PFC), data center bridge exchange (DCBX) and enhanced transmission selection (ETS) make the S5148F-ON an excellent choice for DCB environments.The Dell EMC S5148F-ON model supports the open source Open Network Install Environment (ONIE) for zero touch installation of alternate network operating systems.Maximum performance and functionalityThe Dell EMC Networking S-Series S5148F-ON is a high-performance, multi-function, 10/25/40/50/100 GbE T oR switch purpose-built for applications in high-performance data center, cloud and computing environments.In addition, the S5148F-ON incorporates multiple architectural features that optimize data center network flexibility, efficiency, and availability, including IO panel to PSU airflow or PSU to IO panel airflow for hot/Key applications •Organizations looking to enter the software-defined data center era with a choice of networking technologies designed to deliver the flexibility they need• Use cases that require customization to any packet processing steps or supporting new protocols• Native high-density 25 GbE T oR server access in high- performance data center environments• 25 GbE backward compatible to 10G and 1G for future proofing and data center server migration to faster uplink speeds. • Capability to support mixed 25G and 10G servers on front panel ports without any limitations• iSCSI storage deployment including DCB converged lossless transactions• Suitable as a T oR or Leaf switch in 100G Active Fabric implementations• As a high speed VXLAN L2/L3 gateway that connects the hypervisor-based overlay networks with non-virtualized • infrastructure•Emerging applications requiring hardware support for new protocolsKey features •1RU high-density 25/10/1 GbE T oR switch with up to forty eight ports of native 25 GbE (SFP28) ports supporting 25 GbE without breakout cables• Multi-rate 100GbE ports support 10/25/40/50 GbE• 3.6 Tbps (full-duplex) non-blocking, cut-through switching fabric delivers line-rate performance under full load**• Programmable packet modification and forwarding • Programmable packet mirroring and multi-pathing • Converged network support for DCB and ECN capability • IO panel to PSU airflow or PSU to IO panel airflow • Redundant, hot-swappable power supplies and fans • IEEE 1588v2 PTP hardware supportDELL EMC NETWORKING S5148F-ON SERIES SWITCHProgrammable high-performance open networking top-of-rack switch with native 25Gserver ports and 100G network fabric connectivity• FCoE transit (FIP Snooping)• Full data center bridging (DCB) support for lossless iSCSI SANs, RoCE and converged network.• Redundant, hot-swappable power supplies and fans• I/O panel to PSU airflow or PSU to I/O panel airflow(reversable airflow)• VRF-lite enables sharing of networking infrastructure and provides L3 traffic isolation across tenants• 16, 28, 40, 52, 64 10GbE ports availableKey features with Dell EMC Networking OS10• Consistent DevOps framework across compute, storage and networking elements• Standard networking features, interfaces and scripting functions for legacy network operations integration• Standards-based switching hardware abstraction via Switch Abstraction Interface (SAI)• Pervasive, unrestricted developer environment via Control Plane Services (CPS)• Open and programmatic management interface via Common Management Services (CMS)• OS10 Premium Edition software enables Dell EMC layer 2 and 3 switching and routing protocols with integrated IP Services,Quality of Service, Manageability and Automation features• Platform agnostic via standard hardware abstraction layer (OCP-SAI)• Unmodified Linux kernel and unmodified Linux distribution• OS10 Open Edition software decoupled from L2/L3 protocol stack and services• Leverage common open source tools and best-practices (data models, commit rollbacks)• Increase VM Mobility region by stretching L2 VLAN within or across two DCs with unique VLT capabilities• Scalable L2 and L3 Ethernet Switching with QoS, ACL and a full complement of standards based IPv4 and IPv6 features including OSPF, BGP and PBR• Enhanced mirroring capabilities including local mirroring, Remote Port Mirroring (RPM), and Encapsulated Remote Port Mirroring(ERPM).• Converged network support for DCB, with priority flow control (802.1Qbb), ETS (802.1Qaz), DCBx and iSCSI TLV• Rogue NIC control provides hardware-based protection from NICS sending out excessive pause frames48 line-rate 25 Gigabit Ethernet SFP28 ports6 line-rate 100 Gigabit Ethernet QSFP28 ports1 RJ45 console/management port with RS232signaling1 Micro-USB type B optional console port1 10/100/1000 Base-T Ethernet port used asmanagement port1 USB type A port for the external mass storage Size: 1 RU, 1.72 h x 17.1 w x 18.1” d (4.4 h x 43.4 w x46 cm d)Weight: 22lbs (9.97kg)ISO 7779 A-weighted sound pressure level: 59.6 dBA at 73.4°F (23°C)Power supply: 100–240 VAC 50/60 HzMax. thermal output: 1956 BTU/hMax. current draw per system:5.73A/4.8A at 100/120V AC2.87A/2.4A at 200/240V ACMax. power consumption: 516 Watts (AC)T yp. power consumption: 421 Watts (AC) with all optics loadedMax. operating specifications:Operating temperature: 32° to 113°F (0° to 45°C) Operating humidity: 5 to 90% (RH), non-condensingFresh Air Compliant to 45CMax. non-operating specifications:Storage temperature: –40° to 158°F (–40° to70°C)Storage humidity: 5 to 95% (RH), non-condensingRedundancyHot swappable redundant power suppliesHot swappable redundant fansPerformanceSwitch fabric capacity: 3.6TbpsPacket buffer memory: 16MBCPU memory: 16GBMAC addresses: Up to 512KARP table: Up to 256KIPv4 routes: Up to 128KIPv6 routes: Up to 64KMulticast hosts: Up to 64KLink aggregation: Unlimited links per group, up to 36 groupsLayer 2 VLANs: 4KMSTP: 64 instancesLAG Load Balancing: User Configurable (MAC, IP, TCP/UDPport)IEEE Compliance802.1AB LLDPTIA-1057 LLDP-MED802.1s MSTP802.1w RSTP 802.3ad Link Aggregation with LACP802.3ae 10 Gigabit Ethernet (10GBase-X)802.3ba 40 Gigabit Ethernet (40GBase-X)802.3i Ethernet (10Base-T)802.3u Fast Ethernet (100Base-TX)802.3z Gigabit Ethernet (1000BaseX)802.1D Bridging, STP802.1p L2 Prioritization802.1Q VLAN T agging, Double VLAN T agging,GVRP802.1Qbb PFC802.1Qaz ETS802.1s MSTP802.1w RSTPPVST+802.1X Network Access Control802.3ab Gigabit Ethernet (1000BASE-T) orbreakout802.3ac Frame Extensions for VLAN T agging802.3ad Link Aggregation with LACP802.3ae 10 Gigabit Ethernet (10GBase-X)802.3ba 40 Gigabit Ethernet (40GBase-SR4,40GBase-CR4, 40GBase-LR4, 100GBase-SR10,100GBase-LR4, 100GBase-ER4) on optical ports802.3bj 100 Gigabit Ethernet802.3u Fast Ethernet (100Base-TX) on mgmtports802.3x Flow Control802.3z Gigabit Ethernet (1000Base-X) with QSAANSI/TIA-1057 LLDP-MEDJumbo MTU support 9,416 bytesLayer2 Protocols4301 Security Architecture for IPSec*4302 I PSec Authentication Header*4303 E SP Protocol*802.1D Compatible802.1p L2 Prioritization802.1Q VLAN T agging802.1s MSTP802.1w RSTP802.1t RPVST+802.3ad Link Aggregation with LACPVLT Virtual Link TrunkingRFC Compliance768 UDP793 TCP854 T elnet959 FTP1321 MD51350 TFTP2474 Differentiated Services2698 T wo Rate Three Color Marker3164 Syslog4254 SSHv2791 I Pv4792 ICMP826 ARP1027 Proxy ARP1035 DNS (client)1042 Ethernet Transmission1191 Path MTU Discovery1305 NTPv41519 CIDR1812 Routers1858 IP Fragment Filtering2131 DHCP (server and relay)5798 VRRP3021 31-bit Prefixes3046 DHCP Option 82 (Relay)1812 Requirements for IPv4 Routers1918 Address Allocation for Private Internets2474 Diffserv Field in IPv4 and Ipv6 Headers2596 Assured Forwarding PHB Group3195 Reliable Delivery for Syslog3246 Expedited Assured Forwarding4364 VRF-lite (IPv4 VRF with OSPF andBGP)*General IPv6 Protocols1981 Path MTU Discovery*2460 I Pv62461 Neighbor Discovery*2462 Stateless Address AutoConfig2463 I CMPv62464 Ethernet Transmission2675 Jumbo grams3587 Global Unicast Address Format4291 IPv6 Addressing2464 Transmission of IPv6 Packets overEthernet Networks2711 IPv6 Router Alert Option4007 IPv6 Scoped Address Architecture4213 Basic Transition Mechanisms for IPv6Hosts and Routers4291 IPv6 Addressing Architecture5095 Deprecation of T ype 0 Routing Headers inI Pv6IPv6 Management support (telnet, FTP, TACACS,RADIUS, SSH, NTP)OSPF (v2/v3)1587 NSSA1745 OSPF/BGP interaction1765 OSPF Database overflow2154 MD52328 OSPFv22370 Opaque LSA3101 OSPF NSSA3623 OSPF Graceful Restart (Helper mode)*BGP 1997 Communities 2385 MD52439 Route Flap Damping 2796 Route Reflection 2842 Capabilities 2918 Route Refresh 3065 Confederations 4271 BGP-44360 Extended Communities 4893 4-byte ASN5396 4-byte ASN Representation 5492Capabilities AdvertisementLinux Distribution Debian Linux version 8.4Linux Kernel 3.16MIBSIP MIB– Net SNMPIP Forward MIB– Net SNMPHost Resources MIB– Net SNMP IF MIB – Net SNMP LLDP MIB Entity MIB LAG MIBDell-Vendor MIBTCP MIB – Net SNMP UDP MIB – Net SNMP SNMPv2 MIB – Net SNMP Network Management SNMPv1/2SSHv2FTP, TFTP, SCP SyslogPort Mirroring RADIUS 802.1XSupport Assist (Phone Home)Netconf APIs XML SchemaCLI Commit (Scratchpad)AutomationControl Plane Services APIs Linux Utilities and Scripting Tools Quality of Service Access Control Lists Prefix List Route-MapRate Shaping (Egress)Rate Policing (Ingress)Scheduling Algorithms Round RobinWeighted Round Robin Deficit Round Robin Strict PriorityWeighted Random Early Detect Security 2865 RADIUS 3162 Radius and IPv64250, 4251, 4252, 4253, 4254 SSHv2Data center bridging802.1QbbPriority-Based Flow Control802.1Qaz Enhanced Transmission Selection (ETS)*Data Center Bridging eXchange(DCBx) DCBx Application TLV (iSCSI, FCoE*)Regulatory compliance SafetyUL/CSA 60950-1, Second Edition EN 60950-1, Second EditionIEC 60950-1, Second Edition Including All National Deviations and Group DifferencesEN 60825-1 Safety of Laser Products Part 1: EquipmentClassification Requirements and User’s GuideEN 60825-2 Safety of Laser Products Part 2: Safety of Optical Fibre Communication Systems FDA Regulation 21 CFR 1040.10 and 1040.11Emissions & Immunity EMC complianceFCC Part 15 (CFR 47) (USA) Class A ICES-003 (Canada) Class AEN55032: 2015 (Europe) Class A CISPR32 (International) Class AAS/NZS CISPR32 (Australia and New Zealand) Class AVCCI (Japan) Class A KN32 (Korea) Class ACNS13438 (T aiwan) Class A CISPR22EN55022EN61000-3-2EN61000-3-3EN61000-6-1EN300 386EN 61000-4-2 ESDEN 61000-4-3 Radiated Immunity EN 61000-4-4 EFT EN 61000-4-5 SurgeEN 61000-4-6 Low Frequency Conducted Immunity NEBSGR-63-Core GR-1089-Core ATT -TP-76200VZ.TPR.9305RoHSRoHS 6 and China RoHS compliantCertificationsJapan: VCCI V3/2009 Class AUSA: FCC CFR 47 Part 15, Subpart B:2009, Class A Warranty1 Year Return to DepotLearn more at /Networking*Future release**Packet sizes over 147 BytesIT Lifecycle Services for NetworkingExperts, insights and easeOur highly trained experts, withinnovative tools and proven processes, help you transform your IT investments into strategic advantages.Plan & Design Let us analyze yourmultivendor environment and deliver a comprehensive report and action plan to build upon the existing network and improve performance.Deploy & IntegrateGet new wired or wireless network technology installed and configured with ProDeploy. Reduce costs, save time, and get up and running cateEnsure your staff builds the right skills for long-termsuccess. Get certified on Dell EMC Networking technology and learn how to increase performance and optimize infrastructure.Manage & SupportGain access to technical experts and quickly resolve multivendor networking challenges with ProSupport. Spend less time resolving network issues and more time innovating.OptimizeMaximize performance for dynamic IT environments with Dell EMC Optimize. Benefit from in-depth predictive analysis, remote monitoring and a dedicated systems analyst for your network.RetireWe can help you resell or retire excess hardware while meeting local regulatory guidelines and acting in an environmentally responsible way.Learn more at/Services。

德尔·韦玛网络S4048T-ON交换机说明书

The Dell EMC Networking S4048T-ON switch is the industry’s latest data center networking solution, empowering organizations to deploy modern workloads and applications designed for the open networking era. Businesses who have made the transition away from monolithic proprietary mainframe systems to industry standard server platforms can now enjoy even greater benefits from Dell EMC open networking platforms. By using industry-leading hardware and a choice of leading network operating systems to simplify data center fabric orchestration and automation, organizations can tailor their network to their unique requirements and accelerate innovation.These new offerings provide the needed flexibility to transform data centers. High-capacity network fabrics are cost-effective and easy to deploy, providing a clear path to the software-defined data center of the future with no vendor lock-in.The S4048T-ON supports the open source Open Network Install Environment (ONIE) for zero-touch installation of alternate network operating systems, including feature rich Dell Networking OS.High density 1/10G BASE-T switchThe Dell EMC Networking S-Series S4048T-ON is a high-density100M/1G/10G/40GbE top-of-rack (ToR) switch purpose-builtfor applications in high-performance data center and computing environments. Leveraging a non-blocking switching architecture, theS4048T-ON delivers line-rate L2 and L3 forwarding capacity within a conservative power budget. The compact S4048T-ON design provides industry-leading density of 48 dual-speed 1/10G BASE-T (RJ45) ports, as well as six 40GbE QSFP+ up-links to conserve valuable rack space and simplify the migration to 40Gbps in the data center core. Each40GbE QSFP+ up-link can also support four 10GbE (SFP+) ports with a breakout cable. In addition, the S4048T-ON incorporates multiple architectural features that optimize data center network flexibility, efficiency and availability, including I/O panel to PSU airflow or PSU to I/O panel airflow for hot/cold aisle environments, and redundant, hot-swappable power supplies and fans. S4048T-ON supports feature-rich Dell Networking OS, VLT, network virtualization features such as VRF-lite, VXLAN Gateway and support for Dell Embedded Open Automation Framework.• The S4048T-ON is the only switch in the industry that supports traditional network-centric virtualization (VRF) and hypervisorcentric virtualization (VXLAN). The switch fully supports L2 VX-• The S4048T-ON also supports Dell EMC Networking’s Embedded Open Automation Framework, which provides enhanced network automation and virtualization capabilities for virtual data centerenvironments.• The Open Automation Framework comprises a suite of interre-lated network management tools that can be used together orindependently to provide a network that is flexible, available andmanageable while helping to reduce operational expenses.Key applicationsDynamic data centers ready to make the transition to software-defined environments• High-density 10Gbase-T ToR server access in high-performance data center environments• Lossless iSCSI storage deployments that can benefit from innovative iSCSI & DCB optimizations that are unique only to Dell NetworkingswitchesWhen running the Dell Networking OS9, Active Fabric™ implementation for large deployments in conjunction with the Dell EMC Z-Series, creating a flat, two-tier, nonblocking 10/40GbE data center network design:• High-performance SDN/OpenFlow 1.3 enabled with ability to inter-operate with industry standard OpenFlow controllers• As a high speed VXLAN Layer 2 Gateway that connects thehypervisor based ovelray networks with nonvirtualized infrastructure Key features - general• 48 dual-speed 1/10GbE (SFP+) ports and six 40GbE (QSFP+)uplinks (totaling 72 10GbE ports with breakout cables) with OSsupport• 1.44Tbps (full-duplex) non-blocking switching fabric delivers line-rateperformance under full load with sub 600ns latency• I/O panel to PSU airflow or PSU to I/O panel airflow• Supports the open source ONIE for zero-touch• installation of alternate network operating systems• Redundant, hot-swappable power supplies and fansDELL EMC NETWORKING S4048T-ON SWITCHEnergy-efficient 10GBASE-T top-of-rack switch optimized for data center efficiencyKey features with Dell EMC Networking OS9Scalable L2 and L3 Ethernet switching with QoS and a full complement of standards-based IPv4 and IPv6 features, including OSPF, BGP and PBR (Policy Based Routing) support• Scalable L2 and L3 Ethernet switching with QoS and a full complement of standards-based IPv4 and IPv6 features, including OSPF, BGP andPBR (Policy Based Routing) support• VRF-lite enables sharing of networking infrastructure and provides L3traffic isolation across tenants• Increase VM Mobility region by stretching L2 VLAN within or across two DCs with unique VLT capabilities like Routed VL T, VLT Proxy Gateway • VXLAN gateway functionality support for bridging the nonvirtualizedand the virtualized overlay networks with line rate performance.• Embedded Open Automation Framework adding automatedconfiguration and provisioning capabilities to simplify the management of network environments. Supports Puppet agent for DevOps• Modular Dell Networking OS software delivers inherent stability as well as enhanced monitoring and serviceability functions.• Enhanced mirroring capabilities including 1:4 local mirroring,• Remote Port Mirroring (RPM), and Encapsulated Remote PortMirroring (ERPM). Rate shaping combined with flow based mirroringenables the user to analyze fine grained flows• Jumbo frame support for large data transfers• 128 link aggregation groups with up to 16 members per group, usingenhanced hashing• Converged network support for DCB, with priority flow control(802.1Qbb), ETS (802.1Qaz), DCBx and iSCSI TLV• S4048T-ON supports RoCE and Routable RoCE to enable convergence of compute and storage on Active FabricUser port stacking support for up to six units and unique mixed mode stacking that allows stacking of S4048-ON with S4048T-ON to providecombination of 10G SFP+ and RJ45 ports in a stack.Physical48 fixed 10GBase-T ports supporting 100M/1G/10G speeds6 fixed 40 Gigabit Ethernet QSFP+ ports1 RJ45 console/management port with RS232signaling1 USB 2.0 type A to support mass storage device1 Micro-USB 2.0 type B Serial Console Port1 8 GB SSD ModuleSize: 1RU, 1.71 x 17.09 x 18.11”(4.35 x 43.4 x 46 cm (H x W x D)Weight: 23 lbs (10.43kg)ISO 7779 A-weighted sound pressure level: 65 dB at 77°F (25°C)Power supply: 100–240V AC 50/60HzMax. thermal output: 1568 BTU/hMax. current draw per system:4.6 A at 460W/100VAC,2.3 A at 460W/200VACMax. power consumption: 460 WattsT ypical power consumption: 338 WattsMax. operating specifications:Operating temperature: 32°F to 113°F (0°C to45°C)Operating humidity: 5 to 90% (RH), non-condensing Max. non-operating specifications:Storage temperature: –40°F to 158°F (–40°C to70°C)Storage humidity: 5 to 95% (RH), non-condensingRedundancyHot swappable redundant powerHot swappable redundant fansPerformance GeneralSwitch fabric capacity:1.44Tbps (full-duplex)720Gbps (half-duplex)Forwarding Capacity: 1080 MppsLatency: 2.8 usPacket buffer memory: 16MBCPU memory: 4GBOS9 Performance:MAC addresses: 160KARP table 128KIPv4 routes: 128KIPv6 hosts: 64KIPv6 routes: 64KMulticast routes: 8KLink aggregation: 16 links per group, 128 groupsLayer 2 VLANs: 4KMSTP: 64 instancesVRF-Lite: 511 instancesLAG load balancing: Based on layer 2, IPv4 or IPv6headers Latency: Sub 3usQOS data queues: 8QOS control queues: 12Ingress ACL: 16KEgress ACL: 1KQoS: Default 3K entries scalable to 12KIEEE compliance with Dell Networking OS9802.1AB LLDP802.1D Bridging, STP802.1p L2 Prioritization802.1Q VLAN T agging, Double VLAN T agging,GVRP802.1Qbb PFC802.1Qaz ETS802.1s MSTP802.1w RSTP802.1X Network Access Control802.3ab Gigabit Ethernet (1000BASE-T)802.3ac Frame Extensions for VLAN T agging802.3ad Link Aggregation with LACP802.3ae 10 Gigabit Ethernet (10GBase-X) withQSA802.3ba 40 Gigabit Ethernet (40GBase-SR4,40GBase-CR4, 40GBase-LR4) on opticalports802.3u Fast Ethernet (100Base-TX)802.3x Flow Control802.3z Gigabit Ethernet (1000Base-X) with QSA 802.3az Energy Efficient EthernetANSI/TIA-1057 LLDP-MEDForce10 PVST+Max MTU 9216 bytesRFC and I-D compliance with Dell Networking OS9General Internet protocols768 UDP793 TCP854 T elnet959 FTPGeneral IPv4 protocols791 IPv4792 ICMP826 ARP1027 Proxy ARP1035 DNS (client)1042 Ethernet Transmission1305 NTPv31519 CIDR1542 BOOTP (relay)1812 Requirements for IPv4 Routers1918 Address Allocation for Private Internets 2474 Diffserv Field in IPv4 and Ipv6 Headers 2596 Assured Forwarding PHB Group3164 BSD Syslog3195 Reliable Delivery for Syslog3246 Expedited Assured Forwarding4364 VRF-lite (IPv4 VRF with OSPF, BGP,IS-IS and V4 multicast)5798 VRRPGeneral IPv6 protocols1981 Path MTU Discovery Features2460 Internet Protocol, Version 6 (IPv6)Specification2464 Transmission of IPv6 Packets overEthernet Networks2711 IPv6 Router Alert Option4007 IPv6 Scoped Address Architecture4213 Basic Transition Mechanisms for IPv6Hosts and Routers4291 IPv6 Addressing Architecture4443 ICMP for IPv64861 Neighbor Discovery for IPv64862 IPv6 Stateless Address Autoconfiguration 5095 Deprecation of T ype 0 Routing Headers in IPv6IPv6 Management support (telnet, FTP, TACACS, RADIUS, SSH, NTP)VRF-Lite (IPv6 VRF with OSPFv3, BGPv6, IS-IS) RIP1058 RIPv1 2453 RIPv2OSPF (v2/v3)1587 NSSA 4552 Authentication/2154 OSPF Digital Signatures Confidentiality for 2328 OSPFv2 OSPFv32370 Opaque LSA 5340 OSPF for IPv6IS-IS1142 Base IS-IS Protocol1195 IPv4 Routing5301 Dynamic hostname exchangemechanism for IS-IS5302 Domain-wide prefix distribution withtwo-level IS-IS5303 3-way handshake for IS-IS pt-to-ptadjacencies5304 IS-IS MD5 Authentication5306 Restart signaling for IS-IS5308 IS-IS for IPv65309 IS-IS point to point operation over LANdraft-isis-igp-p2p-over-lan-06draft-kaplan-isis-ext-eth-02BGP1997 Communities2385 MD52545 BGP-4 Multiprotocol Extensions for IPv6Inter-Domain Routing2439 Route Flap Damping2796 Route Reflection2842 Capabilities2858 Multiprotocol Extensions2918 Route Refresh3065 Confederations4360 Extended Communities4893 4-byte ASN5396 4-byte ASN representationsdraft-ietf-idr-bgp4-20 BGPv4draft-michaelson-4byte-as-representation-054-byte ASN Representation (partial)draft-ietf-idr-add-paths-04.txt ADD PATHMulticast1112 IGMPv12236 IGMPv23376 IGMPv3MSDP, PIM-SM, PIM-SSMSecurity2404 The Use of HMACSHA- 1-96 within ESPand AH2865 RADIUS3162 Radius and IPv63579 Radius support for EAP3580 802.1X with RADIUS3768 EAP3826 AES Cipher Algorithm in the SNMP UserBase Security Model4250, 4251, 4252, 4253, 4254 SSHv24301 Security Architecture for IPSec4302 IPSec Authentication Header4303 ESP Protocol4807 IPsecv Security Policy DB MIBdraft-ietf-pim-sm-v2-new-05 PIM-SMwData center bridging802.1Qbb Priority-Based Flow Control802.1Qaz Enhanced Transmission Selection (ETS)Data Center Bridging eXchange (DCBx)DCBx Application TLV (iSCSI, FCoE)Network management1155 SMIv11157 SNMPv11212 Concise MIB Definitions1215 SNMP Traps1493 Bridges MIB1850 OSPFv2 MIB1901 Community-Based SNMPv22011 IP MIB2096 IP Forwarding T able MIB2578 SMIv22579 T extual Conventions for SMIv22580 Conformance Statements for SMIv22618 RADIUS Authentication MIB2665 Ethernet-Like Interfaces MIB2674 Extended Bridge MIB2787 VRRP MIB2819 RMON MIB (groups 1, 2, 3, 9)2863 Interfaces MIB3273 RMON High Capacity MIB3410 SNMPv33411 SNMPv3 Management Framework3412 Message Processing and Dispatching forthe Simple Network ManagementProtocol (SNMP)3413 SNMP Applications3414 User-based Security Model (USM) forSNMPv33415 VACM for SNMP3416 SNMPv23417 Transport mappings for SNMP3418 SNMP MIB3434 RMON High Capacity Alarm MIB3584 Coexistance between SNMP v1, v2 andv34022 IP MIB4087 IP Tunnel MIB4113 UDP MIB4133 Entity MIB4292 MIB for IP4293 MIB for IPv6 T extual Conventions4502 RMONv2 (groups 1,2,3,9)5060 PIM MIBANSI/TIA-1057 LLDP-MED MIBDell_ITA.Rev_1_1 MIBdraft-grant-tacacs-02 TACACS+draft-ietf-idr-bgp4-mib-06 BGP MIBv1IEEE 802.1AB LLDP MIBIEEE 802.1AB LLDP DOT1 MIBIEEE 802.1AB LLDP DOT3 MIB sFlowv5 sFlowv5 MIB (version 1.3)DELL-NETWORKING-SMIDELL-NETWORKING-TCDELL-NETWORKING-CHASSIS-MIBDELL-NETWORKING-PRODUCTS-MIBDELL-NETWORKING-SYSTEM-COMPONENT-MIBDELL-NETWORKING-TRAP-EVENT-MIBDELL-NETWORKING-COPY-CONFIG-MIBDELL-NETWORKING-IF-EXTENSION-MIBDELL-NETWORKING-FIB-MIBIT Lifecycle Services for NetworkingExperts, insights and easeOur highly trained experts, withinnovative tools and proven processes, help you transform your IT investments into strategic advantages.Plan & Design Let us analyze yourmultivendor environment and deliver a comprehensive report and action plan to build upon the existing network and improve performance.Deploy & IntegrateGet new wired or wireless network technology installed and configured with ProDeploy. Reduce costs, save time, and get up and running cateEnsure your staff builds the right skills for long-termsuccess. Get certified on Dell EMC Networking technology and learn how to increase performance and optimize infrastructure.Manage & SupportGain access to technical experts and quickly resolve multivendor networking challenges with ProSupport. Spend less time resolving network issues and more time innovating.OptimizeMaximize performance for dynamic IT environments with Dell EMC Optimize. Benefit from in-depth predictive analysis, remote monitoring and a dedicated systems analyst for your network.RetireWe can help you resell or retire excess hardware while meeting local regulatory guidelines and acting in an environmentally responsible way.Learn more at/lifecycleservicesLearn more at /NetworkingDELL-NETWORKING-FPSTATS-MIBDELL-NETWORKING-LINK-AGGREGATION-MIB DELL-NETWORKING-MSTP-MIB DELL-NETWORKING-BGP4-V2-MIB DELL-NETWORKING-ISIS-MIBDELL-NETWORKING-FIPSNOOPING-MIBDELL-NETWORKING-VIRTUAL-LINK-TRUNK-MIB DELL-NETWORKING-DCB-MIBDELL-NETWORKING-OPENFLOW-MIB DELL-NETWORKING-BMP-MIBDELL-NETWORKING-BPSTATS-MIBRegulatory compliance SafetyCUS UL 60950-1, Second Edition CSA 60950-1-03, Second Edition EN 60950-1, Second EditionIEC 60950-1, Second Edition Including All National Deviations and Group Differences EN 60825-1, 1st EditionEN 60825-1 Safety of Laser Products Part 1:Equipment Classification Requirements and User’s GuideEN 60825-2 Safety of Laser Products Part 2: Safety of Optical Fibre Communication Systems FDA Regulation 21 CFR 1040.10 and 1040.11EmissionsInternational: CISPR 22, Class AAustralia/New Zealand: AS/NZS CISPR 22: 2009, Class ACanada: ICES-003:2016 Issue 6, Class AEurope: EN 55022: 2010+AC:2011 / CISPR 22: 2008, Class AJapan: VCCI V-3/2014.04, Class A & V4/2012.04USA: FCC CFR 47 Part 15, Subpart B:2009, Class A RoHSAll S-Series components are EU RoHS compliant.CertificationsJapan: VCCI V3/2009 Class AUSA: FCC CFR 47 Part 15, Subpart B:2009, Class A Available with US Trade Agreements Act (TAA) complianceUSGv6 Host and Router Certified on Dell Networking OS 9.5 and greater IPv6 Ready for both Host and RouterUCR DoD APL (core and distribution ALSAN switch ImmunityEN 300 386 V1.6.1 (2012-09) EMC for Network Equipment\EN 55022, Class AEN 55024: 2010 / CISPR 24: 2010EN 61000-3-2: Harmonic Current Emissions EN 61000-3-3: Voltage Fluctuations and Flicker EN 61000-4-2: ESDEN 61000-4-3: Radiated Immunity EN 61000-4-4: EFT EN 61000-4-5: SurgeEN 61000-4-6: Low Frequency Conducted Immunity。

Network architecture capabilities - Ericsson

Growing the network’s cognitive capabilities for all growing ecosystems #AutomationIntent-driven management using cognitive technologiesIntent can be deﬁned as a “formal speciﬁcation of all expectations including requirements, goals and constraints given to a technical system”. It states which goals to achieve rather than how to achieve them. Intent enables the creation of autonomous sub-systems rather than creating tightly coupled management workﬂows.Cognition is a psychology term referring to an “action or process of acquiring knowledge, by reasoning or by intuition or through the senses“ [Oxford]. Using cognitive technologies makes it possible to implement a technical system with cognitive capabilities using e.g. AI techniques including Machine Learning (ML) and Machine Reasoning (MR).Standardization in the areas of Autonomous Networks and Intent-driven management is ongoing in several Standardization Organizations (e.g. TMF, 3GPP, ETSI) and cover separate aspects of automation which include use of cognitive technologies such as AI, intent-driven management, digital twins, data-driven management, MLOps, and others.As a step towards a fully autonomous network and achieving an intent-based management of a network, its architecture must be prepared by raising the level of abstraction in management with e.g., strong separation of concerns.Each instance of an Intent Management Function, IMF then has a clear and non-overlapping scope of responsibility for a functional domain in the autonomous network architecture as shown in Figure 1.Figure 1. IMFs within an autonomous network architectureIMFs receive intents from customers and other functions, and exchange intent with each other, managing the life cycle of an intent, and coordinate, within its domain of responsibility, the needed actions with other management functions.The internal control loop of an IMF has a cognitive loop of ﬁve logical phases: Measurements, Assurance, Proposal, Evaluation and Actuation.Collaboration with, for example, service assurance and service orchestration are also required to ensure fulﬁlment.Related articles/Additional reading:Creating autonomous networks with intent-based closed loopsMulti domain orchestration business opportunitiesArtiﬁcial Intelligence and MLOpsMLOps is a set of processes and technology capabilities for building, deploying, and operationalizing Machine Learning (ML) systems, including how data is reﬁned and transformed to serve the ML system, aiming to unify ML system development and ML system operation with DevOps targeting the introduction of software in a repeatable/reproducible and fault tolerant workﬂow.Thus MLOps advocates automation and monitoring in all steps of ML system construction and deployment with a main goal to achieve shorter TTM with high conﬁdence level of addressing challenges in the automated processes of development, veriﬁcation, etc.Certain additional challenges of adopting MLOps to highly reliable live telecom networks exist such as the need to handle lifecycle management or automatic re-training of the many instances of ML models.Adoption of MLOps enables a more expedient handling of artifacts like models, pipelines, datasets, etc. in a uniform way across the different stages of the process.Targeting products and services, both internal and external, will require MLOps to be able to be deployed for several scenarios, e.g. provided as-a-Service (aaS) or licensed SW/product oncustomer site, deployed on cloud infrastructure or deployed on dedicated HW, but likely in several more.CSPs’ realities vary depending on selection of cloud infrastructure with a clear divider of whether the CSP selects to use a particular HCP or use private cloud which can be done for various reasons, like applications execution, licensed SW or data storage, etc.Spreading MLOps over several large HCPs (e.g., AWS, Azure, GCP), with the limited compatibility between their APIs for AI services requires a certain level of adoption for vendors’ products/services to adopt to each of these HCPs. Although there may be beneﬁts of using HCP tools/services, it will require certain efforts – efforts for transferring data, efforts for data reﬁnement, efforts for consumption, etc.A few abstraction layer initiatives exist that may help to provide an abstraction layer for different HCP services. None of these alternatives, however, provide a complete solution to the problem and AI/ML is not at the top of their priority list.Figure 2. Ericsson AI architecture blueprintRelated articles/Additional reading:Deﬁning AI native: A key enabler for advanced intelligent telecom networksAI-powered RAN: An energy efﬁciency breakthroughNetwork Reliability, Availability and Resilience (NRAR)Mobile broadband has become a society-critical service in recent years, with enterprises, governments and private citizens alike relying on its availability, reliability and resilience around the clock. Living up to continuously rising expectations while simultaneously evolvingnetworks to meet the requirements of emerging use cases beyond MBB will require the ability to deliver increasingly higher levels of network robustness.5GS (5G System) has been designed to provide the robustness required to support the growth of conventional MBB services, while also offering network support to new business segments and use cases with more advanced requirements in terms of NRAR. 5GS delivers new capabilities that enable enterprises with business-critical use cases in segments such as manufacturing, ports and automotive to take a major step forward in their digitalization journeys by replacing older means of communication with the 5GS. These new capabilities are also beneﬁcial for mission-critical networks like national security and public safety deployments being modernized.It is important to consider all parts of the network in the deﬁnition of robustness (as illustrated by the green part in Figure 3), as the weakest link in the E2E chain sets the limits for the network service characteristics. In addition, network-level design must include consideration of both sunny day scenarios and different disaster/failure cases in all parts of the network. The large orange section represents both new critical use cases and society-critical use cases with new and tougher requirements. The orange line between the application client and the server, highlights the signiﬁcance of the E2E perspective.Figure 3. Shifting focus from node/NF-level to network robustness for demanding E2E applicationsWhile both 4G and 5G can provide the high level of robustness required to deliver such services today, new and emerging use cases require the addition of new features and mechanisms in the network robustness toolbox. 5GS has been designed to meet even the most challenging network robustness requirements. Beyond that, the creation of robust networks also requires careful network planning and deployment.The 5GS robustness toolbox consists of both standardized and vendor-speciﬁc network features and mechanisms. Highly ﬂexible, it gives CSPs the power to activate the most appropriate mechanisms depending on the use cases and the deployment variants. The toolbox also enables CSPs to activate different mechanisms for different user equipment within a single network.Related articles/Additional reading:Robustness evolution: Building robust critical networks with the 5G System PDFTrafﬁc Classiﬁcation and QoSTrafﬁc classiﬁcation is about mapping of different applications and application ﬂows from a speciﬁc UE to different network resources (e.g. network slices, PDU sessions and Radio Bearers) in both uplink (UL) and downlink (DL) and is based on mechanisms such as: NI-QoS (Network Initiated-Quality of Service) is standardized in 3GPP and based onestablishment of radio bearers and QoS Flows (shortly bearers)L4S (Low Latency Low Loss Scalable Throughput) is an IETF-deﬁned solution for time critical communications to ensure that latency-critical high-rate apps using built on L4S information in the IP-headerURSP (UE Route Selection Policy) is standardized by 3GPP for a UE using multiple slices and/or PDU SessionsThese network resources may have different QoS levels associated to them (see Figure 4). NI-QoS and URSP are examples of trafﬁc classiﬁcation mechanisms with different control points that can be used for QoS support in mobile networks.Additional functionality is needed to support a network with deployed QoS support. One such example is SLA and SLA assurance support. Most applications use multiple application ﬂows with different requirements.Figure 4. Trafﬁc ClassiﬁcationExisting 3GPP standards and products designed based on these standards are not fully prepared to support QoS for data applications beyond VoLTE/IMS and particular care needs tobe taken in the RAN parts where there are limitations in the number of radio bearers.Another area of concern is how to handle Net Neutrality and Open Internet (NN/OI) which impact how a CSP can monetize QoS. One way of working with this could be to offer several subscriptions on a single device.Future direction will require a Trafﬁc Classiﬁcation Toolbox addressing a wide set of needs to be able to handle the ongoing alignment, settlement and potential standardization initiatives existing in the market.Service exposureSee also chapter on the Global Network Platform.As CSPs seek to expand outside telecom to explore the exposure of network capabilities, e.g. to address enterprises, the network resources exposed must be made easy to consume and shaped to ﬁt the needs and desired use cases of enterprises and their partners.To be successful, CSPs need to expand their service portfolio and turn their network into a programmable platform with the capability to onboard new applications while leveraging their existing connectivity offerings and combine them with cloud and edge offerings from different players.Exposure can be applied in different places, both in the network and in the device as illustrated in Figure 5 below which is based on the High Level Network Architecture further below in Figure 5.Figure 5. Exposure InterfacesZ interface layer represents higher level and domain speciﬁc abstractions, interfaces and services, within environments that Developer’s trust, encapsulating/wrapping the C layer as needed.C interface layer contains a collection of northbound exposed capabilities and services of the network, reachable via Service Exposure Frameworks and its APIs/Protocols/SDKs, coveringdomains such as BSS, OSS, Packet Core and Communication Services.Y interface layer is a collection of exposed abstractions of capabilities and services in Z and C from the device side.X interface layer is a collection of network services exposed via the Modem / UNI interface, typically AT commands. Many standardized, but a large set of proprietary from Modem vendors.Although the Z and C layers are expressed as thin lines (Figure 5), these can contain a set of functions that are common to all exposed services, e.g., discovery, access control, identity management, throttling, etc. This drives a consistence experience towards different consumers of the APIs (developers, integrators, enterprises etc.) enabling scale and eliminates the need for an to have to use a proxy through the Management, Orchestration and Monetization layer. Related articles/Additional reading:Programmable 5G for the Industrial Internet of ThingsMonetizing API exposure for enterprises with evolved BSSFOLLOW ERICSSONTwitterLinkedInYoutubeFacebook✉Contact us。

VMware vRealize Network Insight 5.3 安装指南说明书

安装 vRealize Network InsightVMware vRealize Network Insight 5.3您可以从 VMware 网站下载最新的技术文档：https:///cn/。

版权和商标信息安装 vRealize Network Insight目录关于《vRealize Network Insight 安装指南》41系统建议和要求52vRealize Network Insight 安装程序9安装工作流10部署 vRealize Network Insight 平台 OVA11使用 vSphere Web Client 进行部署11使用 vSphere Windows 本机客户端进行部署13激活许可证14生成共享密钥14设置 Network Insight 收集器 (OVA)15使用 vSphere Web Client 的部署15使用 vSphere Windows 本机客户端进行部署16对于 VMware SD-WAN 在 AWS 中设置 Network Insight 收集器 (AMI)17在现有设置中部署其他收集器193使用评估许可证访问 vRealize Network Insight20添加 vCenter Server20分析流量流22生成报告224计划纵向扩展部署23计划纵向扩展平台群集23计划纵向扩展收集器24增加设置的块大小255升级 vRealize Network Insight26联机升级27一键式脱机升级29CLI 升级316卸载 vRealize Network Insight34在 vCenter 中启用 Netflow 时移除收集器 IP34在 NSX 中启用 Netflow 时移除收集器 IP35关于《vRealize Network Insight 安装指南》《vRealize Network Insight 安装指南》面向负责安装 vRealize Network Insight 的管理员或专业人员。

风河携手Cavium共推数字家庭网络

应用交付网络技术领导厂商ＢｕｏｔｌｅＣａ系统公司日前宣
置可扩展能力以及跨网络和网络协议的无缝互操作性，其
强大的服务平台更带来突破性的性能。
布推出在单一集成设备中提供综合数据丢失防护的Ｂｕｌｅ
Ｃａ数据丢失防护（Ｌ）ｏｌＤＰ系列设备，在不增加复杂性的情况
的ＥＯＡ高效节能处理器。ＣｖｍＮｔｏｓ一家全球领ＣＮａｉｅｒ是ｕｗｋ
Ｂｕｏｔ据丢失防护设备针对网络流量而集成数ｌｅＣａ数据丢失防护，这些流量包括：电子邮件和网络内容、数据中心中的数据或单一综合平台上带有统一管理系统的文件
产品市场方面成功的业务与工程战略合作基础，共同为客户提供高度优化的解决方案，满足快速发展的家庭互联网
市场需求。利用风河可靠的ＶＷｏｋｘｒｓ和Ｗｉｉｒｉｕ运ｎＲｖｎｘｄｅＬ
每小时３Ｔ．Ｂ的备份性能和最高５Ｔ５６Ｂ可用容量。这款交钥
昆腾推出新款重复数据一除设备
昆腾公司日前发布新款重复数据删除与复制设备，利用ＶＬ接口向中端和企业级客户的光纤通道ＳＮ环境提供无ＴＡ
与伦比的性能、简洁性和价值。这款Ｄｉ０Ｘ６０设备提供最高７
一，风河将进一步发展与Ｃｖｍ长期以来在网络基础架构ａｉｕ
下实现法规遵从。通过在其安全网关解决方案加入数据丢失防护设备，ｌｅｏｔＢｕａ现在可同时防御入站恶意威胁和出站数Ｃ

Extreme Networks Summit X460-G2 数据手册说明书

The Summit® X460-G2 series is based on Extreme Networks® revolutionaryExtremeXOS®, a highly resilient OS that provides continuous uptime, manageability and operational efficiency. Each switch offers the same high-performance, non-blocking hardware technology, in the Extreme Networks tradition of simplifying network deployments through the use of common hardware and software throughout the network.The Summit X460-G2 switches are effective campus edge switches that support Energy Efficient Ethernet (EEE – IEEE 802.3az) with IEEE 802.3at PoE-plus and can also serve as aggregation switches for traditional enterprise networks. The Summit X460-G2 series is also an option for DSLAM or CMTS aggregation, or for active Ethernet access.The Summit X460-G2 can also be used as a top-of-rack switch for many data center environments with features such as high-density Gigabit Ethernet for concentrated data center environments; XNV™ (ExtremeXOS Network Virtualization) for centralized network-based Virtual Machine (VM) inventory, VM location history and VM provisioning; Direct Attach™ to offload VM switching from servers, thereby improving performance; high-capacity Layer 2/Layer 3 scalability for highly virtualized data centers; and intra-rack and cross-rack stacking with industry-leading flexibility.Comprehensive Security Management• User policy and host integrity enforcement, and identity management • Universal Port Dynamic Security Profiles to provide fine granular securitypolicies in the network• Threat detection and response instrumentation to react to network intrusion with CLEAR-Flow Security Rules Engine• Denial of Service (DoS) protection and IP security against man-in-the-middle and DoS attacks to harden the network infrastructureFlexible Port ConfigurationSummit X460-G2 offers flexible port configurations. For Summit X460-G2 24 port copper models with 10Gb uplinks with four dedicated Gigabit Ethernet fiber ports and four shared Gigabit Ethernet fiber ports, the switch can have up to 8 fiber GbE ports, while still providing 20 Gigabit Ethernet copper ports (PoE-plus or non-PoE). The Summit X460-G2 24 port copper models with 1Gb uplinks can provide up to 12 SFP ports with 20 Gigabit Ethernet ports or eight SFP ports with 24 copper GbE ports.All models come equipped with either 4 ports of SFP+ 10 GbE or 4 ports of SFP 1GbE resident on the faceplate of each model. Through an optional VIM slot, Summit X460-G2 switches can be equipped with an additional 2 ports of 10 GbE for a total of six 10 Gigabit Ethernet ports on the 10Gb uplink models.As another option, each unit can be equipped with 2 ports of QSFP+ 40 Gigabit Ethernet for uplinks or stacking.High-Performance StackingUp to eight Summit X460-G2 switches can be stacked using three different methods of stacking: SummitStack, SummitStack-V, and SummitStack-V160.SUMMITSTACK — STACKING USING COPPER CX4 CONNECTIONSThe Summit X460-G2 supports SummitStack by using the Summit X460-G2-VIM-2ss module, which offers high-speed 40 Gbps stacking performance and provides compatibility with the Summit X440, X460, X460-G2 and X480 stackable switches running the same version of ExtremeXOS.SUMMITSTACK-V — FLEXIBLE STACKING OVER 10GbEExtremeXOS supports the SummitStack-V capability using 2 of the native 10 GbE ports on the faceplate as stacking ports, enabling the use of standard cabling and optics technologies used for 10 GbE SFP+, SummitStack-V provides long-distance 40 Gbps stacking connectivity of up to 40 km while reducing the cable complexity of implementing a stacking solution. SummitStack-V is compatible with SummitX440, X460, X460-G2, X480, X670, X670V, X670-G2 and X770 switches running the same version of ExtremeXOS. SummitStack-V enabled 10 GbE ports must be physically direct-connected.Note: Stacking will NOT be supported on the 10GbE fiber VIM and the 10GbE copperVIM with initial X460-G2 shipments.Note: SummitStack-V is NOT supported on the 1GbE (SFP) front panel faceplateports of non-10Gb X460-G2 models.SUMMITSTACK-V160 — FLEXIBLE STACKING OVER 40GbEThe Summit X460-G2 also supports high-speed 160 Gbps stacking, which is idealfor demanding applications where a high volume of traffic traverses through the stacking links, yet bandwidth is not compromised through stacking.SummitStack-V160 can support passive copper cable (up to 3m), active multi-mode fiber cable (up to 100m), and QSFP+ optical transceivers for 40 GbE up to 10km. With SummitStack-V160, the Summit X460-G2 provides a flexible stacking solution inside the data center or central office to create a virtualized switching infrastructure across rows of racks. SummitStack-V160 is compatible with Summit X460-G2, X480, X670V, X670-G2 and X770 switches running the same version of ExtremeXOS.Intelligent Switching and MPLS/H-VPLS SupportSummit X460-G2 supports sophisticated and intelligent Layer 2 switching, as well as Layer 3 IPv4/IPv6 routing including policy-based switching/routing, Provider Bridges, bidirectional ingress and egress Access Control Lists, and bandwidth control by 8 Kbps granularity both for ingress and egress.T o provide scalable network architectures used mainly for Carrier Ethernet network deployment, Summit X460-G2 supports MPLS LSP-based Layer 3 forwarding and Hierarchical VPLS (H-VPLS) for transparent LAN services. WithH-VPLS, transparent Layer 3 networks can be extended throughout the Layer 3 network cloud by using a VPLS tunnel between the regional transparent LAN services typically built by Provider Bridges (IEEE 802.1ad) technologyIEEE 802.3at PoE-plusIEEE 802.3af Power over Ethernet has been widely used in the campus enterprise edge network for Ethernet-powered devices such as wireless access points, Voice over IP phones, and security cameras. Ethernet port extenders such as Extreme Networks ReachNXT™ 100-8t can also utilize PoE, making installation and management easier and reducing maintenance costs. The newer IEEE 802.3at PoE-plus standard expands upon Power over Ethernet by increasing the power limitup to 30 watts, and by standardizing power negotiation by using LLDP. SummitX460-G2 supports IEEE 802.3at PoE-plus and supports standards-compliant PoE devices today and into the future.1588 Precision Time Protocol (PTP)Summit X460-G2 offers Boundary Clock (BC), Transparent Clock (TC), and Ordinary Clock (OC) for synchronizing phase and frequency and allowing the network and the connected devices to be synchronized down to microseconds of accuracy over Ethernet connection.Audio Video Bridging (AVB)The X460-G2 series supports IEEE 802.1 Audio Video Bridging to enable reliable, real-time audio/video transmission over Ethernet. AVB technology delivers the quality of service required for today’s high-definition and time-sensitive multimedia streams.Ordering NotesThe X460-G2 base switches do not ship with fan trays or power supplies. The fan tray and power supplies must be ordered separately as well as any of the optional VIMS. There is only one optional VIM slot on each X460-G2 switch. The optional Timing Module has a separate dedicated slot on the back of the X460-G2 switch.CPU/MEMORY• 64-bit MIPS Processor, 1 GHz clock• 1GB ECC DDR3 DRAM• 4GB eMMC Flash• 4MB packet bufferLED INDICATORS• Per port status LED including power status• System Status LEDs: management, fan and powerENVIRONMENTAL SPECIFICATIONS• EN/ETSI 300 019-2-1 v2.1.2 - Class 1.2 Storage• EN/ETSI 300 019-2-2 v2.1.2 - Class 2.3 Transportation • EN/ETSI 300 019-2-3 v2.1.2 - Class 3.1e Operational• EN/ETSI 300 753 (1997-10) - Acoustic Noise• ASTM D3580 Random Vibration Unpackaged 1.5 G OPERATING CONDITIONS• T emp: 0° C to 50° C (32° F to 122° F)• Humidity: 10% to 95% relative humidity, non-condensing • Altitude: 0 to 3,000 meters (9,850 feet)• Shock (half sine): 30 m/s2 (3 G), 11 ms, 60 shocks• Random vibration: 3 to 500 Hz at 1.5 G rms PACKAGING AND STORAGE SPECIFICATIONS • T emp: -40° C to 70° C (-40° F to 158° F)• Humidity: 10% to 95% relative humidity, non-condensing• Packaged Shock (half sine): 180 m/s2 (18 G), 6 ms, 600shocks• Packaged Vibration: 5 to 62 Hz at velocity 5 mm/s, 62 to 500 Hz at 0.2 G• Packaged Random Vibration: 5 to 20 Hz at 1.0 ASD w/–3 dB/oct. from 20 to 200 Hz• Packaged Drop Height: 14 drops minimum on sides and corners at 42 inches (<15 kg box)REGULATORY AND SAFETYNorth American ITE• UL 60950-1 2nd Ed., Listed Device (U.S.)• CSA 22.2 #60950-1-03 2nd Ed. (Canada)• Complies with FCC 21CFR 1040.10 (U.S. Laser Safety)• CDRH Letter of Approval (US FDA Approval) European ITE• EN 60950-1:2007 2nd Ed.• EN 60825-1+A2:2001 (Lasers Safety)• TUV-R GS Mark by German Notified Body• 2006/95/EC Low Voltage DirectiveInternational ITE• CB Report & Certificate per IEC 60950-1 2nd Ed. +National Differences• AS/NZX 60950-1 (Australia /New Zealand)EMI/EMC STANDARDSNorth American EMC for ITE• FCC CFR 47 part 15 Class A (USA)• ICES-003 Class A (Canada)European EMC Standards• EN 55022:2006+A1:2007 Class A• EN 55024:A2-2003 Class A includes IEC 61000-4-2, 3, 4, 5, 6, 11• EN 61000-3-2,8-2006 (Harmonics)• EN 61000-3-3 2008 (Flicker)• ETSI EN 300 386 v1.4.1, 2008-04 (EMC T elecommunications)• 2004/108/EC EMC DirectiveInternational EMC Certifications• CISPR 22: 2006 Ed 5.2, Class A (International Emissions)• CISPR 24:A2:2003 Class A (International Immunity)• IEC 61000-4-2:2008/EN 61000-4-2:2009 ElectrostaticDischarge, 8kV Contact, 15 kV Air, Criteria A• IEC 61000-4-3:2008/EN 61000-4-3:2006+A1:2008 Radiated Immunity 10V/m, Criteria A• IEC 61000-4-4:2004 am1 ed.2./EN 61000-4-4:2004/A1:2010 Transient Burst, 1 kV, Criteria A• IEC 61000-4-5:2005 /EN 61000-4-5:2006 Surge, 2 kV L-L, 2 kV L-G, Level 3, Criteria A• IEC 61000-4-6:2008/EN 61000-4-6:2009 ConductedImmunity, 0.15-80 MHz, 10V/m unmod. RMS, Criteria A• IEC/EN 61000-4-11:2004 Power Dips & Interruptions, >30%,25 periods, Criteria CCOUNTRY SPECIFIC• VCCI Class A (Japan Emissions)• ACMA (C-Tick) (Australia Emissions)• CCC Mark• KCC Mark, EMC Approval (Korea)TELECOM STANDARDS• ETSI EN 300 386:2001 (EMC T elecommunications)• ETSI EN 300 019 (Environmental for T elecommunications)• NEBS Level 3 compliant to portions of GR-1089 Issue 4 &GR-63 Issue 3 as defined in SR3580 with exception to filter requirement• CE 2.0 CompliantIEEE 802.3 MEDIA ACCESS STANDARDS• IEEE 802.3ab 1000BASE-T• IEEE 802.3z 1000BASE-X• IEEE 802.3ae 10GBASE-X• IEEE 802.3at PoE Plus• IEEE 802.3az (EEE)* Bystander Sound Pressure is presented for comparison to other products measured using Bystander Sound Pressure. **Declared Sound Power is presented in accordance with ISO-7779:2010(E), ISO 9296:2010 per ETSI/EN 300 753:2012-01SUMMIT X460-G2 VIM-2T2-port 10 Gigabit Ethernet module, provides two 10GBase-T copper ports. SUMMIT X460-G2 VIM-2SSSummitStack module has two SummitStack stacking ports, and provides a 40 Gigabit stacking solution. This stacking module offers compatibility with other Extreme Networks stackable switches, which are Summit X440, Summit X460, and SummitX480.Ordered EmptyRequired: First Power Supply with Air Flow Direction ordered separatelyOptional:Redundant/Additive Power Supply with Air Flow Direction ordered separatelyOptional: Timing Module for SyncE and 1588 PTP ordered separatelyRequired: Fan Tray with Air Flow Direction ordered separatelyOptional: VIM Cardsordered separately* = data networking, not stacking/contact Phone +1-408-579-2800©2014 Extreme Networks, Inc. All rights reserved. Extreme Networks and the Extreme Networks logo are trademarks or registered trademarks of Extreme Networks, Inc. in the United States and/or other countries. All other names are the property of their respective owners. For additional information on Extreme Networks Trademarks。

Communicated by (Name of Editor)

Parallel Processing Letters,f c World Scientiﬁc Publishing CompanyAN EFFICIENT IMPLEMENTATION OFTHE BSP PROGRAMMING LIBRARY FOR VIAYANGSUK KEE and SOONHOI HA∗School of Electrical Engineering and Computer Science,Seoul National UniversitySeoul,151-742,KoreaReceived(received date)Revised(revised date)Communicated by(Name of Editor)ABSTRACTVirtual Interface Architecture(VIA)is a light-weight protocol for protected user-level zero-copy communication.In spite of the promised high performance of VIA,previousMPI implementations for GigaNet’s cLAN revealed low communication performance.Two main sources of such low performance are the discrepancy in the communicationmodel between MPI and VIA and the multi-threading overhead.In this paper,wepropose a new implementation of the Bulk Synchronous Parallel(BSP)programminglibrary for VIA called xBSP to overcome such problems.To the best of our knowledge,xBSP is theﬁrst implementation of the BSP library for VIA.xBSP demonstrates thatthe selection of a proper library is important to exploit the features of light-weightprotocols.Intensive use of Remote Direct Memory Access(RDMA)operations leads tohigh performance close to the native VIA performance with respect to round trip delayand bandwidth.Considering the eﬀects of multi-threading,memory registration,andcompletion policy on performance,we could obtain an eﬃcient BSP implementation forcLAN,which was conﬁrmed by experimental results.Keywords:Bulk Synchronous Parallel,Virtual Interface Architecture,parallel program-ming library,light-weight protocol,cluster1.IntroductionEven though the peak bandwidth of networks has increased rapidly over the years,the latency experienced by applications using these networks has decreased only modestly.The main reason of this disappointing performance is the high software overhead[1,2,3],which mainly results from context switch and data copy between the user and the kernel spaces.To overcome these problems,many light-weight protocols have been proposed to move the protocol stacks from the kernel to the user space[4,5,6,7,8,9,10].One of these protocols is Virtual Interface Architecture(VIA)[6]which was jointly proposed by Intel,Compaq,and Microsoft.The VIA speciﬁcations describe a net-work architecture for protected user-level zero-copy communication.For applica-∗Correspondence Address:School of Electrical Engineering and Computer Science,Seoul National University,Shinlim-Dong,Kwanak-Gu,Seoul,151-742,Korea.Tel.82-2-880-7292.Fax.82-2-879-1532.Email:{enigma,sha}@iris.snu.ac.kr.12Parallel Processing Letterstion developers,VIA provides an interface called the Virtual Interface Provider Layer(VIPL).Even though the VIPL can be directly used to develop applications,it is de-sirable to build various popular programming libraries such as PVM[11],MPI[12], and BSPlib[13]for portability of the programs.Two previous works,for example, are the MPI implementations for cLAN by MPI Software Technology(MPI/Pro)[14] and by Rice University[15].Parallel programming library based on other commu-nication protocols can be found in[16,17,18].The authors of[14]described many implementation issues such as threading,long message,asynchronous incoming mes-sage,etc.In particular,they paid attention to the pre-posting constraint of VIA in implementing asynchronous operations of MPI.The zero-copy strategy of VIA enforces that the receiver is ready before the sender initiates its operation,which deﬁnes the pre-posting constraint.The results of these studies,however,are some-what disappointing.Even though the half round trip time(RTT)of cLAN using VIPL is8.21µs in our system,that of MPI/Pro is delayed more thanﬁve times. Furthermore,MPI/Pro achieved only81.7percent of the peak bandwidth of VIPL. This means that the MPI library could not be eﬃciently integrated with VIA.There are two main causes for such low performance.The primary one is the dis-crepancy in the communication model between MPI and VIA.VIA does not assume any intermediate buﬀers due to the zero-copy policy,while various asynchronous op-erations of MPI require receiving queues.Therefore,the authors suggested the use of”unexpected queues”on the receiver side to handle asynchronous incoming mes-sages.Then,the implementation experiences more than one copying overhead on the receiver side and requiresﬂow control for the queue.Moreover,they did not use the Remote Direct Memory Access(RDMA)operation for small messages,because only large messages can amortize the overhead of exchanging the address of RDMA buﬀers.The second cause is the overhead due to multi-threading.Although dele-gating the message handling task to a separate thread from the computation thread seems a good way of structural implementation,it suﬀers signiﬁcant overhead due to thread switching.The overhead due to multi-threading in our system is over ten micro-seconds:this is indeed comparable to the round trip delay in the application level.This means that the multi-threading overhead negates the gain obtained by reducing the latency in the hardware level.These two problems motivate us to implement another VIA-based parallel li-brary.In this paper,we implement the BSPlib standard of the Bulk Synchronous Parallel(BSP)programming library.The BSP model[19]wasﬁrst proposed as a computing model to bridge the gap between software and hardware for parallel processing.Afterwards,it became a viable programming model with BSPlib.The performance of the BSPlib library was shown to be better than MPICH with re-spect to throughput and predictability[20],which means that BSPlib is not only theoretically but also practically useful.Moreover,the study on BSP clusters[21] has demonstrated that the BSPlib library can be accelerated by rewriting the Fast Ethernet device driver to be optimized for the BSPlib operations.One of the mainAn Eﬃcient Implementation of the BSP Programming Library for VIA3 lessons of the study was that optimization with global knowledge about the trans-port layer and the parallel library promises higher performance.This perspective is also applicable to implementing parallel libraries using light-weight protocols. Indeed,BSPlib has a strong operational resemblance with VIA in memory registra-tion,message passing communication,and direct remote memory access.Our new implementation of BSPlib for cLAN is called express BSP(xBSP).To the best of our knowledge,xBSP is theﬁrst implementation of BSPlib for VIA.xBSP demonstrates that selecting a proper library is important in exploiting the features of light-weight protocols.Furthermore,we achieved performance close to the native VIPL by signiﬁcant eﬀorts to reduce the overheads due to multi-threading,memory registration,andﬂow-control.xBSP also supports reliable communication by using the reliable delivery mode of VIA.In the following two sections,we address key features of VIA required to imple-ment the BSPlib library and discuss how well the library is matched with VIA.After that,we present experimental implementation alternatives to achieve the full per-formance of VIA.In sections4and5,several benchmarks demonstrate the eﬃciency of xBSP,and we conclude our discussion in section6.2.VIA FeaturesIn this section,we discuss VIA features that should be carefully considered for eﬃcient implementation of BSPlib.They concern memory registration,communi-cation mode,and descriptor processing.2.1.Memory RegistrationTable1.Costs of memory registration and copying(µs)message length(byte)11K2K4K8K16Kregistration333445copying122101835Communication buﬀers of the user space should be registered in order to elimi-nate data copying between the user space and the kernel space and to provide mem-ory protection.The memory registration cost,however,is not negligible.For ex-ample,the Windows NT system experienced over15µs latency for messages smaller than16Kbytes[15],while the overhead in our Linux system ranged from3to5µs as shown in table1.Considering communication delay and copying overhead,it is important to reduce the registration overhead,especially for small messages.munication ModeAfter communication buﬀers are registered,processes can transfer data between the registered buﬀers.VIA supports two communication modes.One is the tradi-tional message passing mode in which both the sender and the receiver participate in communication,satisfying the pre-posting constraint.The other is the one-sided4Parallel Processing LettersFig.1.Procedure of RDMA write operationcommunication called RDMA,which is an extension of the local DMA operations that allow a user process to transparently access buﬀers of a user process on another machine connected to the VIA network.The procedure of the RDMA write operation is illustrated in Fig. 1.First, both processes register their buﬀers to their VIA device drivers,and process B informs process A of the address of its buﬀer by explicit message passing to avoid the protection violation.After that,process A initiates its operation by posting descriptors and the device driver moves data from the user buﬀer to the network through DMA.When packets arrive at the target machine,the device driver of the target machine moves data in the reverse way of the sender.This RDMA operation has several advantages.First,the RDMA operation can avoid the descriptor processing overhead in the target process since it does not require any descriptor in the target process except when the initiator uses the immediate dataﬁeld of descriptor.Second,since only the VI-NIC of the target machine is involved in communication,the target process can continue without interruption.Finally,the initiator does not have to worry aboutﬂow control for the resources of the target machine.Therefore,we prefer the RDMA mode to the message passing mode.2.3.Descriptor Processing ModeWhen there are multiple VI-connections to a process,mechanisms like select() in the socket interface are needed.We can implement such mechanisms using the Completion Queue.Notiﬁcations of descriptor completion from multiple Receive Queues are directed to a single Completion Queue.The Completion Queue can be managed by a dedicated communication thread or the user thread itself.When a thread is dedicated to managing the Completion Queue,it prevents the interruption of user threads in a clustered SMP environment.An Eﬃcient Implementation of the BSP Programming Library for VIA5However,this introduces extra latency of thread switching.On the other hand,the user thread directly receives messages at the expense of CPU time to avoid this multi-threading overhead.Since we aim at low latency communication,the user thread itself takes a role in managing the Completion Queue.3.BSPlib ImplementationBased on the previous discussion,we explain in this section how well the BSPlib library is matched with VIA and how the library is realized.3.1.BSP-RegistrationIn a BSP program,a user can access data in a remote memory after one registers a memory block by bsp push reg(void*ident,int nbytes).The registrations within a superstep take eﬀect after the subsequent barrier synchronization identiﬁed by bsp sync().In the Oxford implementation[13],each node keeps track of the sequence of reg-istrations and maintains a mapping table between the unique block number and the associated local address:it does not require any explicit message exchange.When a process initiates a one-sided operation with this block number,the target process translates the number into its local address for the block.The main objective of this mechanism is to reduce unnecessary network traﬃc in the registration step. This low-cost dynamic registration is beneﬁcial to implementing user-level libraries and applications with recursion.Since the registration typically appears at the beginning of a program and rarely afterwards,it may be preferable to speed up ordinary communication operations at the expense of the registration.As discussed in section2.2,the initiator of RDMA operations should know the address of the remote buﬀer.In xBSP,each node registers its local buﬀer to the VI-NIC in the bsp push reg()and exchanges the address in the barrier synchronization step.At the end of the synchronization, each node builds a mapping table between the local address and the corresponding remote addresses.Since each node knows the actual address of the global memory block,it can transfer data to the remote buﬀer directly using the RDMA operation, unlike the Oxford implementation.3.2.One-Sided OperationA process can initiate a one-sided operation on the registered memory block. For example,bsp hpput(int pid,void*src,void*dst,int oﬀset,int nbytes)writes nbytes data in the src buﬀer to the dst+oﬀset address in the pid node;the written data is valid in the next superstep.The bsp hpput()function is exactly matched to the RDMA write operation.As the initiator has the address information of the dst buﬀer after the registration step,it can transfer data to the dst buﬀer directly. The target process does not have to considerﬂow-control,descriptor posting,nor incoming message handling.Furthermore,it is free from multi-threading overhead.6Parallel Processing LettersConsequently,the bsp hpput()function is able to pull delay and bandwidth perfor-mance close to those of VIPL.One problem related to the RDMA write operation is how the target process knows the arrival of a message.There are two possible solutions to this problem. One is to enforce the use of a descriptor notifying the end of a message(EOM). An RDMA write operation consumes a descriptor in the Receive Queue only when there is an immediate data in the source descriptor.Hence,we can use this feature to mark the end of an RDMA message.When a message consists of n packets,the sender transfers n-1packets andﬁnishes the nth packet transfer with the EOM tag while the receiver checks whether a descriptor is consumed and the returned value is EOM.This approach requires one descriptor per message,while the traditional message passing requires n descriptors.The other approach is to send an additional control message to mark the end of a message.Even though this approach has more overhead than theﬁrst,in the case of BSPlib,this approach is preferable.As the transferred messages in a superstep are available in the next superstep,there is no need to handle incoming messages immediately.Since cLAN supports reliable in-order delivery,the arrival of a packet means the successful arrival of the preceding packets.Therefore,a series of EOM control messages in a superstep can be replaced by the last EOM control message and the EOM message can be piggybacked with the barrier synchronization packet.After all,in the place of EOM control messages, barrier synchronization can be used implicitly to mark the end of transfers.3.3.Other IssuesThe accumulated start-up costs of communication are signiﬁcant if small mes-sages are outstanding to the network.This problem has already been discussed in other studies[20,22]and can be overcome with the combining scheme.xBSP also combines small messages into a temporary buﬀer since the copying overhead of small messages is smaller than the memory registration cost of VIA.This com-bining method contributes to increasing the communication bandwidth,sacriﬁcing little round trip time.Table2.Total exchange time with eight nodes for cLAN(µs)message length(byte)latin square naive ordering factor8K15722358 1.516K27194871 1.832K493010308 2.164K934021535 2.3 Besides,reordering messages is helpful to avoid serialization of message deliv-ery[22],and we use a latin square indexing order to schedule the destination of messages.A latin square is a p x p square in which all rows and columns are per-mutations of integers1to p.In comparison,naive ordering distributes messages by theﬁxed index order as implied in the code,like for(j=0;j<p;j++).As presented in table2,the reordering aﬀects the performance for large messages;theAn Eﬃcient Implementation of the BSP Programming Library for VIA7speed-up factor increases with the message size.This result indicates that poor destination scheduling can decrease performance of total exchange signiﬁcantly. 4.Micro-benchmark ExperimentsIn this section,we demonstrate that BSPlib could be eﬃciently implemented on VIA through the experimental results with two micro-benchmarks:half round trip time and bandwidth.Our Linux cluster consists of eight nodes connected by an8-port cLAN switch.Each node has dual Pentium III550MHz processors with 256-Mbyte SDRAM and runs Redhat Linux6.2SMP version.4.1.Preliminary ExperimentsWe tested a few implementation alternatives to achieve the full performance of VIA and observed the eﬀects of completion policies and threading on the round trip delay.Fig.2.Eﬀects of threading and completion policyBy polling,each process repeatedly checks whether the transaction is completed while by blocking it waits for the completion of the transaction.Meanwhile,in the threaded version,a communication thread is dedicated to receiving incoming messages while a user thread continues its computation.Fig.2shows that the single threaded version using polling achieves signiﬁcant reduction of delay.However,it is wasteful to dedicate all of the CPU resources to polling,especially in the case of long message transfers.A tradeoﬀcan be made by mixing both schemes:xBSP polls for a certain number of iterations anticipating the completion of short message transfers and is blocked eventually.Based on these experiments,we chose the single threaded version using the mixed policy.8Parallel Processing Letters4.2.Half Round Trip Time and BandwidthWith micro-benchmarks,we measured the half RTT and the bandwidth.To measure the half RTT,two processes send equal amounts of data back and forth repeatedly.We vary the message size from4bytes to64Kbytes and take the average value over1000execution results.Also,the bandwidth is computed after measuring the latency to transfer1-Mbyte data varying the message size.The baseline is the performance of xBSP using the traditional message passing with a single thread.We change the communication mode of VIA from the message passing to RDMA and compare xBSP to VIPL and MPI/Pro.These benchmarks use the following conﬁgurations:•VIPL-MP:VIPL using message passing(polling)•VIPL-RDMA:VIPL using RDMA(polling)•xBSP-MP:xBSP using message passing(mixed)•xBSP-RDMA:xBSP using RDMA(mixed)•MPI/Pro:MPI of MPI Software TechnologyFig.3.Half round trip time(with combining overhead)Fig.3and Fig.4show the experimental results of the round trip delay and bandwidth for various conﬁgurations.For comparison,the results of MPI/Pro[14] are also presented.The VIPL versions reveal minimum application level latency since they do not include any supplementary jobs for communication like registra-tion and use the polling mechanism for the completion policy.Comparing the two VIPL versions,we can estimate the overhead due to the pre-posting constraint which includes descriptor posting andﬂow control.Even thoughAn Eﬃcient Implementation of the BSP Programming Library for VIA9Fig.4.Bandwidth(without combining advantage)the performance gap is not signiﬁcant,the RDMA version consistently outperforms the MP version and the experiments with xBSP also show similar results.According to Fig.3,xBSP-RDMA is two times slower than VIPL-RDMA with 4-byte packets.The extra latency of xBSP-RDMA mainly results from the copy-ing overhead of the message combining and the blocking overhead of the mixed completion policy.In contrast,MPI/Pro is8.8times slower than VIPL-RDMA:in average,xBSP shows at least twice lower latency than MPI/Pro in the case of small messages.In terms of the peak bandwidth,xBSP-RDMA achieves about94%of the VIPL bandwidth while MPI/Pro achieved only82%.Consequently,these results demonstrate that xBSP exploits VIA features more eﬀectively than MPI/Pro.5.Benchmark ExperimentsEven though micro-benchmarks can be used for measuring the basic link proper-ties,high performance of micro-benchmarks does not ensure the same performance beneﬁt in real applications.To rigorously evaluate the performance,we measure the BSP cost parameters and then the execution times of several real applications.5.1.BSP Cost ModelThe BSP model simpliﬁes a parallel machine by three components,a set of pro-cessors,an interconnection network,and a barrier synchronizer,which are parame-terized as{p,g,l}.Parameter p represents the number of processors in the cluster, parameter g,the gap between continuous message sending operations,and parame-ter l,the barrier synchronization latency.A BSP program consists of a sequence of supersteps separated by barrier synchronizations.In every superstep,each process10Parallel Processing Lettersperforms local computation or exchanges messages which are available in the next superstep.Hence,the execution time for superstep i is modeled by w i+gh i+l, where w i is the longest duration of local computation in the ith superstep and h i is the largest amount of packets exchanged by a process during this superstep.Table3.BSP cost parameters,s(Mﬂop/s)=121xBSP-RDMA xBSP-MP BSPlib-UDP/IPP L(µs)g(µs/word)L(µs)g(µs/word)L(µs)g(µs/word) min max shift total min max shift total min max shift total 217230.0770.10319520.0860.1101363200.400.42 437500.0770.08642710.0920.1022714410.370.40 673890.0790.083801120.1090.1154066870.460.49 81091230.0790.0841081450.1050.1104337640.480.53 In Table3,the cost parameters of xBSP and the Oxford BSPlib implementa-tion using UDP/IP for Fast Ethernet are compared.These parameters serve as a measure of the entire system under some non-trivial workload.The s parameter represents the instruction execution rate of each processor taken from the average execution time of matrix multiplication and dot products.The minimum L value is taken as the average latency of a long sequence of bsp sync(),while the maximum value is taken as the average latency of a long sequence of the pair of bsp hpput()and bsp sync()with one word message.The g parameter is a measure of the global net-work bandwidth,not the point-to-point bandwidth:a smaller g value means higher global bandwidth.With the shift communication pattern,each process sends data to its neighbor,and with total exchange it broadcasts.xBSP-RDMA experiences much lower synchronization latency and higher band-width(short time interval)than the others.xBSP-RDMA achieves a constant global bandwidth of about381Mbps and xBSP-MP achieves about291Mbps while the BSPlib-UDP/IP’s performance decreases with the number of over four nodes:xBSP shows good scalability characteristics,and the RDMA operations are well matched with the BSPlib interfaces.5.2.ApplicationsIn this section,we compare the BSPlib libraries with the following two applica-tions.•ES:application to solve a grid problem with a300x300matrix[23]•LU:application to solve a linear equation using LU decomposition[24]The execution time of the grid solver is presented in Fig.5.The values above bars represent the ratio of the sum of communication and synchronization times compared with xBSP-RDMA.In the grid solver program,each process exchanges data with its neighbors:the communication pattern is similar to the shift communi-cation.Since ES spends most of its time(about5.9sec)in computation in the case of two nodes,the performance gap between xBSP-RDMA and xBSP-MP is not so great.In contrast,since the packet size transferred in a superstep is2400bytes,theAn Eﬃcient Implementation of the BSP Programming Library for VIA11Fig.5.Execution time of ES with a300by300matrixlatency reduction of xBSP-RDMA over BSPlib/UDP is about6.2.Fig.5coincides with the expected result where the sum of the global communication time and the synchronization time is reduced to about18%.As the number of nodes increases, the portion of computation decreases and the communication and synchronization costs become signiﬁcant.xBSP-RDMA always outperforms both xBSP-MP and BSPlib/UDP.Fig.6.LU decomposition on two by two nodesFig.6shows the execution time of LU decomposition measured on four pro-cessors varying the size of the input matrix.In the LU decomposition program, broadcast operations with small h-relations and barrier synchronizations are re-12Parallel Processing Letterspeated.Therefore,it provides a good measure of the communication latency of a system.xBSP-RDMA shows about1.4time better communication performance than xBSP-MP,which is expected by the cost model.6.ConclusionsIn this paper,we presented an eﬃcient implementation of BSPlib for VIA called xBSP.xBSP demonstrates that BSPlib is more appropriate than MPI to exploit the features of VIA.Furthermore,we achieved similar application performance to the native performance from VIPL by reducing the overheads associated with multi-threading,memory registration,andﬂow-control.Even though we paid attention only to implementing BSPlib,there are many possibilities to improving performance by relaxing the BSPlib semantics.In particu-lar,we should reduce barrier synchronization costs by adopting such mechanisms as relaxed barrier synchronization[25]and zero-cost synchronization[26].Currently,we are building a programming environment based on xBSP-RDMA for heterogeneous cluster systems adopting a dynamic load balancing scheme. AcknowledgementsThis work was supported by National Research Laboratory program(number M1-0104-00-0015).The RIACT at Seoul National University provides research fa-cilities for this study.References[1]R.Caceres,P.B.Danzig,S.Jamin,and D.J.Mitzel,Characteristics of Wide-AreaTCP/IP Conversations,ACM SIGCOMM Computer Communication Review21(4) (1991),101–112.[2]J.Kay and J.Pasquale,The Importance of Non-Data Touching Processing Overheadsin TCP/IP,ACM SIGCOMM Computer Communication Review23(4)(1993),259–268.[3]J.Kay and J.Pasquale,Proﬁling and Reducing Processing Overheads in TCP/IP,IEEE/ACM work.4(6)(1996),817–828.[4]R.A.F Bhoedjang,T.Rubl,and H.E.Bal,User-Level Network Interface Protocols,IEEE Computer31(11)(1998),53–60.[5]G.Chiola and G.Ciaccio,GAMMA:A Low Cost Network of Workstations Based onActive Message,In Proc.PDP’97,1997.[6] D.Dunning,G.Regnier,G.McAlpine,D.Cameron,B.Shubert,F.Berry,A.MarieMerritt,E.Gronke,and C.Dodd,The Virtual Interface Architecture,IEEE Micro 18(2)(1998),66–76.[7]S.Pakin,V.Karamcheti,and A.A.Chien,Fast Messages:Eﬃcient,Portable Commu-nication for Workstation Clusters and MPPs,IEEE Concurrency5(2)(1997),60–72.[8]L.Prylli and B.Tourancheau,BIP:A New Protocol Designed for High PerformanceNetworking on Myrinet,PC-NOW’98,Vol.1388of Lect.Notes in Comp.Science,April, 1998,472–485.[9]T.von Eicken,A.Basu,V.Buch,and W.Vogels,U-Net:A User-Level Network Interfacefor Parallel and Distributed Computing,Operating Systems Review29(5)(1995),40–An Eﬃcient Implementation of the BSP Programming Library for VIA1353.[10]T.von Eicken,D.E.Culler,S.C.Goldstein,and K.E.Schauser,Active Messages:A Mechanism for Integrated Communications and Computation,Proc.19th Symp.Computer Architecture,May1992,256–266.[11]V.S.Sunderam,PVM:A Framwork for Paralel Distributed Computing,Concurrency:Practice and Experience2(4)(1990),315–339.[12]Message Passing Interface Forum,MPI:A Message Passing Interface Standard,Tch.Report Version1.1,Univ.of Tennessee,Knoxville,Tenn.,1995.[13]J.M.D.Hill,B.McColl,D.C.Stefanescu,M.W.Goudreau,ng,S.B.Rao,T.Suel,T.Tsantilas,and R.Bisseling,BSPlib:The BSP programming Library,Parallel Computing24(14)(1998),1947–1980.[14]R.Dimitrov and A.Skjellum,An Eﬃcient MPI Implementation for Virtual Inter-face(VI)Architecture-Enabled Cluster Computing,MPI Software Technology,Inc. [15] E.Speight,H.Abdel-Shaﬁ,and J.K.Bennett,Realizing the Performance Potential ofthe Virtual Interface Architecture,ICS’99,June1999,184–192.[16]uria and A.Chien,MPI-FM:High Performance MPI on Workstation Clusters,Journal of Parallel and Distributed Computing40(1)(1997),4–18.[17]L.Prylli,B.Tourancheau,and R.Westrelin,The Design for a High Performance MPIImplementation on the Myrinet Network,EuroPVM/MPI’99,Vol.1697of Lect.Notes in Comp.Science,September,1999,223–230.[18]J.Worringen and T.Bemmerl,MPICH for SCI-connected Clusters,SCI Europe’99,September,1999,3–11.[19]Leslie G.Valiant,A Bridging Model for Parallel Computation,Comm.ACM33(8)(1990),103–111.[20]S.R.Donaldson,J.M.D.Hill,and D.B.Skillicorn,Predictable Communication onUnpredictable Networks:Implementing BSP over TCP/IP,Concurrency:Practice and Experience11(11)(1999),687–700.[21]S.R.Donaldson,J.M.D.Hill,and D.B.Skillicorn,BSP Clusters:High Performance,Reliable,and Very Low Cost,Parallel Computing26(2-3)(2000),199–242.[22]J.M.D.Hill and D.B.Skillicorn,Lessons Learned from Implementing BSP,Journalof Future Generation Computer Systems13(4-5)(1998),327–335.[23] D.E.Culler and J.Pal Singh,Parallel Computer Architecture,Morgan KaufmannPublishers,Inc.,1999,92–116.[24]R.Bisseling,BSPEDUpack,/implmnts/oxtool.htm.[25]J.S.Kim,S.Ha,and C.S.Jhon,Relaxed Barrier Synchronization for the BSP Model ofComputation on Message-Passing Architectures,Information Processing Letters66(5) (1998),247–253.[26]O.Bonorden,B.Juurlink,I.von Otte,and I.Rieping,The Paderborn UniversityBSP(PUB)Library-Design,Implementation and Performance,IPPS/SPDP’99,April 1999,99–104.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Network Processor Architecture for Next Generation Packet FormatAnkush Garg Department of Computer Science University of California.Davis Davis,California95616garg@Prantik Bhattacharyya Department of Computer Science University of California,Davis Davis,California95616 pbhattacharyya@ABSTRACTOver the past few years,network processors have become an important component in packet delivery over the internet with its packet processing rate deter-mining the bottleneck of usage of bandwidth of trans-mission medium.The architectures of network proces-sors have mainly focused on achieving a particular line rate for IPv4packets.As the usage of new format IPv6 increases,changes in architecture models become one of the primary concerns.In this paper,we propose to modify an existing network processor architecture to improve performance by reducing the overload of pre-processing a packet header.We also introduce the con-cept of cache for forwarding address lookup toﬁnd the packet destination address to increase efﬁciency of the lookup system.1IntroductionOne of the most vital component of any network system is the path taken by a packet as it travels from host to destination machines.The current internet infrastructure is based on best-effort delivery options i.e.the setup is such that it will make its best effort to send the packet to its destination.The system does not work on the model of providing a guaranteed service delivery option i.e.it never promises to the host or to the destination machine that it would be able to deliver the packet with a hundred percent certainty.Such a particular model helps in building the components as there is always less demand on service providers to promise to customers and hence deliver the packet.With this model in focus,we have network processor in the path which helps in data transmission over the infrastructure.We have till now used the term‘data’instead of packet so as to give the overview of internet data delivery structure without referring to packet switching or any other technologies.From now on we will use the term packet as packet switching is the pre-dominant technology and all the components being built satisfy the demand of this technology.The main function of a network processor[1][2]is to accept packets from its incoming port and using the processors built inside it,‘route’these packets appropriately,i.e.forward these packets appropriately through the output ports. Thus,the architecture of the router becomes vital to the packet transmission rate.As we have more and more bandwidth available in the physical path, the performance of the router becomes a bottleneck to the packet forwarding rate[3].A lot of research has been done to build better router architectures[4] so that these routers are able to do justice to the bandwidth available otherwise.One of the primary focus of research has been analyzing the different 1architectures which can exhaustively use the packet formats to take the forwarding decisions.As the packet format changes from IPv4to IPv6, we observe that major change(s)in the header format is taking place.The change in these versions is not merely a four times up-gradation from32bits of IPv4addressing mode to128bits of IPv6addressing mode but a major change in the headerﬁelds.In other words,we are talking of the packet formats and specially the header formats of each packet format that has been designed for the new version.As these transitions take place,one of the key challenges has been upgrading or installing new network processors in the data path so that new formats can be identiﬁed and any processing activity can be done on them accordingly.Current technology solutions has mainly focused on integrating IPv6packets with IPv4packets as the restricted deployment of the newer version has not provided companies enough motivation to deploy only IPv6focused network processors.As the upgraded versionﬁnds widespread popularity,one of the key research issue has thus become designing newer architecture models for network processors so that the various new characteristics of the IPv6can be properly used for higher performance rates.In this paper,we present our research on the pos-sible architectures for routers suitable for next gener-ation packet formats.The paper is organized as fol-lows.In Section2,we make a study of the existing network processors and their architectures.We also talk about how the current processing of IPv4is done in these network processors and how IPv6deployment on the current architectures may perform,thus stating our motivation of these research.In Section3,we give details about our model.In Section4we try to math-ematically analyse the performance enhancement that our model can provide and Section5deals with the performance evaluation of our proposed model.These are followed by conclusion and possible future work in Section6.2BackgroundIn this section we study why network processors are required and the architectures of existing network processors.We observe that due to the advent of optical ﬁbres the data transfer rate across transmission lines has increased to the order of Gbps(Gigabit/second). According to Intel[5],a packet arrives every35ns for a10Gbps connection and every8ns for a40Gbps connection.Assuming a minimum packet size of40 bytes,from Sherwood et.al.[6]we observe that a line card operating at10Gbps can process upto32million packets every second.General Purpose Processors are incapable of handling data at such a high rates which has lead to the use of Network Processors(NP).An excellent starting place to understand what a network processor is given by Shah[1].Extending the knowledge we have gained,we can deﬁne a network processor as a processing unit targeted for networking application.In other words,a device with architectural features and/or special circuitry for packet work processors are quite similar to general purpose central processing unit that are used in different types of equipment and products.Below we discuss the architecture of a networkprocessor.Figure1.A General Network ProcessorA network processor has mainly a central process-ing unit where the data is processed.It also has two types of memory units,termed as the SRAM(Static Random Access Memory)and DRAM(Dynamic Random Access Memory).DRAM is generally used to store large forwarding table(s)while SRAM isFigure2.Multi Core Network Processorused to store the small incoming packets.A table look up is done to determine the next forwarding address and other instructions are executed and then the packets come out and are put into the output line. Figure1shows the environment in which a general Network Processor works.The core is equipped with special instructions suited to satisfy the demand of networking applications i.e.the processor is equipped with instructions to speed up packet processing. These instructions are generally optimized to yield better performance with regard to the specialized demands of packet transmissions over any network.Network processors have evolved over the time and most of the processors currently available have multiple cores with each core capable of running multiple threads.Figure2shows a high level diagram for this kind of a processor.We have based our work on Intel IXP2800processor which has multi core architecture capable of running threads and handling more than one packet at a time.A more detailed discussion of this processor is given later. The following paragraphs of the current section talks about the motivation for our project.Thus,we have NPs with special instructions to handle packet(s).Also,it is an interesting fact to note that in packet processing,its mainly the header of a packet which is processed and the data part of the packet is ignored for most of the computation. Thus,it becomes essential that we have NPs with architectures and instruction set suited specially for header formats.Lets take a look at the packet formats of IPv4and IPv6(shown in Figures3and4)to understand what adaptations in NPs are required to make them better suited to handle the new version of packet format.We also take a look on why such a revision in the packet format isrequired.Figure3.IPv4PacketFigure4.IPv6PacketThe IP address in IPv4is32bit long implying that the maximum number of distinct IP address that can be assigned is232.The phenomenal growth of computers and their connectivity to the internet has lead to an IP address exhaustion problem[7]. This inspired the research of a new addressing mode which can provide enough IP addresses to everyone asking for it and yet remain un-exhausted over the years.IPv6has come up as the solution and a large scale deployment of IPv6has already started.An IP address in IPv6is128bit long and thus can be used to represent2128addresses,a number so hugethat a possible exhaustion of available IP addresses is virtually impossible.As the packet header format changes,naturally it becomes important to look into other issues that may be effected due to this change. Correspondingly,Network Processors also need to adapt to these changes in near future.In the new header format of IPv6,the Network Processor needs to read the128bits of the IP ad-dress(es)present in the header and then it determines the next hop address for this packet from its forward-ing table.Also,observed are the facts that in IPv6,the header size isﬁxed(the size of the header is40bytes) which means a preprocessing is not required to sepa-rate the header and the data from the packet and also the fact that a number of instructions are not required to be carried out compared to the previous packet for-mat.For example,as theﬁeld of checksum has been entirely removed from IPv6,the processor need not carry out the complex computation on the data bits and match them to the CRCﬁeld.The packet size for IPv6is alsoﬁxed implying that an incoming packet need not be broken down into more packets as was the case sometimes with IPv4.Thus,it may think of sep-arating the header and the data from the packet from the input line itself so that faster processing can be done.This provided us with the possibility of doing research on this topic.In the next sections,we de-scribe our proposed model and present our simulation results.3Proposed ModelWe decided to work on the Intel IXP2800 Network Processor as it has been widely used in industry.The Intel Product Brief[8]says that IXP 2800is based on the same model as the earlier IXP 1200version.Basically,the chip has one Xscale Core that runs at700Mhz and16Micro-engines each working at a frequency of1.4Ghz.The microengines are capable of handling multiple threads.The SRAM unit in IXP2800supports a800Mbytes/sec read and 800Mbytes/sec write.The SRAM interface hasaFigure5.Existing Model of Processing in IXP2800peak bandwidth of1.6GBytes/sec per channel using 200MHz SRAMs.In the existing model,as shown in Figure5,the incoming packetﬁrst comes to the Xscale Core.The core checks the version of the packet(IPv4or IPv6) and after doing the CRC Check and other operations forwards the packet information to a microengine that is free and puts the packet in SRAM.If the input buffer is full then the packet is dropped.The informa-tion provided to the microengine is the location in the SRAM where the packet is stored.The microengine operates on the packet andﬁnds that on which output line the packet needs to be sent after comparing the source and destination addresses of the packet with the data of the forwarding table stored in the DRAM. Also notable is the fact that the microengine has to get the source and destination address of the packet from the SRAM.This introduces a latency as many cycles are wasted in the process due to the slow speed of SRAM.The packet is then sent on the selected out-put line.3.1Reducing LatencyThe IPv6packet has aﬁxed header size and it also need not do a CRC Check on the IPv6packet.This suggests that the Xscale Core can be replaced by a fast and less complex processing unit.As no extra checks are needed to be done on the packet,the onlyFigure6.Proposed Model of Processing in IXP2800work the core is left with is to direct the packets to the microengine that is free.As shown in Figure6,we propose to replace the core with another microengine operating at1.4Ghz.Now,the header of the packet is forwarded to a microengine and the rest of the data is stored in the SRAM.Along with the header,the mi-croengine also gets the location of the packet from the SRAM.As the microengine does not need to fetch the source and destination address from the SRAM it can start processing the header immediately reducing the idle time.Afterﬁnding the output line for a packet,the microengine fetches the rest of the packet data from the SRAM in the last step,reassembles it and sends it on the output line.Clearly,the microengine can start working on the packet without wasting any time thus reducing the latency.3.2Introducing CacheAnother important point,as noted by Hu et.al.[9], is that microengines have no caches in them.But an important observation made by Harai et.al.[10]is that40%of the packets follow the same path on the network.This is a very signiﬁcant observation as the microengine has to read the forwarding table from the DRAM again and again.The DRAM is even slower than the SRAM implying that for each packet many cycles are wasted for the getting the data of the for-warding table.Thus,the lookup time can be reduced drastically by introducing a cache.As40%of the times the microengine needs the same data from the DRAM the hit rate can be taken to be0.4.This means that the cache can be incorporated to reduce the mem-ory access time.The architectural modiﬁcations can be summarized as follows:1.Modiﬁcation1(M1)-Replace the Xscale Corewith a Microengine to exploit the change in the header from IPv4to IPv6.And forward the header to the microengine reducing the SRAM access time.2.Modiﬁcation2(M2)-Provide the microengineswith a cache to exploit the inherent property of the internet that many of the packets follow a similar path.4Theoretical AnalysisIn this section,we try to compute a theoretical im-provement that might be achieved using the changes that we have proposed.Park et.al.in[11]say that an average program written for IXP2800executes around 654instructions.Out of those105instructions are ex-ecuted on the core and the rest on the microengine. Tan et.al.in[12]note that both the core and the mi-croengine are have6pipeline stages.Again Tan et. al in[13]say that around20%of the instructions im-plemented on microengine are memory accesses with a latency of54microengine clock cycles.Thus,we can compute the time taken by the processor for one packet as follows:time old=(105∗6/700∗106)+((80/100∗549∗6)/1.4∗109)+((20/100∗549∗54)/1.4∗109)=7.017∗10−6sIn the above equation theﬁrst term is for the core, the second for microengine instructions which are not memory accesses and third for the memory accesses for the microengine.The average packet size over the internet is550bytes and as there are16microenginesin IXP2800,we can compute the throughput to be:throughput old=1/time old∗550∗8∗16bits/s=1.0032∗1010bits/s=10.032GbpsWe can see the original IXP2800was able to achieve a line speed of10Gbps+.Now,we will compute the throughput for the M1(modiﬁcation1).The only change will occur in theﬁrst term of the time equation. Thus,the time for the processor with a microengine in place of the core can be computed as follows: time M1=(105∗6/1.4∗109)+((80/100∗549∗6)/1.4∗109)+((20/100∗549∗54)/1.4∗109)=6.567∗10−6sThis gives a throughput ofthroughput M1=1/time M1∗550∗8∗16bits/s=1.0719∗1010bits/s=10.719GbpsThus,from the introduction of M1a performance en-hancement of throughput M1/throughput old=1.07 can be achieved.Now,if we also introduce M2then the memory access term for the time will be modiﬁed. As,we noted earlier that around40%of the packets follow a similar path on the network.Thus,for40% of the times there will be a cache hit thereby reducing the latency from54to6for those instructions.The time equation can be modiﬁed to:time M1+M2=(105∗6/1.4∗109)+((80/100∗549∗6)/1.4∗109)+((20/100∗549∗54∗0.6)/1.4∗109)+((20/100∗549∗6∗0.4)/1.4∗109)=5.0616∗10−6s The throughput for this machine can be found out to be:throughput M1+M2=1/time M1+M2∗550∗8∗16bits/s=1.3908∗1010bits/s=13.908GbpsThus,the total enhancement that can be achieved by introducing both M1and M2is throughput M1+M2/throughput old=1.39.This means that our model can in theory give an ehance-ment of1.39over the original IXP2800.The following section will illustrate the simulation results that were obtained for the model proposed.5Simulation ResultsWe wrote a discrete event simulator to simulate the architecture.The IXP2800product brief[8]says that the size of of the input buffer is8192bytes and assuming an average packet size of550bytes we get the buffer size to be15.The Core checks the incoming packet and then directs that packet to the microengine which is free.Thus,the core has been modelled as an M/M/1queue with a buffer size of 15.The microengines have also been modelled asM/M/1queues.As soon as the packet is processed, the microengine becomes free and becomes ready for another packet to be sent to it.λOriginal M1M1+M20.660923360.82803812126352129690416312474381.2942287669316717841.4165943113461053214911.6229771919803107763491.82841874253579913020082332007130215771801040λOriginal M1M1+M20.64999697499988549999990.849859874993939499982614851553491844649962831.24528865466535149641111.44170293432695548392631.63851144400985346118281.83579071373210943490052333997234892204099489To simulate M1(modiﬁcation1)the speed of the core was increased to make it comparable to that of a microengine.Thus,it was able to process the incom-ing packets faster.And to model M2(modiﬁcation 2)on top of M1,the speed of the microengines was increased to account for lesser latency for the memory accesses.The incoming rate(λ)was changed in steps of0.2(starting from0.6)of the10Gbps packet rate (2272727.3packets/s).Theﬁrst table shows the num-ber of packets dropped for the original processor and the processors with the modiﬁcations proposed.The second table shows the number of packet processed by the processors.Thus,theﬁrst row of upper table tells that for an incoming packet rate of6Gbps the number of packets dropped by the three processors are609,233and6respectively.We can see from the data that the processor with both M1and M2in it performs better than the other two processors.Forλ=1.4the number of packets dropped for M1+M2are one order of magnitude lesser than for the other two processors.This shows that the results match with the theoretical analysis.Figure7shows the number of packetsdropped for the three processors.And Figure8showsthe num-ber of packets processed by the three processors.The blue line is for the theoretical limit of1.39that was proved in the previous section.It can be seen from the graphs that the number of packets dropped for the processors with both M1and M2are least and the ex-plosion in the number of packets dropped occurs close to the blue line showing that the simulation results areFigure7.Packets Dropped by different ProcessorsFigure8.Packets Processed by different Processorsnot very far off from the analytical results.6Conclusion And Future WorkWith the proposed microengine processor introduced in place of the core processor for pre-processing of packet headers,we observe that an improvement over the packet procesing rate can be achieved.Though the improvement seems small in terms of numerical data,the signiﬁcant achievement will be in the cost of the processor as a processor with less complexity has been used.We further introduce the concept of caches as many of the packets from a particular source to a destination follow the same route to increase the total throughput of the network processor.A signiﬁcant improvement in the process-ing rates has been observed due to this concept.One of the interesting future research work can be inspecting various policies for the cache replacement. This would require intensive analysis of how packetsﬂow between different network processors and setting up an appropriate policy.Also,an important part of our research focus has been using simple processing unit in place of the complex and costly Xscale core. We believe that a smaller and less complex proces-sor can be used in place of our proposed micro-engine model to achieve a better performance ratio as the sim-ple version of IPv6requires a small number of instruc-tions for it pre-processing.We leave this topic as part of future work.REFERENCES[1]Niraj Shah.Understanding network processors.UCB.[2]Patrick Crowley,Marc E.Fiuczynski,Jean-LoupBaer,and Brian N.Bershad.Characterizing pro-cessor architectures for programmable network interfaces.Proceedings of the2000International Conference on Supercomputing,2000.[3]Xiaoning Nie,Lajos Gazsi,Frank Engel,andGerhard Fettweis.A new network processor ar-chitecture for high-speed communications.In Proceedings of the IEEE Workshop on Signal Processing Systems(SIPS’99),1999.[4]Mel Tsai,Chidamber Kulkarni,Christian Sauer,Niraj Shah,and Kurt Keutzer.A benchmark-ing methodology for network processors.1st Network Processor Workshop,8th Int.Symp.on High Performance Computer Architectures (HPCA),2002.[5]Intel.Next generation network processor tech-nologies.Technical report,Network Processor Division,Intel Corporation,2001.[6]Timothy Sherwood,George Varghese,and BradCalder.A pipelined memory architecture for high throughput network processors.In Proceed-ings of30th Annual International Symposium on Computer Architecture(ISCA’03),2003. [7]Expert Research Team on Number Re-sources Utilization.Analysis and recom-mendations on the exhaustion of ipv4address space.Technical report,Japan NetworkInformation Center(JPNIC),2006.[8]Intel.Intel ixp2800network processor.Techni-cal report,2004.[9]Xianghui Hu,Xinan Tang,and Bei Hua.High-performance ipv6forwarding algorithm for multi-core and multithreaded network processor.In PPoPP’06:Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming,pages168–177,New York,NY,USA,2006.ACM.[10]Hiroaki Harai and Masayuki Murata.High-speed buffer management for40gb/s-based pho-tonic packet switches.IEEE/ACM w., 14(1):191–204,2006.[11]Jaehyung Park,Myoung Hee Jung,SujeongChang,Su il Choi,Min Young Chung,and Byung Jun Ahn.Performance evaluation of the ﬂow-based router using intel ixp2800network processors.Workshop on Information Systems Information Technologies,2006.[12]Zhangxi Tan,Chuang Lin,Hao Yin,and Bo Li.Optimization and benchmark of cryptographic algorithms on network processors.IEEE Micro, 24(5):55–69,2004.[13]Yao Yue,Chuang Lin,and Zhangxi Tan.Npcryptbench:a cryptographic benchmark suite for network processors.SIGARCH Comput.Ar-chit.News,34(1):49–56,2006.[14]Intel.Intel ixp2855network processor.Techni-cal report,Intel.[15]Cheng Sheng,Zhang Xu,Cao Yingxin,andDing Wei.Implementation of10gigabit packet switching using ixp network processors.Interna-tional Conference on Communication Technol-ogy(ICCT’2003),2003.。