(α, k)-anonymous data publishing

合集下载

大数据外文翻译参考文献综述

大数据外文翻译参考文献综述(文档含中英文对照即英文原文和中文翻译)原文：Data Mining and Data PublishingData mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the partyrunning the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.Although data mining is potentially useful, many data holders are reluctant to provide their data for data mining for the fear of violating individual privacy. In recent years, study has been made to ensure that the sensitive information of individuals cannot be identified easily.Anonymity Models, k-anonymization techniques have been the focus of intense research in the last few years. In order to ensure anonymization of data while at the same time minimizing the informationloss resulting from data modifications, everal extending models are proposed, which are discussed as follows.1.k-Anonymityk-anonymity is one of the most classic models, which technique that prevents joining attacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. In the k-anonymous tables, a data set is k-anonymous (k ≥ 1) if each record in the data set is in- distinguishable from at least (k . 1) other records within the same data set. The larger the value of k, the better the privacy is protected. k-anonymity can ensure that individuals cannot be uniquely identified by linking attacks.2. Extending ModelsSince k-anonymity does not provide sufficient protection against attribute disclosure. The notion of l-diversity attempts to solve this problem by requiring that each equivalence class has at least l well-represented value for each sensitive attribute. The technology of l-diversity has some advantages than k-anonymity. Because k-anonymity dataset permits strong attacks due to lack of diversity in the sensitive attributes. In this model, an equivalence class is said to have l-diversity if there are at least l well-represented value for the sensitive attribute. Because there are semantic relationships among the attribute values, and different values have very different levels of sensitivity. Afteranonymization, in any equivalence class, the frequency (in fraction) of a sensitive value is no more than α.3. Related Research AreasSeveral polls show that the public has an in- creased sense of privacy loss. Since data mining is often a key component of information systems, homeland security systems, and monitoring and surveillance systems, it gives a wrong impression that data mining is a technique for privacy intrusion. This lack of trust has become an obstacle to the benefit of the technology. For example, the potentially beneficial data mining re- search project, Terrorism Information Awareness (TIA), was terminated by the US Congress due to its controversial procedures of collecting, sharing, and analyzing the trails left by individuals. Motivated by the privacy concerns on data mining tools, a research area called privacy-reserving data mining (PPDM) emerged in 2000. The initial idea of PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. The solutions were often tightly coupled with the data mining algorithms under consideration. In contrast, privacy-preserving data publishing (PPDP) may not necessarily tie to a specific data mining task, and the data mining task is sometimes unknown at the time of data publishing. Furthermore, some PPDP solutions emphasize preserving the datatruthfulness at the record level, but PPDM solutions often do not preserve such property. PPDP Differs from PPDM in Several Major Ways as Follows ：1) PPDP focuses on techniques for publishing data, not techniques for data mining. In fact, it is expected that standard data mining techniques are applied on the published data. In contrast, the data holder in PPDM needs to randomize the data in such a way that data mining results can be recovered from the randomized data. To do so, the data holder must understand the data mining tasks and algorithms involved. This level of involvement is not expected of the data holder in PPDP who usually is not an expert in data mining.2) Both randomization and encryption do not preserve the truthfulness of values at the record level; therefore, the released data are basically meaningless to the recipients. In such a case, the data holder in PPDM may consider releasing the data mining results rather than the scrambled data.3) PPDP primarily “anonymizes” the data by hiding the identity of record owners, whereas PPDM seeks to directly hide the sensitive data. Excellent surveys and books in randomization and cryptographic techniques for PPDM can be found in the existing literature. A family of research work called privacy-preserving distributed data mining (PPDDM) aims at performing some data mining task on a set of private databasesowned by different parties. It follows the principle of Secure Multiparty Computation (SMC), and prohibits any data sharing other than the final data mining result. Clifton et al. present a suite of SMC operations, like secure sum, secure set union, secure size of set intersection, and scalar product, that are useful for many data mining tasks. In contrast, PPDP does not perform the actual data mining task, but concerns with how to publish the data so that the anonymous data are useful for data mining. We can say that PPDP protects privacy at the data level while PPDDM protects privacy at the process level. They address different privacy models and data mining scenarios. In the field of statistical disclosure control (SDC), the research works focus on privacy-preserving publishing methods for statistical tables. SDC focuses on three types of disclosures, namely identity disclosure, attribute disclosure, and inferential disclosure. Identity disclosure occurs if an adversary can identify a respondent from the published data. Revealing that an individual is a respondent of a data collection may or may not violate confidentiality requirements. Attribute disclosure occurs when confidential information about a respondent is revealed and can be attributed to the respondent. Attribute disclosure is the primary concern of most statistical agencies in deciding whether to publish tabular data. Inferential disclosure occurs when individual information can be inferred with high confidence from statistical information of the published data.Some other works of SDC focus on the study of the non-interactive query model, in which the data recipients can submit one query to the system. This type of non-interactive query model may not fully address the information needs of data recipients because, in some cases, it is very difficult for a data recipient to accurately construct a query for a data mining task in one shot. Consequently, there are a series of studies on the interactive query model, in which the data recipients, including adversaries, can submit a sequence of queries based on previously received query results. The database server is responsible to keep track of all queries of each user and determine whether or not the currently received query has violated the privacy requirement with respect to all previous queries. One limitation of any interactive privacy-preserving query system is that it can only answer a sublinear number of queries in total; otherwise, an adversary (or a group of corrupted data recipients) will be able to reconstruct all but 1 . o(1) fraction of the original data, which is a very strong violation of privacy. When the maximum number of queries is reached, the query service must be closed to avoid privacy leak. In the case of the non-interactive query model, the adversary can issue only one query and, therefore, the non-interactive query model cannot achieve the same degree of privacy defined by Introduction the interactive model. One may consider that privacy-reserving data publishing is a special case of the non-interactivequery model.This paper presents a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explains their effects on Data Privacy. k-anonymity is used for security of respondents identity and decreases linking attack in the case of homogeneity attack a simple k-anonymity model fails and we need a concept which prevent from this attack solution is l-diversity. All tuples are arranged in well represented form and adversary will divert to l places or on l sensitive attributes. l-diversity limits in case of background knowledge attack because no one predicts knowledge level of an adversary. It is observe that using generalization and suppression we also apply these techniques on those attributes which doesn’t need th is extent of privacy and this leads to reduce the precision of publishing table. e-NSTAM (extended Sensitive Tuples Anonymity Method) is applied on sensitive tuples only and reduces information loss, this method also fails in the case of multiple sensitive tuples.Generalization with suppression is also the causes of data lose because suppression emphasize on not releasing values which are not suited for k factor. Future works in this front can include defining a new privacy measure along with l-diversity for multiple sensitive attribute and we will focus to generalize attributes without suppression using other techniques which are used to achieve k-anonymity because suppression leads to reduce the precision ofpublishing table.译文：数据挖掘和数据发布数据挖掘中提取出大量有趣的模式从大量的数据或知识。

网络翻译用语

abandon 舍弃 abort退出 access 接驳、联通、接达、接收、接通、接取、衔接，登陆Acer Computer 宏（其+石）电脑adapter 监控器，接口add on card 加置卡address 位址、地址Adobe Corp. 土坯集团AI, artificial intelligence 人工智能, 人工智慧algorithm 演算法Alt key 交替AMD, Advance Micro Devices 超微科技America On-line, AOL 美国线上alphanumeric characters 字符analog 类比analog to digital converter 类比数位转化器anchor 锚子Andersen Consulting 安信达咨询公司animation 动态画面、动画anonymous FTP 匿名资料档ANSI, American National Standard Institute 美国国家标准局antivirus software 抗毒软件API, application program interface 应用软件介面append 增添（资料）Apple Chinese Dictation Kit, ACDK 苹果中文译写器Apple Computer Inc. 苹果电脑集团application software 应用软件arcade game 电子游戏Archie 阿奇，档案搜索软件architecture 架构archive 档案ARCnet, Attached Resource Computer net 资源附加网络ARPANET, Advanced Research Project Agency Net 先进研究计划局array 阵列arrow key 方向键ASCII, American Standard Code for information Interchange 美国信息互换标准码Ashton-Tate 安信达AsiaOne 亚洲第一站AsiaOnLine 亚洲线上assembler language 汇编语言AST 虹志电脑asterisk * 星号asynchronous 异步、非同步AT&T, America Telephone & Telegram 美国电话与电报公司ATM, asynchronous transfer mode 异步传输方式audio card 音效卡audio CD 音响光碟audio mail 声音讯息、声讯audio output 音响输出audio signal 声讯audiotext 声讯文字、声文AutoCAD 欧特克Aztech 爱捷特backslash 反斜线,退位键backspace 反向键backup 备份，（支持？）bad sector 毁损磁区bandwidth 频宽，宽带Banyan Systems Inc. 企业网系统公司bar code, bar code reader 条码，条码阅读器based 基准BASIC, Beginner's all-purpose Symbolic Instruction Cod 培基语言batch file 批次档案batch processing 批次处理baud rate 传输速率, 波特率baudot code 波特码BBS, bulletin board system 电子公告栏Bell Lab 贝尔试验室benchmark test 基准测试beta software 测试版软件bi-direction 双向Big-5 code 大五码Big Blue 蓝色巨人BigGreen 绿色巨人binary 二进制BioMed Net 生物医药网Bloomberg 博BIOS, basic input/output system 基本输出输入系统bit, BinarydigiT 位元bit map 点阵backslash 反斜线block move 搬移区块Bob 鲍勃boldface 加黑体粗黑，粗体字Boolean algebra 布林代数boot sector 启动区Borland 宝兰bps, bits per second 每秒多少位元bridge 桥接broadband 宽频道、宽频browse/browser 浏览，浏览器, 阅览，阅览器British Computer Society, BCS 英国电脑学会BrushWriter 大学士BSA, Business Software Alliance 商业软件联盟Bubblejetprinter 喷泡式打印机buffer 缓冲区，缓冲器bug 虫、错误buildin 内建、内置bundled software 批套附送软件bus 汇流排bus network 串列网络byte 字节cache memory 快取记忆体高速缓冲存取CAD, computer-aided design 电脑辅助设计CADD, computer-aided design & drafting 电脑辅助设计与绘图CAE, Computer Aided Engineering 电脑辅助工程CAFE Vanda 万黛兰咖啡屋CAI, computer-aided instruction 电脑辅助教材Canon 佳能capslock 大写锁定capture image 捉取图像cartridge ribbon 匣式墨带、墨匣CASE, computer aided software engineering 电脑辅助软件工程case sensitive 注重大小写区分大小写catalog 目录CBT, computer based training 电脑辅助培训CD, compact disc 光碟CD, compact disc 激光唱片,镭射唱片CD-I,CD-Interactive 互动式光碟CD player 激光唱机CD-R,CD-Recordable 可录写光碟CD recorder 光碟录制机CD-ROM, CD-read only memory 唯读光碟, 影音光碟CD-ROM drive 光碟驱动器CERN, the European Particle Physics Laboratory 欧洲量子物理试验室CERNET, China Education and Research Net 中国教育和科研计算机网CGA, Color Graphic Adapter 彩色图像监控器CGI, common gateway interface 通用接驳介面Challenger Super Store 挑战者超值坊Chinese & Oriental Languages Information Processing Society, COLIPS 中文与东方语言信息处理学会Chinese Characters DOS, CCDOS 汉字操作系统Chinese localisation 汉化Chinese Star, CStar 中文之星chip 晶片chipset 晶片组chip swap 晶片切换Chip-Up 矽奥chooser 选配器cipher/ciphertext 密码，密文CISC, complex instruction set computer 复杂指令集Civil Service Computerisation Programs, CSCP 公共服务电脑化计划Classitext 新视分类资讯click 按压client server network 主从系统clipboard 剪贴板clock cycle/speed 钟速clone computer 仿制电脑cluster 磁区CMOS, complementary metal-oxide semiconductor 辅助性金氧半导体coaxial cable 同轴线code 码、程序coder 编码者coding 编写程序、编程、编码cold boot/start 冷启动collision 冲撞, 冲挤、撞挤、碰撞colour ribbon 彩墨带COMDEX Asia 亚洲电脑展Comdex Asia 亚洲电脑展column 行（直）专栏Comet 彗星网络command 指令CommerceAsia 亚洲商业（网络）communicopia 通讯社群compact casing 精巧机壳Compaq 康柏电脑公司compatibility 兼容性 compiler 编译器compress 压缩Compuserve 电脑服务Computer Associate, CA 电脑友伴computer laboratory 电脑工室Computer Recovery Centre 电脑修复中心computer workshop 电脑工室computing 计算、运算configuration 设置connector 接口Conner 康纳controller card 控制卡controlpanel 控制栏cool page 酷页coprocessor (math/numericcoprocessor) 附处理器（数学、数字附处理器）copy 抄，抄取、录、写，复制本CORENET, Construction and Real Estate Network 建筑与房地产网络corrupted file 混淆档案、乱码counterfeit software 盗版软件courseware 教材软件cpi, character per inch 每寸多少字cps, character per second 每秒多少字CPU, central processing unit 中央处理器crack, cracker 侵截，侵截者crash 冲撞Cray supercomputer 克雷超级电脑Creative Ngee Ann Multimedia Award 创新义安多媒体奖Creative (Taiwan) 创钜科技Creative Technology 创新科技CRT, cathode ray tube 荧光屏, 显像管cryptography 密码学cryptosystem 密码系统cryptoanalysis 密码分析CTD, cumulative trauma disorder 累积性精神失常Cubic CT 茜锑cursor 光标、游标custom install 一般性设置customize software 个别编写的软件cut and paste 剪贴cut-sheet feeder 输纸装置C-Win, Chinese Windows (微软)中文视窗cyber 电子、网际cybercafe 网际咖啡屋cybercash 电子钱cyber citizen 网际公民、网民cyber community 联网社群cybernaut 网客, 网友、网中人cybernetic 控制论Cybermouse 三维鼠、遥控鼠cyberphobia 电脑恐惧症cyberphone 联网电话cyberporn 网际色情cyberpunk /cypherpunk 网际彷客、网际浪人cyber sales 网际行销cybersex / cyberporn / cybersmut / cyberslut 网际色情cyberspace 电子空间, 联网、网际空间cyber surfer 网际漫游者Cyberway 讯威cyberworld 联网世界cycle time 周速daisywheel printer 转轮式打印机, 菊花轮打印机data 数据、资料database 数据库data roam 无线传讯DBMS, database management system 数据库管理系统debug, debugger 侦错，侦错程序除虫，捉臭虫decipher 解密、解码decode, decoder 解密，解码器decompression 释放、反压缩decryption 解密default 启始设置deinstall 反装置delimiter 界限符号、界符Dell Computer Corp. 戴尔电脑集团demo disk, demo software 示范软盘，示范软碟，示范软件desktop computer 桌上型电脑developer kits 开发工具develop software 开发软件diagnostic program 诊断程序dialog box 对话框dial ondemand 随拨服务dial up connection 拨接digicash 电子钱Digital Equipment 数据器材digital, digitize 数位，数位化, 数码，数码化digital diary 电子记事本Digital Media Centre 数位媒体中心Digital Research 迪吉多科研digital tape 数码磁带directory 目录disc 光碟类disc drive 光碟驱动器disk 磁盘类diskette 软磁盘，软盘disk drive 磁盘驱动器disk drive cleaner 磁盘驱动器洗洁剂disk jacket 软盘外套display 显示DNS, domain name system 区位名址系统document, documentation 文件，使用说明docking 接驳domain 定义域、领域、组别door control system 门禁系统DOS based / version DOS基准／版本DOS, disk operating system 磁盘操作系统、操作系统作业系统dot-matrix printer 撞针式打印机dot pitch 光点double byte 双字节double side double density 两面双密downlink 下衔download 下载、下卸、卸载downsizing 功能下移downward compatibility 下携性dpi, dots per inch 每方寸多少点drag and drop 拖与放、拖放DRAM, dynamic random access memory 动态随机存取记忆体drive, driver 驱动器（机），驱动程序DSP, digital signal processor 数位讯号处理器DTP, desktop publishing 桌上排版DVD, digital video disc 数位录像光碟dye based ink 染料油墨dynamic data exchange, DDE 动态数据互换Dynasty 皇朝电脑EAN, European Article Number 欧洲商品编码EBCDIC, Extended Binary Coded Digital Interchange Code 延伸数位互换二进码echoplex （传输正确）回讯EDI, Electronic Data Interchange 电子数据互换editor 编辑程序，编辑器EDP, electronic data processing 电子数据处理edutainment software 教育娱乐软件、教娱软件EGA, enhanced graphic adapter 改良图像监控器EISA, Extended Industry Standard Architecture 延伸工业标准架构electromagnetic 电磁electronic cash, e-cash 电子钱electronic diary 电子记事本e-mail, electronic mail 电子邮递、电子邮件、电子信件、电邮、电子信E-mu Systems Inc. 育苗公司emoticon 表情符号emulation 模拟Enablenet 能网encipher, encipter 加密，译成密码encode 锁码、加密encryption 加密end user 用户Energy Star 能源之星ENIAC, Electrical Numerical Integrator & Calculator 电子数字综合计算机Enter / Return key 回车键entry 登录environment （运行）环境EPA, Environmental Protection Agency 环境保护局EPROM, erasable programmable read only memory 可删可编程唯读记忆体Epson 爱普生erasable 可删除ergonomics 人机工程学, 人体工学Esc key 逸位键, 略过键ETen 倚天Ethernet 以太网, 乙太网eWorld 电子世界execution file 执行档Expanded Memory Specification, EMS 延伸记忆体系统expansion slot 扩充槽expert system 专家系统explorer 探游者、探游器export 输出extension (e.g.,.DAT) 后缀档名, 附注档名external drive 外接、外置磁盘驱动器external harddisk 外接、外置硬盘FAQ, Frequently Asked Question 答客问常见问题解答FAT, file allocation table 档案分配区fax on demand 自选传真服务FCC, Fedaral Communications Commission 联邦电信交通委员会feasibility study 可行性研究feature 功能Female connector 雌性接口fetch / FTP 抓取FDDI, Fiber Distributed Data Interface 光导数据介面fiber optics cable 光导纤缆、光纤缆field 栏位字段file 档案、文档filter software 过滤软件finger 查索firmware 固件, 韧件firewall 防火墙, 网络防护区flame war 火焰战floating-pointcalculation 浮点运算floppy disk 软磁盘、软盘floptical disc / floptical drive 光磁碟，光磁碟机flow chart 流程图folder 文件夹font 字型font cartridge 字库匣format,formatting 格式化fps, frames per second 每秒多少画面freenet 免费网络freeware 免费软件FTP, file transfer protocol 档案传输通讯协定FTP site 资料卸载站fuzzy logic 模糊逻辑, 捷思、快思逻辑function, function key功能，功能键function library 功能集, 函数馆garbage in garbage out 胡乱输入胡乱输出gateway 接驳器Gateway2000 普特维2000GB (Guo Biao) code 国家标准码，国标码G Code 易录宝GIF, graphic interchange format 图像互换格式gigabit, gigabyte (gb) 京位元，京字节GII, global informationinfrastructure 环球资讯体系GIRO 财路转帐服务GIS, Geography Information System 地理资讯系统glitch 故障global positioning system, GPS 环球定位系统go on line 上网Gopher 鼠窜介面, 考访graphic based 图像基准grasp image 捉取图像green PC 绿色电脑groupware 群组软件GUI, graphic user interface 图像用户介面hack, hacker 侵截，侵截者, 黑客hang 死机、当机HanVision 汉神hand input 手写输入handwriting recognision 手写识别,手写辩识harddisk / hard drive 硬磁盘、硬盘hardware 硬件Hayes modem 海斯调制解调器HD, high density 高密HDCD, high definition compatible digital 高密兼容数码HDD-FDD controller card 软硬盘监控卡HDTV, high density television 高密电视Hercules Graphic Adapter 大力士监控卡Hewlett Packard Co., HP 惠普hexadecimal, hex 16进制high levellanguage 高阶语言, 高级语言high resolution graphic 高解析图像、高解像hi-tech 高科技hits 阅读次数hologram 全息图, 浮影图homekey 本位键HomeLink 家铃homepage 主页、本页、网页Hong Kong Star Internet Service 香港星光国际网络服务公司hook up 衔接host computer 主机HotJava 热爪哇 Sun's Internet 浏览器hot news 热闻HPC, high performance computer / computing 高能电脑／运算HTML,Hypertext Markup Language 超文本标记语言http, Hypertext Transfer Protocol 超文本传输协定hub 中转站human interface 人机界面、人性化介面hybrid computer 并合电脑hyperband 超频道、超频Hypercard 超卡软件hyperlink 超联结、超联hypermedia 超媒体hyperspace 超空间hypertext 超级文本、超文本Hz, hertz 周、赫兹IBM Corp., International Business Machine 国际商业机器, 万国商业机器IC, integrated circuit 集成电路icon 图标、图签IDC, International Data Corp. 国际数据机构IDE, Integrated Devices Electronics 综合电器介面IDG, International Data Group. 国际数据组织IDNet, Inter Departmental Net 部门联网IEEE, Institute of Electrical and Electronic Engineers 电机及电子学工程师联合会image processing 影像处理iMedia 艾媒体Imfomix 英孚美IMHO, in my humble opinion 我的浅见impact printer 撞击式打印机index 索引infection 感染（病毒）infobahn 资讯渠道infocosm 资讯化空InfoLine 万象快讯InfoYouth 青年资讯infomercial 资讯商业Informatics 资讯科技展Informatics Computer School 英华美电脑学校Information Communication Institute of Singapore, ICIS 新加坡信息通讯学院information superhighway, I-way 资讯超速公路Infoseek 资讯查找inkcartridge 墨匣initialization 启始化inkjet printer 喷墨式打印机inport 输入insert (Ins) key 加插键installation program 装置／安装程序instruction 指令Intel Corp. 英特尔, 英代尔intellectual property 知识产权, 智慧产权interactive 互动式, 交流式interactive TV, ITV 互动电视intercast 数据广播Interface 介面Inter Media Asia 亚洲多媒体展Internal hard disk 内置硬盘Internaut 网际漫游人, 网客，网友，网中人Internet, Net 网际网络, 大联网、国际、世界、交互、互联网纾Internet backbone 网际网络主干Internet content provider, ICP 网际网络资讯供应者Internet phone 网际电话Internet relay chat, IRC 网际清淡Internet search engines 网际网络检索机体Internet service provider, ISP 网际网络接驳服务供应者Internet Talks Radio, ITR 网际广播 InterNIC, Internet Network Information Centre 网际网络信息中心interpreter 解译器InTV 新视快速资讯inventory control system 库存管理系统I/O, input / output 输入／输出IP address 网络协定位址Iris, Inland Revenue Integrated System 艾丽斯、综合税务系统ISA, Industrial Standard Architectural 工业标准架构ISDN, integrated services digital network 综合服务数位网络ISO10646 国际标准化组织世界语文字符总集IT2000 资讯科技2000年新加坡智慧岛计划IT POWER, IT Programme for Office Workers 文员电脑课程Japan-Singapore Institute of Software Technology, JSIST 日新软件科技学院Japan-Singapore AI Centre 日新人工智能中心Jaring, Joint Advanced Research Integrated Networking 先进科研联网joystick 控制杆JPEG, Joint Photographic Experts Group 联合照相专家组jukebox （影像资料）存取盒jumper 跨接器。

seo作业

SEO 11 填空题1)()是网站建设中针对“用户使用网站的便利性”所提供的必要功能，同时也是“研究网站用户行为的一个有效工具”。

高效的站内检索可以让用户快速准确地找到目标信息，从而更有效地促进产品/服务的销售，而且通过对网站访问者()的深度分析，对于进一步制定更为有效的网络营销策略具有重要价值。

2)SEO(Search Engine Optimization)，汉译为搜索引擎优化，SEO可分为()和()两种。

3)网站SEO的最终目的就是带来()，通过分析网站流量统计数据，可以得知浏览者是搜索什么()找到你的网页的。

4)seo排名的影响因素很多。

比如域名的注册时间，服务器空间的速度和稳定性，网站整体结构，()是否是原创，内部链接，外部链接等等因素。

5)垂直搜索引擎为2006年后逐步兴起的一类搜索引擎。

不同于通用的网页搜索引擎，垂直搜索专注于特定的()和()(例如：机票搜索、旅游搜索、生活搜索、小说搜索、视频搜索等等)，在其特定的搜索领域有更好的用户体验。

相比通用搜索动辄数千台检索服务器，垂直搜索需要的硬件成本低、用户需求特定、查询的方式多样。

参考答案1)(搜索引擎)是网站建设中针对“用户使用网站的便利性”所提供的必要功能，同时也是“研究网站用户行为的一个有效工具”。

高效的站内检索可以让用户快速准确地找到目标信息，从而更有效地促进产品/服务的销售，而且通过对网站访问者(搜索行为)的深度分析，对于进一步制定更为有效的网络营销策略具有重要价值。

2)SEO(Search Engine Optimization)，汉译为搜索引擎优化，SEO可分为(站外SEO )和(站内SEO )两种。

3)网站SEO的最终目的就是带来(流量)，通过分析网站流量统计数据，可以得知浏览者是搜索什么(关键词)找到你的网页的。

4)seo排名的影响因素很多。

比如域名的注册时间，服务器空间的速度和稳定性，网站整体结构，(网站的内容)是否是原创，内部链接，外部链接等等因素。

Application of Chaoxing-Based Flipped Classroom in

US-China Foreign Language, October 2020, Vol. 18, No. 10, 314-318doi:10.17265/1539-8080/2020.10.005Application of Chaoxing-Based Flipped Classroom in Teaching of Integrated Practical Activity Design for Primary EducationZHAN LiliNanchang Normal University, Nanchang, ChinaThe thesis aims to discuss the application of Chaoxing-based flipped classroom in the teaching of integratedpractical activity design for primary school. The undergraduate students who learn this curriculum from PrimaryEducation Classes 1 and 2 of Class 2017 of Nanchang Normal University are taken as the research subjects, ofwhich Class 1 is regarded as the control group and Class 2 is considered the experimental group. The control groupadopts the traditional teaching method while the experimental group applies the teaching method ofChaoxing-based flipped classroom. The result shows that the average point of the total grade of the experimentalgroup is 86.2, which is significantly higher than 73.68 of the control group (p < 0.05, which indicates that there is asignificant difference in the total grades between the two groups). Consequently, applying the Chaoxing-basedflipped classroom into the undergraduate teaching of integrated practical activity design for primary educationenables students to enhance their performances, especially for the improvement of the class teaching capability byusing what they have acquired in the flipped classroom.Keywords: Chaoxing platform, flipped classroom, integrated practical activity design for primary education,teachingIntegrated Practical Activity Design for Primary Education is one of the most critical curricula for undergraduates majoring in primary education. The curriculum is a compulsory course for those who learning primary education in higher normal universities and a necessary skill in their work as a teacher after graduation. Generally speaking, there are several defects with regard to traditional teaching methods: (1) The teaching contents are copious and convoluted; (2) the knowledge is mainly imparted to students via the teacher’s one-way teaching; and (3) the students’ initiative of self-directed learning cannot be prompted. Therefore, it is quite imperative to innovate our educational model and instructional approach. This thesis explores a new instructional model of developing the flipped classroom via Chaoxing platform (a Chinese e-book online reading platform).Subject and ApproachResearch SubjectThe research subjects, a total of 116 undergraduate students, are selected from the Primary Education Classes 1 and 2 of Class 2017 of Nanchang Normal University. The students learn the curriculum of Integrated Practical Activity Design for Primary Education in the first semester during the school year of 2019-2020. ThisZHAN Lili, postgraduates, lecturer, Education Evaluation Institute, Nanchang Normal University, Nanchang, China.All Rights Reserved.APPLICATION OF CHAOXING-BASED FLIPPED CLASSROOM 315experimental research was conducted with the two groups holding similar age, knowledge reserve anddiscipline background, etc. The 58 students of the control group were taught via the traditional teachingapproach, through which the teacher imparted theoretical knowledge to the students in accordance with theteaching plan. In contrast, the 58 students of the experimental group were taught via the Chaoxing-basedflipped classroom, in which tasks were given to the students before the class, and self-directed learning andseminars relying on curriculum resources on the Chaoxing platform were conducted by students to discuss andinternalize the knowledge. During the whole teaching process of this curriculum, a comparative teachingapproach was applied for the two groups of students. The average age of the control group is 21.48 years oldwhile that of the experimental group is 21.37 years old. The difference in the average age between the twogroups shows no statistical significance, and thus the results have comparability. The textbook of this researchis the Integrated Practical Activity Design for Primary Education (2nd edition), which was edited by ProfessorGu Jianjun and published by the Higher Education Press.Teaching Approach and ProcedureIn line with the undergraduate cultivation plan for primary education majors and the syllabus of Integrated Practical Activity Design for Primary Education,the author distributed the knowledge points according toteaching content and class hours and worked out the teaching plan. The teaching content is comprised of basicknowledge and extensive knowledge. The basic knowledge explicates basic theories of the curriculum, whereasthe extensive knowledge consists of four parts: lectures, resources and materials, homework system as well ascase library.The teaching approach and procedure for the control group. The control group adopts the traditional teaching approach dominated by the instructor. The instructor explains the relevant knowledge points inaccordance with the syllabus and teaching plan. The students passively accept knowledge. In the end, theinstructor conducts an analysis, explanation, and a summary of outstanding cases on integrated practicalactivity design for primary education in order to let students master the design and implementation of integratedpractical activities in primary schools. During the whole class, students barely participated in group discussions.Students passively accepted and remembered the knowledge, because the knowledge was conveyed to them viathe instructor’s one-way teaching. The instructor cannot ensure whether the students have done the preview forthe curriculum or not. The corresponding interactions and feedback were inadequate in the instruction process.The instructor can assign after-class tasks to deepen the students’ understanding and application of thecurriculum, whereas the targeted one on one instruction cannot be achieved, and the progress and quality of thetasks cannot be traced.The teaching approach and procedure for the experimental group. The instruction of the curriculum for the experimental group was conducted via a Chaoxing-based flipped classroom. The instructor transformedsuch teaching materials as knowledge points, lectures, materials and resources, case library, and in-class tasksinto instruction resources that can be launched on the Chaoxing platform. In order to facilitate students’ studyinterest while maintaining an appropriate class hour, the above-mentioned materials are within the frameworkof the following requirements: (1) The number of words of literal materials should be less than 2,000 (threepages or so); (2) the number of pages of PowerPoint should be less than 20; (3) the time span of instructionvideos should last between 10 and 15 minutes; (4) the question type for the in-class test should be mainlymultiple choice and/or True or False questions; (5) the material resources and case library should be structured All Rights Reserved.APPLICATION OF CHAOXING-BASED FLIPPED CLASSROOM316 primarily by the latest teaching cases, with one item released on the class for students’ alternative study after class; and (6) lectures on special topics should be delivered in a combination of offline and online approaches.First of all, teaching materials (like literal materials, power points, and instruction videos) were uploaded to Chaoxing platform by the instructor before the class was given. Then, the instructor made a brief cross-talk about the basic knowledge in class, which took up 10% time of the class hour. Subsequently, the instructor illustrated a few difficult points that might not be fully grasped by students through the preview, which spent 20% time of the class hour. Finally, the students were asked to discuss instruction cases, come up with their solutions, and then explicate their ideas according to theories from the textbook titled the Integrated Practical Activity Design for Primary Education . The instructor kept reminding, questioning, summarizing, and answering for the whole class, and thus a two-way teaching class (about 70% time of the class hour) with adequate interactions was achieved. With the above procedures carried out during the curriculum, students would be able to conduct self-directed learning, independent thinking, interactions and discussions, and to resolve problems with their theoretical knowledge.Total final scores. The total scores of the curriculum are composed of the practical score and the theoretical test score, with each accounting for 50%. Besides, the theoretical test score and simulated classroom teaching score, which were set after the end of the curriculum, are considered as objective assessment criteria. The theoretical part was tested by a 100-score test paper, and the test questions were short answer questions (40 scores) and essay questions (60 scores). The test questions for the quantitative assessment of teaching effect (theoretical test) conducted by the experimental group and the control group were all extracted from the standardized question system established in this curriculum. The qualitative assessment (practical scores) was based on the results of grouped simulated classroom teaching.Questionnaire. When the curriculum was concluded, the experimental group (which adopted theChaoxing-based flipped classroom) was invited to fill in a questionnaire titled “A survey form on curriculum satisfaction” by virtue of Wenjuanxing. In addition, the students completed an anonymous comment on the curriculum, so as to express their actual experience of the instruction approach in this research as honest as possible. The author gave out the questionnaire to 58 students of the experimental group in an effort to know whether the students accept such an instruction approach and to have a picture of the satisfactory result. Statistical AnalysisSPSS 24.0 statistical software was applied for data analysis. The measurement data were represented by X. The comparison between the control group and the experimental group was examined by the independent sample t ; p < 0.05 was considered statistically significant, and p < 0.001 was considered that the difference was extremely statistically significant.ResultsComparison of Total Theoretical Scores and Scores of Essay Questions Between the Two GroupsThe average scores of the experimental group and the control group of the traditional instruction approach were 84.2 and 72.3 respectively (p < 0.05). For essay questions with 60 scores, the experimental group students gained an average score of 52.61, which was significantly higher than the control group students’ score of 43.68 points (p < 0.05). The average score of practical tests for the two groups were 88.2 and 75.06 respectively.All Rights Reserved.APPLICATION OF CHAOXING-BASED FLIPPED CLASSROOM 317Result of the Questionnaire on the Students From the Experimental GroupThe author sent “A survey form on curriculum satisfaction” to the students of the experimental group via Wenjuanxing (an online questionnaire platform). After the anonymous comment was finished, 58 questionnaires were returned in total, of which 58 questionnaires are effective with the effective recovery of 100% (see Table 1). It can be seen from Table 1 that the instruction approach of Chaoxing-based flipped classroom in the teaching of integrated practical activity design for primary education was highly recognized by the students of the experimental group. The proportion of approval for each item exceeded 90%, which indicates that the new instruction approach was widely accepted and recognized by the students.Table 1Feedback on the New Instruction Approach by the Students of the Experimental Group (%)Sequence Feedback items Approval Disapproval1 This approach is better than the traditional teaching method. 58 02 I will pay active attention to and learn materials notified by Chaoxing platform 57 13 It will not take up to long to do preview. 54 44 My study initiative gets stronger via flipped classroom. 55 35 I can acquire knowledge from other classmates through in-class case discussions. 56 26 This instruction approach enhanced my study interest and participation rate. 58 07 This instruction approach enhanced my ability to analyze teaching problems withacquired knowledge.56 28 It facilitated my critical thinking and innovative capacity. 57 19 I am more willing to interact anonymously on the Chaoxing teaching platform. 58 0ConclusionA comparative study of the total test scores and essay questions scores of the two groups of studentsshows that the experimental group students have a stronger grasp of theoretical knowledge and a strongerability to design and implement integrated practical activities. On the other hand, judging from the anonymousfeedback results of the qualitative experimental group students on the new teaching method, the students in thisgroup also recognized the teaching method.The advantages of the Chaoxing-based flipped classroom can be summarized as follows: One is to integrate after-class online study with in-class instruction, while the other is to synergize the acquirable onlinestudy with the interactive class learning. The knowledge that is easier to master by students was placed in theonline teaching resources of Chaoxing platform before class (Zeng, Zhou, & Liu, 2020). The knowledge thatrequires deep learning and the activities that are designed for students’ competence advancement wereincorporated into the in-class activities. Therefore, students can master the basic knowledge before the class byvirtue of seemingly broken teaching materials such as words, PowerPoints, and videos. The flipped classroomhas broken the traditional teaching method, brought teachers and students closer, and improved the teachers’teaching enthusiasm and students’ study interest. In short, the teaching method of the flipped classroom is toflip the traditional teaching method of “absorption in the classroom and internalization after the class” into“absorption before the class and internalization in the classroom”, which is a reform of the teaching methodoriented by teaching effect. The author has noticed that, during the teaching process, four students of theexperimental group disapproved of Item 3 in the anonymous feedback. Therefore, the teacher may need tofurther select materials that can be efficiently absorbed by students and upload such materials to the Chaoxing All Rights Reserved.APPLICATION OF CHAOXING-BASED FLIPPED CLASSROOM318 platform before class (Zhao & Jiang, 2017). With the ongoing improvement of such a new teaching method in the future, it is convincing that the teaching effect of the curriculum in this research will be further enhanced and that more excellent teachers for the curriculum of integrated practical activities can be cultivated for our primary schools.ReferencesZeng, W. J., Zhou, Z. Y., & Liu, L. M. (2020). How to design learning-centered flipped classroom in universities. ModernDistance Education Research, 32(5), 77-84.Zhao, H., & Jiang, T. (2017). Application of micro-class-based flipped classroom in college education. China Adult Education, 26,97-99.All Rights Reserved.。

深入理解C#3.x的新特性（1）：AnonymousType

深⼊理解C#3.x的新特性（1）：AnonymousType在C#3.0中，引⼊了⼀个新的Feature：Anonymous Method，允许我们已Inline的⽅式来定义Delegate，为Developer在Coding的时候带来了很⼤的便利。

在C#3.0中，我们⼜有了另⼀个相似的Feature：Anonymous Type。

Anonymous Type允许我们已Inline的⽅式的创建⼀个基于未知类型、具有所需数据结构的对象。

⼀、Anonymous Type Overview在传统的编程模式中，对象依赖于⼀个既定的Type，我们只能在Type的基础上创建相应的Instance。

⽐如如果我们需要创建⼀个Employee Instance，前提是我们已经有了⼀个相应的Emplyee Type的定义。

⽐如：注：在上⾯的Code中，实际上使⽤到了另外两个C# 3.0的new feature: Implicitly typed local variable & Object Initializer.这样基于⼀个预先定义的Type的对象创建⽅式的⼀个最⼤的限制就是：对于我们需要创建的每⼀个对象，我们必先定于该对象对应的Type。

Anonymous Type有效地解决了这个问题。

我认为Anonymous Type主要是基于下⾯的⽬的⽽设计：⼀个Type是对⼀个现实中实体的State（Data）和Behavior（Method）的抽象。

对于⼀些仅仅只包含State（Data）的Type（这样对象通常作为Data Package在Application 各个Layer之间、以及⼀个分布式环境中各个Application之间进⾏数据的传递），我们关⼼的仅仅是这个由这些数据成员组成结构：Type由哪些数据成员构成，它们的名称是什么，具有怎样的数据类型。

换句话说，这样的Data-based Type定义了⼀个Data Structure，相应地，我们可以说⼀个固定的Data Structure对应着⼀个特定的Type。

常见的入侵端口

原来的文件。
如果是NTFS格式，相对就麻烦一些。进安全模式。然后启动pulist列
出进程，然后用pskill这个程序（黑客网站有下的）杀掉svchost.exe
程序。然后在COPY过去。
覆盖后重新启动，使用netstat -an命令，可以看到Windows 2000下
已经没有135端口了。XP系统还有TCP的135，但是UDP里面已经没有
2.盘符属性
确定你要删除的盘符,单击鼠标右键选择共享和安全的选项.在弹出的窗口中选择不共享此文件夹.然后点确定.这样就关闭了共享(包括默认共享).
3.控制面板中删除
控制面板—管理工具—计算机管理—共享文件夹—共享
关闭里面的默认共享(包括admin$的删除)
4. 修改注册表
单击“开始/运行”，在运行窗口中输入“Regedit”，打开注册表编辑器，展开“HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Lanmanworkstation\parameters”，在右侧窗口中创建一个名为“AutoShareWks”的双字节值，将其值设置为0，(win2000 专业版 win xp);[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\
端口说பைடு நூலகம்:这是在安装服务器的时候，把系统安装分区自动进行共享，虽然对其访
问还需要超级用户的密码，但这是潜在的安全隐患，从服务器的安全考
虑，最好关闭这个“默认共享”，以保证系统安全。
关闭方法：关于默认共享的关闭方法有很多种方法.我这里根据自己所知的,归纳
了4种最常用的方法.
1.DOS下删除共享

t_closeness_icde07pdf

k-1 other records with respect to the quasi-identiﬁer. In other words, k -anonymity requires that each equivalence class contains at least k records. While k -anonymity protects against identity disclosure, it is insufﬁcient to prevent attribute disclosure. To address this limitation of k -anonymity, Machanavajjhala et al. [12] recently introduced a new notion of privacy, called -diversity, which requires that the distribution of a sensitive attribute in each equivalence class has at least “wellrepresented” values. One problem with l-diversity is that it is limited in its assumption of adversarial knowledge. As we shall explain below, it is possible for an adversary to gain information about a sensitive attribute as long as she has information about the global distribution of this attribute. This assumption generalizes the speciﬁc background and homogeneity attacks used to motivate -diversity. Another problem with privacy-preserving methods in general is that they effectively assume all attributes to be categorical; the adversary either does or does not learn something sensitive. Of course, especially with numerical attributes, being close to the value is often good enough. We propose a novel privacy notion called t-closeness that formalizes the idea of global background knowledge by requiring that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). This effectively limits the amount of individual-speciﬁc information an observer can learn. Further, in order to incorporate distances between values of sensitive attributes, we use the Earth Mover Distance metric [14] to measure the distance between the two distributions. We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments. The rest of this paper is organized as follows. We give an overview of -diversity in Section 2 and discuss its limitations in Section 3. We present the rationale and deﬁnition of t-closeness in Section 4, and discuss how to calculate the Earth Mover Distance in Section 5. Experimental results are presented in Section 6. Related work is discussed in Section 7. In Section 8, we discuss limitations of our approach and avenues for future research.

互联网行业英语术语

1、Access Method 网络的访问方法2、Access Right访问（存取）权3、Accounts 网络账号4、Accounting Services 记账服务5、Acknowledgment 确认6、Address Resolution Protocol(ARP)网址解决协议7、Addresses of Electronic Mail电子邮件地址8、Address of Network 网络地址9、Advertise online 网络广告10、Alpha 阿尔法试验11、Ask Job on Internet 网上求职12、Advertisement on Internet 网上广告13、AAMOF 事实上14、AFAIK 据我所知15、Angels 网络投资者16、ASAP 尽快17、Agent 代理程序18、Alpha AXP DEC 数字设备公司的电脑系统19、Amazon Business Model 亚马逊商务模式20、American National Standards Institute(ANSI) 美国国家标准协会21、Anonymous File Transfer Protocol 匿名文件传输协议22、AOL（American Online）美国在线23、Application Services Proviver(ASP) 网络应用服务供应商24、Asymmetrical Digital Subscriber Line(ADSL)异步数字用户线25、At Work Architecture Microsoft Auditing 网络运作体系结构26、Authentication and Authorization 验证和授权27、RSA 数据安全28、Backbone Networks 骨干网络29、Backup and Data Archiving 备份与归档30、Bandwidth 带宽31、B to B (Business to Business) 企业间的电子交易32、BBC Model BBC 模式33、B to C 电子零售，企业对消费者的交易34、Biz. 商业类新闻讨论组35、BBIAB 马上回来36、BBIAF 以后见37、BBL稍后便回38、BBYE 再见39、BTW 顺便提一下40、Bits and Bytes 比特与字节41、Business Software 商业软件42、Bookmark 书签43、Bookstore Online 网上书店44、BGP 边界网关协议45、Bridge 网桥46、Broadband ISDN(B-ISDN) 宽带综合业务数字网47、Broadband Service 宽带服务48、Browser 浏览器48、Browsing 浏览49、Bulletin Board System (BBS) 电子公告栏50、Buy Online 网上购物51、Commercial Software 商业软件52、Cyber economy 网络经济53、CES（Consumer Electronics Show）消费性电子用品展54、Carrier 电信公司55、CD 光盘56、CEO(Chief Executive Official) 首席执行官57、CGI（Common Gateway Interface）通用网关接口58、Channel 信道，通道，频道59、Certification Systems 确认认证系统60、Client/Server 客户服务器61、Principal and Subordinate Structure 主从结构62、Client-Server LAN Protocol 客户服务器局域网络协议63、Client Software 用户端软件64、Command Line Account 命令行账户65、Commerce Net 商业网66、Commercial Online Service 商业在线服务67、Comparative Buy 比较购物68、Common Mail Calls(CMC) 共同邮件呼叫69、Communication 通讯70、Communication Server 通讯服务器71、Computer Network 计算机网络72、Consumer Online 网络消费者73、CRM（Customer Relationship Management）客户关系74、Cryptography 密码术75、C to C 消费者之间的交易76、CPM 每千人收费77、Cybercafés 网吧78、CNNIC（China internet network information center）中国互联网络信息中心79、Customer Online 在线顾客、客户80、CWIS（Campus Wide Information System）全校园信息系统81、Cyber investigate for Consumer 网络消费者调查82、Cyberspace 网络空间83、Data base Management System (DBMS) 数据库管理系统84、Database Server 数据库服务器85、Data Communications 数据通信86、Data Encryption Standard(DES) 数据加密标准87、Datagram Delivery Protocol(DDP) 数据报传送协议88、Datagram Network Services 数据报网络服务89、Data Highway 数据高速公路90、Data Management 数据管理91、Data Migration 数据转移92、Data Protection 数据保护93、Data Transfer Rates 数据传输率94、Digital Cash 数字现金95、Dotcom(.com) 互联网络公司96、Desktop 台式计算机97、Dialup Line 拨号线98、Digital Certificate 数字凭证99、Digital Recording 数字录制100、Digital Signatures 数字签名101、Directory Management 目录管理102、Directory Services 目录服务103、Directory Services Netware Netware 的目录服务器104、Directory Tree 目录树状结构105、Distributed Computing 分布式计算机106、Distributed Data base 分布式数据库107、Document Management 文件管理108、Domain Name Service(DNS) 域名服务109、Domains 域110、Download 下载111、Dynamic Data Exchange (DDE) 动态数据交换112、Dynamic Routing 动态路由113、E-book 电子图书114、Ecash 电子现金115、Electronic Mall 电子购物中心116、Electronic money 电子货币117、Eyeball Economy “眼球”经济118、E-wallet 电子钱包119、E-banking 网上银行120、EC（Electronic Commerce）电子商务121、EC Website in China 中国电子商务网站122、EC of Wireless 无线电子商务123、Electronic Data Interchange(EDI) 电子数据交换124、Electronic Mail 电子邮件125、E-Traditional Industry 传统产业的电子化126、E-Mail System and Standard 电子邮件系统和标准127、SMTP 互联网简易邮件传输协议128、Novell MHS 信息管理服务系统129、Constitute Project E-Mail System 建立企业电子邮件系统130、E-Mail Application Program Interface Standard电子邮件应用程序接口标准131、Electronic Business 电子商务132、E-Marketplace 电子交易市场133、E-journal 电子刊物134、Enterprise Network 企业网135、E-zine 电子杂志136、Extranet 企业外部互联网137、EBay() 电子港湾138、Electronic Mail Broadcasts to a Roaming Computer (EMBARC)对移动计算机的电子邮件广播139、E-Consumer 电子消费者140、ECR（Electronic Cash Register）电子收款机141、ECP （Enterprise Customer Portal）企业客户门户142、EDI（Electronic Data Interchange）电子数据交换143、E-Distribution 电子分销144、EFT（Electronic Fund Transfer）电子资金转帐145、ERP （Enterprise Resource Planning）企业资源计划146、EOS（Electronic Ordering System）电子订货系统147、Encoder 编码148、Enterprise Networks 企业网络149、Expert system 专家系统150、E-Publishing 电子出版151、FAQ（Frequently Asked Questions）常见问题回答152、Big File Little Space 大文件，小空间153、File Transfer Access and Management (FTAM) 文件传输存取与管理154、File Transfer Protocol(FTP) 文件传输协议155、Follow up Article 后续新闻稿156、Free-Net 免费网络157、Freeware 免费软件158、FTP（File Transfer Protocol）匿名FTP159、Domain Name Service 域名服务160、Electronic Mail Gateway 电子邮件网关161、Gateway-to-Gateway Protocol 网关一网关协议162、Global Naming Service 全球命名服务163、Group Buy 网上集体议价164、Group 新闻组165、GIF（Graphic Interchange Format）图形交换格式166、Goods Online 在线商品167、Gopher space Gopher 公共Gopher服务器168、Price of Goods Online 在线商品价格169、Groupware 新闻组软件170、E-Mail And Groupware 电子邮件与新闻组软件171、Work Flow Software 工作流软件172、Gateway 网关173、Hacker 黑客174、Header 标题、报头、页眉175、Hierarchy 新闻组的分级176、Host 主机177、Home Page 主页178、Home Shopping 在家购物179、Hostname 主机名180、Hot list 热表181、HTTP（Hyper Text Transport Protocol）超文本传输协议182、Hyperli nk 超链接183、Hypermedia 超媒体184、Hypertext 超文本185、IC 我明白了186、IDK 我不知道187、IOW 换句话说188、Internet 内部互联网189、IDG（International Data Group）国际数据集团190、Inter NTC(internet network information center) 互联网网络信息中心191、ID（Identifier）标识符192、Reseller Online 网上中间商193、Information Superhighway 信息高速公路194、Imaging 图形化195、Interconnectivity 网络连接196、IRM（Information Resource Manage met）信息资源管理197、Interactive 交互的、互动的198、Interactive Marketing 互动营销199、Interactive Television 互动式电视200、Internet Account Internet 账户201、IP（Internet phone）网络电话202、Investigate on internet 网上调查203、ICQ（I seek you）网络寻呼机204、Internet Address Internet 地址205、Internet work 网际网206、IP（Internet Protocol）互联网络协议207、IP Address IP 地址208、Interior Gateway Protocols 内部网关协议209、International Organization for Standardization(ISO) 国际标准组织210、Internet 互联网211、Internet Mall 网络购物中心212、Internet Protocol(IP) 互联网协议213、IPX PACKETS IPX 信息包214、Internet work Routing 互联网上的路由215、Interoperability 交互操作性216、Interrupts 中断217、ISP（Internet Serve Provider 网络服务供应商218、IT (Information Technology) 信息技术219、KOOL 酷220、Kbps (kilo bit per second) 每秒千比特221、Key Escrow 第三方保存密钥222、Kerberos Authentication Kerberos 认证223、Kernel 核心224、Key Encryption Technology 钥匙的加密技术225、LAN Drivers 局域网络驱动程序226、LAN and WAN 局域与广域网络227、Local Area Network (LAN) 局域网络228、Wide Area Network (WAN) 广域网络229、layered Architecture 阶层性结构230、Leaf objects 未端对象231、Learning Bridges 学习型网桥232、Leased Line 专线233、Login 登录234、Logic bomb 逻辑炸弹235、Lurking 潜伏236、Local Area Networks(LANs) 局域网络237、Netscape6 新版网景浏览器238、Local Area Transport(LAT) 局域传输239、Logistics 现代物流240、Management Group(Team) 管理团队241、Mobile Office 移动办公室242、Moore’s Law 摩尔定律243、MIS（Management Information System 管理信息系统244、MP3 电子音乐格式245、NASDAQ 纳斯达克246、Netizen 网民247、Netbug 网虫248、Netiquette 网络礼仪249、Netnews 网络新闻250、Network Language 网络语言251、Newbie 网络新手252、Newsreader 新闻阅读器253、New Economy 新经济254、NC（Network Computer）网络计算机255、NIC（Network Information Center）网络信息中心256、NOC（Network Operation Center）网络运行中心257、Node 节点258、object-Oriented 面向对象的259、Protocol Independence 协议独立260、Network Cant 网络“黑话”261、Memory Management 内存管理262、Networks 网络263、Why Constitute Network? 为何建立电脑网络？264、Network Environment 网络环境265、Network Compose 网络的组成266、Network Connect Method 网络连结的方法267、Network Class 网络的种类268、Network Configuration 网络结构269、Offline 脱机，离线270、Online 联机，在线271、Online Career Center 网上求职中心272、Online Community 网络社区273、OEM（Original Equipment Manufacturer）原始设备制造厂商274、Online Service 在线服务275、Operating System 操作系统276、On-Line Transaction Processing(OLTP) 在线即时事务处理277、Open Data-link Interface(ODI) 开放式数据链接口278、Open Messaging Interface(OMI) 开放式信息接口279、Open Network Computing(ONC)SunSoft SunSoft 的开放式网络计算280、Remote Program Call(RPC) 远程程序呼叫281、External Data Representation(XDR) 外部数据展现282、Network File System(NFS) 网络文件系统283、Open System 开放系统284、Optical Libraries 光盘图书馆285、Packets 信息包286、Packet-Switching Network 包交换网络287、Parallel Processing 并行处理288、PDA（Personal Digital Assistant）个人数字辅助电脑289、Personal Marketing 个性化营销290、Place Online 在线销售渠道291、Platform 平台292、Promotion On Internet 网上促销293、Promotion of E-Webs it 电子商务网站的促销294、Portable Computer 便携式计算机295、Protocol 协议296、Public File 公共文件297、POS（Point of Sale）销售点信息系统298、Mobile Radio Networks 移动无线电网络299、Redirector 重新定向器300、Release 新产品发布301、RI&W 读完去哭吧302、ROTEL 捧腹大笑303、Real time 实时304、Remote Access Software 远程存取软件305、Remote Procedure Call (RPC) 远程程序呼叫306、Report of Business Plan 商业计划书307、Replication 复制308、Routers 路由器309、Multi Protocol Router 多协议路由器310、Interior Gateway Protocols 内部网关协议311、Exterior Gateway Protocol(EGP) 外部网关协议312、Interdomain Policy Routing Protocols 域间的政策性路由器协议313、Border Gateway Protocol(BGP) 边界网关协议314、Routing Information Protocol(RIP) 路由信息协议315、Routing Protocols 路由协议316、Path Logic 逻辑路由317、Exterior Protocol 外部/域协议318、RSA Data Security RSA 数据安全性319、Authentication and Accredit 认证和授权320、Private Key Method 私钥方法（对称性）321、Public Key Method 公钥方法（非对称性）322、Appraisal System 鉴定系统323、Number Idiographic 数字签名324、Segment Network 区段，网络325、Semaphore 信号灯326、Servers Network 网络服务器327、Directory Server 目录服务器328、Service after Sell Online 在线售后服务329、Service Access Point 服务存取点330、Service Advertising Protocol (SAP) 服务广告协议331、Set-Top Box 机顶盒332、Shareware 共享软件333、Shell 外壳334、Simple Mail Transfer Protocol (SMTP) 简易邮件传送协议335、Simple Network Management Protocol (SNMP) 简易网络管理协议336、Signature 签名337、SOHO（Small Office Home Office）在家办公338、Sockets 插头339、Software Distribution 软件分布340、Spanning Tree Algorithm 伸缩树法341、Spam 垃圾邮件342、Subscribe 订阅343、Supercomputer 超级计算机344、Supply Chain Management 供应链管理345、Surfing 冲浪346、Sysop(System Operator) 系统操作员347、System Integration 系统集成348、Supervisor 管理者349、Switched Services 交换服务350、Integrated Service Digital Network(ISDN) 综合业务数字网351、Synchronous Communication 同步传输352、Synchronous Optical Network(SONET) 同步光纤网络353、System Fault Tolerance(SFT) 系统容错354、System Application Architecture(SAA) 系统应用程序体系结构355、Talk 对话356、Telecommuting 远程上班357、Telnets 远程网358、Terminal 终端机359、Terminal Emulation 终端服务器360、Third Party Logistics 第三方物流361、Logistics Center 物流中心，配送中心362、Timesharing Computer 分时计算机363、Internet Protocol (IP)网际协议364、Internet Protocol Address IP 地址365、Internet Applications Protocol 应用软件协议366、Trustees 受托人367、Twisted-Pair Cable 双绞线368、Line Limit 连线限制369、Time limit 时间限制370、Workstation Limit 工作站限制371、Virtual office 虚拟合作372、Virtual office 虚拟办公室373、Videoconferencing 视频会议374、Virtual Circuits 虚拟线路375、Virtual Community 虚拟社区376、Virtual Data Networks 虚拟数据网络377、Virtual File System(VFS) 虚拟文件系统378、Data Access and Access Environment 数据存取与存取环境379、The Repository Environment 库存环境380、Virtual Memory System(VMS) 虚拟存储系统381、Virtual Terminal (VT) 虚拟终端机382、Virtual Electronic Commerce City 电子商城383、Wide Area Networks(WAN) 广域网络384、Expert Network 专用网络385、Public Equipment 公用设备386、Circuitry Exchange Serve 线路交换服务387、Package Exchange Serve 包交换服务388、Leased Line 专用线路389、Integrated Services Digital Network (ISDN) 综合业务服务网390、W AN（Wide Area Network）广域网391、Wed 万维网392、Webonomics 网络经济家393、W AP（Wireless Application Protocol）无线应用协议394、WYSIWYG 所见即所得395、Webmaster 万维网设计管理师396、Website Brand 网站品牌397、Virtual Storage 虚拟存储器398、File and System Protect 文件与系统保护399、Network 网络400、Print Function 打印功能401、Login Startup Options 登入与启动选项402、Worm 蠕虫403、WWW 万维网404、Workgroups 工作组405、Workplace OS 工作站操作系统406、Workstation 工作站407、Yellow pages 黄页1。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

J Intell Inf Syst(2009)33:209–234DOI10.1007/s10844-008-0075-2(α,k)-anonymous data publishingRaymond Wong·Jiuyong Li·Ada Fu·Ke WangReceived:2June2008/Revised:25November2008/Accepted:25November2008/Published online:8January2009©Springer Science+Business Media,LLC2008Abstract Privacy preservation is an important issue in the release of data for mining purposes.The k-anonymity model has been introduced for protecting individual identiﬁcation.Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information.In this paper,we propose an(α,k)-anonymity model to protect both identiﬁcations and relationships to sensitive information in data.We discuss the properties of(α,k)-anonymity model. We prove that the optimal(α,k)-anonymity problem is NP-hard.Weﬁrst present an optimal global-recoding method for the(α,k)-anonymity problem.Next we propose two scalable local-recoding algorithms which are both more scalable and result in less data distortion.The effectiveness and efﬁciency are shown by experiments.We also describe how the model can be extended to more general cases.Keywords Privacy·Data mining·Anonymity·Privacy preservation·Data publishingR.WongDepartment of Computer Science and Engineering,Hong Kong University of Science and Technology,Kowloon,Hong KongJ.Li(B)School of Computer and Information Sciences,University of South Australia,Mawson Lakes,South Australia,Australiae-mail:jiuyong.li@.auA.FuDepartment of Computer Science and Engineering,Chinese University of Hong Kong,Shatin,Hong KongK.WangDepartment of Computer Science,Simon Fraser University,Burnaby,Canada1IntroductionPrivacy preservation has become a major issue in many data mining applications. When a data set is released to other parties for data mining,some privacy-preserving technique is often required to reduce the possibility of identifying sensitive infor-mation about individuals.This is called the disclosure-control problem(Cox1980; Willenborg and de Waal1996;Hundepool and Willenborg1996)in statistics and has been studied for many years.Most statistical solutions concern more about maintaining statistical invariant of data.The data mining community has been study-ing this problem aiming at building strong privacy-preserving models and designing efﬁcient optimal and scalable heuristic solutions.The perturbing method(Agrawal and Srikant2000;Agrawal and Aggarwal2001;Rizvi and Haritsa2002)and the k-anonymity model(Sweeney2002a;Samarati2001)are two major techniques for this goal.The k-anonymity model has been extensively studied recently because of its relative conceptual simplicity and effectiveness(e.g.Iyengar2002;Wang et al. 2004;Fung et al.2005;Bayardo and Agrawal2005;Aggarwal et al.2005;Meyerson and Williams2004).In this paper,we focus on a study on the k-anonymity property(Sweeney2002a; Samarati2001).The k-anonymity model assumes a quasi-identiﬁer,which is a set of attributes that may serve as an identiﬁer in the data set.It is assumed that the dataset is a table and that each tuple corresponds to an individual.A data set satisﬁes k-anonymity if there is either zero or at least k occurrences for any quasi-identiﬁer value.As a result,it is less likely that any tuple in the released table can be linked to an individual and thus personal privacy is preserved.For example,we have a raw medical data set as in Table1.Attributes job,birth and postcode1form the quasi-identiﬁer.Two unique patient records1and2may be re-identiﬁed easily since their combinations of job,birth and postcode are unique. The table is generalized as a2-anonymous table as in Table2.This table makes the two patients less likely to be re-identiﬁed.In the literature of privacy preserving,there are two main models.One model is global recoding(Sweeney2002a;LeFevre et al.2005;Bayardo and Agrawal2005; Samarati2001;Iyengar2002;Wang et al.2004;Fung et al.2005)while the other is local recoding(Sweeney2002a,b;Aggarwal et al.2005;Meyerson and Williams 2004;Hundepool and Willenborg1996;Hundepool2004).Assuming a conceptual hierarchy for each attribute,in global recoding,all values of an attribute come from the same domain level in the hierarchy.For example,all values in Birth date are in years,or all are in both months and years.One advantage is that an anonymous view has uniform domains but it may lose more information.For example,a global recoding of Table1may be Table4and it suffers from over-generalization.With local recoding,values may be generalized to different levels in the domain.For example,Table2is a2-anonymous table by local recoding.In fact one can say that local recoding is a more general model and global recoding is a special case of local recoding.Note that,in the example,known values are replaced by unknown values (*).This is called suppression,which is one special case of generalization,which is in turn one of the ways of recoding.1We use a simpliﬁed postcode scheme in this paper.There are four single digits,representing states, regions,cities and suburbs.Postcode4350indicates state-region-city-suburb.Table1Raw medical data set Job Birth Postcode IllnessCat 119754350HIVCat 119554350HIVCat 119555432fluCat 119555432feverCat 219754350fluCat 219754350feverTable2A2-anonymous data set of Table1Job Birth Postcode Illness Cat 1*4350HIV Cat 1*4350HIV Cat 119555432flu Cat 119555432fever Cat 219754350flu Cat 219754350feverTable3An alternative 2-anonymous data set of Table1Job Birth Post code Illness *19754350HIV **4350HIV Cat 119555432flu Cat 119555432fever **4350flu *19754350feverTable4A(0.5,2)-anonymous table of Table1by full-domain generalization Job Birth Post code Illness **4350HIV **4350HIV **5432flu **5432fever **4350flu **4350feverLet us return to the earlier example.If we inspect Table2again,we can see that though it satisﬁes2-anonymity property,it does not protect two patients’sensitive information,HIV infection.We may not be able to distinguish the two individuals for theﬁrst two tuples,but we can derive the fact that both of them are HIV infectious.Suppose one of them is the mayor,we can then conﬁrm that the mayor has contracted HIV.Surely,this is an undesirable outcome.Note that this is a problem because the other individual whose generalized identifying attributes are the same as the mayor also has HIV.Table3is an appropriate solution.Since(*,1975,4350) is linked to multiple diseases(i.e.HIV and fever)and(*,*,4350)is also linked to multiple diseases(i.e.HIV andﬂu),it protects individual identiﬁcations and hides the implication.We see from the above that protection of relationship to sensitive attribute values is as important as identiﬁcation protection.Thus there are two goals for privacy preservation:(1)to protect individual identiﬁcations and(2)to protect sensitive relationships.Our focus in this paper is to build a model to protect both in a disclosed data set.We propose an(α,k)-anonymity model,whereαis a fraction and k is an integer.In addition to k-anonymity,we require that,after anonymization,in any equivalence class,the frequency(in fraction)of a sensitive value is no more thanα. Weﬁrst extend the well-known k-anonymity algorithm Incognito(LeFevre et al. 2005)to our(α,k)-anonymity problem.As the algorithm is not scalable to the size of quasi-identiﬁer and may give a lot of distortions to the data since it is global-recoding based,we also propose two efﬁcient local-recoding based methods.This proposal is different from the work of association rules hiding(Verykios et al. 2004)in a transactional data set,where the rules to be hidden have to be known beforehand and each time only one rule can be hidden.Also,the implementation assumes that frequent itemsets of rules are disjoint,which is unrealistic.Our scheme blocks all rules from quasi-identiﬁcations to a sensitive class.This work is also different from the work of template-based privacy preservation in classiﬁcation problems(Wang et al.2005,2007),which considers hiding strong as-sociations between some attributes and sensitive classes and combines k-anonymity with association hiding.There,the solution considers global recoding by suppression only and the aim is to minimize a distortion effect that is designed and dedicated for a classiﬁcation problem.The model deﬁned in this paper is more general in that we allow local recoding and that we aim at minimizing the distortions of data modiﬁcations without any attachment to a particular data mining method such as classiﬁcation.This work is proposed to handle the homogeneity attack as l-diversity model(Machanavajjhala et al.2006)does.Homogeneity attack is possible when a group of individuals,whose identities are indistinguishable in a published table,share the same sensitive value.In other words,an attacker does not need to identify an individual from a group,but can learn his/her sensitive information.We handle the problem in a different way from l-diversity model.l-diversity model requires that the sensitive values of every identity undistinguishable group in a published table has at least l different sensitive values.This gives a general principle for handling the homogeneity attack,but l-diversity model suffers a major problem in practice. l-diversity does not specify the protective strength in terms of probability of leakage. Note that l-diversity does not mean that the probability of knowing one’s sensitive value is less than1/l when the distribution of sensitive values is skewed.Also,it is quite difﬁcult for users to set parameter l.In contrast,αin our model is a probabilistic parameter and is intuitive to set.Furthermore,the proposed algorithm in Machanavajjhala et al.(2006)is based on a global-recoding algorithm Incognito, which may generate more distortion compared to a local recoding approach.We propose two local recoding algorithms which can give low information loss.It is worth mentioning other works(Li and Li2007;Xiao and Tao2006,2007; Bu et al.2008)which are also related to us although they are different from us. Li and Li(2007)proposed a privacy model called t-closeness.With this model, the distribution in each A-group in T∗with respect to the sensitive attribute is roughly equal to the distribution of the entire table T∗.The difference between the distribution in each A-group and the distribution of the entire table should be bounded with a parameter t.However,similar to l-diversity,it is difﬁcult for the users to set parameter t since parameter t is not intuitive.Xiao and Tao(2006)proposed a personalized privacy model such that each individual can provide his/her preference on the protection of his/her sensitive value.The above works study the problem for a one-time publication.Xiao and Tao(2007)and Bu et al.(2008)proposed the problems for multiple-time publications.In this paper,we focus on the one-time publication.We propose to handle issues of k-anonymity with protection of some sensitive values.This is based on the fact that we could not protect too many sensitive values in a data set.If we do,a published data set may be hardly useful because of too many distortions have been done to the data set.Practically,not all sensitive information is considered as privacy.For example,people care more about depression than virus infection.We consider our proposed method as a practical enhancement of k-anonymity with the consideration of the utility of published data.Our Contributions:–We propose a simple and effective model to protect both identiﬁcations and sensitive associations in a disclosed data set.The model extends the k-anonymity model to the(α,k)-anonymity model to limit the conﬁdence of the implications from the quasi-identiﬁer to a sensitive value(attribute)to withinαin order to protect the sensitive information from being inferred by strong implications.We prove that the optimal(α,k)-anonymity by local recoding is NP-hard.–We extend Incognito(LeFevre et al.2005),a global-recoding algorithm for the k-anonymity problem,to solve this problem for(α,k)-anonymity.We also propose two local-recoding algorithms,which are scalable and generate less distortion.In our experiment,we show that,on average,the two local-recoding based algorithms performs about4times faster and gives about3times less distortions of the data set compared with the extended Incognito algorithm.2Problem deﬁnitionWe assume that each attribute has a corresponding conceptual hierarchy or taxon-omy.A lower level domain in the hierarchy provides more details than a higher level domain.For example,birth date in D/M/Y(e.g.15/Mar/1970)is a lower level domain and birth date in Y(e.g.1970)is a higher level domain.We assume such hierarchies for numerical attributes too.In particular,we have a hierarchical structure deﬁnedwith{value,interval,*},where value is the raw numerical data,interval is the range of the raw data and*is a symbol representing any values.Intervals can be determined by users or a machine learning algorithm(Fayyad and Irani1993).In a hierarchy domains with fewer values are more general than domains with more values for an attribute.The most general domain contains only one value.For example,10-year interval level in birth domain is more general than one-year level.The most general level of birth domain contains value unknown(e.g.*).Generalization replaces lower level domain values with higher level domain values.For example,birth D/M/Y is replaced by M/Y.Let D be a data set or a table.A record of D is a tuple or a row.An attribute deﬁnes all the possible values in a column.For a data set to be disclosed,any identiﬁer column(e.g.secure id and passport number)is deﬁnitely removed.However,some attribute combinations after this removal may still identify some individuals.Deﬁnition1(Quasi-identiﬁer)A quasi-identiﬁer is a minimum set of attributes of D that may serve as identiﬁcations for some tuples in D.For example,domain expert may decide that the attribute set{Job,Birth,Post-code}in Tables1–4is a quasi-identiﬁer.Theﬁrst goal of privacy preserving is to remove all possible identiﬁcations in a disclosed table(according to the quasi-identifer)so that individuals are not identiﬁable.We deﬁne an important concept, equivalence class,which is fundamental to our(α,k)-anonymity model.Deﬁnition2(Equivalence Class)Let Q be an attribute set.An equivalence class of a table with respect to attribute set Q is a collection of all tuples in the table containing identical values for attribute set Q.For example,tuples1and2in Table2form an equivalence class with respect to attribute set{Job,Birth,Postcode}.The size of an equivalence class indicates the strength of identiﬁcation protection of individuals in the equivalent class.If the number of tuples in an equivalence class is greater,it will be more difﬁcult to re-identify individual.Deﬁnition3(k-Anonymity Property)Let Q be an attribute set.A data set D is k-anonymous with respect to attribute set Q if the size of every equivalence class with respect to attribute set Q is k or more.The k-anonymity model requires that every value set for the quasi-identiﬁer attribute set has a frequency of zero or at least k.For example,Table1does not satisfy2-anonymity property since tuples{Cat1,1975,4350}and{Cat1,1955,4350} occur once.Table2satisﬁes2-anonymity property.Consider a large collection of patient records with different medical conditions.Some diseases are sensitive,such as HIV,but many diseases are common,such as cold and fever.Only associations with sensitive diseases need protection.To start with,we assume only one sensitive value,such as HIV.We introduce theα-deassociation requirement for the protection. Deﬁnition4(α-Deassociation Requirement)Given a data set D,an attribute set Q and a sensitive value s in the domain of attribute S∈Q.Let(E,s)be the set of tuples in equivalence class E containing s for S.andαbe a user-speciﬁed threshold,where0<α<1.Data set D isα-deassociated with respect to attribute set Q and the sensitive value s if the frequency(in fraction)of s in every equivalence class is less than or equal toα.That is,|(E,s)|/|E|≤αfor all equivalence classes E.For example,Table3is0.5-deassociated with respect to attribute set{Job,Birth, Postcode}and sensitive value HIV.There are three equivalence classes:{t1,t6},{t2,t5} and{t3,t4}.For each of theﬁrst two equivalent classes of size two,only one tuple contains HIV and therefore|(E,s)|/|E|=0.5.For the third equivalence class,no tuple contains HIV and therefore|(E,s)|/|E|=0.Thus,for any equivalence classes, |(E,s)|/|E|≤0.5.However,the above deﬁnition may be too restrictive.For example,suppose k is set to2andαis set to0.1.If the equivalence class contains two tuples,there should not be any tuples containing the sensitive value because the greatest possible number of tuples containing the sensitive value|(E,s)|is equal toα×|E|=0.1×2=0.2, which is smaller than one.If all equivalence classes contain only two tuples,then no equivalence classes can store any tuple containing the sensitive value,which is an undesirable result.One solution to this is to generate equivalence classes E of greater size such thatα×|E|should be at least equal to1.But,this solution may lead to unnecessary generalizaton.Therefore our solution is to introduce a ceiling to the formulaα×|E|.Deﬁnition5(Reﬁnedα-Deassociation)Given a data set D,an attribute set Q and a sensitive value s in the domain of attribute S∈Q.Let(E,s)be the set of tuples in equivalence class E containing s andαbe a user-speciﬁed threshold,where0<α<1.Data set D isα-deassociated with respect to attribute set Q and the sensitive value s if the number of tuples containing s in every equivalence class is less than or equal to α|E| ,i.e.|(E,s)|≤ α|E| for all equivalence classes E.Our objective is therefore to anonymize a data set so that it satisﬁes both the k-anonymity and theα-deassociation criteria.Deﬁnition6((α,k)-Anonymization)A view of a table is said to be an(α,k)-anonymization of the table if the view modiﬁes the table such that the view satisﬁes both k-anonymity andα-deassociation properties with respect to the quasi-identiﬁer.For example,Table3is a(0.5,2)-anonymous view of Table1since the size of all equivalence classes with respect to the quasi-identiﬁer is2and each equivalence class contains at most half of the tuples associating with HIV.Both parametersαand k are intuitive and operable in real-world applications. Parameterαcaps the conﬁdence of implications from values in the quasi-identiﬁer to the sensitive value while parameter k speciﬁes the minimum number of identical quasi-identiﬁcations.Deﬁnition7(Local Recoding)Given a data set D of tuples,a function c that convert each tuple t in D to c(t)is a local recoding for D.Local recoding typically distorts the values in the tuples in a data set.We can deﬁne a measurement for the amount of distortion generated by a recoding,whichwe shall call the recoding cost.If a suppression is used for recoding of a value whichmodiﬁes the value to an unknown*,then the cost can be measured by the totalnumber of suppressions,or the number of*’s in the resulting data set.Our objectiveis toﬁnd local recoding with a minimum cost.We call it the problem of optimal (α,k)-anonymization.The corresponding decision problem is deﬁned as follows. (α,k)-ANONYMIZATION:Given a data set D with a quasi-identiﬁer Q and a sensitive value s,is there a local recoding for D by a function c such that,afterrecoding,(α,k)-anonymity is satisﬁed and the cost of the recoding is at most C?Optimal k-anonymization by local recoding is NP-hard as discussed in Meyersonand Williams(2004)and Aggarwal et al.(2005).Now,we show that optimal(α,k)-anonymization by local recoding is also NP-hard.Theorem1(α,k)-anonymity is NP-hard for a binary alphabet(={0,1}).Proof Sketch The proof is by transforming the problem of EDGE PARTITION INTO4-CLIQUES to the(α,k)-anonymity problem.Edge partition into4-cliques:Given a simple graph G=(V,E),with|E|=6m for some integer m,can the edges of G be partitioned into m edge-disjoint4-cliques? (Holyer1981)Given an instance of EDGE PARTITION INTO4-CLIQUES.Setα=0.5and k=12.For each vertex v∈V,construct a non-sensitive attribute.For each edge e∈E,where e=(v1,v2),create a pair of records r v1,v2and r v1,v2,where the two records have the attribute values of both v1and v2equal to1and all other non-sensitive attribute values equal to0,but one record r v1,v2has the sensitive attributeequal to1and the other record r v1,v2has the sensitive attribute equal to0.We deﬁne the cost of the(0.5,12)-anonymity to be the number of suppressions applied in the data set.We show that the cost of the(0.5,12)-anonymity is at most 48m if and only if E can be partitioned into a collection of m edge-disjoint4-cliques.Suppose E can be partitioned into a collection of m disjoint4-cliques.Consider a 4-clique Q with vertices v1,v2,v3and v4.If we suppress the attributes v1,v2,v3and v4 in the12records corresponding to the edges in Q,then a cluster of these12records are formed where each modiﬁed record has four*’s.Note that theα-deassociation requirement can be satisﬁed as the frequency of the sensitive attribute value1is equal to0.5.The cost of the(0.5,12)-anonymity is equal to12×4×m=48m.Suppose the cost of the(0.5,12)-anonymity is at most48m.As G is a simple graph, any twelve records should have at least four attributes different.So,each record should have at least four*’s in the solution of the(0.5,12)-anonymity.Then,the cost of the(0.5,12)-anonymity is at least12×4×m=bining with the proposition that the cost is at most48m,we obtain the cost is exactly equal to48m and thus each record should have exactly four*’s in the solution.Each cluster should have exactly12records(where six have sensitive value1and the other six have sensitive value0).Suppose the twelve modiﬁed records contain four*’s in attributes v1,v2,v3and v4,the records contain0’s in all other non-sensitive attributes.This corresponds to a4-clique with vertices v1,v2,v3and v4.Thus,we conclude that the solution corresponds to a partition into a collection of m edge-disjoint4-cliques.Let p be the fraction of the set of tuples that contain sensitive values.Supposeαis set smaller than p.Then no matter how we partition the data set,by the pigeon holeprinciple,there should be at least one partition P which contains p or more sensitive value,and therefore cannot satisfy α-deassociation property.Lemma 1(Choice of α)αshould be set to a value greater than or equal to the frequency (given in fraction)of the sensitive value in the data set D.Distortion Ratio or Recoding Cost:Since we want to analyze the published data,it is interesting to see how large the distortion is the published data.There are many utility metrics (Machanavajjhala et al.2006;Xu et al.2006;Li et al.2006)to deﬁne the distortion ratio of a published table.For example,in Machanavajjhala et al.(2006),a metric can be the average size of the equivalence classes without using the taxonomy trees for attributes.Xu et al.(2006)and Li et al.(2006)deﬁne more complicated metrics with the use of the taxonomy trees.In this paper,we focus on the following distortion ratio.Note that how to deﬁne distortion ratio is orthogonal to our (α,k )-anonymity model.Since we assume the more general case of a taxonomy tree for each attribute,we deﬁne the cost of local-recoding based on this model.The cost is given by the distortion ratio of the resulting data set and is deﬁned as follows.Suppose the value of the attribute of a tuple has not been generalized,there will be no distortion.However,if the value of the attribute of a tuple is generalized to a more general value in the taxonomy tree,there is a distortion of the attribute of the tuple.If the value is generalized more (i.e.the original value is updated to a value at the node of the taxonomy near to the root),the distortion will be greater.Thus,the distortion of this value is deﬁned in terms of the height of the value generalized.For example,if the value has not been generalized,the height of the value generalized is equal to 0.If the value has been generalized one level up in the taxonomy,the height of the value generalized is equal to 1.Let h i ,j be the height of the value generalized of attribute A i of the tuple t j .The distortion of the whole data set is equal to the sum of the distortions of all values in the generalized data set.That is,distortion = i ,j h i ,j .Distortion ratio is equal to the distortion of the generalized data set divided by the distortion of the fully generalized data set,where the fully generalized data set is one with all values of the attributes are generalized to the root of the taxonomy.3Global-recodingIn this section,we extend an existing global-recoding based algorithm called Incog-nito (LeFevre et al.2005)for the (α,k )-anonymous model.Incognito algorithm (LeFevre et al.2005)is an optimal algorithm for the k -anonymity problem.It has also been used in Machanavajjhala et al.(2006)for the l -diversity problem.Table 5shows a data set containing three attributes (Gender,Birth and Postcode)and one sensitive attribute Sens ,where c is the sensitive value and n represents Table 5A data set GenderBirth Post code Sens maleMay 19654351n maleJun 19654351c maleJul 19654361n male Aug 19654362nP4={****}P3={4***}P2={43**}P1={435*,436*}P0={4351,4361,4362}B2={*}B1={1965}={May1965,Jun1965,Jul1965,Aug1965}G0={male,female}G1={Person}(a)(b)(c)Fig.1Generalization hierarchysome non-sensitive value.Figure1a,b and c show the generalization hierarchies of attributes Postcode,Birth and Gender,respectively.Each node in a generalization hierarchy of attribute A corresponds to a generalization domain with respect to A. The generalization domain in the lower level has more detailed information than the higher level.For example,in Fig.1a,generalization domain P0(with respect to Postcode)has the most detailed information.It contains three postcodes4351,4361 and4362.Generalization domain P1(with respect to Postcode)has more general information.It contains two generalized postcodes435*and436*.Lemma2(Generalization Property)Let T be a table and let Q be an attribute set in T.Let G and G be the generalization domains with respect to Q,where G is a generalization domain which is more general than G.If the table T generalized with the generalization domain G with respect to Q is(α,k)-anonymous,then the table T generalized with the generalization domain G with respect to Q is also(α,k)-anonymous.For example,consider generalization of the data set in Table5,let us set k=2 andα=0.5.Table6(a),the table generalized with generalization domain<G0, B1,P1>,satisﬁes(α,k)-anonymous.As<G0,B1,P2>is more general than <G0,B1,P1>,we know that the table generalized with domain<G0, B1,P2>is also(α,k)-anonymous(as shown in Table6(b)).Lemma3(SUBSET CLOSURE)Let T be a table.Let P and Q be attribute sets in T, where P⊂Q.If the table T generalized with the generalization domain G with respect to Q(e.g.<G0,B1,P1>)is(α,k)-anonymous,then the table T generalized with theTable6Illustration of generalization property(a)(b)Gender Birth Postcode Sens male1965435*n male1965435*c male1965436*n male1965436*n Gender Birth Post code Sens male196543**n male196543**c male196543**n male196543**nTable7Illustration of subset property(a)(b)(c)Gender Sens male n male c male n male n Gender Birth Sensmale1965nmale1965cmale1965nmale1965nBirth Postcode Sens1965435*n1965435*c1965436*n1965436*ngeneralization domain projected from G with respect to P(e.g.<G0,B1>)is also (α,k)-anonymous.For example,we set k=2andα=0.5.Table6(a),the table that generalizes Table5with generalization domain<G0,B1,P1>,satisﬁes(α,k)-anonymous.We note that generalization domains<G0>,<G0,B1>and<B1,P1>all are subset of generalization domain<G0,B1,P1>.It is obvious that Table7(a)(the table generalized with<G0>),Table7(b)(the table generalized with<G0,B1>)and Table7(c)(the table generalized with<B1,P1>)also satisfy(α,k)-anonymous. Algorithm:The algorithm is similar to LeFevre et al.(2005),Machanavajjhala et al. (2006).The difference is in the testing criteria of each candidate.LeFevre et al.(2005) tests for the k-anonymity property and Machanavajjhala et al.(2006)tests the k-anonymity and l-diversity properties.Here,we check the(α,k)-anonymity property.Initially,for each attribute A,we consider all possible generalization domains with respect to A.For example,if A=Postcode,we consider the generalization domains <P0>,<P1>,<P2>,<P3>and<P4>.For each generalization domain G, we test whether the table projected with attribute A and then generalized with G is(α,k)-anonymity.If so,we mark the generalization domain.In this step,we can make use of the generalization property as shown in Lemma2so that we do not need to test all candidates.For example,if<P1>is tested and the corresponding table satisﬁes(α,k)-anonymity,then we do not need to test<P2>,<P3>and <P4>.This is because,by Lemma2,<P2>,<P3>and<P4>will also satisfy (α,k)-anonymity.After the initial step,we obtain all generalization domains of each attribute which satisfy(α,k)-anonymity.The second step is to generate all possible generalization domains with respect to the attribute set of size2,instead of a single attribute(e.g. <G0,B0>).This step is also similar to the candidate generation in the typical Apriori algorithm(Agrawal and Srikant1994)(which mines the frequent itemsets). In this algorithm,we make use of the subset property as shown in Lemma3for the generation of candidates of generalization domains of size2.After the candidate gen-eration,for each candidate,the algorithm tests whether the generalization domain is (α,k)-anonymity.If so,we mark the generalization domain.Similar to theﬁrst step, the second step can also make use of the generalization property for pruning.The step repeats until all generalization domains of size|Q|is reached,where Q is the quasi-identiﬁer.Then,among all these domains of size|Q|,we choose one with the minimum distortion as theﬁnal generalization domain G of the table.Next G is applied to the given table to obtain an(α,k)-anonymous table,which is our output.。