Large-scale mining of usage data on web sites
森林破坏的自然原因和人为原因英文作文

森林破坏的自然原因和人为原因英文作文Forest destruction can be caused by both natural and human factors. 森林破坏可能是由自然因素和人为因素引起的。
Natural factors include natural disasters such as wildfires, hurricanes, and insect infestations. 自然因素包括自然灾害,如森林大火、飓风和昆虫入侵。
These events can devastate forests and lead to widespread destruction of vegetation and wildlife. 这些事件会破坏森林,导致植被和野生动物的大规模破坏。
While natural factors play a significant role in forest destruction, human activities are often the primary cause of deforestation. 虽然自然因素在森林破坏中发挥着重要作用,但人类活动往往是森林砍伐的主要原因。
Deforestation is driven by various human activities, including logging, agriculture, mining, and urban expansion. 森林破坏是由各种人类活动驱动的,包括伐木、农业、采矿和城市扩张。
These activities result in the clearing of vast areas of forests, leading to habitat destruction and loss of biodiversity. 这些活动导致大片森林被砍伐,导致栖息地破坏和生物多样性丧失。
保护亚马孙热带雨林的措施英语作文

保护亚马孙热带雨林的措施英语作文全文共3篇示例,供读者参考篇1Saving the Amazing Amazon RainforestThe Amazon rainforest is one of the most incredible places on our planet Earth. It's a vast green wonderland teeming with weird and wonderful plants and animals found nowhere else in the world. From the tiny rainbow-colored poison dart frogs to the towering kapok trees, every part of this special ecosystem is precious and deserves to be protected.I recently learned that the Amazon rainforest is in big trouble and disappearing at an alarming rate. Large areas are being burned and cleared to make way for cattle ranches, soybean farms, logging, and mining operations. This deforestation has terrible consequences for the plants, animals, Indigenous communities, and even the climate of our entire planet.When I first heard about the threats facing the Amazon, I felt really sad and worried. How can we let such an extraordinary place be destroyed? We have to take action to save theincredible Amazon before it's too late! Here are some of the key things I think we need to do:Stop DeforestationThe biggest threat to the Amazon is deforestation—the clearing and burning of the rainforest. Huge swaths of trees are cut down or burned every year, leaving behind barren land and clouds of smoke. We need stronger laws to stop this senseless destruction of irreplaceable rainforest.Governments should crack down hard on illegal logging and mining operations. They should also limit the expansion of agriculture and cattle ranching into the rainforest. Big companies and local people need incentives to preserve the forest rather than clearing it.Protect Indigenous LandsMany Indigenous tribes have lived in the Amazon for thousands of years. Their traditional lifestyles are intertwined with the forest and they act as great stewards in preserving it. But their ancestral lands are increasingly being invaded by outsiders seeking to exploit the Amazon's resources.We need to respect the rights of Indigenous peoples and legally protect their territories from deforestation, mining, andother threats. They are the Amazon's original caretakers and we should support their efforts to defend the rainforest that has been their home for millennia.Develop Sustainable Economic OptionsMany people living in and around the Amazon engage in environmentally destructive activities like logging, mining, and clearing forests for agriculture because they have few other ways to make a living. We need to provide economic alternatives that allow them to sustainably benefit from keeping the rainforest intact.This could include things like cultivating non-timber forest products like nuts, fruits, and natural medicines. Or offering training and funding for sustainable jobs in areas like community tourism, tree replanting, or environmental research and protection. With viable economic options, people won't be forced to resort to activities that harm the Amazon's biodiversity.Support Environmental Laws & EnforcementAlthough laws exist to safeguard the Amazon in countries like Brazil, enforcement is often lacking due to insufficient resources and staffing. We need stronger policies and fundingfor agencies that can properly monitor logging activities, prevent illegal deforestation, and punish violators.Additionally, multinational agreements are needed to crack down on the international trade of illegally sourced Amazon timber, minerals, and agricultural products. Cutting off global markets for these ill-gotten goods will reduce the economic incentives driving deforestation.Fund Conservation and RestorationLarge-scale reforestation and forest restoration projects are crucial to giving cleared areas of the Amazon a chance to regrow and recover. This involves replanting native tree species, reintroducing wildlife, and giving the land an opportunity to become a lush, bio-diverse rainforest once again.These efforts to regreen the Amazon need substantial funding from governments, environmental groups, companies, and caring individuals around the world. We should all pitch in to ensure there are plenty of resources for undoing deforestation and protecting existing rainforest.Reduce Climate ChangeClimate change caused by human activities like burning fossil fuels is an immense threat to the future of the Amazon.Droughts, wildfires, changes in rainfall patterns, and other climate effects are stressing and damaging this delicate ecosystem.We all need to do our part to cut greenhouse gas emissions and slow down global warming. This includes using less energy, driving and flying less, eating fewer meat and dairy products, and putting pressure on leaders to transition to clean renewable energy sources. A stable climate is essential for the Amazon's survival.Educate and Inspire PeopleFinally, we need to keep educating people of all ages about the wonders of the Amazon and why it's so vital that we preserve this ecological treasure for future generations. We should inspire kids like me to appreciate the Amazon's beauty and biodiversity from an early age.More people need to understand that the Amazon isn't just a big chunk of trees in a faraway place—it's a vital piece of the global environmental puzzle that helps regulate rainfall, absorb carbon, and sustain Indigenous cultures. Once they grasp how amazing and important the Amazon really is, they'll feel motivated to save it.I may only be a kid, but I care deeply about protecting the Amazon rainforest and making sure it's still around for my future children and grandchildren to cherish. It would be a crime against nature to let this green paradise disappear forever. We must all take action in our own way to conserve the incredible Amazon! The plants, animals, people, and planet Earth itself are counting on us.篇2Title: Saving the Amazing Amazon RainforestThe Amazon rainforest is one of the most incredible places on Earth! It's a vast green wonderland, teeming with fascinating plants and incredible animals. The Amazon is often called the "lungs of our planet" because its dense vegetation produces a huge amount of the oxygen we all need to breathe. Sadly, this amazing rainforest is in danger from deforestation (cutting down trees) and other threats. We must take steps to save the Amazon before it's too late!What makes the Amazon so special? Let me tell you! The Amazon basin, which spans eight countries in South America, contains the largest remaining tropical rainforest on Earth. It covers an area larger than the continental United States! TheAmazon is home to around 390 billion trees comprising 16,000 different species. That's more tree species than in the entire United States and Canada combined!Among the rainforest's many wonders are its staggering biodiversity and unique ecosystems. Over 2.5 million different species of insects, plants, birds, and other creatures inhabit the Amazon, with new species still being discovered regularly. Many indigenous tribes also make their home in the rainforest, continuing the traditional ways of life of their ancestors.The Amazon's incredible biodiversity is only one reason why we need to protect this vital rainforest. The Amazon plays a crucial role in absorbing carbon dioxide, a greenhouse gas that contributes to climate change. The trees and other plants use carbon dioxide for photosynthesis and produce fresh oxygen. Estimates suggest the Amazon rainforest absorbs a whopping 2 billion tons of carbon dioxide per year!Not only that, but the Amazon is a vital source of food, medicine, and other resources for humans. Many foods like açaí berries, hearts of palm, and Brazil nuts originate from the Amazon. Indigenous tribes have also used rainforest plants for centuries to create lifesaving medicines. Up to 25% of modernpharmaceuticals may have active ingredients derived from Amazonian plants!In spite of its immense value, the Amazon rainforest faces grave threats from deforestation, mining, agriculture, infrastructure projects, and more. Huge swaths of the rainforest are being cut down or burned at an alarming rate. In fact, over 20% of the Amazon has already been lost forever to deforestation. This not only destroys habitats and biodiversity, but also reduces the rainforest's capacity to absorb carbon dioxide, worsening climate change. If deforestation continues unchecked, we could face an environmental catastrophe.So what can be done to protect this vital rainforest? Governments, businesses, and individuals all have a role to play. Here are some important measures:Governments should strengthen laws to prevent illegal deforestation and mining in the Amazon. They must also work to support sustainable development and indigenous rights.Companies that produce goods linked to deforestation, like beef, soybeans, palm oil and timber, need to ensure their supply chains are deforestation-free. We can demand companies take action.Major infrastructure projects like roads, dams and pipelines in the Amazon should be subject to rigorous environmental reviews to limit damage.We can support organizations working to protect the rainforest through conservation, reforestation, advocacy and sustainable economic opportunities for indigenous communities.At home, we can reduce our carbon footprint and waste to lower the overall demand for resources that drive deforestation. Using less paper, eating less meat, and reducing energy use can all make a difference.Most of all, we need to spread awareness about how vital the Amazon rainforest is and why we must act now to protect it before it's too late!The amazing Amazon rainforest is a treasure we simply cannot afford to lose. Its biodiversity, absorption of carbon dioxide, and importance to indigenous communities make it invaluable. While the threats are severe, we know what actions are needed - now it's up to all of us to take those actions before this natural wonder is destroyed forever. We must be the voice for the voiceless Amazon!篇3The Amazing Amazon RainforestHave you ever heard of the Amazon rainforest? It's an incredible place – the largest rainforest in the world! It covers parts of nine different countries in South America, including Brazil, Peru, and Colombia. The Amazon is home to amazing plants and animals that you won't find anywhere else.Unfortunately, the Amazon rainforest is in danger. Every year, large areas of the forest are being destroyed, which is bad news for the plants, animals, and people who live there. But there are things we can do to help protect this awesome forest!What Makes the Amazon So Special?The Amazon rainforest is simply ginormous – it's about the size of the entire United States! It's filled with towering trees, winding rivers, and all kinds of fascinating creatures.There are jaguars, sloths, river dolphins, and poison dart frogs living in the Amazon. The rainforest has over 2.5 million different species of insects alone! Many of the animals found there exist nowhere else in the world.The Amazon is also home to amazing plants. There are beautiful orchids, trees used to make chocolate and rubber, and even some trees that are taller than a 20-story building! Scientists believe there could be plants in the Amazon that can be used to cure diseases like cancer.Many indigenous groups also live in the Amazon and rely on the forest for food, shelter, and their way of life. The Amazon rainforest provides homes for over 350 different indigenous tribes.Why Is the Amazon in Danger?Sadly, around 20% of the Amazon rainforest has already been destroyed. The biggest threat is deforestation, which means cutting down trees and clearing away parts of the forest.Farmers and ranchers clear land to grow crops like soybeans or to create pastures for cattle. Loggers harvest trees from the forest to make wood products. Oil companies damage the rainforest when looking for oil to drill. Mining for gold, iron, and other minerals also destroys the forest.All this deforestation means less habitat and food for rainforest animals. When the forest gets cleared away, animalscan lose their homes and may go extinct. The native people also lose resources they need to survive.Deforestation also affects the entire planet. Trees absorb carbon dioxide and produce oxygen. With fewer trees, there is more carbon dioxide in the air which contributes to climate change and global warming.What Can We Do to Protect the Amazon?Even though the problems in the Amazon seem huge, there are solutions if we all pitch in to help! Here are some ways to protect this awesome rainforest:Stop burning fossil fuels and use clean energy instead: Burning fossil fuels like gas, oil and coal releases greenhouse gases that contribute to climate change and make the planet warmer. Using renewable energy like solar or wind power is better for the environment.Reduce, reuse, recycle: Cutting down on waste helps protect rainforests. Recycle paper, plastic, metal and glass instead of throwing them away. By recycling and reusing things, we don't need to cut down as many trees for new products.Buy rainforest-friendly products: Look for products made from sustainable materials, not from trees cut down in therainforest. Buy energy-saving lightbulbs, recycled paper and electronics rated for energy efficiency.Support companies that don't destroy rainforests: Some companies contribute to deforestation for their products. Do some research on a company before buying their stuff to make sure they don't damage forests.Get involved: Join a club, protest or petition to help save rainforests. Every signature and every voice counts! You can also plant trees or donate to environmental organizations working to protect rainforests.Spread awareness: Tell your family, friends, classmates and neighbors why rainforests like the Amazon are important to protect. The more people who care about the issue, the better!The Amazon rainforest is so important for the animals that live there, and for the whole planet. By reducing our impact on the environment and supporting rainforest conservation, we can all be heroes and help protect this incredible rainforest!。
斯坦福大学关于海量数据的挖掘的免费教材《MiningofMassiveDatasets》

MiningofMassiveDatasetsAnand RajaramanKosmix,Inc.Jeffrey D.UllmanStanford Univ.Copyright c 2010,2011Anand Rajaraman and Jeffrey D.UllmaniiPrefaceThis book evolved from material developed over several years by Anand Raja-raman and JeffUllman for a one-quarter course at Stanford.The course CS345A,titled“Web Mining,”was designed as an advanced graduate course, although it has become accessible and interesting to advanced undergraduates. What the Book Is AboutAt the highest level of description,this book is about data mining.However, it focuses on data mining of very large amounts of data,that is,data so large it does notfit in main memory.Because of the emphasis on size,many of our examples are about the Web or data derived from the Web.Further,the book takes an algorithmic point of view:data mining is about applying algorithms to data,rather than using data to“train”a machine-learning engine of some sort.The principal topics covered are:1.Distributedfile systems and map-reduce as a tool for creating parallelalgorithms that succeed on very large amounts of data.2.Similarity search,including the key techniques of minhashing and locality-sensitive hashing.3.Data-stream processing and specialized algorithms for dealing with datathat arrives so fast it must be processed immediately or lost.4.The technology of search engines,including Google’s PageRank,link-spamdetection,and the hubs-and-authorities approach.5.Frequent-itemset mining,including association rules,market-baskets,theA-Priori Algorithm and its improvements.6.Algorithms for clustering very large,high-dimensional datasets.7.Two key problems for Web applications:managing advertising and rec-ommendation systems.iiiiv PREFACE PrerequisitesCS345A,although its number indicates an advanced graduate course,has been found accessible by advanced undergraduates and beginning masters students. In the future,it is likely that the course will be given a mezzanine-level number. The prerequisites for CS345A are:1.Thefirst course in database systems,covering application programmingin SQL and other database-related languages such as XQuery.2.A sophomore-level course in data structures,algorithms,and discretemath.3.A sophomore-level course in software systems,software engineering,andprogramming languages.ExercisesThe book contains extensive exercises,with some for almost every section.We indicate harder exercises or parts of exercises with an exclamation point.The hardest exercises have a double exclamation point.Support on the WebYou canfind materials from past offerings of CS345A at:/~ullman/mining/mining.html There,you willfind slides,homework assignments,project requirements,and in some cases,exams.AcknowledgementsCover art is by Scott Ullman.We would like to thank Foto Afrati and Arun Marathe for critical readings of the draft of this manuscript.Errors were also re-ported by Apoorv Agarwal,Susan Biancani,Leland Chen,Shrey Gupta,Xie Ke, Haewoon Kwak,Ellis Lau,Ethan Lozano,Justin Meyer,Brad Penoff,Philips Kokoh Prasetyo,Angad Singh,Sandeep Sripada,Dennis Sidharta,Mark Storus, Roshan Sumbaly,and Tim Triche Jr.The remaining errors are ours,of course.A.R.J.D.U.Palo Alto,CAJune,2011Contents1Data Mining11.1What is Data Mining? (1)1.1.1Statistical Modeling (1)1.1.2Machine Learning (2)1.1.3Computational Approaches to Modeling (2)1.1.4Summarization (3)1.1.5Feature Extraction (4)1.2Statistical Limits on Data Mining (4)1.2.1Total Information Awareness (5)1.2.2Bonferroni’s Principle (5)1.2.3An Example of Bonferroni’s Principle (6)1.2.4Exercises for Section1.2 (7)1.3Things Useful to Know (7)1.3.1Importance of Words in Documents (7)1.3.2Hash Functions (9)1.3.3Indexes (10)1.3.4Secondary Storage (11)1.3.5The Base of Natural Logarithms (12)1.3.6Power Laws (13)1.3.7Exercises for Section1.3 (15)1.4Outline of the Book (15)1.5Summary of Chapter1 (17)1.6References for Chapter1 (17)2Large-Scale File Systems and Map-Reduce192.1Distributed File Systems (20)2.1.1Physical Organization of Compute Nodes (20)2.1.2Large-Scale File-System Organization (21)2.2Map-Reduce (22)2.2.1The Map Tasks (23)2.2.2Grouping and Aggregation (24)2.2.3The Reduce Tasks (24)2.2.4Combiners (25)vvi CONTENTS2.2.5Details of Map-Reduce Execution (25)2.2.6Coping With Node Failures (26)2.3Algorithms Using Map-Reduce (27)2.3.1Matrix-Vector Multiplication by Map-Reduce (27)2.3.2If the Vector v Cannot Fit in Main Memory (28)2.3.3Relational-Algebra Operations (29)2.3.4Computing Selections by Map-Reduce (32)2.3.5Computing Projections by Map-Reduce (32)2.3.6Union,Intersection,and Difference by Map-Reduce (33)2.3.7Computing Natural Join by Map-Reduce (34)2.3.8Generalizing the Join Algorithm (34)2.3.9Grouping and Aggregation by Map-Reduce (35)2.3.10Matrix Multiplication (35)2.3.11Matrix Multiplication with One Map-Reduce Step (36)2.3.12Exercises for Section2.3 (37)2.4Extensions to Map-Reduce (38)2.4.1Workflow Systems (38)2.4.2Recursive Extensions to Map-Reduce (40)2.4.3Pregel (42)2.4.4Exercises for Section2.4 (43)2.5Efficiency of Cluster-Computing Algorithms (43)2.5.1The Communication-Cost Model for ClusterComputing (44)2.5.2Elapsed Communication Cost (46)2.5.3Multiway Joins (46)2.5.4Exercises for Section2.5 (49)2.6Summary of Chapter2 (51)2.7References for Chapter2 (52)3Finding Similar Items553.1Applications of Near-Neighbor Search (55)3.1.1Jaccard Similarity of Sets (56)3.1.2Similarity of Documents (56)3.1.3Collaborative Filtering as a Similar-Sets Problem (57)3.1.4Exercises for Section3.1 (59)3.2Shingling of Documents (59)3.2.1k-Shingles (59)3.2.2Choosing the Shingle Size (60)3.2.3Hashing Shingles (60)3.2.4Shingles Built from Words (61)3.2.5Exercises for Section3.2 (62)3.3Similarity-Preserving Summaries of Sets (62)3.3.1Matrix Representation of Sets (62)3.3.2Minhashing (63)3.3.3Minhashing and Jaccard Similarity (64)CONTENTS vii3.3.4Minhash Signatures (65)3.3.5Computing Minhash Signatures (65)3.3.6Exercises for Section3.3 (67)3.4Locality-Sensitive Hashing for Documents (69)3.4.1LSH for Minhash Signatures (69)3.4.2Analysis of the Banding Technique (71)3.4.3Combining the Techniques (72)3.4.4Exercises for Section3.4 (73)3.5Distance Measures (74)3.5.1Definition of a Distance Measure (74)3.5.2Euclidean Distances (74)3.5.3Jaccard Distance (75)3.5.4Cosine Distance (76)3.5.5Edit Distance (77)3.5.6Hamming Distance (78)3.5.7Exercises for Section3.5 (79)3.6The Theory of Locality-Sensitive Functions (80)3.6.1Locality-Sensitive Functions (81)3.6.2Locality-Sensitive Families for Jaccard Distance (82)3.6.3Amplifying a Locality-Sensitive Family (83)3.6.4Exercises for Section3.6 (85)3.7LSH Families for Other Distance Measures (86)3.7.1LSH Families for Hamming Distance (86)3.7.2Random Hyperplanes and the Cosine Distance (86)3.7.3Sketches (88)3.7.4LSH Families for Euclidean Distance (89)3.7.5More LSH Families for Euclidean Spaces (90)3.7.6Exercises for Section3.7 (90)3.8Applications of Locality-Sensitive Hashing (91)3.8.1Entity Resolution (92)3.8.2An Entity-Resolution Example (92)3.8.3Validating Record Matches (93)3.8.4Matching Fingerprints (94)3.8.5A LSH Family for Fingerprint Matching (95)3.8.6Similar News Articles (97)3.8.7Exercises for Section3.8 (98)3.9Methods for High Degrees of Similarity (99)3.9.1Finding Identical Items (99)3.9.2Representing Sets as Strings (100)3.9.3Length-Based Filtering (100)3.9.4Prefix Indexing (101)3.9.5Using Position Information (102)3.9.6Using Position and Length in Indexes (104)3.9.7Exercises for Section3.9 (106)3.10Summary of Chapter3 (107)viii CONTENTS3.11References for Chapter3 (110)4Mining Data Streams1134.1The Stream Data Model (113)4.1.1A Data-Stream-Management System (114)4.1.2Examples of Stream Sources (115)4.1.3Stream Queries (116)4.1.4Issues in Stream Processing (117)4.2Sampling Data in a Stream (118)4.2.1A Motivating Example (118)4.2.2Obtaining a Representative Sample (119)4.2.3The General Sampling Problem (119)4.2.4Varying the Sample Size (120)4.2.5Exercises for Section4.2 (120)4.3Filtering Streams (121)4.3.1A Motivating Example (121)4.3.2The Bloom Filter (122)4.3.3Analysis of Bloom Filtering (122)4.3.4Exercises for Section4.3 (123)4.4Counting Distinct Elements in a Stream (124)4.4.1The Count-Distinct Problem (124)4.4.2The Flajolet-Martin Algorithm (125)4.4.3Combining Estimates (126)4.4.4Space Requirements (126)4.4.5Exercises for Section4.4 (127)4.5Estimating Moments (127)4.5.1Definition of Moments (127)4.5.2The Alon-Matias-Szegedy Algorithm for SecondMoments (128)4.5.3Why the Alon-Matias-Szegedy Algorithm Works (129)4.5.4Higher-Order Moments (130)4.5.5Dealing With Infinite Streams (130)4.5.6Exercises for Section4.5 (131)4.6Counting Ones in a Window (132)4.6.1The Cost of Exact Counts (133)4.6.2The Datar-Gionis-Indyk-Motwani Algorithm (133)4.6.3Storage Requirements for the DGIM Algorithm (135)4.6.4Query Answering in the DGIM Algorithm (135)4.6.5Maintaining the DGIM Conditions (136)4.6.6Reducing the Error (137)4.6.7Extensions to the Counting of Ones (138)4.6.8Exercises for Section4.6 (139)4.7Decaying Windows (139)4.7.1The Problem of Most-Common Elements (139)4.7.2Definition of the Decaying Window (140)4.7.3Finding the Most Popular Elements (141)4.8Summary of Chapter4 (142)4.9References for Chapter4 (143)5Link Analysis1455.1PageRank (145)5.1.1Early Search Engines and Term Spam (146)5.1.2Definition of PageRank (147)5.1.3Structure of the Web (151)5.1.4Avoiding Dead Ends (152)5.1.5Spider Traps and Taxation (155)5.1.6Using PageRank in a Search Engine (157)5.1.7Exercises for Section5.1 (157)5.2Efficient Computation of PageRank (159)5.2.1Representing Transition Matrices (160)5.2.2PageRank Iteration Using Map-Reduce (161)5.2.3Use of Combiners to Consolidate the Result Vector (161)5.2.4Representing Blocks of the Transition Matrix (162)5.2.5Other Efficient Approaches to PageRank Iteration (163)5.2.6Exercises for Section5.2 (165)5.3Topic-Sensitive PageRank (165)5.3.1Motivation for Topic-Sensitive Page Rank (165)5.3.2Biased Random Walks (166)5.3.3Using Topic-Sensitive PageRank (167)5.3.4Inferring Topics from Words (168)5.3.5Exercises for Section5.3 (169)5.4Link Spam (169)5.4.1Architecture of a Spam Farm (169)5.4.2Analysis of a Spam Farm (171)5.4.3Combating Link Spam (172)5.4.4TrustRank (172)5.4.5Spam Mass (173)5.4.6Exercises for Section5.4 (173)5.5Hubs and Authorities (174)5.5.1The Intuition Behind HITS (174)5.5.2Formalizing Hubbiness and Authority (175)5.5.3Exercises for Section5.5 (178)5.6Summary of Chapter5 (179)5.7References for Chapter5 (182)6Frequent Itemsets1836.1The Market-Basket Model (184)6.1.1Definition of Frequent Itemsets (184)6.1.2Applications of Frequent Itemsets (185)6.1.3Association Rules (187)6.1.4Finding Association Rules with High Confidence (189)6.1.5Exercises for Section6.1 (189)6.2Market Baskets and the A-Priori Algorithm (190)6.2.1Representation of Market-Basket Data (191)6.2.2Use of Main Memory for Itemset Counting (192)6.2.3Monotonicity of Itemsets (194)6.2.4Tyranny of Counting Pairs (194)6.2.5The A-Priori Algorithm (195)6.2.6A-Priori for All Frequent Itemsets (197)6.2.7Exercises for Section6.2 (198)6.3Handling Larger Datasets in Main Memory (200)6.3.1The Algorithm of Park,Chen,and Yu (200)6.3.2The Multistage Algorithm (202)6.3.3The Multihash Algorithm (204)6.3.4Exercises for Section6.3 (206)6.4Limited-Pass Algorithms (208)6.4.1The Simple,Randomized Algorithm (208)6.4.2Avoiding Errors in Sampling Algorithms (209)6.4.3The Algorithm of Savasere,Omiecinski,andNavathe (210)6.4.4The SON Algorithm and Map-Reduce (210)6.4.5Toivonen’s Algorithm (211)6.4.6Why Toivonen’s Algorithm Works (213)6.4.7Exercises for Section6.4 (213)6.5Counting Frequent Items in a Stream (214)6.5.1Sampling Methods for Streams (214)6.5.2Frequent Itemsets in Decaying Windows (215)6.5.3Hybrid Methods (216)6.5.4Exercises for Section6.5 (217)6.6Summary of Chapter6 (217)6.7References for Chapter6 (220)7Clustering2217.1Introduction to Clustering Techniques (221)7.1.1Points,Spaces,and Distances (221)7.1.2Clustering Strategies (223)7.1.3The Curse of Dimensionality (224)7.1.4Exercises for Section7.1 (225)7.2Hierarchical Clustering (225)7.2.1Hierarchical Clustering in a Euclidean Space (226)7.2.2Efficiency of Hierarchical Clustering (228)7.2.3Alternative Rules for Controlling HierarchicalClustering (229)7.2.4Hierarchical Clustering in Non-Euclidean Spaces (232)7.2.5Exercises for Section7.2 (233)CONTENTS xi7.3K-means Algorithms (234)7.3.1K-Means Basics (235)7.3.2Initializing Clusters for K-Means (235)7.3.3Picking the Right Value of k (236)7.3.4The Algorithm of Bradley,Fayyad,and Reina (237)7.3.5Processing Data in the BFR Algorithm (239)7.3.6Exercises for Section7.3 (242)7.4The CURE Algorithm (242)7.4.1Initialization in CURE (243)7.4.2Completion of the CURE Algorithm (244)7.4.3Exercises for Section7.4 (245)7.5Clustering in Non-Euclidean Spaces (246)7.5.1Representing Clusters in the GRGPF Algorithm (246)7.5.2Initializing the Cluster Tree (247)7.5.3Adding Points in the GRGPF Algorithm (248)7.5.4Splitting and Merging Clusters (249)7.5.5Exercises for Section7.5 (250)7.6Clustering for Streams and Parallelism (250)7.6.1The Stream-Computing Model (251)7.6.2A Stream-Clustering Algorithm (251)7.6.3Initializing Buckets (252)7.6.4Merging Buckets (252)7.6.5Answering Queries (255)7.6.6Clustering in a Parallel Environment (255)7.6.7Exercises for Section7.6 (256)7.7Summary of Chapter7 (256)7.8References for Chapter7 (260)8Advertising on the Web2618.1Issues in On-Line Advertising (261)8.1.1Advertising Opportunities (261)8.1.2Direct Placement of Ads (262)8.1.3Issues for Display Ads (263)8.2On-Line Algorithms (264)8.2.1On-Line and Off-Line Algorithms (264)8.2.2Greedy Algorithms (265)8.2.3The Competitive Ratio (266)8.2.4Exercises for Section8.2 (266)8.3The Matching Problem (267)8.3.1Matches and Perfect Matches (267)8.3.2The Greedy Algorithm for Maximal Matching (268)8.3.3Competitive Ratio for Greedy Matching (269)8.3.4Exercises for Section8.3 (270)8.4The Adwords Problem (270)8.4.1History of Search Advertising (271)xii CONTENTS8.4.2Definition of the Adwords Problem (271)8.4.3The Greedy Approach to the Adwords Problem (272)8.4.4The Balance Algorithm (273)8.4.5A Lower Bound on Competitive Ratio for Balance (274)8.4.6The Balance Algorithm with Many Bidders (276)8.4.7The Generalized Balance Algorithm (277)8.4.8Final Observations About the Adwords Problem (278)8.4.9Exercises for Section8.4 (279)8.5Adwords Implementation (279)8.5.1Matching Bids and Search Queries (280)8.5.2More Complex Matching Problems (280)8.5.3A Matching Algorithm for Documents and Bids (281)8.6Summary of Chapter8 (283)8.7References for Chapter8 (285)9Recommendation Systems2879.1A Model for Recommendation Systems (287)9.1.1The Utility Matrix (288)9.1.2The Long Tail (289)9.1.3Applications of Recommendation Systems (289)9.1.4Populating the Utility Matrix (291)9.2Content-Based Recommendations (292)9.2.1Item Profiles (292)9.2.2Discovering Features of Documents (293)9.2.3Obtaining Item Features From Tags (294)9.2.4Representing Item Profiles (295)9.2.5User Profiles (296)9.2.6Recommending Items to Users Based on Content (297)9.2.7Classification Algorithms (298)9.2.8Exercises for Section9.2 (300)9.3Collaborative Filtering (301)9.3.1Measuring Similarity (301)9.3.2The Duality of Similarity (304)9.3.3Clustering Users and Items (305)9.3.4Exercises for Section9.3 (307)9.4Dimensionality Reduction (308)9.4.1UV-Decomposition (308)9.4.2Root-Mean-Square Error (309)9.4.3Incremental Computation of a UV-Decomposition (310)9.4.4Optimizing an Arbitrary Element (312)9.4.5Building a Complete UV-Decomposition Algorithm (314)9.4.6Exercises for Section9.4 (316)9.5The NetFlix Challenge (317)9.6Summary of Chapter9 (318)9.7References for Chapter9 (320)。
Moxa 网络监控移动能力启用白皮书说明书

WHITE PAPEREnabling Mobility in NetworkMonitoringYiwei ChenMoxa Product ManagerIntroductionEngineers face different challenges during each stage of the industrial network management lifecycle. During the installation stage, manual device configuration and testing is time consuming and prone to human error. During the operation stage, engineers are required to monitor network status in real time and minimize system downtime. During the maintenance stage, engineers often face long labor hours doing firmware upgrades or configuration changes on multiple devices. During the diagnostics stage, being able to quickly identify where critical network issues occur is essential. To help minimize the total cost of ownership, engineers are always on the lookout for new industrial network management tools that can help them overcome all of these challenges.Industrial network management software is usually installed in the control room, or is sometimes integrated with an existing SCADA system. But when you’re out of the co ntrol room or on the move, you could miss important messages such as network changes or errors, and fail to respond quickly enough. With the number of devices connected to industrial networks continually increasing, the ability to monitor and maintain your network—anytime, anywhere—is becoming more crucial than ever before to ensure that your operation is reliable and runs smoothly.Current statistics show that globally, the number of mobile users is now greater than the number of desktop users, and we can expect this global trend to expand into the industrial automation workplace. In fact, since engineers joining the workforce today are accustomed to using mobile devices in their private life, it is only natural that they would want to use the same devices to simplify their work life.In this white paper, we discuss the challenges in industrial network management and show how a mobile monitoring tool can help keep you informed of network status, even when you’re on the move. In addition, we’ll share experiences we’ve had helping customers from the rail industry reduce system downtime by utilizing the right mobile tools to quickly respond to network changes.Released on October 7, 2015© 2015 Moxa Inc. All rights reserved.Moxa is a leading manufacturer of industrial networking, computing, and automation solutions. With over 25 years of industry experience, Moxa has connected more than 30 million devices worldwide and has a distribution and service network that reaches customers in more than 70 countries. Moxa delivers lasting business value by empowering industry with reliable networks and sincere service for automation systems. Information about Moxa’s solutions is available at . You may also contact Moxa by email at *************.How to contact MoxaTel: 1-714-528-6777Fax: 1-714-528-6778Major Challenges in Industrial Network ManagementManaging a network can be a complex and often extensive operation, especially for industrial networks, and being able to monitor and manage devices is essential to ensuring that the network is running smoothly. However, with evolving business operations, administrators are often on the move, making it difficult to stay informed of or quickly respond to status changes in the network.When doing regular maintenance or troubleshooting at a field site where many network devices are deployed, engineers often face the daunting task of identifying specific devices hiding among a multitude of identical devices. Even with proper labeling and hardware placement, it can still take time to obtain the status information of a specific device onsite. As a result, faulty devices cannot be swapped out quickly enough to ensure that your operation runs smoothly.With the development of mobile networking tools, engineers can now improve operational efficiency and maximize network availability.Why Mobile Network Monitoring?Like their enterprise counterparts, automation engineers can now access their operational applications from mobile devices by installing an appropriate network monitoring app. The mobile network monitoring app is usually a client software tool designed to work in tandem with the network management software installed in the control room.The following diagram illustrates how a typical mobile app for network monitoring works to keep users informed of the ir network’s status. The app connects to the software server over an intranet or the Internet to access network status in real time. In addition, if the network is updated, the network management software server will send a push notification via the Apple cloud or Google cloud to alert the app user.A mobile phone app for network monitoring usually works as the client of the main network management software. Through the app, engineers can access the network status anytime,anywhere.How Mobile Networking Empowers Network OperatorsA mobile network monitoring app should support the following three features to ensure that monitoring a network from a mobile device is worth the effort.1.Sending Real-time AlertsWith a mobile network monitoring app, administrators can receive notifications of events pushed to their mobile devices. These real-time alerts allow administrators to take action immediately in response to critical events, even when they are out of the control room. For example, once an alert is received, they can contact maintenance engineers to do onsite troubleshooting and consequently reduce system downtime.2.Allowing Instant Network ChecksA mobile network app allows users to check the status of a network in real time. After youlog in to the app, it will inform you whether or not the network is operating normally. The app will also display detailed information of a specific network device, keeping network administrators in the know while they are on the move or out of the control room.Information, such as a device’s IP address, MAC address, location, and firmware version can be viewed from the app. For example, if an engineer receives an alert for a link-down event, they can readily access the information needed to determine which port is faulty.3.Finding Field Devices QuicklyIn certain scenarios, it could take a long time to manually search for a specific device from racks and racks of similar devices. Moreover, if automation engineers need to access the parameters or settings of a specific device for onsite troubleshooting, they would need to physically connect the device to a laptop computer using a web console or CLI (command line interface), or physically read the MAC address or serial number printed on the device, and then check the information with the computer. Either way, the engineer could end up spending much more time than would be necessary if the same information could bechecked using a mobile device.To make the task easier and more efficient, mobile network monitoring apps now usually come with a function that allows users to quickly find a particular device, and even view detailed device information.For example, each network device could be encoded with a unique QR code based on its MAC address. If the mobile phone app supports a built-in QR code scanner, engineers can scan the device’s QR code onsite to pull up information about that device, without needing to boot up a laptop computer or entering a device ID manually.With Moxa’s MXview ToGo app, users can not only scan the device to get detailed information, they can also activate the Device Locator function to find the device—which works by causing the device’s LED to blink in a way that is easy to recognize.Success StoryDeploying a Server/Client Solution for Industrial Network MonitoringTo ensure that a network operates reliably, industrial network management software is usually installed in large-scale networks in mission-critical industries, such as transportation, mining, and oil & gas. In this section, we share a success story from a railway application that uses a fiber Ethernet backbone built for data transmission between several stations located across a wide area. Since the application involves multiple control rooms spread over a wide area, the industrial network management software and the mobile phone app can help engineers access network status in real time and then respond quickly, thereby greatly reducing system downtime.This high-speed railway operator built a fiber Ethernet backbone for data transmission between its Operation Management Center and other railway stations to ensure high network availability. The customer used about 30 Moxa industrial rackmount switches (IKS-G6524) to connect to the pre-existing Layer 3 networks, and used the MXstudio industrial network management suite across the network management lifecycle, including for installation, operation, maintenance, and diagnostics. The MXstudio suite includes the MXview industrial network management software, MXconfig industrial network configuration tool, and N-Snap network snapshot tool.The railway operator’s network administrators recounted that they sometimes needed to leave the control room for patrol inspections within and around the station. Since MXview was already installed in the control room, they could install Moxa’s MXview ToGo mobile app, which works as a client of MXview, and then easily check the latest network status from their mobile phones. The dashboard design of the app makes it easy for engineers to tell whether the network is operating under Normal, Warning, or Critical conditions. In one notable incident, an IT engineer received a push notification about a downed link, used the app to determine wherethe broken link was located, and also connected to the MXview server to determine the cause. After determining the cause, the engineer contacted onsite staff immediately, allowing them to get the network link back up and running in no time.The diagram shows that engineers on the move can still get real-time network status with themobile app.ConclusionThe use of effective network management applications can help network administrators accomplish tasks efficiently during different stages of the network management lifecycle. With the changing business environment and improvements in mobile device technology, a mobile app for network monitoring allows administrators to be efficient, effective, and responsive when monitoring and maintaining an industrial network.Using a mobile app for network monitoring, administrators can view device and network status and receive real-time alerts from their mobile devices while on the move. In the field, administrators can quickly search for any device and view that device’s detailed configuration parameters with the click of a button.∙Learn more about Moxa’s MXview ToGo mobile app here:/MXview_ToGo∙Scan the following QR code to download the MXview ToGo app:iPhone OS Android OSDisclaimerThis document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied by law, including implied warranties and conditions of merchantability, or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document.。
赞比亚矿业和矿产法 2008

Mines and Minerals Development [No.2008577ofTHE MINES AND MINERALS DEVELOPMENTACT, 2008ARRANGEMENT OF SECTIONSPARTIPRELIMINARYSection1. Short title2. Interpretation3. Rights to minerals vested in PresidentPART IIMINING RIGHTS4. Acquisition of mining rights5. Prohibition of prospecting, mining, etc. without mining right ormineral processing licence6. Types of rights7. Certain persons disqualified from holding rights8. Restrictions on mining rights and mineral processing licence9. Priority of applications for mining rights10. Mining right for area subject to other rights11. Survey of land12. Bids13. Preference for Zambian products, etc.PART IIILARGE-SCALE MINING OPERATIONSDivision I - Prospecting Licence14. Application for prospecting licence15. Consideration of application for prospecting licence16. Grant of prospecting licence17. Duration of prospecting licence18. Rights conferred by prospecting licence19. Obligations of holder of prospecting licence20. Amendment of programme of prospecting operations21. Transfer of prospecting licence22. Restrictions on removal of minerals23. Discovery of minerals not included in prospecting licenceCopies of this Act can be obtained from the Government Printer,P.O. Box 30136, 10101 Lusaka. Price K26.000 each.58 [No. 7 of 2008Mines and Minerals Development24.Renewal of prospecting licenceDivision II - Large-Scale Mining Licence25. Application for large-scale mining licence26. Consideration of application for large-scale mining licence27. Grant of large-scale mining licence28. Duration of large-scale mining licence29. Rights conferred by large scale mining licence30. Obligations of holder of large-scale mining licence31. Amendment of programme of mining operations32. Transfer of large scale mining licence33. Discovery of minerals not included in large-scale mininglicence34. Suspension of production35. Renewal of large-scale mining licenceDivision III - Large-Scale Gemstone Licence36. Application for large-scale gemstone licence37. Consideration of application for large-scale gemstone licence38. Grant of large-scale gemstone licence39. Duration of large-scale gemstone licence40. Rights conferred by large-scale gemstone licence41.Obligations of holder of large-scale gemstone licence42. Amendment of programme of mining operations43. Transfer of large-scale gemstone licence44. Discovery of minerals not included in large-scale gemstonelicence45. Suspension of production46. Renewal of large-scale gemstone licencePART IVSMALL-SCALE MINING OPERATIONSDivision I - Prospecting Permit47. Application for prospecting permit48. Consideration of application for prospecting permit49. Grant of prospecting permit50. Duration of prospecting permit51. Rights conferred by prospecting permit52.Obligations of holder of prospecting permitMines and Minerals Development [No.2008597of53. Restrictions on removal of mineralsDivision II - Small-Scale Mining Licence54. Application for small scale mining licence55. Consideration of application for small-scale mining licence56. Grant of small scale mining licence57. Duration of small-scale mining licence58. Rights conferred by small-scale mining licence59.Obligations of holder of small scale mining licence60. Renewal of small-scale mining licence61. Transfer of small-scale mining licence62. Requirement to convert small-scale mining licence to large-scale mining licence63. Termination of small-scale mining licence for insufficientproductionDivision III - Small-Scale Gemstone Licence64. Application for small-scale gemstone licence65. Consideration of application for small-scale gemstone licence66. Grant of small-scale gemstone licence67. Duration of small-scale gemstone licence68. Rights conferred by small-scale gemstone licence69.Obligations of holder of small-scale gemstone licence70. Renewal of small-scale gemstone licence71. Transfer of small-scale gemstone licence72. Requirement to convert small-scale gemstone licence to largescale gemstone licence73. Termination of small-scale gemstone licence for insufficientproductionPART VARTISANAL MINING74. Application for artisan's mining right75. Grant of artisan's mining right76. Duration of artisan's mining right77. Renewal of artisan's mining right78. Rights conferred by artisan's mining right79. Suspension or cancellation of artisans mining rightDevelopment 60 [No. 7 of 2008 MinesMineralsand80. Right to building materials81. Obligations of holder of artisan's mining rightPART VIMINERAL PROCESSING LICENCE82. Application for mineral processing licence83. Consideration ofapplication for mineral processing licence84. Grant of mineral processing licencemineral processing licence85. Duration ofmineral processing licence86. Rights conferred by87. Obligations of holder of mineral processing licence88. Amendment of programme of mineral processingoperations89. Transfer of mineral processing licence90. Renewalmineral processing licenceoffor insufficient91. Termination ofmineral processing licenceproductionPART VIIGEMSTONE SALES CERTIFICATEwithout gemstone salesgemstonestrading in92. Prohibition ofcertificate93. Application for gemstone sales certificate94. Grantgemstone sales certificateofholder of gemstone sales certificate95. Obligations ofPART VIIIGENERAL PROVISIONS RELATING TO LICENCES AND PERMITSit96. Annual operating perm97. Holder to have office in Zambia98. Alteration of prospecting area99. Alteration of mining area100. Mergers or co-ordination of mining operations101. Abandonment of land subject to licence or permit102. Suspension or cancellation of mining right or non-miningright103.Transitional extension of mining right or non-mining rightpending certain applications104. Transfer of control of company105.Surrender of records on termination of mining right106. Export, import, etc. of mineralsMines and Minerals Development ' [No. 7 of 2008 61 107. Prohibition of acquisition, selling, etc. of radioactiveminerals108. Application to export, sell, etc. radioactive minerals109. Insurance and indemnities110.Obstruction of holder of mining right111. Production of information112. Register113. Inspection of Register114. Power to close area to prospectingPART IXSAFETY, HEALTH AND ENVIRONMENTAL PROTECTION115. Environment and human health to be considered whengranting mining rights or mineral processing licences116. Conditions for protection of environment and human health117. Direction to comply with conditions of mining right ormineral processing licence118. Rehabilitation by Director of Mines Safety at holder'sexpense119. Clearing away of mining plant or mineral processing plant120. Sale of mining plant or mineral processing plant121. Wasteful mining practices122. Environmental Protection Fund123. Liability and redressPART XGEOLOGICAL SERVICES AND MINERAL ANALYSIS124.Responsibilities of Director of Geological Survey125. Geological survey, mapping and prospecting on behalf ofRepublic126. Prohibition of operation of mineral analysis laboratory and geological or mining consultancy firm without permitPART XIMINING RIGHTS AND SURFACE RIGHTS127. Restrictions of rights of entry by holder of licence or permit128. Rights under licence or permit to be exercised reasonably129. Right to use and access water or graze stock130.Acquisition of use of land by holder of licence or permit131. Arbitration of disputes132. Compensation for disturbance of rights, etc.62 [No. 7 of 2008Mines and Minerals DevelopmentPART XIIROYALTIES AND CHARGES133. Royalties on production of minerals134. Due date for mineral royalty135. Commissioner-General to be responsible for royalties136. Mineral royalty sharing mechanism137. Mineral royalty returns138. Returns and assessments139. Remission of royalties140. Deferment of royalties141. Provisional assessment of royalty142. Prohibition of disposal of minerals143. Annual charge in respect of licencesPART XIIIADMINISTRATION144. Director and other officers145. Mining cadastre offices146. Execution and delegation of powers and functions of Director and other officers147. Power of entry by Director148.Obstruction of Director or authorised officer149. Recovery of fees150. Mining Advisory Committee151. Disclosure of informationPART XIVAPPEALS152. Appeals against decisions of Director153. Appeals in relation to licences issued by Minister154. Appeals in relation to insurance155. Notification of decisionsPART XVGENERAL PROVISIONS156. General penalty157. Miscellaneous offences158. Offence committed by body corporate or un-incorporate body159. Development agreements160. Existing development agreements to cease to be binding on Republic 161. Regulations162. Repeal of Cap. 213FIRST SCHEDULESECOND SCHEDULE2008 637 ofMines and Minerals Development [No.GOVERNMENT OF ZAMBIAACTNo. 7 of 2008Date of Assent: 27/03/08An Act to revise the law relating to the prospecting for, mining and processing of minerals; to repeal and replacethe Mines and Minerals Act, 1995; and to provide formatters connected with or incidental to the foregoing.[4th April, 2008ENACTED by the Parliament of Zambia. EnactmentPARTIPRELIMINARY1. (1) This Act may be cited as the Mines and Minerals Short title Development Act, 2008.(2) This Act shall come into operation on 1st April, 2008.contextotherwiserequires— Interpretationthe2. (1) InthisAct,unless"access agreement" means an agreement entered into betweenthe holder of a mining right and an owner or occupier ofland over which the right subsists, for the regulation ofprospecting, mining or other activities authorised by themining right to be carried on upon the land;"artisan's mining right" means an artisan's mining rightgranted under Part V of this Act;"base metal" means a non-precious metalthat is eithercommon or more chemically active, or both common andchemically active and includes iron, copper, nickel,aluminium, lead, zinc, tin, magnesium, cobalt, manganese,titanium, scandium, vanadium and chromium;DevelopmentMineralsand64 [No. 7 of 2008 Mines"bird sanctuary" means an area declared as such undersection one hundred and forty four of the Zambia WildAct, 1998;Act No. 12 of life1998"cadastre unit" means a quadrilateral formed by theintersection of meridians and parallels and with a distanceequal to six sexagesimal seconds, and that covers anaverage planimetric surface of three point three fourzero zero_ hectares;"Central Mining Cadastre Office" means theoffice established under section one hundred and forty-five;"citizen -owned company" means a companywhere at least fifty point one per cent of its equity is ownedby Zambian citizens and in which the Zambian citizenshave significant control of the management of thecompany;"Commissioner-General" means the Commissioner-General appointed under the Zambia Revenue Authority Cap. 321 Act;"Director" means the Director of Mines appointed undersection one hundred and forty-four;" Director of Geological Survey" means the personappointed as such under subsection (3) of section Onehundred and forty four;"Director of Mines Safety" means the person appointed assuch under subsection (2) of section One hundred andforty-four;"energy minerals" means a naturally occurring substance inthe earth's crust used as a source of energy and includescoal, uranium and any other minerals used to generateenergy but does not include petroleum;"Environmental Council of Zambia" has the meaning assignedto it in the Environmental Protection and Pollution Control Cap. 204 Act;"environmental impact study" has the meaning assigned to itin the Environmental Protection and Pollution Control(Environmental Impact Assessment) Regulations, 1997;"Environment Management Plan" means a plan approved bythe Environmental Council of Zambia in accordance with Cap. 204 the Environmental Protection and Pollution Control Act;652008of7Mines and Minerals Development [No."game management area" means an area of land declared as such under section twenty-six of the Zambia Wild lifeAct, 1998; Act No. 12 of1998 "gemstone sales certificate" means a gemstone sales certificate granted under Part VII of this Act;"gemstones" means amethyst, aquamarine, beryl, corundum, diamond, emerald, garnet, ruby, sapphire, topaz, tourmalineand any other non metallic mineral substance, being asubstance used in the manufacture of jewellery, that theMinister, by statutory instrument, declares to be a gemstonefor the purposes of this Act;"holder" means the person in whose name a mining right is registered under this Act;"industrial minerals" means a rock or mineral other than gemstones, base metals, energy minerals or precious metalsused either in their natural state or after physical or chemicaltransformation and includes but is not limited to barites,dolomite, feldspar, fluorspar, graphite, gypsum, ironstonewhen used as a fluxing agent, kyanite, limestone, phyllite,magnesite, mica, nitrate, phosphate, pyrophyllite, salt, sands,clay, talc, laterite, gravel and any other minerals when soused:Provided that the Minister may, by statutory order,classify any other mineral as an industrial mineral;"large-scale gemstone licence" means a large-scale gemstone licence granted under Part III of this Act to enable a personprospect for and mine gemstones;"large scale mining licence" means a large scale mining licence granted under Part III of this Act;"local forest" means an area declared as such under section seventeen of the Forests Act; Cap. 199 "local office" means an office of the Ministry established for any area;"mine" means any place, pit, shaft, drive, levelor other excavation, and any drift, gutter, lead, vein, lode,reef, saltpan or working, in or on or by means of which anyoperation connected with mining is carried on, together withall the buildings, premises, erections and appliances,whether above or below the ground, that are used inconnection with any such operation or for the extraction,treatment or preparation of any mineral or for the purposeof dressing mineral ores;"mineral" means any substance, occuring naturally in or on the earth or in or under water and which was formedby or subjected to a geological process and includesany mineral occuring in residue stockpiles or in residueDevelopment66 [No. 7 of 2008 MinesMineralsanddesposit, but excludes—•(a) water, other than water taken from the land orany water body for the extraction for any mineralfrom such water; and(b) petroleum"mineral processing"means the practice of beneficiating orliberating valuable minerals from their ores which maycombine a number of unit operations such as crushing,grinding, sizing, screening, classification, washing, frothfloatation, gravity concentration, electrostatic separation,magnetic separation, leaching, smelting, refining, calciningand gasification or any other processes incidental thereto;"mineral process lincence" means a mineral processing licencegranted under part VI of this Act;"mineral royalty" means a payment received as considerationfor the extraction of minerals;"mining" means the extraction of material, whether solid, liquidor gaseous, from land or from beneath the surface of theearth in order to win minerals, or any operations directly orindirectly necessary or incidental thereto;"Mining Advisory Committee" means the Mining AdvisoryCommittee established by section one hundred and fifty;"mining area" means an area of land subject to a licence orpermit under this Act;"Mining Cadastre Office" means the central administrativeoffice established in Lusaka which is responsible for theprocessing and administration of mining rights and nonmining rights;"mining operations" means any operation carried out under amining right referred to in section six but does not includean operation carried out under a prospecting permit,prospecting licence or mineral processing licence;"mining plant" means any building, plant, machinery,equipment, tools or other property used for mining, whetheror not affixed to land, but does not include any timber orother material used or applied in the construction or supportof any shaft, drive, gallery, terrace, race, dam or other work;"mining right" means a right granted under subsection (1) ofsection six;"National Forest" means an area declared as such undersection eight of the Forests Act;"National Park" means an area declared as such under section Cap. 199 ten of the Zambia Wildlife Act;"non-mining right" means a mineral processing licence orgemstone sales certificate granted under this Act;"ore" means a natural aggregate of one or more valuableminerals which may be mined or from which some partsmay be extracted;"ore body" means a continous, well defined mass of ore; Cap. 435 "petroleum" has the meaning assigned to it in the PetroleumMines and Minerals Development [No.7of2008 67(Exploration and Production) Act, but does not include coalor oil shale;"preliminary investigation rights" uieans rights granted by theDirector of Geological Survey under subsection (2) ofsection five;"person includes a partnership and a co-operative;"prospect" means to search for any mineral by any meansand to carry out such works, and remove such samples, asmay be necessary to test the mineral bearing qualities ofany land;"prospecting area" means an area of land subject to a cap. 435 prospecting licence or a prospecting permit;"prospecting licence" means a prospecting licence grantedunder Part III of this Act;"prospecting operations" means operations carried out in thecourse of prospecting;"prospecting permit" means a prospecting permit grantedunder Part IV of this Act;"radioactive mineral" means a mineral which contains byweight at least one twentieth of one per centum of uraniumor thorium or any combination thereof, and includes, but isnot limited to—(a) monazite sand and other ores containing thorium;and(b) carnotite, pitchblende and other ores containinguranium."regional mining cadastre offices" meansother mining cadastreoffices, established in other districts throughout the Republicother than Lusaka, to enable the public lodge applicationsfor mining rights and non mining rights;"Register" means the Register established and maintainedpursuant to section one hundred and twelve;"royalty" means the royalty charged under this Act."small-scale gemstone licence" means a small-scale gemstonelicence granted under Part IV of this Act; and"small-scale mining licence" means a small scale mininglicence granted under Part IV of this Act.(2) A reference, in any provision of this Act, to an authorisedofficer is a reference to a public officer or other person, designatedunder section one hundred and forty-four, who is duly authorisedto exercise and perform the powers and functions conferred orimposed by that provision on an authorised officer.(3) A reference in this Act to land subject to a mining right, is a reference to an area of land in respect of which a mining right hasbeen granted and subsists.3. (1) Allrightsofownershipin,searchingfor,miningand Rightstodisposing of, minerals wheresoever located in the Republic arev e g t e<j -mhereby vested in the President on behalf of the Republic. President68 [No. 7 of 2008Mines and Minerals Development(2) The provisions of this section have effect notwithstandingany right, title or interest which any person may possess in or overthe soil in, on or under which minerals are found.Acquisition ofmining rightsProhibition of prospecting, mining, etc. without mining right or mineral processing licenceTypes of rightsPART IIM INING RIGHTS4. Subject to the other provisions of this Act, rights of prospecting for, mining and disposing of, minerals shall be acquired and held under and in accordance with this Act.5. (1) Apersonshallnotprospect formineralsorcarryon mining operations or mineral processing operations except under the authority of a mining right or mineral processing licence granted under this Act.(2) The Director of Geological Survey may, for a period not exceeding ninety days, grant in writing, subject to such conditions, including conditions relating to work and expenditure, as the Director of Geological Survey may impose, the right to enter any area that is not subject to a mining right, or undertake an aerial survey, for the purpose of reconnaissance operations for the location of minerals by geo-physical, geo-chemical and photo-geological survey or by the study of surface geology.(3) A right granted by the Director of Geological Survey under subsection (2) shall not confer on the holder exclusive rights over the area to which it relates or any preference or priority in respect of an application for a mining right over that area.(4) A person who contravenes subsection (1) commits an offence and is liable upon conviction —(a) in the case of an individual, to a fine not exceeding onemillion penalty units or to imprisonment for a term notexceeding ten years, or to both; or(b) in the case of a body corporate or un-incorporate body,to a fine of five million penalty units.(1) The following mining rights may be granted under this6Act:(a) a prospecting licence;(b) a large-scale mining licence;(c) a large-scale gemstone licence;(d) a prospecting permit;Mines and Minerals Development [No. 7 of 2008 69(e) a small-scale mining licence;(f)a small-scale gemstone licence; and(g) an artisan's mining right.(2) The following non-mining rights may be granted underthis Act:(a) a mineral processing licence; and(b) a gemstone sales certificate.7.(1) A mining right or non-mining right shall not be granted to any person except in accordance with the provisions of this Act.(2) A mining right or non-mining right shall not be granted to or held by —(a) an individual who—(i) is under the age of eighteen years;(ii) is or becomes an undischarged bankrupt, having been adjudged or otherwise declared bankrupt under anywritten law, or enters into any agreement or schemeof composition with creditors, or takes advantage ofany legal process for the relief of bankrupt orinsolvent debtors; or(iii) has been convicted, within the previous ten years, of an offence involving fraud or dishonesty, or of anyoffence under this Act or any other law within oroutside Zambia, and been sentenced therefor toimprisonment without the option of a fine or to a fineexceeding fifty thousand penalty units; or(b) a company—(i) which is in liquidation, other than liquidation which formspart of a scheme for the reconstruction of thecompany or for its amalgamation with anothercompany;(ii) unless the company is incorporated under the Companies Act.(iii) which has not established an office in Zambia; or(iv) which has among its directors or shareholders any person who would be disqualified under sub-paragraphs (ii) or (iii) of paragraph (a).(3) A prospecting permit, small-scale mining licence, small-scale gemstone licence and an artisan's mining right shall not be granted to a person who is not a citizen of Zambia or a company which is not a citizen-owned company.(4) A mining right for industrial minerals shall only be granted to a citizen of Zambia and a citizen-owned company.(5) Any document or transaction purporting to grant a mining Certain persons disqualified from holding mining rightsCap. 38870 [No. 7 of 2008 MinesandMineralsDevelopmentRestrictions on mining rightsand mineral processing licence Priority of applications for mining rightsMining right for area subject to other rightsSurvey of landBids right to any person not entitled to hold the right shall be void and of no effect.(6) For the purposes of this Act, "citizen of Zambia" means(a) in relation to an individual, an individual who is a citizen ofZambia; and(b) in relation to a partnership, a partnership which is composedexclusively of persons who are citizens of Zambia.8. Aminingrightormineralprocessinglicence,andtherights conferred by it, shall be subject to the provisions of this Act, the conditions attached to it at the time it is granted and, to the extent that the amendment of such conditions during the currency of the mining right or mineral processing licence is permitted under this Act, to the conditions as amended.9. SubjecttothisAct,wheremorethanonepersonapplyfor a mining right over the same area of land, the Director of Geological survey as the case may be shall dispose of the applications in the order in which they are received.10. (1) An applicant for a mining right over an area subject to another mining right may apply for consent from the holder of the mining right, which consent shall not be unreasonably withheld.(2) A holder of a mining right over an area in respect of which an application is made under subsection (1) shall, within a period of ninety days, consent to the application where—(a) the minerals or metals applied for are different fromthose indicated on holder's licence or permit;(b) the geographical position of the minerals or metalsapplied for is different from the holder's ore bodyposition indicated in the approved programmes ofoperations;(c)the geological position of the minerals or metals appliedfor is different from the position of the holder's plantand infrastructure indicated in the approved programmeof operations; or(d) the mineral applied for is an industrial mineral and theholder is not eligible under the Act.(3) An applicant shall, where a holder of a mining right over an area in respect of which the application is made withholds consent, apply to the Mining Advisory Committee which shall determine the matter taking into account the matters referred to under paragraphs (a) to (d) of subsection (2).11.The Director or the Director of Geological Survey as the case may be, may before a mining right or mineral processing licence is issued, require that the land over which the mining right or mineral processing licence is to be issued be properly surveyed in accordance with the provisions of this Act.12. (1) Subject to the other provisions ofthis Act, the Minister。
地震的起因中英文介绍

地震的起因中英文介绍导语:地震又称地动、地振动,是地壳快速释放能量过程中造成振动,期间会产生地震波的一种自然现象。
地球上板块与板块之间相互挤压碰撞,造成板块边沿及板块内部产生错动和破裂,是引起地面震动即地震的主要原因.地震的起因英文介绍The causes of the earthquake 地震的起因The earthquake causes the Earth's surface vibration There are many reasons according to the causes of earthquakes, earthquakes can be divided into the following:引起地球表层振动的原因很多,根据地震的成因,可以把地震分为以下几种:1. Tectonic earthquakeSince the depths of underground rock dislocation, caused by the rupture of the earthquake known as tectonic earthquake. Such earthquake occurred most often, the greatest damage, theearthquake accounted for about 90 percent of the world over.1.构造地震由于地下深处岩层错动、破裂所造成的地震称为构造地震。
这类地震发生的次数最多,破坏力也最大,约占全世界地震的90%以上。
2. Volcanic earthquakeAs the volcano, such as magmatic activity, such as gas explosions causedby the earthquake known as the volcanic earthquake. Only in areas of volcanic activity may occur before the volcanicearthquakes, such earthquakes around the world accounts for only about 7 percent of the earthquake.2.火山地震由于火山作用,如岩浆活动、气体爆炸等引起的地震称为火山地震。
关于爬虫的外文文献

关于爬虫的外文文献爬虫技术作为数据采集的重要手段,在互联网信息挖掘、数据分析等领域发挥着重要作用。
本文将为您推荐一些关于爬虫的外文文献,以供学习和研究之用。
1."Web Scraping with Python: Collecting Data from the Modern Web"作者:Ryan Mitchell简介:本书详细介绍了如何使用Python进行网页爬取,从基础概念到实战案例,涵盖了许多常用的爬虫技术和工具。
通过阅读这本书,您可以了解到爬虫的基本原理、反爬虫策略以及如何高效地采集数据。
2."Scraping the Web: Strategies and Techniques for Data Mining"作者:Dmitry Zinoviev简介:本书讨论了多种爬虫策略和技术,包括分布式爬虫、增量式爬虫等。
同时,还介绍了数据挖掘和文本分析的相关内容,为读者提供了一个全面的爬虫技术学习指南。
3."Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Instagram, Pinterest, and More"作者:Matthew A.Russell简介:本书主要关注如何从社交媒体平台(如Facebook、Twitter 等)中采集数据。
通过丰富的案例,展示了如何利用爬虫技术挖掘社交媒体中的有价值信息。
4."Crawling the Web: An Introduction to Web Scraping and Data Mining"作者:Michael H.Goldwasser, David Letscher简介:这本书为初学者提供了一个关于爬虫技术和数据挖掘的入门指南。
内容包括:爬虫的基本概念、HTTP协议、正则表达式、数据存储和数据分析等。
大数据挖掘外文翻译文献

文献信息:文献标题:A Study of Data Mining with Big Data(大数据挖掘研究)国外作者:VH Shastri,V Sreeprada文献出处:《International Journal of Emerging Trends and Technology in Computer Science》,2016,38(2):99-103字数统计:英文2291单词,12196字符;中文3868汉字外文文献:A Study of Data Mining with Big DataAbstract Data has become an important part of every economy, industry, organization, business, function and individual. Big Data is a term used to identify large data sets typically whose size is larger than the typical data base. Big data introduces unique computational and statistical challenges. Big Data are at present expanding in most of the domains of engineering and science. Data mining helps to extract useful data from the huge data sets due to its volume, variability and velocity. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective.Keywords: Big Data, Data Mining, HACE theorem, structured and unstructured.I.IntroductionBig Data refers to enormous amount of structured data and unstructured data thatoverflow the organization. If this data is properly used, it can lead to meaningful information. Big data includes a large number of data which requires a lot of processing in real time. It provides a room to discover new values, to understand in-depth knowledge from hidden values and provide a space to manage the data effectively. A database is an organized collection of logically related data which can be easily managed, updated and accessed. Data mining is a process discovering interesting knowledge such as associations, patterns, changes, anomalies and significant structures from large amount of data stored in the databases or other repositories.Big Data includes 3 V’s as its characteristics. They are volume, velocity and variety. V olume means the amount of data generated every second. The data is in state of rest. It is also known for its scale characteristics. Velocity is the speed with which the data is generated. It should have high speed data. The data generated from social media is an example. Variety means different types of data can be taken such as audio, video or documents. It can be numerals, images, time series, arrays etc.Data Mining analyses the data from different perspectives and summarizing it into useful information that can be used for business solutions and predicting the future trends. Data mining (DM), also called Knowledge Discovery in Databases (KDD) or Knowledge Discovery and Data Mining, is the process of searching large volumes of data automatically for patterns such as association rules. It applies many computational techniques from statistics, information retrieval, machine learning and pattern recognition. Data mining extract only required patterns from the database in a short time span. Based on the type of patterns to be mined, data mining tasks can be classified into summarization, classification, clustering, association and trends analysis.Big Data is expanding in all domains including science and engineering fields including physical, biological and biomedical sciences.II.BIG DATA with DATA MININGGenerally big data refers to a collection of large volumes of data and these data are generated from various sources like internet, social-media, business organization, sensors etc. We can extract some useful information with the help of Data Mining. It is a technique for discovering patterns as well as descriptive, understandable, models from a large scale of data.V olume is the size of the data which is larger than petabytes and terabytes. The scale and rise of size makes it difficult to store and analyse using traditional tools. Big Data should be used to mine large amounts of data within the predefined period of time. Traditional database systems were designed to address small amounts of data which were structured and consistent, whereas Big Data includes wide variety of data such as geospatial data, audio, video, unstructured text and so on.Big Data mining refers to the activity of going through big data sets to look for relevant information. To process large volumes of data from different sources quickly, Hadoop is used. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Its distributed supports fast data transfer rates among nodes and allows the system to continue operating uninterrupted at times of node failure. It runs Map Reduce for distributed data processing and is works with structured and unstructured data.III.BIG DATA characteristics- HACE THEOREM.We have large volume of heterogeneous data. There exists a complex relationship among the data. We need to discover useful information from this voluminous data.Let us imagine a scenario in which the blind people are asked to draw elephant. The information collected by each blind people may think the trunk as wall, leg as tree, body as wall and tail as rope. The blind men can exchange information with each other.Figure1: Blind men and the giant elephantSome of the characteristics that include are:i.Vast data with heterogeneous and diverse sources: One of the fundamental characteristics of big data is the large volume of data represented by heterogeneous and diverse dimensions. For example in the biomedical world, a single human being is represented as name, age, gender, family history etc., For X-ray and CT scan images and videos are used. Heterogeneity refers to the different types of representations of same individual and diverse refers to the variety of features to represent single information.ii.Autonomous with distributed and de-centralized control: the sources are autonomous, i.e., automatically generated; it generates information without any centralized control. We can compare it with World Wide Web (WWW) where each server provides a certain amount of information without depending on other servers.plex and evolving relationships: As the size of the data becomes infinitely large, the relationship that exists is also large. In early stages, when data is small, there is no complexity in relationships among the data. Data generated from social media and other sources have complex relationships.IV.TOOLS:OPEN SOURCE REVOLUTIONLarge companies such as Facebook, Yahoo, Twitter, LinkedIn benefit and contribute work on open source projects. In Big Data Mining, there are many open source initiatives. The most popular of them are:Apache Mahout:Scalable machine learning and data mining open source software based mainly in Hadoop. It has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent patternmining.R: open source programming language and software environment designed for statistical computing and visualization. R was designed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand beginning in 1993 and is used for statistical analysis of very large data sets.MOA: Stream data mining open source software to perform data mining in real time. It has implementations of classification, regression; clustering and frequent item set mining and frequent graph mining. It started as a project of the Machine Learning group of University of Waikato, New Zealand, famous for the WEKA software. The streams framework provides an environment for defining and running stream processes using simple XML based definitions and is able to use MOA, Android and Storm.SAMOA: It is a new upcoming software project for distributed stream mining that will combine S4 and Storm with MOA.Vow pal Wabbit: open source project started at Yahoo! Research and continuing at Microsoft Research to design a fast, scalable, useful learning algorithm. VW is able to learn from terafeature datasets. It can exceed the throughput of any single machine networkinterface when doing linear learning, via parallel learning.V.DATA MINING for BIG DATAData mining is the process by which data is analysed coming from different sources discovers useful information. Data Mining contains several algorithms which fall into 4 categories. They are:1.Association Rule2.Clustering3.Classification4.RegressionAssociation is used to search relationship between variables. It is applied in searching for frequently visited items. In short it establishes relationship among objects. Clustering discovers groups and structures in the data.Classification deals with associating an unknown structure to a known structure. Regression finds a function to model the data.The different data mining algorithms are:Table 1. Classification of AlgorithmsData Mining algorithms can be converted into big map reduce algorithm based on parallel computing basis.Table 2. Differences between Data Mining and Big DataVI.Challenges in BIG DATAMeeting the challenges with BIG Data is difficult. The volume is increasing every day. The velocity is increasing by the internet connected devices. The variety is also expanding and the organizations’ capability to capture and process the data is limited.The following are the challenges in area of Big Data when it is handled:1.Data capture and storage2.Data transmission3.Data curation4.Data analysis5.Data visualizationAccording to, challenges of big data mining are divided into 3 tiers.The first tier is the setup of data mining algorithms. The second tier includesrmation sharing and Data Privacy.2.Domain and Application Knowledge.The third one includes local learning and model fusion for multiple information sources.3.Mining from sparse, uncertain and incomplete data.4.Mining complex and dynamic data.Figure 2: Phases of Big Data ChallengesGenerally mining of data from different data sources is tedious as size of data is larger. Big data is stored at different places and collecting those data will be a tedious task and applying basic data mining algorithms will be an obstacle for it. Next we need to consider the privacy of data. The third case is mining algorithms. When we are applying data mining algorithms to these subsets of data the result may not be that much accurate.VII.Forecast of the futureThere are some challenges that researchers and practitioners will have to deal during the next years:Analytics Architecture:It is not clear yet how an optimal architecture of analytics systems should be to deal with historic data and with real-time data at the same time. An interesting proposal is the Lambda architecture of Nathan Marz. The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, theserving layer, and the speed layer. It combines in the same system Hadoop for the batch layer, and Storm for the speed layer. The properties of the system are: robust and fault tolerant, scalable, general, and extensible, allows ad hoc queries, minimal maintenance, and debuggable.Statistical significance: It is important to achieve significant statistical results, and not be fooled by randomness. As Efron explains in his book about Large Scale Inference, it is easy to go wrong with huge data sets and thousands of questions to answer at once.Distributed mining: Many data mining techniques are not trivial to paralyze. To have distributed versions of some methods, a lot of research is needed with practical and theoretical analysis to provide new methods.Time evolving data: Data may be evolving over time, so it is important that the Big Data mining techniques should be able to adapt and in some cases to detect change first. For example, the data stream mining field has very powerful techniques for this task.Compression: Dealing with Big Data, the quantity of space needed to store it is very relevant. There are two main approaches: compression where we don’t loose anything, or sampling where we choose what is thedata that is more representative. Using compression, we may take more time and less space, so we can consider it as a transformation from time to space. Using sampling, we are loosing information, but the gains inspace may be in orders of magnitude. For example Feldman et al use core sets to reduce the complexity of Big Data problems. Core sets are small sets that provably approximate the original data for a given problem. Using merge- reduce the small sets can then be used for solving hard machine learning problems in parallel.Visualization: A main task of Big Data analysis is how to visualize the results. As the data is so big, it is very difficult to find user-friendly visualizations. New techniques, and frameworks to tell and show stories will be needed, as for examplethe photographs, infographics and essays in the beautiful book ”The Human Face of Big Data”.Hidden Big Data: Large quantities of useful data are getting lost since new data is largely untagged and unstructured data. The 2012 IDC studyon Big Data explains that in 2012, 23% (643 exabytes) of the digital universe would be useful for Big Data if tagged and analyzed. However, currently only 3% of the potentially useful data is tagged, and even less is analyzed.VIII.CONCLUSIONThe amounts of data is growing exponentially due to social networking sites, search and retrieval engines, media sharing sites, stock trading sites, news sources and so on. Big Data is becoming the new area for scientific data research and for business applications.Data mining techniques can be applied on big data to acquire some useful information from large datasets. They can be used together to acquire some useful picture from the data.Big Data analysis tools like Map Reduce over Hadoop and HDFS helps organization.中文译文:大数据挖掘研究摘要数据已经成为各个经济、行业、组织、企业、职能和个人的重要组成部分。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Large-Scale Mining of Usage Data on Web SitesGeorgios Paliouras,* Christos Papatheodorou,+ Vangelis Karkaletsis,* Panayotis Tzitziras,+Constantine D. Spyropoulos,** Institute of Informatics and Telecommunications,+ Division of Applied Technologies,National Centre for Scientific Research (NCSR) “Demokritos”,15310, Aghia Paraskevi, Attikis, GREECE*e-mail: {paliourg, vangelis, costass}@iit.demokritos.gr+e-mail: {papatheodor, tzitziras}@lib.demokritos.grAbstractIn this paper we present an approach to the discovery of trends in the usage of large Web-based information systems. This approach is based on the empirical analysis of the users′ interaction with the system and the construction of user groups with common interests (user communities). The empirical analysis is achieved with the use of cluster mining, a technique that process data collected from the users’ interaction with the Web site. Our main concern is the construction of meaningful communities, which can be used for improving the structure of the site as well as for making suggestions to the users at a personal level. Our case study on a site providing information for researchers in Chemistry shows that the proposed method provides effective mining of large usage databases.IntroductionAs the Web is expanding at an increasingly fast rate, embracing a large number of services, the issue of efficient information access is becoming a crucial factor in the design of Web sites. However, the manner in which a user accesses the information available on a Web site is heavily dependent on the user’s needs, interests, knowledge and prejudices. As a result, the structure of the site should reflect the requirements of its users.The first step towards providing efficient information access in a site is to understand its usage. This can be done by monitoring the daily usage of the site and analyzing the collected data. Commonly, the data that is collected by the administrators of various sites consists of general-purpose statistical figures, such as the number of users who access a particular page within certain periods of the day. This information can be useful in drawing a few general conclusions on the usage of a site, but does not facilitate the adaptation of the site to the needs of the users.In this paper we examine an alternative, more personalized approach to the collection and analysis of usage data. This approach is based on the analysis of access logs, which record the date and time each page is accessed, as well as the IP number of the visitor. We organize the access-log information in sessions, each of them providing a navigational pattern, associating a set of pages in the site, and then we construct community models.A community (Orwant, 1995) is a group of users with common navigational behavior and the community model describes the common features in the behavior of the users. The construction of communities is done with the use of the Cluster Mining algorithm (Perkowitz & Etzioni, 1998). The work presented in this paper builds upon previous work of ours on the construction of user communities from usage data on various information services on the Internet (Paliouras et al. 1998, 1999a, 1999b). The main difference with our previous work is the scale of the problem. Previously, we had only examined a small Web site, consisting of 41 pages (Paliouras et al. 1999b), while now we are looking at a site with a few thousand pages and very high hit rate. The increase in scale introduces a number of important issues, concerning feature selection, scalability of the clustering algorithms and interpretation of the results. We are addressing each of these issues individually, introducing new ideas to our method for constructing community models, with the aid of clustering algorithms.The first issue that we address is that of data engineering, which includes selecting the right representation for the training data and reducing the dimensionality of the problem. Regarding the representation of the data, we have seen in our previous work (Paliouras et al. 1999b) that representing access sessions by means of transitions between pages produces interesting navigation patterns for the community models. This representation is used again here, but we combine it with a simpler representation, where access sessions are represented as bags of pages. Another issue, in which we pay substantial attention, is the characterization of the community models, i.e., the construction of meaningful communities that are useful for the system administrator. Ideally we would like to be able to construct a prototypical model for each community, which is representative of the participating users and significantly different from the users of other communities.Figure 1. Non-normalised graph.Such community descriptions can be used to:•=introduce structure or re-organise the existing structureof an information service,•=make suggestions at a personal level to the userswithin a specific community,•=support the expansion strategy for the service, etc.Section 2 of the paper presents the algorithm that is used for the construction of communities and discusses the problem of constructing meaningful communities. Section 3 presents the experimental setting and results and section 4 summarises the presented work, introducing our plans for the future.Learning User CommunitiesLearning from navigation patternsThe most effective way to learn about the use of a Web-based information service and draw conclusions, which may help to improve it, is through the direct analysis of the users’ interaction with it (i.e., user queries and/or navigation patterns). A number of interesting attempts to achieve such analysis have been done lately, in the context of analysing usage data on the Web, using machine learning methods.Machine learning has mainly been used for acquiring models of individual users interacting with an information system, e.g. (Bloedorn, Mani and MacMillan, 1996; Chiu, 1997; Raskutti and Beitz, 1996 ). In such situations, the use of the system by an individual is monitored and the collected data are used to construct the model of the user, i.e., his/her individual requirements.We are concerned with a higher level of generalization of the users’ interests: the construction of user communities. One approach to the construction of user communities is by generalizing the properties of user models. This approach requires the application of unsupervised learning techniques to the data, i.e., the users' characteristic features, which in our case are the visited Web pages.Unsupervised learning tasks have been approached by a variety of methods, ranging from statistical clustering techniques to neural networks and symbolic machine learning. In this work, we have opted for the statistical learning methods. The statistical learning algorithm used here is a variant of the cluster mining algorithm used in PageGather (Perkowitz & Etzioni, 1998). The cluster mining algorithm in PageGather is applied to Web access trails, i.e., it translates the access trails into a graph and searches for highly connected subgraphs.Cluster MiningThe cluster mining algorithm that we use here discoverspatterns of common behaviour, by looking for all cliques in a graph that represents the users' characteristic features. We start by constructing a weighted graph G (V ,E ,W V ,W E ). The set of vertices V corresponds to the users' characteristicfeatures . The set of edges E corresponds to the combination of the users' characteristics as they are observed in their interaction with the system. For instance, if the Web site is a library on Chemistry, and the user visits pages concerning "Organic Chemistry" and "Polymers" we create an edge between the relevant vertices (Figure 1). The weights on the vertices W V and the edges W E are computed as the frequencies of the users' interests and their combinations respectively.Edge frequencies are normalised by dividing them with the maximum of the frequencies of the two vertices that they connect. The effect of normalisation is to remove the bias for characteristics that appear very often in all users. According to the previous example, the resulting normalised graph is given in Figure 2.The connectivity of the resulting graph is usually veryhigh. For this reason we make use of a connectivity threshold aiming to reduce the edges of the graph. In our example in Figure 2, if the threshold equals 0.1 the edge ("Inorganic", "Biochemistry") is dropped. Despite the large theoretical complexity of the clique-finding problem, in practice the algorithm that has beenFigure 2. Normalised graph.implemented (Bron & Kerbosch, 1973) is fast.1 The efficiency of the algorithm allowed a full investigation of the effect of the connectivity threshold.The algorithm that we use differs in two ways from the cluster mining algorithm used in the PageGather system: (a) PageGather does not normalise the weights W E and (b) it restricts its search to cliques of size k and to connected components.Cluster mining does not attempt to form independent user groups, but the generated clusters group together characteristic features of the users directly. If needed, a user can be associated with the clique(s) that best match the user’s behaviour. Alternatively each user can be associated probabilistically with each of the cliques. As mentioned above, the focus of our work is on the discovery of the behavioural patterns of user communities. For this reason, no attempt is made to match individual users with the cliques generated by the cluster mining algorithm. Meaningful CommunitiesIn contrast to the common clustering methods, such as COBWEB (Fisher, 1987) and Autoclass (Hanson, Stutz and Cheeseman, 1991), the clusters generated by cluster mining group together characteristic features of the users directly. Each clique discovered by the cluster mining algorithm, is already a navigational pattern. This is an important advantage of this mining approach.In order to examine the expressiveness of the communities produced by the cluster mining algorithm, we varied the connectivity threshold, and measured the following two properties of the generated community descriptions:Coverage: the proportion of features covered by the descriptions. Some of the features will not be covered, because their frequency will not have increased sufficiently. In order for this to happen in the proposed method, which generates all cliques in a graph, we need to ignore singleton cliques.Overlap: the extent of overlap between the constructed descriptions. This is measured simply as the ratio between the total number of features in the description and the number of distinct features that are covered.Case StudyData EngineeringFor this experiment, we used the access logs of the site “Information Retrieval in Chemistry”(http://macedonia. chem.demokritos.gr), which consists of a few thousand pages with a high hit rate. The log files consisted of 137,150 Web-server calls (log file entries) and covered the1 It generates all cliques (approx. 200) of a large graph (239 vertices), with an average clique size of 100 vertices, in about 5 mins on a common SparcServer. period between January and August 1999. Each log entry recorded a visitor's access date and time, its computer IP address and domain name, and the target page (URL).In order to construct a training set for the clustering algorithm, the data in the log files passed through two stages of pre-processing:1. Access sessions were extracted.2. The paths recorded in the access sessions weretranslated into feature vectors.Extracting access sessions from log files is a less deterministic process than one initially would imagine. This process involves the following stages:1. Grouping the logs by date and IP address.2. Selecting a time-frame within which two hits from thesame IP address can be considered to belong to the same access session.3. Grouping the pages accessed by the same IP addresswithin the selected time-frame to form a session.In order to select the appropriate time-frame, we generated the frequency distribution of the page transitions in minutes. According to this distribution, transitions from one page to another, made with a time interval longer than one hour, had very low frequency. Thus, a sensible definition of the access session is a sequence of page transitions for the same IP address, where each transition is done at a time interval smaller than one hour. Based on this definition, our log files consisted of 11,893 access sessions. Concerning the translation of access sessions to feature vectors, we examined two alternative approaches. In the first approach each feature in the feature vector represented the absence or presence of a particular page of the Web site in the session. In the second approach, we used transitions between pages, rather than individual pages as the basic path components. There were 1,027 pages in the site that were visited at least once. Clearly the number of all possible transitions between these pages is prohibitively large. Even the number of different transitions that appear in the log files is very large. Thus, we needed a method to reduce the number of features in both experiments. This reduction was achieved by examining the frequency distributions of the pages and transitions from one page to another. The two distributions were highly skewed, i.e., there was a small number of very frequent pages and transitions. Thus, we decided on a cut-off frequency of 30 for pages and 20 for transitions, which were the points where the corresponding distributions were becoming flat. Additionally we removed all transitions from a page to itself. As a result, 229 pages and 251 transitions survived this selection and were used to form the binary feature vectors. We also tried a method that uses Mutual Information, as a criterion for selecting features for unsupervised learning (Sahami, 1997). More than 90% of the features selected by this method, were within the high-frequency range that we selected. However, some of the features that were eliminated were clearly important. ForFigure 5. Coverage of cliques for different valuesof the connectivity threshold.00,20,40,60,8100,20,40,60,81Connectivity thresholdC o v e r a g eFor small values of the threshold, the graph is highly connected and contains many large cliques. It is really above the threshold value 0.2, that the number of the cliques drops to manageable levels. At that level, almost all cliques are pairs of pages or transitions. The first unexpected observation in these data is the close proximity of the curves for the two different representations. One explanation for this can be given by considering the organization of the Web site. The site is organized roughly as a tree, which is very wide, but shallow. This means, that there are many pages at the same conceptual level, corresponding to different areas of chemistry. Given that the visitor, can easily move from any of these pages to any other, the concept of navigational patterns between pages at the same conceptual level becomes very weak. As a result, frequent transition sequences become equivalent to frequent sets of pages, corresponding both to the characteristic interests of different communities. The second interesting observation is the sharp fall in the average size of the cliques for both representations. The result of this phenomenon is that the associations found between the pages are really co-occurrence patterns, rather than substantial groups of pages. This observation justifies the choice of a flat hierarchy for such a large Web site.In addition to the number of cliques and their average size, we examined the coverage of the cliques and the overlap between them. Figures 5 and 6 present the results along those two dimensions. As expected, the overlap for small threshold values is very large, due to the large number of very large cliques. However, it falls sharply, following the sharp fall in the size of the cliques. The coverage of the cliques falls at a much lower pace. The result of this is that around the threshold value 0.2, about half of the pages and the transitions appear in the cliques, while there is little overlap between the cliques. In other words, the pairs of pages and transitions that are grouped together by the algorithm at that threshold level seem to be quite distinct and as a result they cover a large proportion of the original features. This observation suggests theFigure 4. Average clique size for different valuesof the connectivity threshold. 00,20,40,60,81Connectivity thresholdselection of this threshold value, for the formation of the communities, which then need to be examined further by the administrator of the site.ConclusionsEfficient and effective access to on-line information becomes increasingly critical as the amount of information that becomes on-line increases at an overwhelming pace. Web-based information systems constitute a vital component of this new style of information exchange and user modelling technology can facilitate considerably the access to them. This is a much more informative analysis than the simple Web usage statistics that are commonly used by system administrators. The results obtained by a Web usage analysis, can be used to modify the structure of a Web site, in reaction to the interests of the visitors and/or make the site adaptive to different types of visitors.We have examined two different representations of the access sessions, as simple bags of pages and as sets of page transitions. Higher-order representations, i.e., longer sequences of page transitions, may be interesting, but are likely to increase the dimensionality and reduce radically the density of the training data. This can have a seriously negative effect on the ability of the learning algorithms to generalize.Finally, the work presented here could be extended in several ways. We are currently comparing the cluster mining method with clustering methods, including Autoclass (Hanson, Stutz and Cheeseman, 1991) and the neural clustering module in the IBM Intelligent Miner TM package. The same Web site is used as a testbed for this comparison. Our longer-term plans focus on the next step to the approach presented here, i.e., the use of our results for the personalization of a Web site. The results presented here show that the discovery of behavioural patterns for user communities, with the use of cluster mining, is feasible even for large sites and can provide very valuable information to the Web-site administrator.AcknowledgmentsWe would like to thank the coordinator of the team "Information Retrieval in Chemistry", Dr. E. Varveri, as well as the members of the team Mr. A. Varveris and Mr. P. Telonis, for the dataset used in our case study.ReferencesBloedorn, E.; Mani, I.; and MacMillan, T.R. 1996. Machine Learning of User Profiles: Representational Issues. In Proceedings of the National Conference on Artificial Intelligence , 433-438: AAAI Press.Bron, C.; and Kerbosch, J. 1973. Finding all cliques of an undirected graph. Communications of the ACM 16:575-577.Chiu, P. 1997. Using C4.5 as an Induction Engine for Agent Modelling: An experiment of Optimisation. In Proceedings of the International Conference on User Modelling, Workshop on Machine Learning for User Modelling.Fisher, D. 1987. Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning, 2:139-172.Hanson, R., Stutz, J., Cheeseman, P. 1991. Bayesian Classification Theory, Technical Report FIA-90-12-7-01, NASA Ames Research Center, Artificial Intelligence Branch.Orwant, J. 1995. Heterogeneous Learning in the Doppelgänger User Modelling System. User Modelling and User-Adapted Interaction, 4:107-130.Paliouras, G.; Papatheodorou, C.; Karkaletsis, V.; Spyropoulos, C.D.; and Malaveta, V. 1998. Learning User Communities for Improving the Services of Information Providers. In Proceedings of the Second European Conference on Digital Libraries , 367-384. Heraklion, Greece: Lecture Notes in Computer Science, n. 1513, Springer-Verlag.Paliouras, G.; Karkaletsis, V.; Papatheodorou, C.; and Spyropoulos, C.D. 1999a. Exploiting Learning Techniques for the Acquisition of User Stereotypes and Communities. In Proceedings of the Seventh International Conference on User Modeling , 169-178. Banff, Canada: CISM Courses and Lectures, n. 407, Springer-Verlag.Paliouras, G.; Papatheodorou, C.; Karkaletsis, V.; Spyropoulos, C.D.; and Tzitziras, P. 1999b. From Web Usage Statistics to Web Usage Analysis. In Proceedings of the IEEE Conference on Systems Man and Cybernetics , II-159-164. Tokyo, Japan : IEEE Press.Perkowitz, M.; and Etzioni, O. 1998. Adaptive Web Sites: Automatically synthesizing Web pages. In Proceedings of the Fifteenth National Conference in Artificial Intelligence . Madison, Wisconsin, MW: AAAI Press.Perkowitz M.; and Etzioni O. 1999. Adaptive Web Sites: Conceptual Cluster Mining. In Proceedings of the Sixteenth International Joint Conference in ArtificialFigure 6. Overlap between cliques for differentvalues of the connectivity threshold. 01234500,20,40,60,81Connectivity thresholdO v e r l a pIntelligence, 264-269. Stockholm, Sweden: International Joint Conferences on Artificial Intelligence, Inc. Raskutti, B.; and Beitz, A. 1996. Acquiring User Preferences for Information Filtering in Interactive Multi-Media Services. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, 47-58. Sahami, M. 1998. Using Machine Learning to Improve Information Access, Ph.D. Thesis, Department of Computer Science, Stanford University.。